CN111897932A

CN111897932A - Query processing method and system for text big data

Info

Publication number: CN111897932A
Application number: CN202010707345.6A
Authority: CN
Inventors: 黄海阳
Original assignee: Shenzhen Dimension Statistics Consulting Co ltd
Current assignee: Shenzhen Dimension Statistics Consulting Co ltd
Priority date: 2020-07-21
Filing date: 2020-07-21
Publication date: 2020-11-06

Abstract

The invention provides a query processing method and a query processing system for text big data, and relates to the field of computer application. A query processing method of text big data comprises the following steps: acquiring text big data semantics; configuring a query rule and a corresponding relation between the query rule and a query condition; generating a text big data semantic instruction analysis and query model by fusing SQL grammar; and recording the intermediate process variable and the temporary state of the query and feeding back the result. The semantic model can be standardized, the problems of redundancy, invalidity and the like in the query process are solved, the accuracy of the query process is improved, and the most effective query mode can be quickly found to obtain the result. In addition, the invention also provides a query processing system of text big data, which comprises: the device comprises a specification module, an inquiry module, an instruction module and a control module.

Description

Query processing method and system for text big data

Technical Field

The invention relates to the field of computer application, in particular to a method and a system for query processing of text big data.

Background

With the development of the internet, the data volume in each service system becomes huge, especially in the telecommunication and internet industries. In the face of the requirement of querying the big data, the performance and efficiency of the query are the first time to come.

Under the condition that conditions such as server hardware and database configuration are fixed, if a traditional query mode is adopted, the query efficiency is continuously reduced along with the continuous increase of data volume, the response time of user query is gradually slow, and even the condition that the database cannot be used due to query can occur. Although it is conceivable to introduce a search engine to implement query operations of mass data, since the search engine cannot effectively understand business data, to implement rapid analysis and query of business data for a specific industry, the introduction of the search engine is difficult, and the workload and implementation complexity are too high.

Disclosure of Invention

The invention aims to provide a query processing method of text big data, which can standardize a semantic model, eliminate the problems of redundancy, invalidity and the like in the query process, improve the accuracy of the query process and quickly find the most effective query mode to obtain a result.

Another object of the present invention is to provide a query processing system for big text data, which can apply a query processing method for big text data.

The embodiment of the invention is realized by the following steps:

in a first aspect, an embodiment of the present application provides a query processing method for text big data, which includes the following steps of obtaining text big data semantics, configuring a query rule and a corresponding relationship between the query rule and a query condition, fusing an SQL syntax to generate a text big data semantic instruction parsing and query model, recording intermediate process variables and temporary states of a query, and feeding back results of the intermediate process variables and the temporary states.

In some embodiments of the present invention, the obtaining text big data semantics includes creating one or more query object semantic morphism models involved in the text big data query analysis; the symmetry of each object is specified through one or more levels of morphological change semantic description; one or more levels of morphometric change interconversion availability are designed for one or more query objects to obtain a semantic description.

In some embodiments of the present invention, the configuring the query rule and the corresponding relationship between the query rule and the query condition include setting a pre-query threshold and a query flag.

In some embodiments of the present invention, when the queried data amount reaches the pre-query threshold, feeding back the queried data as the query result, marking the query result by the query mark, and continuing querying according to the query mark when querying next time according to the same query scheme.

In some embodiments of the present invention, the configuring the text big data query rule is to configure corresponding query granularities of different text big data attributes.

In some embodiments of the present invention, the above further includes: and constructing a query model through knowledge representation learning relationship path reasoning, query rule-based reasoning and SQL.

In some embodiments of the present invention, the recording of the intermediate process variables and the temporary state of the query and the feedback of the result includes selecting a query method model meeting the requirements of each link of the workflow, and constructing and combining candidate query method models of each link.

In some embodiments of the present invention, the above further includes evaluating the confidence of the confidence system for one or more instructions through the query history, constructing a query work chain by analyzing local variables through SQL, calculating the result confidence of each workflow stage on the query work chain through a 95% confidence interval, and feeding back the result with the highest confidence.

In a second aspect, an embodiment of the present application provides a query processing system for text big data, which includes a specification module, configured to obtain text big data semantics; the query module is used for configuring query rules and the corresponding relation between the query rules and query conditions; the instruction module is used for fusing SQL grammar to generate a text big data semantic instruction analysis and query model; and the control module is used for recording the intermediate process variable and the temporary state of the query and feeding back the result.

In some embodiments of the invention, the above further comprises at least one memory for storing computer instructions, at least one processor in communication with the memory, wherein the at least one processor, when executing the computer instructions, causes the system to perform: the device comprises a specification module, an instruction module and a control module.

Compared with the prior art, the embodiment of the invention has at least the following advantages or beneficial effects:

firstly, the method is convenient and quick, a new semantic model of the query instruction is designed, the query model specification is established, and the most effective query mode can be quickly found to obtain the result.

Secondly, the method is accurate and reliable, and by refining semantics, the method standardizes a semantic model, eliminates the problems of redundancy, invalidity and the like in the query process, and improves the accuracy of the query process.

And thirdly, by configuring the query rule and the corresponding relation between the query rule and the query condition, the corresponding query rule can be extracted according to the query condition, and an optimized query scheme is constructed. The query scheme obtained in the way is suitable for different query conditions, and the query efficiency is ensured.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.

Fig. 1 is a schematic flowchart of a query processing method for text big data according to an embodiment of the present invention;

fig. 2 is a schematic flowchart of a method for querying and processing text big data according to an embodiment of the present invention;

fig. 3 is a schematic block diagram of a query processing system for text big data according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Some embodiments of the present application will be described in detail below with reference to the accompanying drawings. The embodiments described below and the individual features of the embodiments can be combined with one another without conflict.

Example 1

Referring to fig. 1, fig. 1 is a schematic flow chart of a method for querying and processing text big data according to an embodiment of the present invention, where fig. 1 is a schematic flow chart, and includes:

s100, acquiring text big data semantics;

specifically, the content entities identified and extracted from the text big data for performing instruction query are obtained.

In some embodiments, the text big data semantic model is subjected to normalized modification, so that the text big data semantic model meets the requirement that different query objects can be mutually converted. For example: the method can be applied to text classification through a machine learning method, the basic process is labeling, and a batch of texts are accurately classified by manpower to be used as a material for machine learning of a training set; training, wherein the computer excavates some rules capable of being effectively classified from the texts to generate a rule set summarized by the classifier; and classifying, namely applying the generated classifier to a text set to be classified to obtain a text classification result so as to obtain text big data semantics.

Step S110, configuring a query rule and a corresponding relation between the query rule and a query condition;

specifically, the query rule is a corresponding query granularity for different texts of the configuration system.

In some embodiments, the text may include time, type, parameters, etc., for example, one query rule may be: and configuring different time granularities corresponding to different attribute parameters, wherein the minimum granularity of data of the attribute SP number can be configured to be minutes, and the minimum granularity of data of the attribute personal number is configured to be days.

The number and time can be used as the query condition in the query condition, so that the query rule corresponding to the query number in the query condition is found through the corresponding relation between the query condition and the query rule, for example, if the query number of the query condition is an SP number, the corresponding query rule is queried by taking minutes as granularity; if the query number is a personal number, the corresponding query rule is to perform query with day as granularity.

Step S120, generating a text big data semantic instruction analysis and query model by fusing SQL grammar;

specifically, the underlying data is queried by generating a text big data semantic instruction analysis through the SQL grammar, so that query optimization is achieved.

In some embodiments, on the basis of realizing the function, the number of times of accessing the large text database is reduced; by searching parameters, the number of access lines to the table is reduced as much as possible, and a result set is minimized, so that the network burden is reduced; the operation can be separated and processed as much as possible, and the response speed of each time is improved; when the data window uses SQL, the used index is placed in the selected first column as much as possible; the algorithm structure is as simple as possible; in query, without using too many wildcards such as SELECT FROM T1 statements, several columns are used to SELECT them such as: SELECT COL1, COL2 FROMT 1; limiting as many rows of result sets as possible as: SELECT TOP 300COL1, COL2, COL3 FROM T1.

Step S130, recording the intermediate process variable and the temporary state of the query and feeding back the result.

Specifically, a specific query method model related in the workflow is evaluated and selected, and on the basis, a selection result of the query method model is verified by using the association model and the query case in the ontology library.

In some embodiments, intermediate process variables and temporary states of the record query may be parsed by SQL and corresponding parse trees generated. This process parser is mainly validated and parsed by grammar rules. Such as whether the wrong key is used in SQL or whether the order of the keys is correct, etc. The preprocessing will further check if the parse tree is legal according to the SQL rules. Such as checking whether a data table or a data column to be queried exists, etc.

Example 2

Referring to fig. 2, fig. 2 is a schematic flowchart illustrating a method for querying and processing text big data according to an embodiment of the present invention, including:

step S200, creating one or more query object semantic morphological change models related to text big data query analysis;

specifically, a semantic morphological change model of a query object in text big data is created, semantic analysis and identification are carried out, a content entity for instruction query is extracted, and the semantic description specifications with different levels and the conversion criteria thereof are provided

In some embodiments, semantic analysis refers to learning and understanding semantic content represented by a piece of text by using various methods, and any understanding of a language can be classified into the category of semantic analysis. A text segment is usually composed of words, sentences and paragraphs, and the semantic analysis can be further decomposed into vocabulary level semantic analysis, sentence level semantic analysis and chapter level semantic analysis according to different language units of the comprehension object. Vocabulary-level semantic analysis focuses on how to obtain or distinguish the semantics of words, sentence-level semantic analysis attempts to analyze the semantics expressed by the entire sentence, and chapter semantic analysis aims at studying the internal structure of natural language text and understanding text units, which may be semantic relationships between clauses or paragraphs of a sentence. The method aims to realize automatic semantic analysis of various language units including vocabularies, sentences, sections and the like by establishing an effective model and an effective system, so that the real semantics of the whole text expression can be understood, and a semantic morphological change model of a query object in text big data can be established.

Step S210, describing and standardizing the symmetry of each object through one or more levels of morphological change semantics;

specifically, the semantic description specifications and the conversion criteria of different levels that the user instructions need to be understood by the system should have describe the symmetry of each object of the specification.

Step S220, designing one or more levels of morphological change interconversion availability for one or more query objects so as to obtain semantic description;

specifically, the computation model corresponding to one or more instruction query methods should have different levels of semantic description specifications and conversion criteria, and the semantic description is obtained by mutually converting the morphological change into usability.

Step S230, setting a pre-query threshold and a query mark;

specifically, a high performance index is established for setting and querying thresholds and query tokens.

In some embodiments, the query optimizer also uses an index, such as a parameter in a where clause, to cause a full table scan. Because SQL will only resolve local variables at runtime, the optimizer cannot defer the selection of access plans until runtime; it must be selected at compile time. However, if the access plan is built at compile time, the values of the variables are not known and cannot be used as input for index selection. And when the data volume inquired this time reaches the pre-inquiry threshold value, returning the inquired data as the inquiry result to the client.

Step S240, when the inquired data amount reaches a pre-inquiry threshold value, feeding back the inquired data serving as the inquiry result, and marking the inquiry result through an inquiry mark;

in some embodiments, the data are sorted in a certain order, and when the data a is queried, the data are queried in the order, and when the data a is queried and reaches a pre-query threshold, the query is ended, and the data a is marked by a query mark.

Step S250, continuing to inquire according to the inquiry mark when inquiring according to the same inquiry scheme next time;

in some embodiments, at the beginning of the next query with the same query scheme, the query flag is found, the query continues from a data after the data a, and similarly, when the next query is completed, for example, the data B is queried, the data B is marked with the query flag, and so on, until all data are queried.

Step S260, configuring a text big data query rule into corresponding query granularities with different text big data attributes;

in some embodiments, given a standard question text corpus, the system needs to match the query entered by the user to the question that the user most wants. The query entered by the user is usually short text, and the standard question text big data is a closed set. Each standard question has a fixed answer and title, with multiple expanding interrogatories and keywords. The model needs to solve the problem of finding the most acceptable answer to the user in the standard question, given the query. In the scene, text matching is to calculate the similarity between query and the extended query to represent the similarity between query and standard question, and output topk as the matching result after sorting.

Step S270, constructing a query model through knowledge representation learning relationship path reasoning, query rule-based reasoning and SQL;

in some embodiments, learning the relational path inference through knowledge characterization is one class of embedded representation-based methods and one class of path-based methods. Most of the methods based on embedded representation implicitly utilize knowledge maps to guide the representation learning of user-object pairs, do not utilize the associated information between the user-object pairs, and lack reasoning ability. Whereas path-based approaches rely heavily on predefined meta-paths.

The method constructs the association between the user-article pairs through the knowledge graph, and provides explanation for user behaviors. The KPRN model learns the representation of the associated path through the LSTM, considers the sequence dependency generated between the entities and the relations and has reasoning capability. For example, the query Model may be constructed by using a distance Model (SE), a Single Layer Neural Network Model (SLM), an Energy Model (SME), a bilinear Model, a Tensor Neural Network Model (NTN), a matrix decomposition Model, a translation Model, and the like.

Step S280, selecting a query method model meeting the requirements of each link of the workflow;

in some embodiments, the real-time status detection is performed on the information such as the real-time status of the available resources, the use status of the query method model, and the progress of the query workflow, and the intermediate process variables and the temporary status of the query are recorded.

Step S290, constructing and combining candidate query method models of each link;

in some embodiments, the resource scheduling, workflow reconstruction, query method reselection, or query emergency restart, which occur in various situations, are controlled to ensure smooth execution of the query process, in order to improve the accuracy of result feedback.

Step S300, evaluating the confidence of the confidence system to one or more instructions through inquiring history;

in some embodiments, a policy based on result confidence ranking is implemented that evaluates the confidence of the confidence system for one or more instructions through query history.

Step S310, analyzing local variables through SQL to construct a query working chain;

in some embodiments, SQL parse local variables to build a query work chain may include from, from after which a look-up table identifies the data source to be queried by the statement, Cartesian collections, on filtering, adding external columns. A virtual table vt1 is generated after the from process; where generates virtual table vt 2; generating a virtual table vt3 by group by; different groups were filtered by having. Generating vt 4; selecting generates vt5, and calculates expression distint top; order by generates a vt6 table. The total loss of the current prediction is calculated using a cross entropy loss function. The negative of the logarithm of the probability of correct class prediction in each sample is calculated and the cross-entropy loss is the average of the values in the X and Y instances. Natural logarithm is an increasing function, so it is intuitive to define the loss function as a negative logarithm of the probability of correct class prediction. If the prediction probability of the correct class is high, the loss function will be low. Conversely, if the prediction probability of the correct class is low, the loss function value will be high, and the query work chain is constructed with this reduced risk of overfitting.

Step S320, calculating the result confidence of each workflow stage on the query workflow chain through a 95% confidence interval;

in some embodiments, from different aspects of timeliness, effectiveness, matching and the like, confidence values of different workflows and query method models are comprehensively calculated to characterize the identity and reliability of the workflows and the query method models, and meanwhile, the use probability of the workflows and the query method models in past history is utilized to predict confidence probability intervals of the workflows and the query method models. In this embodiment, the confidence interval may be 95%.

Step S330, feeding back the result with the highest confidence coefficient;

in some embodiments, all possible query work chains are formed for the query method models of different workflows and candidate workflow links, so that each work chain points to the represented instruction result; further, performing confidence calculation one by one aiming at each link of the query work chain, thereby forming the global confidence of the whole work chain and representing the reliability of the result fed back by the work chain; finally, the invention uses the confidence of the whole work chain to sort the different results and feeds the query result with the highest confidence back to the user.

Example 3

Referring to fig. 3, fig. 3 is a schematic diagram of a query processing system module for text big data according to an embodiment of the present invention, including: the standard module is used for acquiring text big data semantics; the query module is used for configuring query rules and the corresponding relation between the query rules and query conditions; the instruction module is used for fusing SQL grammar to generate a text big data semantic instruction analysis and query model; and the control module is used for recording the intermediate process variable and the temporary state of the query and feeding back the result.

Also included are a memory, a processor, and a communication interface, which are electrically connected, directly or indirectly, to each other to enable transmission or interaction of data. For example, the components may be electrically connected to each other via one or more communication buses or signal lines. The memory may be used to store software programs and modules, and the processor may execute various functional applications and data processing by executing the software programs and modules stored in the memory. The communication interface may be used for communicating signaling or data with other node devices.

The Memory may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like.

The processor may be an integrated circuit chip having signal processing capabilities. The Processor may be a general-purpose Processor including a Central Processing Unit (CPU), a Network Processor (NP), etc.; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.

It will be appreciated that the illustration of FIG. 3 is merely exemplary, and that a textual big data query processing system may also include more or fewer components than those illustrated in FIG. 3, or have a different configuration than that illustrated in FIG. 3. The components shown in fig. 3 may be implemented in hardware, software, or a combination thereof.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

To sum up, the query processing method and system for text big data provided by the embodiment of the present application can:

The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

It will be evident to those skilled in the art that the present application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.

Claims

1. A query processing method for text big data is characterized by comprising the following steps:

acquiring text big data semantics;

configuring a query rule and a corresponding relation between the query rule and a query condition;

generating a text big data semantic instruction analysis and query model by fusing SQL grammar;

and recording the intermediate process variable and the temporary state of the query and feeding back the result.

2. The method as claimed in claim 1, wherein the obtaining text big data semantics includes:

creating one or more query object semantic morphological change models involved in text big data query analysis;

the symmetry of each object is specified through one or more levels of morphological change semantic description;

one or more levels of morphometric change interconversion availability are designed for one or more query objects to obtain a semantic description.

3. The method as claimed in claim 1, wherein the configuring the query rule and the corresponding relationship between the query rule and the query condition includes:

and setting a pre-query threshold value and a query mark.

4. The method for processing query of text big data according to claim 3, further comprising:

when the inquired data amount reaches a pre-inquiry threshold value, feeding back the inquired data serving as the inquiry result, and marking the inquiry result through an inquiry mark;

and continuing to inquire according to the inquiry mark when the inquiry is carried out according to the same inquiry scheme next time.

5. The method for processing query of text big data according to claim 1, further comprising:

and configuring the text big data query rule into corresponding query granularity of different text big data attributes.

6. The method for processing query of text big data according to claim 1, further comprising:

and constructing a query model through knowledge representation learning relationship path reasoning, query rule-based reasoning and SQL.

7. The method as claimed in claim 1, wherein the recording of the intermediate process variables and the temporary state of the query and the feedback of the result comprises:

selecting a query method model meeting the requirements of each link of the workflow;

and constructing and combining candidate query method models of all links.

8. The method for processing query of text big data according to claim 7, further comprising:

evaluating the confidence of the confidence system to one or more instructions through the query history;

analyzing local variables through SQL to construct a query working chain;

calculating the result confidence of each workflow stage on the query workflow chain through the 95% confidence interval;

and feeding back the result with the highest confidence coefficient.

9. A query processing system for text big data, comprising:

the standard module is used for acquiring text big data semantics;

the query module is used for configuring query rules and the corresponding relation between the query rules and query conditions;

the instruction module is used for fusing SQL grammar to generate a text big data semantic instruction analysis and query model;

and the control module is used for recording the intermediate process variable and the temporary state of the query and feeding back the result.

10. The system for processing query for text big data according to claim 9, further comprising:

at least one memory for storing computer instructions;

at least one processor in communication with the memory, wherein the at least one processor, when executing the computer instructions, causes the system to perform: the device comprises a specification module, an instruction module and a control module.