CN116894046A - Data analysis method and device, electronic equipment and storage medium - Google Patents

Data analysis method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN116894046A
CN116894046A CN202310855658.XA CN202310855658A CN116894046A CN 116894046 A CN116894046 A CN 116894046A CN 202310855658 A CN202310855658 A CN 202310855658A CN 116894046 A CN116894046 A CN 116894046A
Authority
CN
China
Prior art keywords
data analysis
data
data structure
vector
feature vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310855658.XA
Other languages
Chinese (zh)
Inventor
阳成文
董慧珂
周斌
王志伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Shizhuang Information Technology Co ltd
Original Assignee
Shanghai Shizhuang Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Shizhuang Information Technology Co ltd filed Critical Shanghai Shizhuang Information Technology Co ltd
Priority to CN202310855658.XA priority Critical patent/CN116894046A/en
Publication of CN116894046A publication Critical patent/CN116894046A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0499Feedforward networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a data analysis method, a device, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring data structure information and demand content information; the data structure information comprises preset table structure keywords; the demand content information is information representing the analysis demand of the user data; respectively carrying out text feature processing on the data structure information and the required content information by utilizing an encoder in the conversion analysis model to obtain a data structure feature vector and a required content feature vector; carrying out data analysis on the data structure feature vector and the demand content feature vector by using a decoder in the conversion analysis model to obtain a data analysis statement; and obtaining a data analysis result according to the data analysis statement. Providing directionality for data analysis by acquiring table structure keywords in the data structure information; the content of the data structure information is focused more in the data analysis process, so that the matching degree of the data analysis result and the data analysis requirement is improved, and the accuracy of the data analysis is improved.

Description

Data analysis method and device, electronic equipment and storage medium
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a data analysis method, a data analysis device, an electronic device, and a storage medium.
Background
With the development of internet technology, more and more enterprises recognize the importance of data, and data analysis is applied to different businesses in each industry. The analysis and query requirements of enterprises on various data show the characteristics of diversity, dispersibility and disposability, so that the data analysis is mostly processed manually and needs to be matched by personnel with different roles. Such as business operators and data analysis specialists, result in inefficient data analysis.
Disclosure of Invention
The embodiment of the application aims at a data analysis method, a device, an electronic device and a storage medium, which can provide directivity for a data analysis result by acquiring a table structure keyword in data structure information, so that the accuracy of data analysis is improved, and a data analysis sentence is obtained by converting an analysis model, so that the data analysis is tooled, and the data analysis efficiency is improved.
In a first aspect, an embodiment of the present application provides a data analysis method, including: acquiring data structure information and demand content information; the data structure information comprises preset table structure keywords; the demand content information is information representing the analysis demand of the user data; respectively carrying out text feature processing on the data structure information and the required content information by utilizing an encoder in the conversion analysis model to obtain a data structure feature vector and a required content feature vector; carrying out data analysis on the data structure feature vector and the demand content feature vector by using a decoder in the conversion analysis model to obtain a data analysis statement; and obtaining a data analysis result according to the data analysis statement.
In the implementation process, directivity is provided for data analysis by acquiring the table structure keywords in the data structure information; the data analysis process is more focused on the content of the data structure information, and the matching degree of the data analysis result and the data analysis requirement is improved, so that the accuracy of data analysis is improved, the data analysis statement is obtained through converting the analysis model, the data analysis is made to be tool, and the data analysis efficiency is improved.
Optionally, in an embodiment of the application, the encoder includes a self-attention module and a feed-forward neural network; respectively performing text feature processing on the data structure information and the required content information by using an encoder in the conversion analysis model to obtain a data structure feature vector and a required content feature vector, wherein the method comprises the following steps: obtaining self-attention weight corresponding to each data structure feature vector through a self-attention module; and obtaining the data structure feature vector and the required content feature vector by utilizing a feature extraction function in the feedforward neural network according to the self-attention weight corresponding to the data structure feature vector.
In the implementation process, the encoder comprises a self-attention module, the long-distance dependence relationship between different positions in the input sequence can be captured in the encoder by utilizing a self-attention mechanism, and which positions are the most important to the current position is dynamically determined according to the data structure information and the required content information, so that more useful characteristic information is extracted, the representation capability and performance of a conversion analysis model are improved, and the accuracy of data analysis is improved.
Optionally, in an embodiment of the present application, the obtaining, by the self-attention module, a self-attention weight corresponding to each data structure feature vector includes: obtaining a calculation vector based on the data structure information; the calculation vector comprises a query vector and a key vector; and performing similarity calculation on the query vector and the key vector to obtain the self-attention weight.
In the implementation process, the similarity calculation is performed by using the query vector and the key vector, so that the self-attention weight is obtained, the conversion analysis model is helpful to concentrate on processing the most relevant information and adapt to specific contexts of different tasks, and the accuracy of data analysis is improved.
Optionally, in an embodiment of the present application, the calculation vector further includes a value vector; according to the self-attention weight corresponding to the data structure feature vector, the data structure feature vector and the required content feature vector are obtained by utilizing a feature extraction function in the feedforward neural network, and the method comprises the following steps: performing weighted summation calculation on the value vector and the self-attention weight to obtain a context vector; and performing text feature processing on the context vector by using a feature extraction function in the feedforward neural network to obtain a data structure feature vector and a required content feature vector.
In the implementation process, the context vector and the required content input vector are used as input elements of the feedforward neural network, and feature extraction is performed by using a feature extraction function in the feedforward neural network, so that the data structure feature vector and the required content feature vector with rich semantic information are obtained.
Optionally, in an embodiment of the present application, the decoder includes an attention module, and the data analysis module performs data analysis on the data structure feature vector and the required content feature vector by using the decoder in the conversion analysis model to obtain a data analysis statement, including: the attention module is utilized to obtain a time step output vector based on the data structure feature vector and the demand content feature vector; performing similarity calculation on the time step output vector and the data structure feature vector to obtain a matching degree; and obtaining the data analysis statement based on the time step output vector and the matching degree.
In the implementation described above, the attention module in the decoder may cause the decoder to dynamically focus on the portion of the input sequence that is relevant to the current output. In this way, the time step output vectors generated by the decoder at different time steps can be adjusted according to different parts of the input sequence, so that the expression capacity and the prediction accuracy of the model are improved, the data analysis statement output by the model is more focused on the content of the data structure information, and the accuracy of data analysis is improved.
Optionally, in an embodiment of the present application, obtaining a data analysis result according to a data analysis statement includes: acquiring configuration information; the configuration information includes a connection string; the configuration information is used for connecting a database corresponding to the data analysis statement; analyzing the connection character strings and connecting a database corresponding to the configuration information; and based on the data analysis statement, inquiring in a database to obtain a data analysis result.
In the implementation process, the connection character strings are analyzed, the databases corresponding to the configuration information are connected, and the databases are queried based on the data analysis statement to obtain the data analysis result, so that the data analysis tool is realized, and the data analysis efficiency is improved.
Optionally, in an embodiment of the present application, the method further includes: according to the difference between the data analysis statement and the preset statement, the contribution degree of each model parameter in the conversion analysis model is obtained; and carrying out parameter adjustment on the conversion analysis model based on the contribution degree of each model parameter.
In the implementation process, the contribution degree of each model parameter in the conversion analysis model is obtained through the data analysis statement, and the conversion analysis model is subjected to parameter adjustment based on the contribution degree of each model parameter, so that the model can better predict new data, the performance of the model is improved, and the accuracy of data analysis is further improved.
In a second aspect, an embodiment of the present application further provides a data analysis apparatus, including: the acquisition module is used for acquiring the data structure information and the required content information; the data structure information comprises preset table structure keywords; the demand content information is information representing the analysis demand of the user data; the text feature processing module is used for respectively carrying out text feature processing on the data structure information and the required content information by utilizing an encoder in the conversion analysis model to obtain a data structure feature vector and a required content feature vector; the analysis module is used for carrying out data analysis on the data structure feature vector and the demand content feature vector by utilizing a decoder in the conversion analysis model to obtain a data analysis statement; and the obtaining result module is used for obtaining the data analysis result according to the data analysis statement.
In a third aspect, an embodiment of the present application further provides an electronic device, including: a processor and a memory storing machine-readable instructions executable by the processor to perform the method as described above when executed by the processor.
In a fourth aspect, embodiments of the present application also provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the method described above.
The data analysis method, the device, the electronic equipment and the storage medium provide directivity for data analysis by acquiring the table structure keywords in the data structure information, namely the data analysis result pays more attention to the content of the data structure information, so that the accuracy of data analysis is improved, and the data analysis statement is obtained by converting the analysis model, so that the data analysis is toolized, and the data analysis efficiency is improved. And the attention mechanism is utilized to dynamically determine which positions are most important to the current position according to the data structure information and the required content information, so that more useful information is extracted, the representation capacity and performance of the conversion analysis model are improved, and the accuracy of data analysis is further improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and should not be considered as limiting the scope, and other related drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic flow chart of a data analysis method according to an embodiment of the present application;
fig. 2 is a schematic structural diagram of a conversion analysis model according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a parallel arrangement of a plurality of encoders and a plurality of decoders according to an embodiment of the present application;
FIG. 4 is a schematic diagram of an analysis result output flow provided by an embodiment of the present application;
FIG. 5 is a schematic diagram of a data analysis execution flow provided in an embodiment of the present application
Fig. 6 is a schematic structural diagram of a data analysis device according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
Embodiments of the technical scheme of the present application will be described in detail below with reference to the accompanying drawings. The following examples are only for more clearly illustrating the technical aspects of the present application, and thus are merely examples, and are not intended to limit the scope of the present application.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application.
In the description of embodiments of the present application, the technical terms "first," "second," and the like are used merely to distinguish between different objects and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated, a particular order or a primary or secondary relationship. In the description of the embodiments of the present application, the meaning of "plurality" is two or more unless otherwise specifically defined.
The data analysis is mainly used for determining data analysis requirements according to different business scenes, writing database sentences according to the data analysis requirements, and inquiring in the corresponding database to obtain data analysis results.
In the following, the prior art data analysis is illustrated, where the data analysis requirements are typically determined by the service operators of different service scenarios, where the data analysis requirements are typically natural text, such as "help me query users logged in within seven days and screen them for users with order deals greater than 1000".
However, the operation process of the database is not clear for the service operator, the data analysis requirement is required to be given to a professional data analysis technician, the data analysis technician constructs a data query statement according to the data analysis requirement, determines the database to be connected according to the requirement analysis, and queries the data analysis result in the corresponding database. The process needs personnel with different roles to cooperate, the labor cost is high, and the data analysis efficiency is low.
In addition, some semantic analysis tools in the prior art are used for performing semantic analysis on data analysis demands, extracting key information therein, and further constructing data query sentences for query, but the semantic analysis tools cannot identify more common but more complex question methods, for example, the data analysis demands are longer, the demands are more, and the semantic analysis tools cannot determine the group referred to by the data analysis tools, so that the accuracy of the data analysis is lower.
The application provides a data analysis method, a data analysis device, electronic equipment and a storage medium, wherein directivity is provided for data analysis by acquiring table structure keywords in data structure information, so that accuracy of the data analysis is improved, data analysis sentences are obtained by converting an analysis model, data analysis is realized, and data analysis efficiency is improved.
Please refer to fig. 1, which illustrates a flow chart of a data analysis method according to an embodiment of the present application. The data analysis method provided by the embodiment of the application can be applied to electronic equipment, and the electronic equipment can comprise a terminal and a server; the terminal can be a smart phone, a tablet computer, a personal digital assistant (Personal Digital Assitant, PDA) and the like; the server may be an application server or a Web server. The data analysis method may include:
Step S110: acquiring data structure information and demand content information; the data structure information comprises preset table structure keywords; the demand content information is information characterizing the demand of the user data analysis.
Step S120: and respectively carrying out text feature processing on the data structure information and the required content information by utilizing an encoder in the conversion analysis model to obtain a data structure feature vector and a required content feature vector.
Step S130: and carrying out data analysis on the data structure feature vector and the demand content feature vector by using a decoder in the conversion analysis model to obtain a data analysis statement.
Step S140: and obtaining a data analysis result according to the data analysis statement.
In step S110, the data structure information includes a preset table structure keyword, and the requirement content information is information characterizing the requirement of user data analysis. The required content information is determined according to the service requirement, and the form of the required content information can be a natural text; the data structure information is preset according to the required content information, and the data structure information and the required content information are independent and related data.
By way of example, the data structure information may include table names, column names of tables, notes, field data types, lengths, and/or primary keys, etc.; if the required content information is 'help me inquire about the users logged in within seven days', and select the users with the order transaction amount larger than 1000 from the users. According to the requirement content information, it can be determined that the list structure keyword in the data structure information can be specifically a user list.
If the data analysis is performed only according to the required content information, the query process has no directivity, and the data queried according to the required content information is very much. For example, in the above example, since the required content information includes keywords such as login and user, the user table, the user history table or the user registry may be queried in the process of data analysis without the data structure information. In practice, however, the data actually needed by the user is in the user table, which causes a great deal of invalid work for the subsequent analysis operation, and other related tables can be directly queried, so that the accuracy of data analysis is reduced while the efficiency of data analysis is reduced.
As an embodiment, the above-mentioned manner of acquiring the data structure information and the required content information may be implemented by providing a front-end page input layer and a back-end service receiving layer of the computer. An interactive interface is provided for the user through the B/S architecture (browser and server architecture modes), such as a UI interface on the browser web for the user to upload data structure information and desired content information. When the front end of the computer receives information, the data is transmitted to the server through the form of the api interface of the server.
In step S120, after receiving the data structure information and the required content information, the server invokes the conversion analysis model to analyze the data. The conversion analysis model comprises a decoder and an encoder, and is used for converting the data structure information and the required content information input by the user into corresponding data analysis sentences. The data structure information and the required content information are respectively encoded through an encoder to obtain corresponding data structure feature vectors and required content feature vectors, wherein the feature vectors are a mathematical representation mode. The encoder is used for generating a data analysis statement according to the data structure feature vector and the required content feature vector.
Specifically, for example, in the process of vector characterization of the data structure information and the required content information, in order to fully embody the directionality of the data structure information on the data analysis result, the weight of the feature vector can be determined by using the data structure information, so that the data to be analyzed and the data analysis result always conform to the feature information of the data structure information, and the accuracy of data analysis is improved. A specific embodiment may be to introduce an attention mechanism by adding a self-attention module or an attention module to the conversion analysis model, so that the decoder dynamically focuses on the part of the input sequence related to the current output to generate an accurate output.
In step S130, the decoder in the transformation analysis model may perform data analysis on the encoded data structure feature vector and the demand content feature vector to obtain a data analysis statement. Because the sequence to be decoded comprises the data structure feature vector, the decoder can automatically adjust the attention degree of different inputs according to different parts of the input sequence, so that the output data analysis statement is more accurate.
In step S140, according to the data analysis statement, a connection is established between the data analysis statement and the data resource, and the data analysis result is obtained from the data resource, where the data resource may be various databases, such as a relational database, a non-relational database, or a data warehouse. The data analysis result is the data information inquired in the data resource.
In the implementation process, directivity is provided for data analysis by acquiring the table structure keywords in the data structure information; the data analysis process is more focused on the content of the data structure information, and the matching degree of the data analysis result and the data analysis requirement is improved, so that the accuracy of data analysis is improved, the data analysis statement is obtained through converting the analysis model, the data analysis is made to be tool, and the data analysis efficiency is improved.
Optionally, in a further aspect of the above embodiment, the encoder includes a self-attention module and a feed-forward neural network; respectively performing text feature processing on the data structure information and the required content information by using an encoder in the conversion analysis model to obtain a data structure feature vector and a required content feature vector, wherein the method comprises the following steps: obtaining self-attention weight corresponding to each data structure feature vector through a self-attention module; and obtaining the data structure feature vector and the required content feature vector by utilizing a feature extraction function in the feedforward neural network according to the self-attention weight corresponding to the data structure feature vector.
In the specific implementation process: the encoder includes a self-attention module and a feedforward neural network; the attention module is used in the encoder to assign different weights to each element in the input sequence, and can capture the dependency relationship between different positions in the data structure information and the required content information, so as to better capture important information, and the data structure information has directivity to the data analysis result.
The step of obtaining the feature vector by the encoder is described below by way of example, and first, the data structure information and the required content information are input sequence-expressed as a data structure input sequence and a required content input sequence, respectively. The specific manner may be to perform word embedding or character embedding operations on the input semantic sequence, so as to obtain a set of vector representations, for example, word2vec or GloVe may be used to generate these vectors.
For vectors in the input sequence, corresponding calculation vectors are obtained, which may include query vectors, key vectors, and/or value vectors. The calculation vector is used to capture information for each position in the input sequence.
And calculating the attention score of each data structure feature vector by using the calculation vector, wherein the attention score reflects the relevance of each query vector and all key vectors. And obtaining the self-attention weight corresponding to each data structure feature vector based on the attention score calculation.
After the self-attention weight is obtained, the contribution of each element in the input sequence to the current query is obtained, and the feature extraction function in the feedforward neural network is utilized to perform feature extraction according to the weights of different input sequences, so as to obtain the data structure feature vector and the required content feature vector.
The self-attention mechanism may calculate the attention between different locations in parallel without requiring sequential stepwise processing as in a conventional Recurrent Neural Network (RNN). This enables the self-attention mechanism to efficiently encode long sequences and has low computational complexity, speeding up the training and reasoning of the model.
In the implementation process, the encoder comprises a self-attention module, a long-distance dependency relationship between different positions in an input sequence can be captured in the encoder by utilizing a self-attention mechanism, and which positions are the most important to the current position is dynamically determined according to the data structure information and the required content information, so that more useful information is extracted, the representation capability and performance of a conversion analysis model are improved, and the accuracy of data analysis is further improved.
Optionally, in an embodiment of the present application, the obtaining, by the self-attention module, a self-attention weight corresponding to each data structure feature vector includes: obtaining a calculation vector based on the data structure information; the calculation vector comprises a query vector and a key vector; and performing similarity calculation on the query vector and the key vector to obtain the self-attention weight.
In the specific implementation process: the input sequence passes through a layer of fully connected neural network and is mapped to a group of eigenvectors, namely calculation vectors, including key vectors (keys), query vectors (queries) and value vectors (values).
And carrying out similarity calculation on the query vector and the key vector to obtain an attention score, wherein the attention score represents the association degree between the query vector and the key vector. The similarity may be calculated as a dot product or other measure. And normalizing the attention score to obtain self-attention weight, wherein the self-attention weight represents the importance of each element. The normalization process may be implemented by a softmax function.
In the implementation process, the similarity calculation is performed by using the query vector and the key vector, so that the self-attention weight is obtained, the conversion analysis model is helpful to concentrate on processing the most relevant information and adapt to specific contexts of different tasks, and the accuracy of data analysis is improved.
Optionally, in an embodiment of the present application, the calculation vector further includes a value vector; according to the self-attention weight corresponding to the data structure feature vector, the data structure feature vector and the required content feature vector are obtained by utilizing a feature extraction function in the feedforward neural network, and the method comprises the following steps: performing weighted summation calculation on the value vector and the self-attention weight to obtain a context vector; and performing text feature processing on the context vector by using a feature extraction function in the feedforward neural network to obtain a data structure feature vector and a required content feature vector.
In the specific implementation process: the self-attention weight is weighted summed with the value vector to obtain a context vector that includes contributions of elements in the input sequence to the current query. And taking the context vector and the required content input vector as input elements of the feedforward neural network, and carrying out feature extraction on the input sequence by utilizing a feature extraction function in the feedforward neural network to obtain a data structure feature vector and a required content feature vector.
The feature extraction process is specifically, for example, for each input element, the feedforward neural network will obtain a new representation of its feature vector through a series of nonlinear transforms and activation functions, where the nonlinear transforms may include multiple fully connected layers and activation functions, and so on. The expressive power of the model can be increased by stacking multiple hidden layers, the output of each hidden layer serving as the input of the next layer, ultimately outputting a data structure feature vector and a required content feature vector that contain higher level semantic information of the input sequence than the input sequence.
In the implementation process, the context vector and the required content input vector are used as input elements of the feedforward neural network, and feature extraction is performed by using a feature extraction function in the feedforward neural network, so that the data structure feature vector and the required content feature vector with rich semantic information are obtained.
Optionally, in an embodiment of the present application, the decoder includes an attention module, and the data analysis module performs data analysis on the data structure feature vector and the required content feature vector by using the decoder in the conversion analysis model to obtain a data analysis statement, including: the attention module is utilized to obtain a time step output vector based on the data structure feature vector and the demand content feature vector; performing similarity calculation on the time step output vector and the data structure feature vector to obtain a matching degree; and obtaining the data analysis statement based on the time step output vector and the matching degree.
In the specific implementation process: the attention module in the decoder is used for determining the importance of the current decoder step according to the input characteristic vector and the decoder hiding state of the last step, and the output of the decoder step is the time step output vector. And inputting the data structure characteristic vector and the demand content characteristic vector into a decoder to obtain a time step output vector. The time-step output vector includes decoder hidden states, an internal state maintained by the decoder hidden states in generating the output sequence. The time-step output vector is the result of the decoder computing and updating from the input sequence and the generated partial output elements in order to correctly generate the next output element.
Performing similarity calculation on the time step output vector and the data structure feature vector to obtain a matching degree; the degree of matching is used to determine which parts of the encoder output vector are more relevant to the current output, the larger the degree of matching value is, the larger the correlation of the time step output vector is represented, the smaller the degree of matching value is, and the smaller the correlation of the time step output vector is represented. That is, the larger the matching degree value, the greater the contribution of the time-step output vector to the decoder output result. Among other things, the similarity calculation method may include dot product attention (dot product attention), scaled dot product attention (scaled dot product attention), bilinear attention (bilinear attention), and the like.
Based on the time step output vector and the matching degree, a data analysis statement is obtained, specifically for example: and carrying out weighted summation on the matching degree and the time step output vector to obtain a weighted vector. The weight vector is a weighted representation of the encoder output, representing the portion of the time step output vector that should be of interest to the decoder. And normalizing the weighted vectors, and setting the sum of the attention weights to be 1 to obtain the attention weight vector. And fusing the time step output vector and the normalized attention weight vector to obtain a fused feature vector, and finally inputting the fused feature vector into a decoder to obtain the data analysis statement. The fusion mode can comprise vector connection or weighted addition.
An attention module in the decoder may cause the decoder to dynamically focus on portions of the input sequence that are related to the current output. In this way, the time step output vectors generated by the decoder at different time steps can be adjusted according to different parts of the input sequence, so that the expression capacity and the prediction accuracy of the model are improved.
In the implementation described above, the attention module in the decoder may cause the decoder to dynamically focus on the portion of the input sequence that is relevant to the current output. In this way, the time step output vectors generated by the decoder at different time steps can be adjusted according to different parts of the input sequence, so that the expression capacity and the prediction accuracy of the model are improved, the data analysis statement output by the model is more focused on the content of the data structure information, and the accuracy of data analysis is improved.
Optionally, in an embodiment of the present application, obtaining a data analysis result according to a data analysis statement includes: acquiring configuration information; the configuration information includes a connection string; the configuration information is used for connecting a database corresponding to the data analysis statement; analyzing the connection character strings and connecting a database corresponding to the configuration information; and based on the data analysis statement, inquiring in a database to obtain a data analysis result.
In the specific implementation process: the configuration information is preset and stored, and the configuration information comprises a connection character string, and the connection character string can comprise a database type, a server address, a database host name, a database password, a database library name and the like.
As an embodiment, the development of the data call module may be performed on a computer program in a data driven collection manner. The computer data driver, also known as SQLDriver, is part of the ODBC driver, a protocol for interfacing with the database. The computer application calls the SQLDriverConnect function to connect the databases.
And transmitting the connection character string to the SQLDriverConnect function as a parameter, wherein the parameter of the function comprises information such as a connected database type, a server address, a user name, a password and the like. And analyzing the connection character string through the ODBC driver, and establishing connection with the database server according to the information in the connection character string.
Queries in the database based on the data analysis statement, for example, include data queries, data inserts, data updates, or delete data. These data analysis statements are converted into a format that can be understood by the database server by the ODBC driver and sent to the database server for execution. The database server returns an execution result, converts the execution result into a format which can be understood by the application program, namely a data analysis result by using the ODBC driver, and returns the data analysis result to the application program.
In an alternative embodiment, if an anomaly occurs in the data analysis, data anomaly information returned by the database server is received. The step of data analysis may be re-performed based on the data anomaly information until the data analysis is completed, obtaining a data analysis result.
In the implementation process, the connection character strings are analyzed, the databases corresponding to the configuration information are connected, and the databases are queried based on the data analysis statement to obtain the data analysis result, so that the data analysis tool is realized, and the data analysis efficiency is improved.
In an alternative embodiment, after the data analysis result is obtained, the report data may also be visualized and displayed through a chart and a graph.
Optionally, in an embodiment of the present application, the method further includes: according to the difference between the data analysis statement and the preset statement, the contribution degree of each model parameter in the conversion analysis model is obtained; and carrying out parameter adjustment on the conversion analysis model based on the contribution degree of each model parameter.
In the specific implementation process: the present embodiment uses the output of the model to adjust the model itself, which aims to enable the model to better predict new data. One way to implement the back feed model is to use a back propagation algorithm. The method can enable the model to automatically adjust parameters according to the output result, so that the model output is more accurate.
The back propagation algorithm determines the contribution of each parameter to the loss function by calculating the gap between the model output and the actual data (i.e., the loss function). Specifically, for example, the preset sentence is a desired output corresponding to preset data structure information and demand content information; and calculating the difference between the data analysis statement and the preset statement to obtain the contribution degree of each model parameter in the conversion analysis model. Each parameter in the conversion analytical model is then updated according to the contributions so that the conversion analytical model output result at the next iteration is closer to the actual data.
In addition to the back-propagation algorithm, there are other methods that can be used for the back-feeding model. For example, genetic algorithms, bayesian optimization, or reinforcement learning methods may be used. These methods can all help fine-tune the model by feeding back the output of the model into the model. It should be noted that different methods may perform differently for different tasks and data sets.
The evaluation function is also an important component of the back-feed model. The evaluation function may help us measure the prediction accuracy of the model and help us determine which aspects need improvement. For example, we can evaluate the performance of the model using the indices of accuracy, precision, recall, etc. In fine tuning the model, we can decide which actions should be taken to improve the model based on these metrics.
In the implementation process, the contribution degree of each model parameter in the conversion analysis model is obtained through the data analysis statement, and the conversion analysis model is subjected to parameter adjustment based on the contribution degree of each model parameter, so that the model can better predict new data, the performance of the model is improved, and the accuracy of data analysis is further improved.
Please refer to fig. 2, which illustrates a schematic structural diagram of a conversion analysis model according to an embodiment of the present application. The encoder in the conversion analysis model comprises a feedforward neural network and a self-attention module; the decoder includes a feed-forward neural network, an attention module, and a self-attention module.
Please refer to fig. 3, which illustrates a schematic diagram of a parallel arrangement of a plurality of encoders and a plurality of decoders according to an embodiment of the present application.
In an alternative embodiment, the transformation resolution model includes a plurality of encoders and a plurality of decoders corresponding to the plurality of encoders; a plurality of encoders and a plurality of decoders are arranged in parallel; the conversion analysis model is obtained through the training of the supervised training data and the unsupervised training data.
In the specific implementation process: as shown in FIG. 3, the conversion resolution model may be a Meta-NLLB model, which is a neural network model that is capable of training multiple encoders and multiple decoders simultaneously. In this model, both the input and the output are processed by a plurality of encoders and a plurality of decoders, respectively, and then combined.
In the case of a parallel arrangement, since all encoders and decoders are operated independently, they can be operated simultaneously. In actual operation, a parallel computing framework (e.g., spark) may be used to distribute multiple encoders and decoders to different computing nodes for processing to improve operating efficiency.
Please refer to fig. 4, which illustrates a flowchart of analysis result output provided by an embodiment of the present application.
The service operator reports service analysis requirements according to the form, wherein the service analysis requirements comprise data structure information and requirement content information. And extracting the characteristics of the data structure information and the required content information, and extracting the data entity information and the data source type. And carrying out data analysis on the feature vectors of the data structure and the feature vectors of the required content after feature extraction by using a data analysis model to generate data analysis sentences. Then carrying out grammar error correction on the data analysis statement, and regenerating if the grammar is wrong; and if the grammar is correct, outputting a data analysis result, and finally, finishing and summarizing the data analysis result by a service operator to form a visual analysis report.
Please refer to fig. 5, which illustrates a flow chart of data analysis execution provided in an embodiment of the present application.
In an alternative embodiment, a user inputs data structure information and required content information through an information input page, and after receiving the data structure information and the required content information, the user performs feature extraction on the data structure information and the required content information to obtain a data structure feature vector and a required content feature vector.
And carrying out data analysis on the data structure feature vector and the demand content feature vector by using a conversion analysis model to obtain a data analysis statement. After the data analysis statement is obtained, the conversion analysis model can be trained in real time by using the data analysis statement as training data.
And according to the data analysis statement, obtaining a data analysis result, judging whether the execution is successful or not and whether the abnormality occurs or not according to the data analysis result, and sending the execution information to business personnel for analysis feedback.
Please refer to fig. 6, which illustrates a schematic structure of a data analysis device according to an embodiment of the present application; the embodiment of the application provides a data analysis device 200, which comprises:
an obtaining module 210, configured to obtain data structure information and required content information; the data structure information comprises preset table structure keywords; the demand content information is information representing the analysis demand of the user data;
The text feature processing module 220 is configured to perform text feature processing on the data structure information and the required content information by using an encoder in the conversion analysis model, so as to obtain a data structure feature vector and a required content feature vector;
the parsing module 230 is configured to parse the data structure feature vector and the required content feature vector by using a decoder in the transformation parsing model to obtain a data analysis statement;
the obtaining result module 240 is configured to obtain a data analysis result according to the data analysis statement.
Optionally, in an embodiment of the application, the data analysis device, the encoder comprises a self-attention module and a feed-forward neural network; the text feature processing module 220 is specifically configured to obtain, through the self-attention module, a self-attention weight corresponding to each data structure feature vector; and obtaining the data structure feature vector and the required content feature vector by utilizing a feature extraction function in the feedforward neural network according to the self-attention weight corresponding to the data structure feature vector.
Optionally, in an embodiment of the present application, the data analysis device, the text feature processing module 220 is further configured to obtain a calculation vector based on the data structure information; the calculation vector comprises a query vector and a key vector; and performing similarity calculation on the query vector and the key vector to obtain the self-attention weight.
Optionally, in an embodiment of the present application, the data analysis device, the calculation vector further includes a value vector; the text feature processing module 220 is further configured to perform weighted summation calculation on the value vector and the self-attention weight to obtain a context vector; and performing text feature processing on the context vector by using a feature extraction function in the feedforward neural network to obtain a data structure feature vector and a required content feature vector.
Optionally, in an embodiment of the present application, the data analysis device, the decoder includes an attention module, and the parsing module 230 is specifically configured to obtain, by using the attention module, a time step output vector based on the data structure feature vector and the required content feature vector; performing similarity calculation on the time step output vector and the data structure feature vector to obtain a matching degree; and obtaining the data analysis statement based on the time step output vector and the matching degree.
Optionally, in the embodiment of the present application, the data analysis device obtains the result module 240, which is specifically configured to obtain the configuration information; the configuration information includes a connection string; the configuration information is used for connecting a database corresponding to the data analysis statement; analyzing the connection character strings and connecting a database corresponding to the configuration information; and based on the data analysis statement, inquiring in a database to obtain a data analysis result.
Optionally, in the embodiment of the present application, the data analysis device and the model adjustment module are configured to obtain a contribution degree of each model parameter in the conversion analysis model according to a gap between the data analysis statement and the preset statement; and carrying out parameter adjustment on the conversion analysis model based on the contribution degree of each model parameter.
It should be understood that, corresponding to the above-mentioned data analysis method embodiment, the apparatus can perform the steps related to the above-mentioned method embodiment, and specific functions of the apparatus may be referred to the above description, and detailed descriptions are omitted herein as appropriate to avoid redundancy. The device includes at least one software functional module that can be stored in memory in the form of software or firmware (firmware) or cured in an Operating System (OS) of the device.
Please refer to fig. 7, which illustrates a schematic structural diagram of an electronic device according to an embodiment of the present application. An electronic device 300 provided in an embodiment of the present application includes: a processor 310 and a memory 320, the memory 320 storing machine-readable instructions executable by the processor 310, which when executed by the processor 310 perform the method as described above.
The embodiment of the application also provides a storage medium, wherein a computer program is stored on the storage medium, and the computer program is executed by a processor to execute the method.
The storage medium may be implemented by any type of volatile or nonvolatile Memory device or combination thereof, such as static random access Memory (Static Random Access Memory, SRAM), electrically erasable Programmable Read-Only Memory (Electrically Erasable Programmable Read-Only Memory, EEPROM), erasable Programmable Read-Only Memory (Erasable Programmable Read Only Memory, EPROM), programmable Read-Only Memory (PROM), read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic disk, or optical disk.
In the embodiments of the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. The apparatus embodiments described above are merely illustrative, for example, of the flowcharts and block diagrams in the figures that illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, the functional modules in the embodiments of the present application may be integrated together to form a single part, or each module may exist alone, or two or more modules may be integrated to form a single part.
The foregoing description is merely an optional implementation of the embodiment of the present application, but the scope of the embodiment of the present application is not limited thereto, and any person skilled in the art may easily think about changes or substitutions within the technical scope of the embodiment of the present application, and the changes or substitutions are covered by the scope of the embodiment of the present application.

Claims (10)

1. A method of data analysis, comprising:
acquiring data structure information and demand content information; the data structure information comprises preset table structure keywords; the required content information is information representing the analysis requirement of the user data;
respectively carrying out text feature processing on the data structure information and the required content information by using an encoder in a conversion analysis model to obtain a data structure feature vector and a required content feature vector;
carrying out data analysis on the data structure feature vector and the required content feature vector by using a decoder in the conversion analysis model to obtain a data analysis statement;
And obtaining a data analysis result according to the data analysis statement.
2. The method of claim 1, wherein the encoder comprises a self-attention module and a feed-forward neural network; the text feature processing is performed on the data structure information and the required content information by using an encoder in a conversion analysis model, so as to obtain a data structure feature vector and a required content feature vector, which comprises the following steps:
obtaining self-attention weights corresponding to the feature vectors of each data structure through the self-attention module;
and obtaining the data structure feature vector and the required content feature vector by utilizing a feature extraction function in the feedforward neural network according to the self-attention weight corresponding to the data structure feature vector.
3. The method according to claim 2, wherein the obtaining, by the self-attention module, the self-attention weight corresponding to each of the data structure feature vectors includes:
obtaining a calculation vector based on the data structure information; the calculation vector comprises a query vector and a key vector;
and carrying out similarity calculation on the query vector and the key vector to obtain the self-attention weight.
4. A method according to claim 3, wherein the calculated vector further comprises a value vector; the obtaining the data structure feature vector and the demand content feature vector according to the self-attention weight corresponding to the data structure feature vector by using a feature extraction function in the feedforward neural network includes:
performing weighted summation calculation on the value vector and the self-attention weight to obtain a context vector;
and performing text feature processing on the context vector by using a feature extraction function in the feedforward neural network to obtain the data structure feature vector and the required content feature vector.
5. The method of claim 1, wherein the decoder includes an attention module, the parsing the data structure feature vector and the desired content feature vector with a decoder in the transformation parsing model to obtain a data analysis statement, comprising:
obtaining a time step output vector based on the data structure feature vector and the required content feature vector by using the attention module;
performing similarity calculation on the time step output vector and the data structure feature vector to obtain matching degree;
And obtaining the data analysis statement based on the time step output vector and the matching degree.
6. The method of claim 1, wherein obtaining data analysis results from the data analysis statement comprises:
acquiring configuration information; the configuration information comprises a connection character string; the configuration information is used for connecting a database corresponding to the data analysis statement;
analyzing the connection character strings and connecting the connection character strings with a database corresponding to the configuration information;
and inquiring in the database based on the data analysis statement to obtain the data analysis result.
7. The method according to any one of claims 1-6, further comprising:
according to the difference between the data analysis statement and a preset statement, the contribution degree of each model parameter in the conversion analysis model is obtained;
and carrying out parameter adjustment on the conversion analysis model based on the contribution degree of each model parameter.
8. A data analysis device, comprising:
the acquisition module is used for acquiring the data structure information and the required content information; the data structure information comprises preset table structure keywords; the required content information is information representing the analysis requirement of the user data;
The text feature processing module is used for respectively carrying out text feature processing on the data structure information and the required content information by utilizing an encoder in the conversion analysis model to obtain a data structure feature vector and a required content feature vector;
the analysis module is used for carrying out data analysis on the data structure feature vector and the required content feature vector by utilizing a decoder in the conversion analysis model to obtain a data analysis statement;
and the obtaining result module is used for obtaining a data analysis result according to the data analysis statement.
9. An electronic device, comprising: a processor and a memory storing machine-readable instructions executable by the processor to perform the method of any one of claims 1 to 7 when executed by the processor.
10. A computer-readable storage medium, characterized in that it has stored thereon a computer program which, when executed by a processor, performs the method according to any of claims 1 to 7.
CN202310855658.XA 2023-07-12 2023-07-12 Data analysis method and device, electronic equipment and storage medium Pending CN116894046A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310855658.XA CN116894046A (en) 2023-07-12 2023-07-12 Data analysis method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310855658.XA CN116894046A (en) 2023-07-12 2023-07-12 Data analysis method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116894046A true CN116894046A (en) 2023-10-17

Family

ID=88314489

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310855658.XA Pending CN116894046A (en) 2023-07-12 2023-07-12 Data analysis method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116894046A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117155712A (en) * 2023-10-31 2023-12-01 北京晶未科技有限公司 Method for constructing data analysis tool for information security and electronic equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117155712A (en) * 2023-10-31 2023-12-01 北京晶未科技有限公司 Method for constructing data analysis tool for information security and electronic equipment
CN117155712B (en) * 2023-10-31 2024-02-06 北京晶未科技有限公司 Method for constructing data analysis tool for information security and electronic equipment

Similar Documents

Publication Publication Date Title
CN111046152B (en) Automatic FAQ question-answer pair construction method and device, computer equipment and storage medium
CN110704588A (en) Multi-round dialogue semantic analysis method and system based on long-term and short-term memory network
CN111061847A (en) Dialogue generation and corpus expansion method and device, computer equipment and storage medium
EP3901788A2 (en) Conversation-based recommending method, conversation-based recommending apparatus, and device
US20210182680A1 (en) Processing sequential interaction data
CN112307168B (en) Artificial intelligence-based inquiry session processing method and device and computer equipment
CN110162681B (en) Text recognition method, text processing method, text recognition device, text processing device, computer equipment and storage medium
US11080563B2 (en) System and method for enrichment of OCR-extracted data
WO2021044908A1 (en) Translation device, translation method, and program
US20160275444A1 (en) Procurement System
CN111680165B (en) Information matching method and device, readable storage medium and electronic equipment
CN114840671A (en) Dialogue generation method, model training method, device, equipment and medium
CN111310440A (en) Text error correction method, device and system
CN111639247A (en) Method, apparatus, device and computer-readable storage medium for evaluating quality of review
CN111241850B (en) Method and device for providing business model
CN116894046A (en) Data analysis method and device, electronic equipment and storage medium
US11947578B2 (en) Method for retrieving multi-turn dialogue, storage medium, and electronic device
US20170034111A1 (en) Method and Apparatus for Determining Key Social Information
CN111767833A (en) Model generation method and device, electronic equipment and storage medium
CN110472025B (en) Method, device, computer equipment and storage medium for processing session information
CN113223502B (en) Speech recognition system optimization method, device, equipment and readable storage medium
CN117591659A (en) Information processing method, device, equipment and medium based on ChatGLM operation and maintenance scene
CN113761845A (en) Text generation method and device, storage medium and electronic equipment
CN116703509A (en) Online shopping assistant construction method for live marketing commodity quality perception analysis
US20230128200A1 (en) Long-range modeling of source code files by syntax hierarchy

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination