CN114116771A

CN114116771A - Voice control data analysis method and device, terminal equipment and storage medium

Info

Publication number: CN114116771A
Application number: CN202111437927.8A
Authority: CN
Inventors: 李保雷; 孙旭东; 宋占亮
Original assignee: If Technology Co Ltd
Current assignee: If Technology Co Ltd
Priority date: 2021-11-29
Filing date: 2021-11-29
Publication date: 2022-03-01

Abstract

The application is applicable to the technical field of voice control, and provides a voice control data analysis method, a voice control data analysis device, terminal equipment and a storage medium. Converting the voice information into text information, and filtering the text information to obtain a vocabulary set; bringing the vector corresponding to each vocabulary in the vocabulary set into a preset model to obtain a first output result; substituting vectors corresponding to the corpora in the corpus into a preset model to obtain a plurality of second output results; determining a target output result with the highest similarity to the first output result in the plurality of second output results, and coding the corpus corresponding to the target output result to obtain a combined sequence; decoding the combined sequence by using the abstract syntax tree to obtain an SQL statement; and sending the SQL sentences to an SQL database to query data to obtain data results, and sending the data results to a front-end page of the large data screen so that the front-end page updates display data according to the data results, thereby realizing intelligent interaction between a user and the large data screen.

Description

Voice control data analysis method and device, terminal equipment and storage medium

Technical Field

The application belongs to the technical field of voice control, and particularly relates to a voice control data analysis method and device, a terminal device and a storage medium.

Background

In the big data era, the most common words heard are "speak with data". But the data itself is a cold number, and it is difficult to tell us directly which data is valuable information. Only through the display expression of data by a proper visualization tool, the feeling transmitted to the user can be more intuitive, and the value of the feeling can be more easily obtained.

The data large screen is an effective data visualization tool, and can display the key indexes of the business on one or more LED large screens in a visual mode, so that business personnel can quickly and directly find important data from complex business data, and a decision-making personnel can be assisted.

However, the visual large screen on the market at present has a single function, the displayed data volume is limited, the interactivity is lacked, and the displayed content and interface of the large data screen are fixed after a programmer develops and finishes online. If the displayed data, format and image are not satisfactory, only secondary development, testing and line re-wiring can be carried out. A change and increase in demand will take days, which is a common disadvantage of data large screens today.

Disclosure of Invention

The embodiment of the application provides a voice control data analysis method and device, terminal equipment and a storage medium, and can solve the problem that a large data screen is lack of interactivity.

In a first aspect, an embodiment of the present application provides a voice control data analysis method, including:

converting voice information into text information, and filtering the text information to obtain a vocabulary set;

determining a vector of each vocabulary in the vocabulary set, and bringing the vector corresponding to each vocabulary into a preset model to obtain a first output result;

determining vectors of the corpora in the corpus, and bringing the vectors corresponding to the corpora into the preset model to obtain a plurality of second output results;

determining a target output result with the highest similarity to the first output result in the plurality of second output results, and coding the corpus corresponding to the target output result to obtain a combined sequence;

decoding the combined sequence by using an abstract syntax tree to obtain an SQL statement;

and sending the SQL statement to an SQL database to query data to obtain a data result, and sending the data result to a front-end page of a data large screen to enable the front-end page to update display data according to the data result.

In a possible implementation manner of the first aspect, the filtering the text information to obtain a vocabulary set includes:

removing non-text content in the text information;

performing word segmentation processing on the text information without the non-text content to obtain a plurality of words;

performing part-of-speech tagging on each vocabulary;

and removing stop words to obtain the vocabulary set.

In a possible implementation manner of the first aspect, the preset model is a word2vec model, a one-hot model, or a TF-IDF model.

In a possible implementation manner of the first aspect, the determining, in the plurality of second output results, a target output result with a highest similarity to the first output result, and encoding a corpus corresponding to the target output result to obtain a combined sequence includes:

calculating the similarity of each second output result and the first output result;

determining a second output result with the highest similarity to the first output result, and recording the second output result as the target output result;

acquiring a target corpus corresponding to the target output result from a corpus, and obtaining an expression mode of query, table and column in a database according to the target corpus;

and jointly coding the query, the table and the column to obtain the combined sequence.

In a possible implementation manner of the first aspect, the decoding the combined sequence by using an abstract syntax tree to obtain an SQL statement includes:

carrying out syntax analysis on the combined sequence to obtain a target abstract syntax tree;

converting the linguistic data on each node in the target abstract syntax tree according to a preset rule;

and obtaining the SQL statement according to the converted target abstract syntax tree.

In a possible implementation manner of the first aspect, before the converting the voice information into text information and filtering the text information to obtain a vocabulary set, the method further includes:

extracting feature information of the voice information, and identifying an identity corresponding to the voice information according to the feature information;

when the identity corresponding to the voice is a speaker, executing the step of converting the voice information into text information and filtering the text information to obtain a vocabulary set;

and when the identity of the voice information is a non-speaker, not executing the step of converting the voice information into text information, and filtering the text information to obtain a vocabulary set.

In a possible implementation manner of the first aspect, before the extracting the feature information of the voice information and identifying the identity corresponding to the voice information according to the feature information, the method further includes:

when a preset instruction is received, acquiring voice information of a speaker;

and extracting the characteristic information of the voice information, and taking the characteristic information as standard characteristic information.

In a second aspect, an embodiment of the present application provides a voice control data analysis apparatus, including:

the text processing module is used for converting the voice information into text information and filtering the text information to obtain a vocabulary set;

the first calculation module is used for determining the vector of each vocabulary in the vocabulary set and bringing the vector corresponding to each vocabulary into a preset model to obtain a first output result;

the second calculation module is used for determining vectors of all corpora in the corpus and bringing the vectors corresponding to all the corpora into the preset model to obtain a plurality of second output results;

the encoding module is used for determining a target output result with the highest similarity to the first output result in the plurality of second output results, and encoding the corpus corresponding to the target output result to obtain a combined sequence;

the decoding module is used for decoding the combined sequence by using the abstract syntax tree to obtain an SQL statement;

and the sending module is used for sending the SQL statement to an SQL database to query data to obtain a data result, and sending the data result to a front-end page of a data large screen to update display data according to the data result.

In a third aspect, an embodiment of the present application provides a terminal device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the method according to any one of the first aspect when executing the computer program.

In a fourth aspect, the present application provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the computer program implements the method of any one of the first aspect.

In a fifth aspect, the present application provides a computer program product, which when run on a terminal device, causes the terminal device to execute the method of any one of the above first aspects.

Compared with the prior art, the embodiment of the application has the advantages that:

when the data is used for large screen, the voice information of the user is obtained, the voice information is converted into text information, and the text information is filtered to obtain a vocabulary set. And determining the vector of each vocabulary in the vocabulary set, and bringing the vector corresponding to each vocabulary into a preset model to obtain a first output result. And determining the vector of each corpus in the corpus, and bringing the vector corresponding to each corpus into a preset model to obtain a plurality of second output results. And determining a target output result with the highest similarity to the first output result in the plurality of second output results, and coding the corpus corresponding to the target output result to obtain a combined sequence. And decoding the combined sequence by using the abstract syntax tree to obtain an SQL statement, sending the SQL statement to an SQL database to query data to obtain a data result, and sending the data result to a front-end page of a data large screen to update display data according to the data result. Therefore, the intelligent interaction between the user and the data large screen is realized, and the experience of the user is improved.

It is understood that the beneficial effects of the second aspect to the fifth aspect can be referred to the related description of the first aspect, and are not described herein again.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is a schematic flowchart of a voice control data analysis method according to an embodiment of the present application;

fig. 2 is a schematic flow chart of a voice-controlled data analysis method according to another embodiment of the present application;

fig. 3 is a schematic structural diagram of a voice-controlled data analysis apparatus according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of a terminal device according to an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It should also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

As used in the specification of this application and the appended claims, the term "if" may be interpreted contextually as "when …" or "upon" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".

Furthermore, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used for distinguishing between descriptions and not necessarily for describing or implying relative importance.

Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless expressly specified otherwise.

Fig. 1 is a schematic flow chart illustrating a voice control data analysis method according to an embodiment of the present application. Referring to fig. 1, the voice control data analysis method includes steps S101 to S106.

Step S101, converting the voice information into text information, and filtering the text information to obtain a vocabulary set.

Specifically, when data analysis is performed by using a data large screen, a user can issue a voice control instruction. The voice information of the user can be acquired through a voice acquisition module (e.g., a microphone) in the data large screen. And inputting the voice information into a preset voice model, comparing the characteristics of the voice model and the voice information, and finding out a series of optimal templates matched with the input voice according to a preset searching and matching strategy. Then according to the definition of the template, the identification result of the computer can be given by looking up the table, and the text information is obtained.

And after the voice information is converted into text information, filtering the text information to obtain a vocabulary set. Wherein, the filtering of the text information comprises removing non-text content, word segmentation, part of speech standard, word deactivation and the like.

Illustratively, step S101 may include steps S1011 to S1014.

In step S1011, the non-text content in the text information is removed.

Specifically, a small amount of non-text content can be directly removed by using a regular expression (re) of Python, and complex non-text content can be removed by using a beautiful soup.

In step S1012, the text information without the non-text content is subjected to word segmentation processing to obtain a plurality of words.

Specifically, after the content of the non-text is removed, word segmentation processing is performed on the text information. The Chinese corpus data is a batch of short texts or long texts, such as: sentences, article abstracts, paragraphs and whole articles. The words and expressions between the general sentences and paragraphs are continuous and have certain meanings. When the text information mining analysis is performed, the minimum unit granularity of the text information processing is preferably a word or a word, so that word segmentation is needed to perform word segmentation on all texts at this time.

In step S1013, part-of-speech tagging is performed for each vocabulary.

Specifically, after word segmentation, part-of-speech tagging is performed, and word class labels, such as adjectives, verbs, nouns and the like, are marked on each word or word. This allows the text information to be incorporated into more useful language information in later processing. Common part-of-speech tagging methods can be divided into rule-based and statistical-based methods. Wherein the statistical-based methods such as part-of-speech tagging based on maximum entropy, part-of-speech output based on statistical maximum probability, and part-of-speech tagging based on HMM.

And step S1014, removing stop words to obtain a vocabulary set.

Specifically, the words are finally stopped, and words which do not contribute to text features, such as punctuation marks, tone, names of people, and the like, are used. So in general text processing, after word segmentation, the next step is to stop the word. However, for Chinese, the operation of stop words is not constant, and the stop word dictionary is determined according to specific scenes, for example, in emotion analysis, the word of tone and the exclamation mark should be retained because they have certain contribution and meaning to expressing the degree of tone and emotional color.

Step S102, determining the vector of each vocabulary in the vocabulary set, and bringing the vector corresponding to each vocabulary into a preset model to obtain a first output result.

Specifically, after the vocabulary set is obtained in step S101, the vocabulary in the vocabulary set needs to be represented as a computer-recognized calculation type (generally, vector). And converting each vocabulary in the vocabulary set into a vector, and then bringing the vector corresponding to each vocabulary into a preset model to obtain a first output result. The preset model can be a word2vec model, a one-hot model or a TF-IDF model. For example, the Word2Vec algorithm can better express similarity and analogy relationships between different words. Besides, there are some Word vector representations, such as Doc2Vec, Word Rank, and Fast Text.

The preset model is established by firstly determining the model type and then training the model, wherein the model comprises model fine adjustment and the like. For different application requirements, different models are selected, traditional machine learning models with supervision and without supervision are selected, such as KNN, SVM, Naive Bayes, decision trees, GBDT, K-means and the like, and deep learning models are selected, such as CNN, RNN, LSTM, Seq2Seq, Fast Text, Text CNN and the like. The model is continuously optimized and adjusted in the process of training the model, so that the problems of over-fitting and under-fitting are solved, and the generalization capability of the model is continuously improved.

During the model training process, the problems of over-fitting and under-fitting can occur.

Overfitting: the model learning ability is so strong that the features of the noisy data are also learned, resulting in a reduced generalization ability of the model, which performs well on the training set but performs poorly on the test set.

The solution is as follows: increasing the training amount of the data; adding regularization terms such as L1 regularization and L2 regularization; unreasonable feature selection, manual feature screening and feature selection algorithm application; the Dropout method is used.

Under-fitting: it is the model that does not fit the data well, in that the model is too simple.

The solution is as follows: adding other characteristic items; increasing the complexity of the model, such as adding more layers to the neural network and enabling the generalization capability of the model to be stronger by adding a polynomial to the linear model; the regularization parameters are reduced, the purpose of regularization is to prevent overfitting, but now the model appears under-fitting, and the regularization parameters need to be reduced.

Step S103, determining the vector of each corpus in the corpus, and bringing the vector corresponding to each corpus into a preset model to obtain a plurality of second output results.

Specifically, the method in step S102 is used to convert each corpus into a vector, and then the vector corresponding to each corpus is brought into the preset model to obtain a plurality of second output results, that is, each corpus corresponds to one second output result.

And step S104, determining a target output result with the highest similarity to the first output result in the plurality of second output results, and coding the corpus corresponding to the target output result to obtain a combined sequence.

For example, step S104 may specifically include step S1041 to step S1044.

Step S1041, calculating a similarity between each second output result and the first output result.

Step S1042, determining the second output result with the highest similarity to the first output result, and recording the second output result as the target output result.

Specifically, the similarity between the first output result and each second output result is obtained through calculation, then the similarities are ranked, and finally the second output result with the highest similarity to the first output result is selected and used as the target output result.

Step S1043, obtaining a target corpus corresponding to the target output result from the corpus, and obtaining an expression manner of the query, the table, and the column in the database according to the target corpus.

Specifically, after the second output result is determined, the target corpus corresponding to the second output result is obtained, and the target corpus is the keyword corresponding to the vocabulary in the corpus. And obtaining the expression modes of the query, the table and the column in the database after the target corpus is determined.

And step S1044, carrying out joint coding on the query, the table and the column to obtain a combined sequence.

Specifically, the query, the table and the column are jointly encoded, and the implicit link relation is captured. Wherein table (table name) and column (column name) are collectively referred to as database schema. When a language model such as BERT is used, the input is spliced into a long sequence such as "CLS ] query [ SEP ] table [ SEP ] column1[ SEP ] column2[ SEP ]. the. [ SEP ]", and different tables and columns have correlations with different weights for question queries through multi-layer transform coding.

And step S105, decoding the combined sequence by using the abstract syntax tree to obtain the SQL statement.

Specifically, SQL is treated as an abstract syntax tree during decoding, and each node in the abstract syntax tree is a candidate value of a key word (SELECT, WHERE, and..) or a table name and a column name of SQL. The process of generating SQL is equivalent to making a depth-first search of the syntax tree from the root node of the tree.

Taking node "SELECT" as an example, the "SELECT" node may include 3 leaf nodes down: "Column", "AGG" and "Distingt" respectively represent "select a certain Column", "add aggregation operation" and "duplicate removal from a Column". The downward search from the "SELECT" node is equivalent to a 3-classification task, and the cross entropies of the nodes are calculated in sequence according to the real path and the search path and summed up to be the total loss.

The idea of the abstract syntax tree avoids designing various sub-networks, and has good effect on complex data sets related to cross-table query and nested query. In computer science, an Abstract Syntax Tree (AST) is a tree-like representation of the abstract syntax structure of source code, here specifically the source code of a programming language. Each node on the tree represents a structure in the source code. The syntax is said to be abstract, because the syntax here does not represent every detail that appears in the real syntax. For example, nesting brackets are implicit in the structure of the tree and are not present in the form of nodes.

The purpose of this is to decompose the original sentence into single grammar units, and simultaneously preserve the hierarchical structure between grammar units, and then complete the conversion from the original sentence to the target sentence by traversing again according to a certain rule through the reformation or replacement of the grammar units. For example, taking "1 + 2" as an example, the "+" may be replaced with add, and 1 and 2 are understood as parameters of an add function, that is, the conversion from the original operation statement to the function call may be implemented.

Illustratively, step S105 may include steps S1051 to S1053.

And step S1051, carrying out syntax analysis on the combined sequence to obtain a target abstract syntax tree.

Specifically, when the AST is used to parse an original sentence, lexical analysis is required, the lexical analysis divides the original sentence into a one-dimensional array grammar unit list (token list) according to a predetermined grammar unit table, and the grammar unit list can be defined by self according to different scenes. Generally, lexical analysis automatically partitions grammar elements using continuous spaces as separators.

And after the token table is obtained, converting the one-dimensional unstructured token table into a tree structure by using syntactic analysis. The correctness of the grammar is also verified during parsing. If a statement occurs that does not conform to the grammar, an error is thrown, which is generally the product of this stage of compilation error.

And step 1052, converting the linguistic data on each node in the target abstract syntax tree according to a preset rule.

Specifically, the AST conversion has no fixed standard according to purposes, sometimes only a simple replacement of the matching node, and sometimes possibly an adjustment or replacement of the matching sub-tree structure. Typically this process involves two steps of traversal and translation. The abstract syntax tree can use a general tree traversal method, and currently, the prior traversal and the subsequent traversal are used more frequently.

And step S1053, obtaining the SQL statement according to the converted target abstract syntax tree.

Specifically, the generation is a reverse engineering of the AST resolution, and the tree traversal is also needed in the generation process. Although sometimes generation and conversion may be performed simultaneously, the generation logic is more complicated than when conversion is performed simply.

In the generated traversal process, different processing logics are defined for all situations of nodes of all different types of syntax units, and the traversal order of subtrees under different types of syntax nodes can be different. Enumeration is required for all cases.

The method can quickly realize the conversion from the text to the SQL, has strong interpretability and high SQL accuracy, avoids designing various sub-networks, and has good effect on complex data sets related to cross-table query and nested query.

And step S106, sending the SQL statement to the SQL database to query data to obtain a data result, and sending the data result to a front-end page of a data large screen to enable the front-end page to update display data according to the data result.

Specifically, after the SQL statement is determined, the SQL statement is sent to a front-end page of a data large screen. And the front-end page reconstructs page elements, fills data and other operations according to the SQL statement, and then displays the updated data. Therefore, data analysis display can be achieved according to the voice control instruction, interactivity of a data large screen and a user is improved, and experience of the user is improved.

Fig. 2 is a schematic flow chart of a voice control data analysis method according to another embodiment of the present application. As shown in fig. 2, step S101 further includes step S1001 to step S1003.

Step S1001, extracting feature information of the voice information, and identifying the identity corresponding to the voice information according to the feature information.

Specifically, when a user uses the data large screen to analyze data, the data large screen only performs data analysis work according to the voice of a speaker, and is not interfered by other speakers. After the voice information is acquired, extracting the characteristic information of the voice information, then analyzing the characteristic information of the voice information, and determining the identity corresponding to the voice information.

Step S1002, when the identity corresponding to the voice is the speaker, the step of converting the voice information into text information and filtering the text information to obtain a vocabulary set is executed.

Specifically, when it is determined that the identity corresponding to the voice information is the speaker, it indicates that the speaker issues the voice control instruction to the data large screen, and then step S101 to step S106 are executed. And finally, realizing page element reconstruction and data filling by the data large screen according to the voice information of the speaker, and displaying a reconstructed picture.

And step S1003, when the identity of the voice information is a person who is not the speaker, the step of converting the voice information into text information is not executed, and the text information is filtered to obtain a vocabulary set.

Specifically, when it is determined that the identity corresponding to the voice message is a non-speaker, the steps S101 to S106 are not executed. Therefore, when the speaker is used for analyzing speech data, the data large screen cannot be mistakenly displayed due to the interference of the voice of the non-speaker, and the experience of the user using the data large screen is improved.

In an embodiment of the present application, before the step S1001, a step S10001 and a step S10002 are further included.

Step S10001, after receiving the preset instruction, acquiring the voice information of the speaker.

Specifically, before the user uses the data large screen to perform data analysis, the user can predetermine the speaker, so that when the data large screen is used to analyze data, the data large screen only performs data analysis according to the voice of the speaker, and is not interfered by other speakers. Before the user uses the data large screen to analyze the data, the user can issue a preset instruction to the data large screen, and then the speaker reads the preset sentence segment.

Step S10002 extracts feature information of the speech information, and uses the feature information as standard feature information.

Specifically, after the voice information (the preset sentence period read by the speaker) is acquired, the feature information of the voice information is extracted and used as the standard feature information.

When a user uses a data large screen to analyze data, voice information of the user is obtained, and then characteristic information of the voice information is extracted. And comparing the characteristic information with the standard characteristic information to realize the identification of the user identity (a speaker or a non-speaker).

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

Fig. 3 is a schematic structural diagram of a voice control data analysis apparatus according to an embodiment of the present application. Referring to fig. 3, the voice control data analysis apparatus includes:

the text processing module 31 is configured to convert the voice information into text information, and filter the text information to obtain a vocabulary set;

the first calculation module 32 is configured to determine a vector of each vocabulary in the vocabulary set, and bring the vector corresponding to each vocabulary into a preset model to obtain a first output result;

the second calculation module 33 is configured to determine a vector of each corpus in the corpus, and bring the vector corresponding to each corpus into the preset model to obtain a plurality of second output results;

the encoding module 34 is configured to determine, among the plurality of second output results, a target output result with the highest similarity to the first output result, and encode the corpus corresponding to the target output result to obtain a combined sequence;

a decoding module 35, configured to decode the combined sequence by using an abstract syntax tree to obtain an SQL statement;

and the sending module 36 is configured to send the SQL statement to the SQL database to query data to obtain a data result, and send the data result to a front-end page of a data large screen, so that the front-end page updates display data according to the data result.

In an embodiment of the present application, the text processing module 31 is further configured to:

removing non-text content in the text information;

performing part-of-speech tagging on each vocabulary;

and removing stop words to obtain the vocabulary set.

In an embodiment of the application, the preset model is a word2vec model, a one-hot model or a TF-IDF model.

In an embodiment of the present application, the encoding module 34 is further configured to:

In an embodiment of the present application, the decoding module 35 is further configured to:

In an embodiment of the present application, the voice control data analysis apparatus further includes:

the identity recognition module is used for extracting the characteristic information of the voice information and recognizing the identity corresponding to the voice information according to the characteristic information;

the first execution module is used for executing the step of converting the voice information into the text information and filtering the text information to obtain a vocabulary set when the identity corresponding to the voice is the speaker;

and the second execution module is used for not executing the step of converting the voice information into the text information and filtering the text information to obtain the vocabulary set when the identity of the voice information is a non-speaker.

the acquisition module is used for acquiring the voice information of the speaker after receiving a preset instruction;

and the standard characteristic information determining module is used for extracting the characteristic information of the voice information and taking the characteristic information as standard characteristic information.

It should be noted that, for the information interaction, execution process, and other contents between the above-mentioned devices/units, the specific functions and technical effects thereof are based on the same concept as those of the embodiment of the method of the present application, and specific reference may be made to the part of the embodiment of the method, which is not described herein again.

In addition, the voice control data analysis apparatus shown in fig. 3 may be a software unit, a hardware unit, or a combination of software and hardware unit that is built in the existing terminal device, may be integrated into the terminal device as an independent pendant, or may exist as an independent terminal device.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

Fig. 4 is a schematic structural diagram of a terminal device according to an embodiment of the present application. As shown in fig. 4, the terminal device 4 of this embodiment may include: at least one processor 40 (only one processor 40 is shown in fig. 4), a memory 41, and a computer program 42 stored in the memory 41 and executable on the at least one processor 40, wherein the processor 40 executes the computer program 42 to implement the steps of any of the above-mentioned method embodiments, for example, the steps S101 to S106 in the embodiment shown in fig. 1. Alternatively, the processor 40, when executing the computer program 42, implements the functions of the modules/units in the above-described device embodiments, such as the functions of the modules 31 to 36 shown in fig. 3.

Illustratively, the computer program 42 may be partitioned into one or more modules/units that are stored in the memory 41 and executed by the processor 40 to implement the present invention. The one or more modules/units may be a series of instruction segments of the computer program 42 capable of performing specific functions, which are used to describe the execution process of the computer program 42 in the terminal device 4.

The terminal device 4 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The terminal device 4 may include, but is not limited to, a processor 40, a memory 41. Those skilled in the art will appreciate that fig. 4 is merely an example of the terminal device 4, and does not constitute a limitation of the terminal device 4, and may include more or less components than those shown, or combine some components, or different components, such as an input-output device, a network access device, and the like.

The Processor 40 may be a Central Processing Unit (CPU), and the Processor 40 may be other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 41 may in some embodiments be an internal storage unit of the terminal device 4, such as a hard disk or a memory of the terminal device 4. In other embodiments, the memory 41 may also be an external storage device of the terminal device 4, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like provided on the terminal device 4. Further, the memory 41 may also include both an internal storage unit and an external storage device of the terminal device 4. The memory 41 is used for storing an operating system, an application program, a Boot Loader (Boot Loader), data, and other programs, such as program codes of the computer program 42. The memory 41 may also be used to temporarily store data that has been output or is to be output.

The present application further provides a computer-readable storage medium, where a computer program 42 is stored, and when the computer program 42 is executed by the processor 40, the steps in the above-mentioned method embodiments may be implemented.

The embodiments of the present application provide a computer program product, which when running on a mobile terminal, enables the mobile terminal to implement the steps in the above method embodiments when executed.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. With this understanding, all or part of the processes in the methods of the embodiments described above can be implemented by the computer program 42 to instruct the relevant hardware, where the computer program 42 can be stored in a computer readable storage medium, and when the computer program 42 is executed by the processor 40, the steps of the methods of the embodiments described above can be implemented. Wherein the computer program 42 comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include at least: any entity or apparatus capable of carrying computer program code to a terminal device, recording medium, computer Memory, Read-Only Memory (ROM), Random-Access Memory (RAM), electrical carrier wave signals, telecommunications signals, and software distribution medium. Such as a usb-disk, a removable hard disk, a magnetic or optical disk, etc. In certain jurisdictions, computer-readable media may not be an electrical carrier signal or a telecommunications signal in accordance with legislative and patent practice.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/network device and method may be implemented in other ways. For example, the above-described apparatus/network device embodiments are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implementing, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims

1. A voice control data analysis method is characterized by comprising the following steps:

2. The method according to claim 1, wherein the filtering the text information to obtain a vocabulary set comprises:

removing non-text content in the text information;

performing part-of-speech tagging on each vocabulary;

and removing stop words to obtain the vocabulary set.

3. The method according to claim 1, wherein the predetermined model is a word2vec model, a one-hot model, or a TF-IDF model.

4. The method according to claim 1, wherein the determining a target output result with the highest similarity to the first output result among the plurality of second output results, and encoding the corpus corresponding to the target output result to obtain a combined sequence comprises:

5. The method according to claim 1, wherein the decoding the combined sequence using the abstract syntax tree to obtain the SQL statement comprises:

6. The method according to any one of claims 1 to 5, further comprising, before converting the voice message into a text message and filtering the text message to obtain a vocabulary set:

7. The method according to claim 6, wherein before extracting the feature information of the voice message and recognizing the identity corresponding to the voice message according to the feature information, the method further comprises:

8. A voice-activated data analysis device, comprising:

9. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 7.