CN118095217B

CN118095217B - Voice interactive data analysis system based on natural language processing

Info

Publication number: CN118095217B
Application number: CN202410479579.8A
Authority: CN
Inventors: 皇甫汉聪; 王永才; 关兆雄; 庞伟林; 林浩; 李沐栩; 王俊丰; 郑晓娟; 吴丽贤; 杜家兵; 宋才华; 刘胜强; 庞维欣
Original assignee: Foshan Power Supply Bureau of Guangdong Power Grid Corp
Current assignee: Foshan Power Supply Bureau of Guangdong Power Grid Corp
Priority date: 2024-04-22
Filing date: 2024-04-22
Publication date: 2024-07-05
Anticipated expiration: 2044-04-22
Also published as: CN118095217A

Abstract

The invention relates to the technical field of voice recognition, and discloses a voice interactive data analysis system based on natural language processing, which comprises a universal language recognition layer, an additional language layer, a conversion recognition layer, a query feedback layer, an exception handling layer and a privacy protection layer.

Description

Voice interactive data analysis system based on natural language processing

Technical Field

The invention relates to the technical field of voice recognition, in particular to a voice interactive data analysis system based on natural language processing.

Background

Data analysis refers to the process of identifying, analyzing, and extracting patterns, trends, correlations, and holes in data by collecting, cleaning, processing, and interpreting the data. It relates to the extraction of useful information from large amounts of data using techniques and methods such as statistics, machine learning, data mining, and visualization, providing support and guidance for decision making, problem solving, and business optimization.

By means of data analysis of voice interaction, richer data information can be obtained. Besides the voice command of the user, the voice interaction can capture information such as emotion, intonation, speed and the like of voice, and the information has important significance for data analysis in the aspects of user behavior analysis, emotion recognition and the like.

Speech recognition (Speech Recognition) is a key technology for speech interaction, and in speech recognition applications, there are often many industry terms and terms in the professional field, and these words may lack training data in a general language model, resulting in a decrease in recognition accuracy. For these domain-specific terms, if a large amount of industry data is prepared for specialized training or customization, the application cost increases dramatically, and the problem of integrity of the training data is faced, so that the speech recognition result is not expected, and a high-accuracy and low-cost speech recognition mechanism is needed for the domain-specific terms to meet the data analysis of the speech interaction. To this end, we propose a voice interactive data analysis system based on natural language processing.

Disclosure of Invention

The present invention is directed to a voice interactive data analysis system based on natural language processing, so as to solve the above-mentioned problems in the background art.

In order to achieve the above purpose, the present invention provides the following technical solutions: the voice interactive data analysis system based on natural language processing comprises a universal language recognition layer, wherein the universal language recognition layer comprises a universal model for natural language processing and is used for carrying out natural language understanding on texts obtained from voice recognition, and the universal model comprises a parsing sentence pattern structure, recognition keywords and an understanding context;

The method comprises the steps of including an additional language layer for converting incoming text into technical terms, wherein the additional language layer comprises modeling and matching functions for the technical terms;

The method comprises a conversion recognition layer, wherein the conversion recognition layer comprises a conversion text judgment model, the conversion text judgment model is used for judging whether a text output by the universal language recognition layer needs to be transmitted into an additional language layer for secondary conversion, the text needing secondary conversion can obtain a professional term for replacing the text after conversion, and the conversion recognition layer replaces the text before conversion by the professional term after conversion to obtain a final analysis text of voice recognition.

Preferably, the additional corpus layer comprises a dictionary tree, a character matcher and a tree constructor, the dictionary tree adopts a hash table data structure to store a state machine of the corpus, each node in the dictionary tree comprises a private two-dimensional letter array and a target text, and the letter array is used for marking a matching path of the target text;

the character matcher comprises an ASCII (ASCII code operator), and the ASCII code operator is used for realizing the quick positioning of the index of the letter array and finding out a complete target text path;

The tree constructor provides a corpus adding and deleting function and is used for constructing and debugging a dictionary tree state machine which accords with expectations.

Preferably, the converted text judgment model comprises a feature extraction module, a data marking module and a model training module, wherein the feature extraction module extracts features by taking texts output by the universal model as keywords, the data marking module converts the extracted features into numerical vectors convenient for classification, the model training module uses a logistic regression algorithm to construct a classification model, and the classification model takes the numerical vectors converted by the data marking module as input and takes classification results as output.

Preferably, a classification model is constructed by adopting a logistic regression classification algorithm, the classified marks are abstracted into 0 and 1, and the mapping relation is realized by using a logarithmic geometric function:

Wherein x represents a numerical vector converted by the data marking module, z represents any real number, y represents a result of logistic regression, w represents a weight vector, b represents an offset, T represents a transposed mark, w is transposed from a column vector to a row vector, a logarithmic probability function is used for compressing an output range of logistic regression from negative infinity to positive infinity to (0, 1), and the result of logistic regression, y=0.5, is regarded as a demarcation point; when y > =0.5, the result of the logistic regression is judged as a positive example, the classification mark is 1, and the text is considered to be required to be transmitted into an additional language layer for secondary conversion; when y is less than 0.5, the result of the logistic regression is judged as a counterexample, the classification mark is 0, and the text is regarded as not needing to be transmitted into an additional language layer for secondary conversion.

Preferably, the keyword features extracted by the feature extraction module comprise word frequency, word length, part of speech and shape change of front and rear words.

Preferably, the training data is divided into a training set for training the classification model, a verification set for adjusting parameters of the classification model, and a test set for evaluating performance of the classification model.

Preferably, the method comprises a query feedback layer, wherein the query feedback layer is used for mapping the text finally analyzed by the conversion recognition layer to a corresponding data query or analysis task, and feeding back a query analysis result to a user, and the feedback modes comprise voice broadcasting, graphical interface display and text output.

Preferably, an exception handling layer is included for handling input instructions that are not understood by the system, providing error handling and feedback mechanisms.

Preferably, the method comprises a privacy protection layer, wherein the privacy protection layer is used for encrypting data of voice input of a user and output results of data query, and ensuring safety of the data in the transmission and storage processes.

Preferably, the measures taken by the privacy protection layer further comprise identity authentication and authorization, data desensitization, and user access of the system is controlled based on a token authentication mechanism.

Compared with the prior art, the invention has the beneficial effects that:

1. according to the invention, through combining the additional language layer and the conversion recognition layer with the traditional universal natural language processing model, high-accuracy recognition can be realized on terms in a specific field during voice interaction, the application scene of voice interaction is increased, and the user experience is improved.

2. The invention can lead the voice recognition result to iterate continuously through the high maintainability of the additional language layer, and the recognition accuracy is improved continuously along with the iteration; on the premise of ensuring high accuracy, the application cost of voice interaction in specific fields is reduced, and after the conversion recognition layer obtains the recognition training model, the conversion recognition layer can be multiplexed in each specific field, so that the aim of multiplexing multiple places in one training is fulfilled.

Drawings

FIG. 1 is a schematic diagram of the overall structure of the present invention;

FIG. 2 is a diagram of a dictionary tree after adding the term "distribution network";

FIG. 3 is a diagram of the dictionary tree after adding the terms "distribution network" and "distribution board";

FIG. 4 is a diagram of a dictionary tree with all example terms added;

Fig. 5 is a diagram of an additional language layer storage structure.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1-5, the present invention provides a technical solution: the voice interactive data analysis system based on natural language processing comprises a universal language recognition layer, wherein the universal language recognition layer comprises a universal model for natural language processing and is used for carrying out natural language understanding on texts obtained from voice recognition, and the universal model comprises a parsing sentence pattern structure, recognition keywords and an understanding context; the method comprises the steps of including an additional language layer for converting incoming text into technical terms, wherein the additional language layer comprises modeling and matching functions for the technical terms; the method comprises a conversion recognition layer, wherein the conversion recognition layer comprises a conversion text judgment model, the conversion text judgment model is used for judging whether a text output by the universal language recognition layer needs to be transmitted into an additional language layer for secondary conversion, the text needing secondary conversion can obtain a professional term for replacing the text after conversion, and the conversion recognition layer replaces the text before conversion by the professional term after conversion to obtain a final analysis text of voice recognition.

The method uses the additional language layer and the conversion recognition layer to strengthen the recognition accuracy of the general language recognition layer on the technical terms, the purpose of the conversion recognition layer is to distinguish whether the text needs conversion or not, and the text before conversion is replaced by the converted file to obtain the final analysis text of voice recognition.

The conversion recognition layer comprises a conversion text judgment model, and the conversion text judgment model is essentially a text classification model and can be multiplexed at multiple positions; the additional language layers are a set of professional terms, have different corpus contents for different professional fields, are relatively low in construction cost, only provide a certain number of words which need to be compared during voice recognition conversion, have maintainability, and can add and delete and modify the corpus contents in real time. The invention can meet the requirement on the accuracy of the voice recognition in the specific field by combining the universal language recognition layer, the additional language layer and the conversion recognition layer, and has low multiplexing cost and high portability.

Embodiment one:

The additional corpus layer comprises a dictionary tree, a character matcher and a tree constructor, wherein the dictionary tree adopts a hash table data structure to store a state machine of corpus, each node in the dictionary tree comprises a private two-dimensional letter array and a target text, and the letter array is used for marking a matching path of the target text. The dictionary tree is organized using pinyin letters of the terms of art, the last pinyin letter of each term of art Chinese character is taken as a leaf node, the length of the letter data is 26, the 26 lower case letters from a to z are included, the position of the subscript 0 in the array is stored with a pointer to the child node a, the position of the subscript 1 is stored with a pointer to the child node b, and similarly, the position of the subscript 25 is stored with a pointer to the child node z. If the child node of a character does not exist, null is stored at the location of the corresponding index.

Referring to fig. 2-4, by way of example in the field of power generation, assume that the terms of art are: "distribution network", "distribution board", "substation", "ring main unit", "ring network circuit", "transformer". Adding the technical term 'distribution network' into a dictionary tree, wherein the dictionary tree structure change is shown in the figure 2, and then adding the technical term 'distribution board' into the dictionary tree, wherein the dictionary tree structure change is shown in the figure 3; by such pushing, the technical terms of the transformer station, the ring main unit, the ring network circuit and the transformer are added into the dictionary tree, so that the dictionary tree structure shown in fig. 4 can be obtained.

The character matcher comprises an ASCII code operator, and the ASCII code operator is used for realizing quick positioning of the index of the letter array and finding out a complete target text path. When the text is converted, the pinyin composition of the text to be converted is firstly obtained, each pinyin letter is regarded as a character, and the ASCII code of the character is subtracted by the ASCII code of the character to quickly find the pointer of the matched child node. For example, the ASCII code of "d" minus the ASCII code of "a" is 3, and the pointer of that child node "d" is stored in the array in the position with the subscript 3. And matching the pinyin character sequences one by one, and if all the pinyin characters of the text to be converted can find the corresponding nodes in the dictionary tree, acquiring the target text of the last matched tree node to serve as a replacement text.

The tree constructor provides a corpus adding and deleting function, is used for constructing and debugging a dictionary tree state machine conforming to expectations, enables an additional corpus layer to have high maintainability, enables a voice recognition result to be iterated continuously through fine adjustment of the dictionary tree, and enables recognition accuracy to be improved continuously along with iteration.

Embodiment two:

The conversion text judgment model comprises a feature extraction module, a data marking module and a model training module, wherein the feature extraction module extracts features by taking texts output by a general model as key words, the data marking module converts the extracted features into numerical vectors convenient for classification, the model training module uses a logistic regression algorithm to construct a classification model, the classification model takes the numerical vectors converted by the data marking module as input, takes classification results as output, adopts a logistic regression classification algorithm to construct a classification model, abstracts the classified marks into 0 and 1, and uses a logarithmic geometric function to realize a mapping relation:

Wherein x represents a numerical vector converted by the data marking module, z represents any real number, y represents a result of logistic regression, w represents a weight vector, b represents an offset, T represents a transposed mark, w is transposed from a column vector to a row vector, a logarithmic probability function is used for compressing an output range of logistic regression from negative infinity to positive infinity to (0, 1), and the result of logistic regression, y=0.5, is regarded as a demarcation point; when y > =0.5, the result of the logistic regression is judged as a positive example, the classification mark is 1, and the text is considered to be required to be transmitted into an additional language layer for secondary conversion; when y is less than 0.5, determining the result of logistic regression as a counterexample, classifying the text as 0, and considering that the text does not need to be transmitted into an additional language layer for secondary conversion; regarding the result y of the logarithmic probability function as the probability that sample x is taken as the positive example, then 1-y is the probability that it is taken as the negative example, the ratio of the two The relative likelihood of a sample as a positive example is represented by a conditional probability distribution:

where x represents the numerical vector converted by the data tagging module, The probability of determining a positive example is represented,When model training learning is performed, training and learning weight vector w and offset b by minimizing a loss function (such as cross entropy loss), and simultaneously updating weight vector w and offset b by applying a gradient descent optimization algorithm; the logistic regression model applies the maximum likelihood estimation method to the given training data set to determine the parameters of the model, and the logistic regression model is used for the given data set,) Logistic regression maximizes the probability that each sample belongs to its true signature.

Embodiment III:

The keyword features extracted by the feature extraction module comprise word frequency, word length, part of speech and shape change of front and rear words; the extracted features and corresponding markers are used to train the model, the training data is divided into a training set, a verification set and a test set, the training set is used for training the classification model, the verification set is used for adjusting parameters of the classification model, and the test set is used for evaluating performance of the classification model.

The query feedback layer is used for mapping the text finally analyzed by the conversion recognition layer to a corresponding data query or analysis task, executing corresponding data query and analysis operation based on the query intention and parameters of the user, retrieving data from a database or other data sources, executing statistical analysis, generating reports and other tasks, feeding back query analysis results to the user, and the feedback modes comprise voice broadcasting, graphical interface display and text output.

The user can send out further inquiry, request explanation, parameter adjustment and the like through voice, and the system correspondingly adjusts and responds according to the feedback of the user.

The exception handling layer is used for handling input instructions which cannot be understood by the system and providing error handling and feedback mechanisms. The system supports detection of voice recognition errors, intention recognition errors, data query failures and the like, supports log recording and monitoring functions, records abnormal events and error information in time, and is convenient for system performance analysis and problem investigation. In the event of an anomaly, the system directs the user to take the correct action, for example, to provide an explicit indication or to ask the user if assistance or further explanation is required.

The privacy protection layer is used for encrypting the data of the voice input of the user and the output result of the data query, and ensuring the safety of the data in the transmission and storage processes. The measures taken by the privacy protection layer also comprise identity authentication and authorization, and the user access of the system is controlled based on a token authentication mechanism. For sensitive data of the user, such as personal identity information, financial data, etc., encryption processing should be performed during storage and transmission to ensure that the data is not obtained by unauthorized visitors. For the situation that the original data does not need to be reserved, a data desensitization method can be adopted, and the privacy of a user is protected by carrying out confusion, generalization or deletion and the like on the data.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. The voice interactive data analysis system based on natural language processing is characterized in that:

The method comprises a universal language recognition layer, wherein the universal language recognition layer comprises a universal model for natural language processing and is used for carrying out natural language understanding on texts obtained from voice recognition, and the universal language recognition layer comprises a parsing sentence pattern structure, recognition keywords and an understanding context;

The conversion recognition layer comprises a conversion text judgment model, wherein the conversion text judgment model is used for judging whether a text output by the universal language recognition layer needs to be transmitted into an additional language layer for secondary conversion, the text needing secondary conversion can obtain a technical term for replacing the text after conversion, and the conversion recognition layer replaces the text before conversion by the technical term after conversion to obtain an analytic text finally recognized by voice;

The additional corpus layer comprises a dictionary tree, a character matcher and a tree constructor, wherein the dictionary tree adopts a hash table data structure to store a state machine of corpus, each node in the dictionary tree comprises a private two-dimensional letter array and a target text, and the letter array is used for marking a matching path of the target text;

The tree constructor provides a corpus adding and deleting function and is used for constructing and debugging a dictionary tree state machine which accords with expectations;

The conversion text judgment model comprises a feature extraction module, a data marking module and a model training module, wherein the feature extraction module extracts features by taking texts output by the general model as keywords, the data marking module converts the extracted features into numerical vectors convenient for classification, the model training module uses a logistic regression algorithm to construct a classification model, and the classification model takes the numerical vectors converted by the data marking module as input and takes classification results as output.

2. The natural language processing based voice interactive data analysis system according to claim 1, wherein: constructing a classification model by adopting a logistic regression classification algorithm, abstracting the classified marks into 0 and 1, and realizing a mapping relation by using a logarithmic geometric function:

Wherein x represents a numerical vector converted by the data marking module, z represents any real number, y represents a result of logistic regression, w represents a weight vector, b represents an offset, T represents a transposed mark, w is transposed from a column vector to a row vector, a logarithmic probability function is used for compressing an output range from negative infinity to positive infinity to (0, 1), and the result of logistic regression, y=0.5, is regarded as a demarcation point; when y > =0.5, the result of the logistic regression is judged as a positive example, the classification mark is 1, and the text is considered to be required to be transmitted into an additional language layer for secondary conversion; when y is less than 0.5, the result of the logistic regression is judged as a counterexample, the classification mark is 0, and the text is regarded as not needing to be transmitted into an additional language layer for secondary conversion.

3. The natural language processing based voice interactive data analysis system according to claim 2, wherein: the keyword features extracted by the feature extraction module comprise word frequency, word length, part of speech and shape change of front and rear words.

4. A natural language processing based voice interactive data analysis system according to claim 3, characterized in that: the training data is divided into a training set, a verification set and a test set, wherein the training set is used for training the classification model, the verification set is used for adjusting parameters of the classification model, and the test set is used for evaluating performance of the classification model.

5. The natural language processing based voice interactive data analysis system according to claim 4, wherein: the method comprises a query feedback layer, wherein the query feedback layer is used for mapping a text finally analyzed by a conversion identification layer to a corresponding data query or analysis task, and feeding back a query analysis result to a user, and the feedback modes comprise voice broadcasting, graphical interface display and text output.

6. The natural language processing based voice interactive data analysis system according to claim 5, wherein: the system comprises an exception handling layer, wherein the exception handling layer is used for handling input instructions which cannot be understood by a system and providing an error handling and feedback mechanism.

7. The natural language processing based voice interactive data analysis system according to claim 6, wherein: the data encryption device comprises a privacy protection layer, wherein the privacy protection layer is used for encrypting data of voice input of a user and output results of data query, and ensuring safety of the data in the transmission and storage processes.

8. The natural language processing based voice interactive data analysis system according to claim 7, wherein: the measures taken by the privacy protection layer also comprise identity authentication, authorization and data desensitization, and the user access of the system is controlled based on a token authentication mechanism.