CN116189681B

CN116189681B - Intelligent voice interaction system and method

Info

Publication number: CN116189681B
Application number: CN202310486481.0A
Authority: CN
Inventors: 李广鹏; 周林娜
Original assignee: Beijing Crystal Digital Technology Co ltd
Current assignee: Beijing Crystal Digital Technology Co ltd
Priority date: 2023-05-04
Filing date: 2023-05-04
Publication date: 2023-09-26
Anticipated expiration: 2043-05-04
Also published as: CN116189681A

Abstract

The invention discloses an intelligent voice interaction system and method, which relate to the field of voice interaction and comprise a data acquisition module, a data processing module, a data analysis module, a data center, an execution module and a control center.

Description

Intelligent voice interaction system and method

Technical Field

The invention relates to the technical field of intelligent voice control, in particular to an intelligent voice interaction system and method.

Background

Speech is the most common way of communication for humans, and is also the most desirable way for humans to communicate with computers. Therefore, communication between voice and a computer is also a recent research hotspot, along with development of technology, an intelligent voice system is increasingly applied to various industries, and an intelligent voice guide for exhibition is a device for facilitating visitors to deeply understand exhibition objects by performing voice broadcasting explanation on indoor exhibition objects.

The intelligent voice guide has a man-machine interaction function, can record voice in a certain range, analyze voice semantics and communicate; but the current common intelligent voice guidance does not have the accurate voice recognition capability under the complex environment, is easily interfered by the outside to cause unclear voice and interference sound, and especially relates to the intelligent voice guidance in the exhibition field, and because the working environment of the intelligent voice guidance is noisy and various, the voice interaction function of the intelligent voice guidance is easily affected due to the fact that the voice of the intelligent voice guidance is very easy to interfere.

In addition, in the special scene of exhibition, intelligent voice navigation is difficult to recognize different users according to the voice characteristics of the different users, and personalized communication service of the users cannot be provided, so that the interaction experience of the users in the scene of exhibition is poor.

Disclosure of Invention

In order to solve the above-mentioned shortcomings in the background art, the present invention aims to provide an intelligent voice interaction system and method.

The aim of the invention can be achieved by the following technical scheme: in a first aspect, the invention provides an intelligent voice interaction system, which comprises a data acquisition module, a data processing module, a data analysis module, a data center, an execution module and a control center; the data center comprises a tone database, a noise database, a general question-answer database and a user question-answer database; the data acquisition module is used for: collecting analog sound signals, and sending the collected analog sound signals to the data processing module for data processing; the data processing module: converting the analog sound signal into a digital sound signal by using analog-to-digital conversion, and extracting the characteristics of the converted digital sound signal to obtain characteristic parameters of the digital sound signal, wherein the characteristic parameters of the digital sound signal comprise decibels, speeds, tones and audios of the digital sound signal, marking the characteristic parameters of the digital sound signal, and sending the characteristic parameters to the data analysis module for analysis; the data analysis module: calculating the decibel, the speed and the tone of the digital sound signal by utilizing the characteristic parameters of the digital sound signal to obtain a first judgment parameter, setting a standard judgment parameter, carrying out first-order derivation on the first judgment parameter and the standard judgment parameter, and obtaining the difference between the first derivative of the first judgment parameter after the first-order derivation and the absolute value of the first derivative of the standard judgment parameter to obtain a judgment difference value; comparing the judging difference value with a preset difference value threshold, judging that the digital sound signal of the collected sound does not accord with a control standard if the judging difference value is larger than or equal to the difference value threshold, and recording the digital sound signal by the noise database; if the judging difference value is smaller than the difference value threshold value, judging that the digital sound signal of the collected sound accords with a control standard, and the control center filtering the digital sound signal recorded in the noise database and analyzing the tone of the filtered digital sound signal; matching the timbre of the digital sound signal with the timbre of the user in the user timbre parameter set stored in the timbre database: if matching is successful, analyzing the user NLP natural language according to the digital sound signal, traversing the historical question-answer records of the user question-answer library by the control center, eliminating information differences according to the correlation of the analyzed user NLP natural language result and the content of the historical question-answer records of the user question-answer library, obtaining a final language processing result, generating answer content according to the final language processing result, and executing an interaction instruction by the execution module; if the matching fails, analyzing the NLP natural language of the user according to the digital sound signals, accessing the general question-answering library by the control center, calling the data of the general question-answering library to answer, executing an interactive instruction by the execution module, generating a historical question-answering record of the user question-answering library of the user, and receiving and inputting the question-answering content into the user question-answering library.

Preferably, the process of the data processing module for data processing includes the following steps: converting the analog sound signal into a digital sound signal by using analog-to-digital conversion, extracting the characteristics of the converted digital sound signal to obtain characteristic parameters of the digital sound signal, wherein the characteristic parameters of the digital sound signal comprise decibels, speed, tone and tone of the digital sound signal, marking the characteristic parameters of the digital sound signal, and marking the decibels of the digital sound signal as F _by Marking the speed of the digital sound signal as S _dy Marking the tone of the digital sound signal as G _dy Marking the tone of the digital sound signal as Y _sy Wherein y is the number of collection labels, and y=1, 2, 3,..and n, n is the total number of collection; decibel F of the digital sound signal _by Speed S of the digital sound signal _dy Tone G of the digital sound signal _dy And tone Y of the digital sound signal _sy And sending the data to the data analysis module for data analysis.

Preferably, the process of the data analysis module for data analysis includes the following steps: using the formula Calculating to obtain a first determination parameter P _dy Wherein F is _b0 Is the standard sound decibel parameter S _d0 G is the standard sound velocity parameter _d0 For the standard sound tone parameters, α is the sound decibel influencing parameter, β is the sound speed influencing parameter, γ is the sound tone influencing parameter, +.>Is a preset proportionality coefficient; using the calculated first decision parameter P _dy Obtaining the first derivative P of the decision parameter _dy1 And set standard judgment parameter P _db And determines the parameter P for the standard _db Performing first-order derivation to obtain a first derivative P of the standard judgment parameter _db1 The method comprises the steps of carrying out a first treatment on the surface of the Calculating the saidFirst derivative P of first decision parameter _dy1 And the first derivative P of the standard decision parameter _db1 The absolute value difference of (2) is given by:a difference Cz is obtained and is compared with a preset difference threshold Cz ₀ Comparing if Cz is larger than or equal to Cz ₀ The digital sound signals of the collected sound are not in accordance with the control standard, and the noise database records the digital sound signals; if Cz < Cz ₀ The collected sound is according with a control standard, and the control center filters the digital sound signals recorded in the noise database and analyzes the tone of the filtered digital sound signals; acquiring a user tone color parameter set Y stored in the tone color database through a data acquisition unit in the data analysis module _sbp And the tone Y of the digital sound signal is calculated _sy And the tone color parameter set Y of the user _sbp Matching the parameters of the user tone color parameters in the digital voice signal, if the tone color Y of the digital voice signal is _sy The matching is successful, the user NLP natural language is analyzed according to the digital sound signals, the control center traverses the historical question-answer records of the user question-answer library, the information difference is eliminated according to the correlation between the analyzed user NLP natural language results and the content of the historical question-answer records of the user question-answer library, a final language processing result is obtained, answer content is generated according to the final language processing result to interact, and the execution module executes an interaction instruction; if tone Y of digital sound signal _sy And if the matching fails, analyzing the NLP natural language of the user according to the digital sound signal, accessing the general question-answering library by the control center, calling the data of the general question-answering library to answer, executing an interactive instruction by the execution module, generating a historical question-answering record of the user question-answering library of the user, and receiving and inputting the question-answering content into the user question-answering library.

Preferably, the user tone color parameter set Y _sbp ={Y _sb1 、Y _sb2 、Y _sb3 、...、Y _sbt Where p is the user number and t is the total number of users.

Preferably, the user tone color parameter set Y _sbp The acquisition process of (a) is as follows: recording voice information of a user through a data acquisition terminal in the control center, wherein the voice information of the user comprises voice decibels, voice speeds and voice tones; and combining the sound information with a tone mapping model, acquiring and storing user tone parameters, and integrating all acquired user tone parameters to form a user tone parameter set, wherein the tone mapping model is trained based on an artificial intelligence model.

Preferably, the timbre mapping model is trained based on the artificial intelligence model by the following steps: integrating and acquiring standard training data through a server, wherein the standard training data comprises sound information and user tone parameters; training the artificial intelligent model through the standard training data to acquire and store the tone mapping model; wherein the artificial intelligence model comprises a deep convolutional neural network model and an RBF neural network model.

Preferably, the data acquisition module is configured to acquire the analog sound signal by using a sound pickup.

Preferably, the sound pickup is an analog sound pickup, and is composed of a microphone and an audio amplifying circuit.

In a second aspect, the present invention also provides an intelligent voice interaction method, which includes the following steps: obtaining an analog sound signal, and performing analog-to-digital conversion on the analog sound signal to obtain a digital sound signal; extracting the characteristics of the digital sound signals to obtain characteristic parameters of the digital sound signals, and marking the characteristic parameters of the digital sound signals; calculating by using the marked characteristic parameters of the digital sound signals to obtain first judging parameters, setting standard judging parameters, respectively carrying out first-order derivation on the first judging parameters and the standard judging parameters, and calculating the difference between the first derivative of the first judging parameters and the absolute value of the first derivative of the standard judging parameters to obtain judging difference values; comparing the judging difference value with a set difference value threshold value, judging that the digital sound signal of the collected sound does not accord with a control standard if the judging difference value is larger than or equal to the difference value threshold value, and recording the digital sound signal by a noise database; if the judging difference value is smaller than the difference value threshold value, judging that the digital sound signal of the collected sound accords with a control standard, and the control center filtering the digital sound signal recorded in the noise database and analyzing the tone of the filtered digital sound signal; matching the timbre of the digital sound signal with the timbre of the user in the user timbre parameter set stored in the timbre database: if matching is successful, analyzing the user NLP natural language according to the digital sound signal, traversing the historical question-answer records of the user question-answer library by the control center, eliminating information differences according to the correlation of the analyzed user NLP natural language result and the content of the historical question-answer records of the user question-answer library, obtaining a final language processing result, generating answer content according to the final language processing result for interaction, and executing an interaction instruction by the execution module; if the matching fails, analyzing the NLP natural language of the user according to the digital sound signal, accessing a general question-answer library by the control center, calling data of the general question-answer library to answer, executing an interaction instruction by the execution module, generating a historical question-answer record of the user question-answer library of the user, and receiving and inputting the question-answer content into the user question-answer library.

The invention has the following beneficial effects: in the use process of the intelligent voice interaction system provided by the invention, analog voice signals are required to be acquired, analog-to-digital conversion is carried out on the analog voice signals to digital voice signals, the characteristics of the converted digital voice signals are extracted, the characteristic parameters of the digital voice signals are obtained, and the characteristic parameters of the digital voice signals are marked; calculating by using the marked characteristic parameters of the digital sound signal to obtain a first judgment parameter, setting a standard judgment parameter, respectively carrying out first-order derivation on the first judgment parameter and the standard judgment parameter, and calculating the difference between the first derivative of the first judgment parameter and the absolute value of the first derivative of the standard judgment parameter to obtain a difference value; comparing the difference with a set difference threshold: if the difference value is greater than or equal to the difference value threshold value, judging that the digital sound signal of the collected sound does not accord with the control standard, and recording the digital sound signal by the noise database; if the difference value is smaller than the difference value threshold value, judging that the digital sound signal of the collected sound accords with the control standard, and the control center filters the digital sound signal recorded in the noise database and analyzes the tone of the filtered digital sound signal; matching the timbre of the digital sound signal with the user timbre in the user timbre parameter set stored in the timbre database: if matching is successful, analyzing the user NLP natural language according to the digital sound signal, traversing the historical question-answer records of the user question-answer library by the control center, eliminating information difference according to the correlation between the analyzed user NLP natural language result and the content of the historical question-answer records of the user question-answer library, obtaining a final language processing result, generating answer content according to the final language processing result for interaction, and executing an interaction instruction by the execution module; if the matching fails, analyzing the NLP natural language of the user according to the digital sound signals, accessing the general question-answering library by the control center, calling the data of the general question-answering library to answer, executing an interactive instruction by the execution module, generating a historical question-answering record of the user question-answering library of the user, and receiving and inputting the question-answering content into the user question-answering library.

The invention can realize the identification of effective sound production or noisy noise of the environment by using the intelligent voice equipment, and can shield other environment interference sounds if the intelligent voice equipment judges that the intelligent voice equipment is effective sound production, thereby increasing the accuracy of voice identification; the invention can also identify the identity of the sounding user according to the comparison result of the tone database, and the correlation of the content of the history question-answer record accessing the user database eliminates the information difference, so that the problem that the user interaction experience is poor due to unclear voice identification is avoided, if no question-answer history exists, the user database can be created, the NLP natural language is analyzed and the general database is accessed for answering.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described, and it will be obvious to those skilled in the art that other drawings can be obtained according to these drawings without inventive effort.

Fig. 1 is a system architecture diagram of an intelligent voice interaction system according to an embodiment of the present invention.

Fig. 2 is a flowchart of an intelligent voice interaction method according to a second embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Example 1

The intelligent voice interaction system shown in fig. 1 comprises a data acquisition module, a data processing module, a data analysis module, a data center, an execution module and a control center, wherein the data center comprises a tone database, a noise database, a general question-answering database and a user question-answering database.

And a data acquisition module: and acquiring an analog sound signal, and sending the acquired analog sound signal to a data processing module for data processing.

And a data processing module: converting the analog sound signal into a digital sound signal by using analog-to-digital conversion, and extracting the characteristics of the converted digital sound signal to obtain the characteristic parameters of the digital sound signal, wherein the characteristic parameters of the digital sound signal comprise decibels, speed, tone and tone of the digital sound signal, marking the characteristic parameters of the digital sound signal, and sending the characteristic parameters to a data analysis module for analysis.

And a data analysis module: calculating the decibel, the speed and the tone of the digital sound signal by utilizing the characteristic parameters of the digital sound signal to obtain a first judgment parameter, setting a standard judgment parameter, carrying out first-order derivation on the first judgment parameter and the standard judgment parameter, and obtaining the difference between the first derivative of the first judgment parameter after the first-order derivation and the absolute value of the first derivative of the standard judgment parameter to obtain a judgment difference value.

And comparing the judging difference value with a preset difference value threshold, and if the judging difference value is larger than or equal to the difference value threshold, judging that the digital sound signal of the collected sound does not accord with the control standard, and recording the digital sound signal by the noise database.

If the judging difference is smaller than the difference threshold, judging that the digital sound signal of the collected sound accords with the control standard, filtering the digital sound signal recorded in the noise database by the control center, and analyzing the tone of the filtered digital sound signal.

Matching the timbre of the digital sound signal with the user timbre in the user timbre parameter set stored in the timbre database: if the matching is successful, analyzing the natural language of the user NLP (Natural Language Processing ) according to the digital sound signal, traversing the historical question-answer records of the user question-answer library by the control center, eliminating information differences according to the correlation between the analyzed natural language result of the user NLP and the content of the historical question-answer records of the user question-answer library, obtaining a final language processing result, generating answer content according to the final language processing result to interact, and executing an interaction instruction by the execution module; if the matching fails, analyzing the NLP natural language of the user according to the digital sound signals, accessing a general question-answer library by the control center, calling data of the general question-answer library to answer, executing an interactive instruction by the execution module, generating a history question-answer record of the user question-answer library of the user, and recording the question-answer content into the user question-answer library.

In the first embodiment of the invention, in the use process, analog voice signals are required to be collected and converted into digital voice signals in an analog-to-digital mode, the converted digital voice signals are subjected to characteristic extraction to obtain characteristic parameters of the digital voice signals, and the characteristic parameters of the digital voice signals are marked; calculating by using the marked characteristic parameters of the digital sound signal to obtain a first judgment parameter, setting a standard judgment parameter, respectively carrying out first-order derivation on the first judgment parameter and the standard judgment parameter, and calculating the difference between the first derivative of the first judgment parameter and the absolute value of the first derivative of the standard judgment parameter to obtain a difference value; comparing the difference with a set difference threshold: if the difference value is greater than or equal to the difference value threshold value, judging that the digital sound signal of the collected sound does not accord with the control standard, and recording the digital sound signal by the noise database; if the difference value is smaller than the difference value threshold value, judging that the digital sound signal of the collected sound accords with the control standard, and the control center filters the digital sound signal recorded in the noise database and analyzes the tone of the filtered digital sound signal.

Matching the timbre of the digital sound signal with the user timbre in the user timbre parameter set stored in the timbre database: if matching is successful, analyzing the user NLP natural language according to the digital sound signal, traversing the historical question-answer records of the user question-answer library by the control center, eliminating information difference according to the correlation between the analyzed user NLP natural language result and the content of the historical question-answer records of the user question-answer library, obtaining a final language processing result, generating answer content according to the final language processing result for interaction, and executing an interaction instruction by the execution module; if the matching fails, analyzing the NLP natural language of the user according to the digital sound signals, accessing a general question-answer library by the control center, calling data of the general question-answer library to answer, executing an interactive instruction by the execution module, generating a history question-answer record of the user question-answer library of the user, and recording the question-answer content into the user question-answer library.

The intelligent voice interaction system provided by the embodiment of the invention can realize that the intelligent voice equipment is used for identifying effective sounding or noisy environmental noise, if the intelligent voice equipment is judged to be effective sounding, other environmental interference sounds can be shielded, and the voice identification accuracy is improved; in the intelligent voice interaction system provided by the embodiment of the invention, the identity of the sounding user can be identified according to the comparison result of the tone database, the information difference is eliminated by the relativity of the contents of the history question-answer records accessed to the user database, the problem that the user interaction experience is poor due to unclear voice identification is avoided, if no question-answer history exists, the user database can be created, the NLP natural language is analyzed and the general database is accessed for answering.

It should be further described that, in the first embodiment of the present invention, the data acquisition module acquires the analog sound signal by using a pickup, where the pickup is an analog pickup, and is composed of a microphone and an audio amplifying circuit.

The sound pickup is a sound sensing device which converts an analog audio signal into a digital signal through a digital signal processing system and performs corresponding digital signal processing. The analog pick-up amplifies the sound collected by the microphone by using a common analog circuit. The pickup has three-wire system and four-wire system; the three-wire system pickup generally has red representing the positive electrode of the power supply, white representing the positive electrode of the audio frequency, and black representing the negative electrode of the signal and power supply. The four-wire system pickup is generally red for the positive power supply, white for the positive audio supply, and the negative audio supply and the negative power supply are separate.

The data processing module processes the data after receiving the analog sound signal sent by the data acquisition module, and specifically, the process of the data processing module for processing the data comprises the following steps: converting the analog sound signal into a digital sound signal by using analog-to-digital conversion, and extracting the characteristics of the converted digital sound signal to obtain characteristic parameters of the digital sound signal, wherein the characteristic parameters of the digital sound signal comprise decibels, speed, tone and tone of the digital sound signal; marking characteristic parameters of the digital sound signal, and marking decibels of the digital sound signal as F _by Marking the speed of the digital sound signal as S _dy Marking the tone of a digital sound signal as G _dy Marking the timbre of the digital sound signal as Y _sy Where y is the number of collection labels, and y=1, 2, 3,..and n, n is the total number of collection.

It should be further described that, in the first embodiment of the present invention, an intelligent voice is providedIn an interactive system, the decibel F of a digital sound signal _by Speed S of digital sound signal _dy Tone G of digital sound signal _dy Tone Y of digital sound signal _sy And sending the data to a data analysis module for data analysis.

In the characteristic parameters of the digital sound signals, the decibels represent the loudness of sound, the timbre of the sound signals represents different characteristics of different sounds in terms of waveforms, the characteristic parameters are used for distinguishing different human voices, and the tone of the digital sound signals represents the height of sound frequency; the speed of the digital sound signal indicates the length of the interval of sound production.

Then the decibel F of the digital sound signal _by Tone Y of digital sound signal _sy Tone G of digital sound signal _dy Speed S of digital sound signal _dy The digital sound signal is sent to a data analysis module for data analysis, and the data analysis module receives the decibel F of the digital sound signal sent by the data processing module _by Tone Y of digital sound signal _sy Tone G of digital sound signal _dy Speed S of digital sound signal _dy Then, data analysis is carried out, and specifically, the analysis process of the data analysis module comprises the following steps: using the formulaCalculating to obtain a first determination parameter P _dy Wherein F is _b0 Is the standard sound decibel parameter S _d0 G is the standard sound velocity parameter _d0 For the standard sound tone parameters, α is the sound decibel influencing parameter, β is the sound speed influencing parameter, γ is the sound tone influencing parameter, +.>Is a preset proportionality coefficient.

Using the calculated first decision parameter P _dy Obtaining the first derivative P of the first decision parameter _dy1 And set standard judgment parameter P _db And determines the parameter P for the standard _db Performing first-order derivation to obtain first-order derivative P of standard judgment parameter _db1 The method comprises the steps of carrying out a first treatment on the surface of the Calculating a first decision parameterDerivative of order P _dy1 And first derivative P of standard decision parameter _db1 The absolute value difference of (2) is given by:obtaining a difference Cz and comparing the difference Cz with a preset difference threshold Cz ₀ Comparing if Cz is larger than or equal to Cz ₀ The digital sound signal of the collected sound is not in accordance with the control standard, and the noise database records the digital sound signal; if Cz < Cz ₀ The collected sound accords with the control standard, and the control center filters the digital sound signals recorded in the noise database and analyzes the tone of the filtered digital sound signals; acquiring a user tone parameter set Y stored in a tone database through a data acquisition unit in a data analysis module _sbp And the tone Y of the digital sound signal _sy Tone color parameter set Y with user _sbp The user tone parameters in the digital voice signal are matched with parameters, if the tone Y of the digital voice signal _sy The matching is successful, the user NLP natural language is analyzed according to the digital sound signals, the control center traverses the historical question-answer records of the user question-answer library, the information difference is eliminated according to the correlation between the analyzed user NLP natural language results and the content of the historical question-answer records of the user question-answer library, the final language processing results are obtained, the answer content is generated according to the final language processing results to interact, and the execution module executes the interaction instruction; if tone Y of digital sound signal _sy And if the matching fails, analyzing the NLP natural language of the user according to the digital sound signals, accessing a general question-answer library by the control center, calling data of the general question-answer library to answer, executing an interactive instruction by the execution module, generating a history question-answer record of the user question-answer library of the user, and recording the question-answer content into the user question-answer library.

It should be noted that the standard sound decibel parameter, the standard sound tone parameter, and the standard sound speed parameter are the optimal decibel value, the optimal pitch value, and the optimal speed value in the entire control system, and the sound decibel influencing parameter, the sound tone influencing parameter, and the sound speed influencing parameter are three parameter values influencing sound decibel, pitch, and speed.

Need to further sayIt is clear that, in the intelligent voice interaction system provided in the first embodiment of the present invention, the user tone color parameter set Y _sbp ={Y _sb1 、Y _sb2 、Y _sb3 、...、Y _sbt Where p is the user number and t is the total number of users.

User tone color parameter set Y _sbp The acquisition process of (a) is as follows: recording voice information of the user through a data acquisition terminal in the control center, wherein the voice information of the user comprises voice decibels, voice speeds and voice tones.

And combining the sound information with a tone mapping model, acquiring and storing user tone parameters, and integrating all acquired user tone parameters to form a user tone parameter set, wherein the tone mapping model is trained based on an artificial intelligent model.

It should be further described that, training the tone mapping model based on the artificial intelligence model comprises the following specific processes: integrating and acquiring standard training data through a server, wherein the standard training data comprises sound information and voice user color parameters; training the artificial intelligent model through standard training data to obtain and store a tone mapping model; wherein the artificial intelligence model comprises a deep convolutional neural network model and an RBF neural network model.

It will be appreciated that the range of physical characteristic parameters in the standard training data should be large enough, e.g. gender should include male and female, and the age range should be evenly distributed over the 1-120 years.

It should be further noted that the deep convolutional neural network model is a feedforward neural network (Feedforward Neural Networks) with a deep structure, which includes convolutional calculation, and is one of representative algorithms of deep learning (deep learning), and the convolutional neural network has a capability of feature learning (representation learning) and can perform translational invariant classification on input information according to its hierarchical structure, and the convolutional is a linear operation, and a set of weights needs to be multiplied by the input to generate a two-dimensional weight array called a filter. If a filter is adjusted to detect a particular feature type in an input, repeated use of the filter throughout the input image may reveal features anywhere in the image, the structure comprising: input layer: the input layer of the convolutional neural network can process multidimensional data, and the input layer of the one-dimensional convolutional neural network receives a one-dimensional or two-dimensional array, wherein the one-dimensional array is usually time or frequency spectrum sampling; the two-dimensional array may include a plurality of channels; the input layer of the two-dimensional convolutional neural network receives a two-dimensional or three-dimensional array; the input layer of the three-dimensional convolutional neural network receives a four-dimensional array. Since convolutional neural networks are widely used in the field of computer vision, many studies have previously assumed three-dimensional input data, i.e., two-dimensional pixel points and RGB channels on a plane, when introducing their structures. Similar to other neural network algorithms, the input features of convolutional neural networks require normalization processing due to learning using gradient descent algorithms. Specifically, before the learning data is input into the convolutional neural network, the input data needs to be normalized in the channel or time/frequency dimension.

Hidden layer: the hidden layer of the convolutional neural network comprises common structures of a convolutional layer, a pooling layer and a full-connection layer 3, and complex structures such as an acceptance module, a residual block (residual block) and the like can exist in some more modern algorithms. In a common architecture, the convolutional layer and the pooling layer are specific to convolutional neural networks. The convolution kernels in the convolution layer contain weight coefficients, whereas the pooling layer does not, and thus in the literature the pooling layer may not be considered a separate layer. Taking the LeNet-5 as an example, the order in which class 3 is commonly built into the hidden layer is typically: input-convolution layer-pooling layer-full connection layer-output.

The RBF (Radial Basis Function ) neural network model is also called radial basis function neural network model, and is a three-layer forward network, the first layer is an input layer composed of signal source nodes, the second layer is a hidden layer, the number of hidden units is determined according to the requirement of a problem, the transformation function of the hidden units is a non-negative nonlinear function RBF, the third layer is an output layer, the output layer is a linear combination of hidden layer neuron outputs, and the basic idea of the RBF neural network model is that: the hidden layer space is constructed using the RBF as the basis of the hidden units, so that the input vector can be mapped directly to the hidden space without requiring a pass through weight connection. After the center point of the RBF is determined, this mapping relationship is also determined. The mapping from hidden layer space to output space is linear, i.e. the output of the network is a linear weighted sum of hidden unit outputs, where the weights are the network adjustable parameters. The function of the hidden layer is to map the vector from low dimension to high dimension, so that the situation that the low dimension is linearly inseparable to high dimension can become linearly inseparable, which is mainly the idea of kernel function. In this way, the mapping from input to output of the network is nonlinear, and the output of the network is linear for the adjustable parameters, so that the weight of the network can be directly solved by a linear equation system, thereby greatly improving the learning speed and avoiding the problem of local minima.

Example two

The second embodiment of the present invention provides an intelligent voice interaction method, as shown in fig. 2, including the following steps: obtaining an analog sound signal, and performing analog-to-digital conversion on the analog sound signal to obtain a digital sound signal; extracting the characteristics of the digital sound signal to obtain the characteristic parameters of the digital sound signal, and marking the characteristic parameters of the digital sound signal; calculating by using the marked characteristic parameters of the digital sound signals to obtain first judgment parameters, setting standard judgment parameters, respectively carrying out first-order derivation on the first judgment parameters and the standard judgment parameters, and calculating the difference between the first derivative of the first judgment parameters and the absolute value of the first derivative of the standard judgment parameters to obtain judgment difference values; comparing the determined difference value with a set difference value threshold value, and if the determined difference value is larger than or equal to the difference value threshold value, judging that the digital sound signal of the collected sound does not accord with the control standard, and recording the digital sound signal by a noise database; if the judging difference value is smaller than the difference value threshold value, judging that the digital sound signal of the collected sound accords with the control standard, and the control center filters the digital sound signal recorded in the noise database and analyzes the tone of the filtered digital sound signal; matching the timbre of the digital sound signal with the user timbre in the user timbre parameter set stored in the timbre database: if matching is successful, analyzing the user NLP natural language according to the digital sound signal, traversing the historical question-answer records of the user question-answer library by the control center, eliminating information difference according to the correlation between the analyzed user NLP natural language result and the content of the historical question-answer records of the user question-answer library, obtaining a final language processing result, generating answer content according to the final language processing result for interaction, and executing an interaction instruction by the execution module; if the matching fails, analyzing the NLP natural language of the user according to the digital sound signals, accessing a general question-answer library by the control center, calling data of the general question-answer library to answer, executing an interactive instruction by the execution module, generating a history question-answer record of the user question-answer library of the user, and recording the question-answer content into the user question-answer library.

According to the intelligent voice interaction method provided by the second embodiment of the invention, the intelligent voice equipment can be used for identifying effective sounding or noisy environmental noise, if the effective sounding is judged, other environmental interference sounds can be shielded, and the voice identification accuracy is improved; according to the intelligent voice interaction method provided by the embodiment of the invention, the identity of the sounding user can be identified according to the comparison result of the tone database, the information difference is eliminated by accessing the correlation of the contents of the history question-answer records of the user database, the problem that the user interaction experience is poor due to unclear voice identification is avoided, if no question-answer history exists, the user database can be created, the NLP natural language is analyzed, and the general database is accessed for answer; according to the intelligent voice interaction method provided by the second embodiment of the invention, whether effective sounding is effectively recognized, the sounding user is recognized and the history record is recorded, and the man-machine interaction experience is optimized.

In the description of the present specification, the descriptions of the terms "one embodiment," "example," "specific example," and the like, mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present disclosure. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

The above formulas are all formulas with dimensions removed and numerical values calculated, the formulas are formulas which are obtained by acquiring a large amount of data and performing software simulation to obtain the closest actual situation, and preset parameters and preset thresholds in the formulas are set by a person skilled in the art according to the actual situation or are obtained by simulating a large amount of data.

Finally, it should be noted that: the above examples are only specific embodiments of the present invention for illustrating the technical solution of the present invention, but not for limiting the scope of the present invention, and although the present invention has been described in detail with reference to the foregoing examples, it will be understood by those skilled in the art that the present invention is not limited thereto: any person skilled in the art may modify or easily conceive of the technical solution described in the foregoing embodiments, or perform equivalent substitution of some of the technical features, while remaining within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims

1. The intelligent voice interaction system is characterized by comprising a data acquisition module, a data processing module, a data analysis module, a data center, an execution module and a control center;

the data center comprises a tone database, a noise database, a general question-answer database and a user question-answer database;

the data acquisition module is used for: collecting analog sound signals, and sending the collected analog sound signals to the data processing module for data processing;

the data processing module: converting the analog sound signal into a digital sound signal by using analog-to-digital conversion, and extracting the characteristics of the converted digital sound signal to obtain characteristic parameters of the digital sound signal, wherein the characteristic parameters of the digital sound signal comprise decibels, speeds, tones and audios of the digital sound signal, marking the characteristic parameters of the digital sound signal, and sending the characteristic parameters to the data analysis module for analysis;

the data analysis module: calculating the decibel, the speed and the tone of the digital sound signal by utilizing the characteristic parameters of the digital sound signal to obtain a first judgment parameter, setting a standard judgment parameter, carrying out first-order derivation on the first judgment parameter and the standard judgment parameter, and obtaining the difference between the first derivative of the first judgment parameter after the first-order derivation and the absolute value of the first derivative of the standard judgment parameter to obtain a judgment difference value;

Comparing the judging difference value with a preset difference value threshold, judging that the digital sound signal of the collected sound does not accord with a control standard if the judging difference value is larger than or equal to the difference value threshold, and recording the digital sound signal by the noise database;

if the judging difference value is smaller than the difference value threshold value, judging that the digital sound signal of the collected sound accords with a control standard, filtering out the digital sound signal recorded by the noise database by the control center, and analyzing the tone of the filtered digital sound signal;

matching the timbre of the digital sound signal with the timbre of the user in the user timbre parameter set stored in the timbre database:

if matching is successful, analyzing a user natural language by utilizing an NLP according to the digital sound signal, traversing a historical question-answer record of the user question-answer library by the control center, eliminating information difference according to the correlation between the result of analyzing the user natural language by utilizing the NLP and the content of the historical question-answer record of the user question-answer library, obtaining a final language processing result, generating answer content according to the final language processing result, and performing interaction by the execution module, wherein the execution module executes an interaction instruction;

If the matching fails, analyzing natural language of the user by utilizing NLP according to the digital sound signals, accessing the general question-answering library by the control center, calling data of the general question-answering library to answer, executing an interactive instruction by the execution module, generating a history question-answering record of the user question-answering library of the user, and receiving and inputting the question-answering content into the user question-answering library;

the process of the data processing module for data processing comprises the following steps:

converting the analog sound signal into a digital sound signal by using analog-to-digital conversion, and extracting the characteristics of the converted digital sound signal to obtain characteristic parameters of the digital sound signal, wherein the characteristic parameters of the digital sound signal comprise decibels, speed, tone and tone of the digital sound signal;

marking the characteristic parameters of the digital sound signal, and marking the decibel of the digital sound signal as F _by Marking the speed of the digital sound signal as S _dy Marking the tone of the digital sound signal as G _dy Marking the tone of the digital sound signal as Y _sy Wherein y is the number of collection labels, and y=1, 2, 3,..and n, n is the total number of collection;

Decibel F of the digital sound signal _by Speed S of the digital sound signal _dy Tone G of the digital sound signal _dy And tone Y of the digital sound signal _sy Sending the data to the data analysis module for data analysis;

the process of the data analysis module for data analysis comprises the following steps:

calculating a first determination parameter P by using a formula _dy Wherein F is _b0 Is the standard sound decibel parameter S _d0 G is the standard sound velocity parameter _d0 The method is characterized in that the method comprises the steps of taking standard sound tone parameters, wherein alpha is a sound decibel influence parameter, beta is a sound speed influence parameter, gamma is a sound tone influence parameter and is a preset proportionality coefficient;

using the calculated first decision parameter P _dy Obtaining the first derivative P of the first decision parameter _dy1 And set standard judgment parameter P _db And determines the parameter P for the standard _db Performing first-order derivation to obtain a first derivative P of the standard judgment parameter _db1 ；

Calculating a first derivative P of the first decision parameter _dy1 And the first derivative P of the standard decision parameter _db1 The absolute value difference of (2) is given by: to obtain a difference Cz, anAnd is in accordance with a preset difference threshold Cz ₀ Comparing if Cz is larger than or equal to Cz ₀ The digital sound signals of the collected sound are not in accordance with the control standard, and the noise database records the digital sound signals;

If Cz < Cz ₀ The collected sound is according with a control standard, the control center filters out the digital sound signals recorded by the noise database and analyzes the tone of the filtered digital sound signals;

acquiring a user tone color parameter set Y stored in the tone color database through a data acquisition unit in the data analysis module _sbp And the tone Y of the digital sound signal is calculated _sy And the tone color parameter set Y of the user _sbp Matching the parameters of the user tone color parameters in the digital voice signal, if the tone color Y of the digital voice signal is _sy The matching is successful, the user natural language is analyzed by utilizing NLP according to the digital sound signal, the control center traverses the history question-answer record of the user question-answer library, the information difference is eliminated according to the correlation between the result of analyzing the user natural language by utilizing NLP and the content of the history question-answer record of the user question-answer library, the final language processing result is obtained, the answer content is generated according to the final language processing result to interact, and the execution module executes the interaction instruction;

if tone Y of digital sound signal _sy And if matching fails, analyzing natural language of the user by utilizing NLP according to the digital sound signal, accessing the general question-answer library by the control center, calling data of the general question-answer library to answer, executing an interactive instruction by the execution module, generating a historical question-answer record of the user question-answer library of the user, and receiving and inputting the question-answer content into the user question-answer library.

2. An intelligent voice interactive system according to claim 1, wherein said user tone color parameter set Y _sbp ={Y _sb1 、Y _sb2 、Y _sb3 、...、Y _sbt Where p is the user number and t is the total number of users.

3. An intelligent voice interactive system according to claim 2, characterized in that said user tone color parameter set Y _sbp The acquisition process of (a) is as follows:

recording voice information of a user through a data acquisition terminal in the control center, wherein the voice information of the user comprises voice decibels, voice speeds and voice tones;

and combining the sound information with a tone mapping model, acquiring and storing user tone parameters, and integrating all acquired user tone parameters to form a user tone parameter set, wherein the tone mapping model is trained based on an artificial intelligence model.

4. A smart voice interactive system according to claim 3, wherein the timbre mapping model is trained based on the artificial intelligence model by:

integrating and acquiring standard training data through a server, wherein the standard training data comprises sound information and user tone parameters;

training the artificial intelligent model through the standard training data to acquire and store the tone mapping model; wherein the artificial intelligence model comprises a deep convolutional neural network model and an RBF neural network model.

5. The intelligent voice interactive system according to claim 1, wherein the data acquisition module is configured to acquire the analog sound signal using a microphone.

6. The intelligent voice interactive system according to claim 5, wherein the sound pick-up is an analog sound pick-up, and comprises a microphone and an audio amplifier circuit.

7. An intelligent voice interaction method is characterized by comprising the following steps:

obtaining an analog sound signal, and performing analog-to-digital conversion on the analog sound signal to obtain a digital sound signal;

extracting the characteristics of the digital sound signals to obtain characteristic parameters of the digital sound signals, and marking the characteristic parameters of the digital sound signals;

calculating by using the marked characteristic parameters of the digital sound signals to obtain first judging parameters, setting standard judging parameters, respectively carrying out first-order derivation on the first judging parameters and the standard judging parameters, and calculating the difference between the first derivative of the first judging parameters and the absolute value of the first derivative of the standard judging parameters to obtain judging difference values;

comparing the judging difference value with a set difference value threshold value, judging that the digital sound signal of the collected sound does not accord with a control standard if the judging difference value is larger than or equal to the difference value threshold value, and recording the digital sound signal by a noise database;

If the judging difference value is smaller than the difference value threshold value, judging that the digital sound signal of the collected sound accords with a control standard, filtering out the digital sound signal recorded by the noise database by a control center, and analyzing the tone of the filtered digital sound signal;

if matching is successful, analyzing a user natural language by utilizing an NLP according to the digital sound signal, traversing a historical question-answer record of the user question-answer library by a control center, eliminating information difference according to the correlation between the result of analyzing the user natural language by utilizing the NLP and the content of the historical question-answer record of the user question-answer library, obtaining a final language processing result, generating answer content according to the final language processing result to interact, and executing an interaction instruction by an execution module;

if the matching fails, analyzing a user natural language by utilizing NLP according to the digital sound signal, accessing a general question-answer library by a control center, calling data of the general question-answer library to answer, executing an interactive instruction by an execution module, generating a historical question-answer record of the user question-answer library of the user, and receiving and inputting the question-answer content into the user question-answer library;

decibel F of the digital sound signal _by Speed S of the digital sound signal _dy Tone G of the digital sound signal _dy And tone Y of the digital sound signal _sy Sending the data to a data analysis module for data analysis;

Calculating a first derivative P of the first decision parameter _dy1 And the first derivative P of the standard decision parameter _db1 The absolute value difference of (2) is given by: a difference Cz is obtained and is compared with a preset difference threshold Cz ₀ Comparing if Cz is larger than or equal to Cz ₀ The digital sound signals of the collected sound are not in accordance with the control standard, and the noise database records the digital sound signals;

if Cz < Cz ₀ The collected sound accords with the control standard, the control center filters out the digital sound signals recorded by the noise database and analyzes the tone of the filtered digital sound signals;

acquiring a user tone color parameter set Y stored in the tone color database through a data acquisition unit in a data analysis module _sbp And the tone Y of the digital sound signal is calculated _sy And the tone color parameter set Y of the user _sbp Matching the parameters of the user tone color parameters in the digital voice signal, if the tone color Y of the digital voice signal is _sy The matching is successful, the user natural language is analyzed by utilizing NLP according to the digital sound signal, the control center traverses the history question-answer record of the user question-answer library, the information difference is eliminated according to the correlation between the result of analyzing the user natural language by utilizing NLP and the content of the history question-answer record of the user question-answer library, the final language processing result is obtained, the answer content is generated according to the final language processing result to interact, and the execution module executes the interaction instruction;

if tone Y of digital sound signal _sy And analyzing natural language of the user by utilizing NLP according to the digital sound signal, accessing the general question-answer library by the control center, calling data of the general question-answer library to answer, executing an interaction instruction by the execution module, generating a history question-answer record of the user question-answer library of the user, and receiving and inputting the question-answer content into the user question-answer library.