CN110600012A

CN110600012A - Fuzzy speech semantic recognition method and system for artificial intelligence learning

Info

Publication number: CN110600012A
Application number: CN201910713034.8A
Authority: CN
Inventors: 孙斌; 李东晓
Original assignee: Optical Control Teslian (shanghai) Information Technology Co Ltd; Terminus Beijing Technology Co Ltd
Current assignee: LIGHT CONTROLS TESILIAN (SHANGHAI) INFORMATION TECHNOLOGY Co.,Ltd.
Priority date: 2019-08-02
Filing date: 2019-08-02
Publication date: 2019-12-20
Anticipated expiration: 2039-08-02
Also published as: CN110600012B

Abstract

The invention provides a fuzzy speech semantic recognition method and a system for artificial intelligence learning. The invention aims at the fuzzy speech existing in the user dictation speech instruction, reconstructs the fuzzy speech into clear standard speech by utilizing a GAN network architecture, and further realizes the conversion and identification of semantic information based on the standard speech. And in the training process of the GAN network, the input fuzzy speech is corresponding to a sample collection with a larger range by utilizing speech feature matching, and the training of the GAN network is realized by the sample collection.

Description

Fuzzy speech semantic recognition method and system for artificial intelligence learning

Technical Field

The application relates to the field of artificial intelligence control, in particular to a fuzzy speech semantic recognition method and system for artificial intelligence learning.

Background

With the maturity of voice recognition and semantic conversion technologies, people control service facilities by using voice commands more and more frequently, and the applications of the services in smart buildings, smart communities and smart families are more and more.

For example, people can dictate voice commands to control the operation of various services in a smart building, smart community, smart home. For example, people can send a voice command of "please rise to the 15 th floor" for an elevator of a smart building, can also send voice commands of "please call XXX room", "please open door", the door opening password is xxxxxxxx "," please lock door ", and the like beside an access control system of a smart community, or send voice commands of" please open air conditioner "," please turn on ceiling light ", and the like, facing the vicinity of a central control panel of a smart home. The service facility collects the voice command signal, after necessary enhancement processing, the voice command is converted and recognized into semantic information, then a control instruction in a machine code form is generated by the semantic information in a natural language character form, and the service facility can execute necessary work according to the control instruction. Compare in carrying out manual control to button, button type control panel and coming, the voice command mode can bring more convenient experience and bigger degree of freedom for the user, especially is the disabled person who does not have both hands or blindness at the user, or under the condition such as control panel can't be touched because of factors such as environmental barrier, distance are far away, can strengthen the convenience and the accessible of wisdom building, wisdom community, wisdom family.

However, there is a high probability of false conversion in the current recognition of the conversion of a voice command into semantic information, i.e., the recognition of a natural language character from an acoustic signal. Among them, clear speech can be recognized well, but it is especially difficult to realize correct semantic information conversion for fuzzy speech. In the process of transmitting the sound signal to the service facility for collection, fuzzy speech is generated due to attenuation of the sound itself and interference of ambient noise including unclear pronunciation of the user itself, accent and other factors, and the speech command cannot be correctly recognized as semantic information directly, so that the service facility cannot be controlled to operate.

In the prior art, semantic recognition for fuzzy speech mainly adopts preprocessing such as enhancement of sound signals and adopts a confidence evaluation mode, so that the problem of realizing accurate semantic recognition through the fuzzy speech cannot be effectively solved.

With the development of artificial intelligence, recognition models such as an SVM (support vector machine) vector machine and a neural network are applied to a semantic recognition technology of voice in the prior art, specifically, feature quantities are extracted by using voice samples to train the recognition models, and then the feature quantities of the voice to be recognized are input into the recognition models to obtain semantic information. However, if the above recognition model is directly used for semantic recognition of the fuzzy speech, the change forms of the fuzzy speech are very rich, so that the characteristic quantities of the fuzzy speech are very rich and diversified, and the fuzzy speech samples are often lack of representativeness, so that the problems of insufficient training of the artificial intelligent recognition model and poor applicability of the trained recognition model to other fuzzy speech are caused

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides a fuzzy speech semantic recognition method and system for artificial intelligence learning. The invention aims at the fuzzy speech existing in the user dictation speech instruction, reconstructs the fuzzy speech into clear standard speech by utilizing a GAN network architecture, and further realizes the conversion and identification of semantic information based on the standard speech. And in the training process of the GAN network, the input fuzzy speech is corresponding to a sample collection with a larger range by utilizing speech feature matching, and the training of the GAN network is realized by the sample collection.

The invention provides a fuzzy speech semantic recognition method for artificial intelligence learning, which comprises the following steps:

step 1, acquiring a fuzzy voice signal input by a user, and extracting high-dimensional characteristic quantity of the fuzzy voice signal;

step 2, determining a sample collection matched with the characteristics of the fuzzy speech signal according to the spectral envelope characteristic quantity of the fuzzy speech signal;

step 3, constructing a reconstruction model of a GAN framework for reconstructing the fuzzy speech into the standard speech, and training the reconstruction model by utilizing the sample collection;

step 4, constructing a converter for converting the fuzzy voice fundamental frequency into the standard voice fundamental frequency;

step 5, inputting the spectral envelope characteristic quantity of the fuzzy speech signal input by the user into the trained reconstruction model to obtain the spectral envelope characteristic quantity of the reconstructed standard speech output by the generator of the reconstruction model, and inputting the fundamental frequency of the fuzzy speech into a converter to convert the fundamental frequency of the reconstructed standard speech;

step 6, synthesizing and reconstructing the standard voice according to the spectral envelope characteristic quantity and the fundamental frequency of the reconstructed standard voice;

and 7, recognizing semantic information by using the reconstructed standard voice.

Preferably, a plurality of sample collections are established in the step 2, each voice sample comprises a fuzzy voice sample and a standard voice sample, and the similarity of the characteristic quantity of the fuzzy voice sample is within a preset similarity range; and matching the spectral envelope characteristic quantity of the fuzzy speech signal extracted in the step 1 with the collection representative characteristic quantity of each sample collection, thereby selecting the sample collection matched with the spectral envelope characteristic quantity of the fuzzy speech signal.

It is further preferable that, in step 2, the sample collection has n speech samples, and the spectral envelope characteristic quantity of the blurred speech sample corresponding to each speech sample is X_1s，X_2s...X_nsEach frequency spectrum envelope characteristic quantity is d-dimension characteristic vector, and a characteristic quantity matrix X of the sample collection is formed_S＝{X_1s，X_2s...X_ns}; for the r-th dimension in the d-dimension, its entire feature quantity matrix X is calculated_SIs expressed asAnd selecting a feature quantity matrix X_SIn n_kA submatrix composed of characteristic quantities, denoted as a submatrix kThereby the characteristic quantity matrix X_SIn each n_kForming a sub-matrix by the feature vectors, wherein the total number of the sub-matrices is c, namely k is 1,2.. c; the mean of the r-dimension of the sub-matrix k in the d-dimension is expressed asThen calculate the inter-class distance of the c sub-matrices:

and calculating the intra-class distance of each sub-matrix of the c sub-matrices:

wherein x^k _s，rIs X^k _SThe value of each feature vector in r dimension;

calculating the intra-class inter-class proportion of each submatrix of the c submatrixes:

σ＝D_b/D_w

and further determining the submatrix with the highest intra-class inter-class proportion value as the collection representative characteristic quantity of the sample collection.

Preferably, the step 3 of reconstructing the GAN architecture includes: a generator G and a discriminator D; the generator reconstructs the spectral envelope characteristic quantity of the standard voice according to the spectral envelope characteristic quantity of the fuzzy voice input into the generator; the discriminator is used for judging the authenticity of the spectral envelope characteristic quantity reconstructed by the generator.

Preferably, the loss function I of the generator G in step 3_G(G) Expressed as:

whereinRepresenting the penalty, L, of the generator G_c(G) Representing the loss of cyclic agreement of the generator G,regularization parameter, L, representing cyclic consistency loss_id(G) Representing the loss of the feature map of the generator G,a regularization parameter that represents a feature mapping penalty.

Preferably, the loss function of discriminator D in step 3 is expressed as:

wherein D (x)_S) A discrimination value representing a spectral envelope characteristic quantity of a standard speech sample in the input sample collection by discriminator D,representing an expectation of a probability distribution for a standard speech sample;

D(G(x_t) Representation discriminator D to generator G based on the fuzzy speech feature x_tThe discrimination value of the spectral envelope characteristic quantity of the generated standard voice sample,representing feature x of fuzzy speech_tExpectation of probability distribution.

Preferably, the fundamental frequency conversion function constructed in step 4 is:

wherein mu_GAnd σ_GMean and variance, mu, in the log domain for the standard speech generated by the generator_tAnd σ_tMean and variance in the log domain for blurred speech, f_tFor blurring the fundamental frequency of speech, f_GIs converted standard speech fundamental frequency.

Furthermore, the invention provides a fuzzy speech semantic recognition system for artificial intelligence learning, which comprises:

the fuzzy voice signal characteristic quantity extraction module is used for collecting a fuzzy voice signal input by a user and extracting high-dimensional characteristic quantity of the fuzzy voice signal;

the sample selection matching module is used for determining a sample selection matched with the characteristics of the fuzzy speech signal according to the spectral envelope characteristic quantity of the fuzzy speech signal;

the GAN reconstruction model building and training module is used for building a reconstruction model of a GAN framework for reconstructing the fuzzy speech into the standard speech, and training the reconstruction model by utilizing the sample collection;

the converter construction module is used for constructing a converter for converting the fuzzy voice fundamental frequency into the standard voice fundamental frequency;

the reconstruction conversion module is used for inputting the spectral envelope characteristic quantity of the fuzzy voice signal input by the user into the trained reconstruction model to obtain the spectral envelope characteristic quantity of the reconstructed standard voice output by the generator of the reconstruction model, inputting the spectral envelope characteristic quantity of the reconstructed standard voice into the converter and converting the fundamental frequency of the reconstructed standard voice;

and the standard voice synthesis module synthesizes and reconstructs the standard voice according to the spectral envelope characteristic quantity and the fundamental frequency of the reconstructed standard voice.

And the semantic information recognition module is used for recognizing the semantic information by utilizing the reconstructed standard voice.

Preferably, the sample collection matching module has a plurality of sample collections, each voice sample includes a fuzzy voice sample and a standard voice sample, and the similarity of the characteristic quantity of the fuzzy voice sample is within a preset similarity range; and matching the spectral envelope characteristic quantity of the fuzzy speech signal with the collection representative characteristic quantity of each sample collection based on the spectral envelope characteristic quantity of the fuzzy speech signal, so as to select the sample collection matched with the spectral envelope characteristic quantity of the fuzzy speech signal.

It is further preferred that the sample collection in the sample collection matching module has n speech samples, and each speech sample corresponds to a spectral envelope of a blurred speech sampleCharacteristic quantity of X_1s，X_2s...X_nsEach frequency spectrum envelope characteristic quantity is d-dimension characteristic vector, and a characteristic quantity matrix X of the sample collection is formed_S＝{X_1s，X_2s...X_ns}; for the r-th dimension in the d-dimension, its entire feature quantity matrix X is calculated_SIs expressed asAnd selecting a feature quantity matrix X_SIn n_kA submatrix composed of characteristic quantities, denoted as a submatrix kThereby the characteristic quantity matrix X_SIn each n_kForming a sub-matrix by the feature vectors, wherein the total number of the sub-matrices is c, namely k is 1,2.. c; the mean of the r-dimension of the sub-matrix k in the d-dimension is expressed asThen calculate the inter-class distance of the c sub-matrices:

wherein x^k _s，rIs X^k _SThe value of each feature vector in r dimension;

σ＝D_b/D_w

Preferably, the GAN architecture reconstruction model constructed by the GAN reconstruction model construction and training module includes: a generator G and a discriminator D; the generator reconstructs the spectral envelope characteristic quantity of the standard voice according to the spectral envelope characteristic quantity of the fuzzy voice input into the generator; the discriminator is used for judging the authenticity of the spectral envelope characteristic quantity reconstructed by the generator.

Preferably, the loss function I of the generator G_G(G) Expressed as:

Preferably, the loss function of discriminator D is expressed as:

Preferably, the fundamental frequency conversion function constructed by the converter construction module is:

Therefore, the invention provides a fuzzy speech semantic recognition method and system for artificial intelligence learning. The invention aims at the fuzzy speech existing in the user dictation speech instruction, reconstructs the fuzzy speech into clear standard speech by utilizing a GAN network architecture, and further realizes the conversion and identification of semantic information based on the standard speech. According to the invention, the input fuzzy speech is corresponding to the sample collection with a larger range through the speech feature matching, and the training of the GAN network is realized by the sample collection, so that the GAN network training is sufficient and is fully adapted to the feature distribution of the current fuzzy speech, the accuracy and reliability of reconstructing the standard speech are further improved, the accuracy rate from speech to semantic information recognition is obviously improved, and the correct conversion rate can reach more than 95.6% through experimental verification.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is a flowchart of a fuzzy speech semantic recognition method for artificial intelligence learning according to an embodiment of the present application;

fig. 2 is a structural diagram of a fuzzy speech semantic recognition system for artificial intelligence learning according to an embodiment of the present application.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

As shown in FIG. 1, the invention provides a fuzzy speech semantic recognition method for artificial intelligence learning, which comprises the following steps:

step 1, acquiring a fuzzy voice signal input by a user, and extracting high-dimensional characteristic quantity of the fuzzy voice signal.

The fuzzy speech semantic recognition method for artificial intelligence learning of the invention can be applied to speech control functions of service facilities in intelligent communities, intelligent buildings and intelligent families, a user speaks a speech command to the service facilities, the service facilities collect speech signals by using components such as a microphone and the like, necessary front-end enhancement processing such as filtering, noise suppression, time spectrum estimation and the like is carried out, and windowing and framing processing of the speech signals are carried out, so that the method belongs to the prior art and is not described in detail. If the processed speech signal belongs to clear speech, the semantic information is directly identified and converted, and the speech signal is not an improvement point of the invention and is not specifically described here. The invention focuses on the recognition processing of the fuzzy speech signal after the acquisition and enhancement processing.

In the step, extracting high-dimensional characteristic quantity of the fuzzy speech signal, specifically, the high-dimensional characteristic quantity is the spectrum envelope characteristic of each fuzzy speech signal frame, the extraction process of the spectrum envelope characteristic is to perform short-time FTT conversion on each fuzzy speech signal frame to obtain the spectrum of the fuzzy speech signal frame, obtain the Mei spectrum by the Mei filter on the spectrum of the fuzzy speech signal, then perform logarithm taking and DCT discrete cosine transform on the basis of the Mei spectrum to obtain MFCC coefficients, intercept 12-16 MFCC coefficients as fuzzy speech signalsSpectral envelope characteristic quantity X of speech signal frame_t。

And 2, determining a sample selection set matched with the characteristics of the fuzzy speech signal according to the spectral envelope characteristic quantity of the fuzzy speech signal.

In the subsequent steps, a clear speech reconstruction model based on GAN needs to be trained by using a sample set with a certain sample capacity; the fuzzy speech presents abundant diversity, if the general fuzzy speech samples are adopted, the representativeness is often insufficient, and the GAN training is insufficient, so the invention establishes a plurality of sample collections, wherein each sample collection can contain about 1000 sections of speech samples, each speech sample comprises the fuzzy speech sample and the standard speech sample, and the similarity of the characteristic quantity of the fuzzy speech sample is in a preset similarity range. In this step, based on the spectral envelope characteristic quantity of the blurred speech signal extracted in step 1, matching is performed with the collection representative characteristic quantity of each sample collection, so that the sample collection matched with the spectral envelope characteristic quantity of the blurred speech signal is selected.

For the sample collection, assume that there are n segments of speech samples, and the spectral envelope characteristic quantity of the blurred speech sample corresponding to each speech sample is X_1s，X_2s...X_nsEach frequency spectrum envelope characteristic quantity is d-dimension characteristic vector, and a characteristic quantity matrix X of the sample collection is formed_S＝{X_1s，X_2s...X_ns}; for the r-th dimension in the d-dimension, its entire feature quantity matrix X is calculated_SIs expressed asAnd selecting a feature quantity matrix X_SIn n_kA submatrix composed of characteristic quantities, denoted as a submatrix kThereby the characteristic quantity matrix X_SIn each n_kForming a sub-matrix by the feature vectors, wherein the total number of the sub-matrices is c, namely k is 1,2.. c; the mean of the r-dimension of the sub-matrix k in the d-dimension is expressed asThen calculate the inter-class distance of the c sub-matrices:

wherein x^k _s，rIs X^k _SThe value of each feature vector in r dimension.

σ＝D_b/D_w

And matching the spectral envelope characteristic quantity of the fuzzy speech signal with the collection representative characteristic quantity of each sample collection, namely calculating the spectral envelope characteristic quantity of the fuzzy speech signal and the characteristic quantity in the submatrix serving as the collection representative characteristic quantity to calculate the average vector distance, and selecting the sample collection with the minimum average vector distance, thereby selecting the sample collection matched with the spectral envelope characteristic quantity of the fuzzy speech signal.

And 3, constructing a reconstruction model of the GAN architecture for reconstructing the fuzzy speech into the standard speech, and training the reconstruction model by utilizing the sample collection.

The reconstruction model of the GAN architecture comprises the following steps: a generator G and a discriminator D; the generator reconstructs the spectral envelope characteristic quantity of the standard voice according to the spectral envelope characteristic quantity of the fuzzy voice input into the generator; the discriminator is used for judging the authenticity of the spectral envelope characteristic quantity reconstructed by the generator.

The generator adopts a two-dimensional convolution neural network and consists of an encoding network and a decoding network. The coding network comprises 5 convolutional layers, the decoding network comprises 5 deconvolution layers, ResNet is established between the coding network and the decoding network, and standardization is carried out after each convolutional layer. The discriminator uses a two-dimensional convolutional neural network, comprising 5 convolutional layers, standardized after each convolutional layer.

In the training process, for the fuzzy voice samples in the sample selection set, the spectral envelope characteristic quantity X of the fuzzy voice samples_tThe input generator and the training generator minimize the loss function of the generator, and the generator outputs the spectral envelope characteristic quantity of the reconstructed standard voice.

Loss function I of generator G_G(G) Expressed as:

In the training process, the spectrum envelope characteristic quantity of the reconstructed standard voice and the spectrum envelope characteristic quantity of the standard voice sample in the sample selection are input into a discriminator, and the discriminator is trained to minimize the loss function of the discriminator.

The loss function of discriminator D is expressed as:

wherein D (x)_S) A discrimination value representing a spectral envelope characteristic quantity of a standard speech sample in the input sample collection by discriminator D,representing the expectation of a probability distribution of standard speech samples

D(G(x_t) Representation discriminator D to generator G based on the fuzzy speech feature x_tThe discrimination value of the spectral envelope characteristic quantity of the generated standard voice sample,representing feature x of fuzzy speech_tExpectation of probability distribution

Through the training, the loss functions of the generator and the discriminator are minimized, and the trained GAN framework reconstruction model for reconstructing the fuzzy speech into the standard speech is obtained through the preset iteration times.

Step 4, constructing a converter for converting the fuzzy voice fundamental frequency into the standard voice fundamental frequency; the fundamental transfer function is:

step 6, synthesizing and reconstructing the standard voice according to the spectral envelope characteristic quantity and the fundamental frequency of the reconstructed standard voice; specifically, the spectral envelope characteristic quantity and fundamental frequency of the reconstructed standard speech can be substituted into the existing speech synthesizer, such as the WORLD speech synthesizer, to obtain the synthesized reconstructed standard speech

Furthermore, as shown in fig. 2, the present invention provides a fuzzy speech semantic recognition system for artificial intelligence learning, comprising:

and the fuzzy voice signal characteristic quantity extraction module is used for collecting the fuzzy voice signal input by the user and extracting the high-dimensional characteristic quantity of the fuzzy voice signal.

The fuzzy speech semantic recognition system for artificial intelligence learning of the invention can be applied to speech control functions of service facilities in intelligent communities, intelligent buildings and intelligent families, a user speaks a speech command to the service facilities, the service facilities collect speech signals by using components such as a microphone and the like, necessary front-end enhancement processing such as filtering, noise suppression, time spectrum estimation and the like is carried out, and windowing and framing processing of the speech signals are carried out, so that the system is not described in detail in the prior art. If the processed speech signal belongs to clear speech, the semantic information is directly identified and converted, and the speech signal is not an improvement point of the invention and is not specifically described here.

The fuzzy speech signal characteristic quantity extraction module extracts high-dimensional characteristic quantity of a fuzzy speech signal, wherein the high-dimensional characteristic quantity is specifically the spectrum envelope characteristic of each fuzzy speech signal frame, the spectrum envelope characteristic extraction process is to perform short-time FTT conversion on each fuzzy speech signal frame to obtain the spectrum of each fuzzy speech signal frame, obtain the Mei spectrum of the spectrum of each fuzzy speech signal frame through an Mei filter, then perform logarithm taking and DCT (discrete cosine transform) on the basis of the Mei spectrum to obtain MFCC (Mel-based coefficient), intercept 12-16 MFCC coefficients and use the MFCC coefficients as the spectrum envelope characteristic quantity X of the fuzzy speech signal frame_t。

And the sample selection matching module is used for determining a sample selection matched with the characteristics of the fuzzy speech signal according to the spectral envelope characteristic quantity of the fuzzy speech signal.

The method establishes a plurality of sample collections, wherein each sample collection can contain about 1000 sections of voice samples, each voice sample comprises a fuzzy voice sample and a standard voice sample, and the similarity of the characteristic quantity of the fuzzy voice sample is within a preset similarity range. The sample collection may be stored in a sample library of the sample collection matching module.

And for the extracted spectral envelope characteristic quantity of the fuzzy speech signal, the sample selection matching module matches the selection representative characteristic quantity of each sample selection, so that the sample selection matched with the spectral envelope characteristic quantity of the fuzzy speech signal is selected.

For a sample collection, assuming that n sections of voice samples exist, a sample collection matching module determines the spectral envelope characteristic quantity of a fuzzy voice sample corresponding to each voice sample to be X_1s，X_2s...X_nsEach frequency spectrum envelope characteristic quantity is d-dimension characteristic vector, and a characteristic quantity matrix X of the sample collection is formed_S＝{X_1s，X_2s...X_ns}; for the r-th dimension in the d-dimension, its entire feature quantity matrix X is calculated_SIs expressed asAnd selecting a feature quantity matrix X_SIn n_kA submatrix composed of characteristic quantities, denoted as a submatrix kThereby the characteristic quantity matrix X_SIn each n_kForming a sub-matrix by the feature vectors, wherein the total number of the sub-matrices is c, namely k is 1,2.. c; the mean of the r-dimension of the sub-matrix k in the d-dimension is expressed asThen calculate the inter-class distance of the c sub-matrices:

wherein x^k _s，rIs X^k _SThe value of each feature vector in r dimension.

σ＝D_b/D_w

The sample selection matching module matches the spectrum envelope characteristic quantity of the fuzzy speech signal with the selection representative characteristic quantity of each sample selection, namely calculates the spectrum envelope characteristic quantity of the fuzzy speech signal and the characteristic quantity in the submatrix as the selection representative characteristic quantity to calculate the average vector distance, selects the sample selection with the minimum average vector distance, and accordingly selects the sample selection matched with the spectrum envelope characteristic quantity of the fuzzy speech signal.

And the GAN reconstruction model building and training module is used for building a reconstruction model of a GAN framework for reconstructing the fuzzy speech into the standard speech, and training the reconstruction model by utilizing the sample collection.

The reconstruction model of the GAN architecture comprises the following steps: a generator and a discriminator; the generator reconstructs the spectral envelope characteristic quantity of the standard voice according to the spectral envelope characteristic quantity of the fuzzy voice input into the generator; the discriminator is used for judging the authenticity of the spectral envelope characteristic quantity reconstructed by the generator.

In the training process, for the fuzzy voice samples in the sample selection set, the spectral envelope characteristic quantity of the fuzzy voice samplesX_tThe input generator and the training generator minimize the loss function of the generator, and the generator outputs the spectral envelope characteristic quantity of the reconstructed standard voice.

Loss function I of generator G_G(G) Expressed as:

The loss function of discriminator D is expressed as:

wherein D (x)_S) A discrimination value representing a spectral envelope characteristic quantity of a standard speech sample in the input sample collection by discriminator D,representing expectations of probability distribution for standard speech samples

D(G(x_t) Representation discriminator D to generator G based on the fuzzy speech feature x_tGenerated standard speechThe discrimination value of the spectral envelope characteristic quantity of the sample,representing feature x of fuzzy speech_tExpectation of probability distribution

The converter construction module is used for constructing a converter for converting the fuzzy voice fundamental frequency into the standard voice fundamental frequency; the fundamental transfer function is:

The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims

1. A fuzzy speech semantic recognition method for artificial intelligence learning comprises the following steps:

2. The fuzzy speech semantic recognition method according to claim 1, wherein a plurality of sample collections are established in step 2, each speech sample comprises a fuzzy speech sample and a standard speech sample, and the similarity of the feature quantity of the fuzzy speech sample is within a preset similarity range; and matching the spectral envelope characteristic quantity of the fuzzy speech signal extracted in the step 1 with the collection representative characteristic quantity of each sample collection, thereby selecting the sample collection matched with the spectral envelope characteristic quantity of the fuzzy speech signal.

3. The fuzzy speech semantic recognition method according to claim 2, wherein in step 2, the sample collection has n speech samples, and the spectral envelope characteristic quantity of the fuzzy speech sample corresponding to each speech sample is X_1s,X_2s…X_nsEach frequency spectrum envelope characteristic quantity is d-dimension characteristic vector, and a characteristic quantity matrix X of the sample collection is formed_S＝{X_1s,X_2s…X_ns}; for the r-th dimension in the d-dimension, its entire feature quantity matrix X is calculated_SIs expressed asAnd selecting a feature quantity matrix X_SIn n_kA submatrix composed of characteristic quantities, denoted as a submatrix kThereby the characteristic quantity matrix X_SIn each n_kForming a sub-matrix by the feature vectors, wherein the total number of the sub-matrices is c, namely k is 1,2.. c; the mean of the r-dimension of the sub-matrix k in the d-dimension is expressed asThen calculate the inter-class distance of the c sub-matrices:

wherein x^k _s，rIs X^k _SThe value of each feature vector in r dimension;

σ＝D_b/D_w

4. The fuzzy speech semantic recognition method of claim 1, wherein the step 3 of reconstructing the GAN architecture comprises: a generator G and a discriminator D; the generator reconstructs the spectral envelope characteristic quantity of the standard voice according to the spectral envelope characteristic quantity of the fuzzy voice input into the generator; the discriminator is used for judging the authenticity of the spectral envelope characteristic quantity reconstructed by the generator.

5. The fuzzy speech semantic recognition method of claim 4, wherein the loss function I of the generator G in step 3_G(G) Expressed as:

6. The fuzzy speech semantic recognition method of claim 4, wherein the loss function of the discriminator D in step 3 is expressed as:

7. The fuzzy speech semantic recognition method according to claim 1, wherein the fundamental frequency conversion function constructed in step 4 is:

8. An artificial intelligence learning fuzzy speech semantic recognition system, comprising:

9. The system according to claim 8, wherein the sample collection matching module has a plurality of sample collections, each of the speech samples includes a fuzzy speech sample and a standard speech sample, and the similarity of the feature quantity of the fuzzy speech sample is within a preset similarity range; and matching the spectral envelope characteristic quantity of the fuzzy speech signal with the collection representative characteristic quantity of each sample collection based on the spectral envelope characteristic quantity of the fuzzy speech signal, so as to select the sample collection matched with the spectral envelope characteristic quantity of the fuzzy speech signal.

10. The system according to claim 9, wherein the sample collection in the sample collection matching module has n speech samples, and the spectral envelope characteristic quantity of the fuzzy speech sample corresponding to each speech sample is X_1s,X_2s…X_nsEach frequency spectrum envelope characteristic quantity is d-dimension characteristic vector, and a characteristic quantity matrix X of the sample collection is formed_s＝{X_1s,X_2s…X_ns}; for the r-th dimension in the d-dimension, its entire feature quantity matrix X is calculated_SIs expressed asAnd selecting a feature quantity matrix X_SIn n_kA submatrix composed of characteristic quantities, denoted as a submatrix kThereby the characteristic quantity matrix X_SIn each n_kForming a sub-matrix by the feature vectors, wherein the total number of the sub-matrices is c, namely k is 1,2.. c; the mean of the r-dimension of the sub-matrix k in the d-dimension is expressed asThen calculate the inter-class distance of the c sub-matrices:

wherein x^k _s，rIs X^k _sThe value of each feature vector in r dimension;

σ＝D_b/D_w