CN117423338B

CN117423338B - Digital human interaction dialogue method and system

Info

Publication number: CN117423338B
Application number: CN202311732973.XA
Authority: CN
Inventors: 赵策; 张玥; 雷媛媛; 孙岩; 潘亮亮; 刘岩
Original assignee: Zhuo Shi Future Tianjin Technology Co ltd
Current assignee: Zhuo Shi Future Tianjin Technology Co ltd
Priority date: 2023-12-18
Filing date: 2023-12-18
Publication date: 2024-03-08
Anticipated expiration: 2043-12-18
Also published as: CN117423338A

Abstract

The invention relates to the technical field of data processing, in particular to a digital human interaction dialogue method and a digital human interaction dialogue system, comprising the following steps: acquiring audio data; performing iterative self-organizing clustering on all data points to obtain a first clustering result; acquiring a neighborhood data point and a reference data point; acquiring the local significance degree of the data point; acquiring a first data point; acquiring a neighborhood distribution curve of a data point and a first data point; acquiring the matching degree of the data point to be calculated of the data point and the first data point; acquiring a data point to be calculated; acquiring the overall significance level of the data points; updating the clustering center according to the total significance degree of the data points; iterative self-organizing clustering is carried out according to the updated clustering center; obtaining the abnormal degree of the data points according to the final iterative self-organizing clustering result, obtaining the denoised audio data according to the abnormal degree of the data points, and carrying out semantic recognition on the audio data to realize accurate semantic recognition.

Description

Digital human interaction dialogue method and system

Technical Field

The invention relates to the technical field of data processing, in particular to a digital human interaction dialogue method and system.

Background

In the digital human interactive dialogue process, the accuracy of acquiring the real-time user audio data is an important precondition of semantic conversion and intelligent interactive dialogue of the audio data. However, since the user audio data collected in real time often has environmental noise data, mixed audio data can be formed, and a certain influence is caused on the accurate processing of the audio data, so that the user audio data often needs to be subjected to denoising processing.

In the denoising process, denoised audio data is obtained according to the degree to which each data point of the audio data is subjected to noise. The iterative self-organizing clustering method is an unsupervised clustering method and is suitable for noise degree quantization of each data point of audio data. However, due to the different time sequence distribution characteristics of the user sound and noise in the collected mixed audio data, the same phonemes of the user sound have similarity in the audio fluctuation characteristics in the audio data, the audio waveforms of the user sound are influenced by the phonemes in the user sound to present a certain intra-stage aggregation characteristic, and the noise part increases the local waveform confusion degree due to the random characteristics of the noise part. In the process of carrying out cluster analysis on audio data through an iterative self-organizing cluster algorithm, abnormal data points caused by noise often cause wrong clustering, the correction of the center of a subsequent cluster is affected, and the extraction of real user sound data is difficult to realize.

Disclosure of Invention

In order to solve the problems, the invention provides a digital human interaction dialogue method and a digital human interaction dialogue system.

The invention relates to a digital human interaction dialogue method and a system, which adopt the following technical scheme:

one embodiment of the invention provides a digital human interaction dialogue method, which comprises the following steps:

acquiring audio data; constructing a two-dimensional sample space according to the acquired audio data, wherein the abscissa in the two-dimensional sample space is the moment, and the ordinate is the amplitude value of the audio data;

performing iterative self-organizing clustering on all data points to obtain a first clustering result in the clustering process, wherein the first clustering result comprises a plurality of clusters;

marking any one data point as a target data point; presetting a data point neighborhood range, and acquiring a neighborhood data point of the target data point according to the data point neighborhood range; in the first clustering result, marking other data points in the cluster where the target data point is located as reference data points of the target data point;

obtaining the local significance degree of the target data point according to the amplitude value difference between the target data point and the reference data point and the amplitude value distribution of the target data point and the neighborhood data point;

recording any one data point in the clusters except the cluster where the target data point is located as a first data point; acquiring a neighborhood distribution curve of a target data point and a first data point;

obtaining the matching degree of the target data point and the data point to be calculated of the first data point according to the difference of the local significance degree of the target data point and the first data point and the difference between the neighborhood distribution curves; acquiring a data point set to be calculated of a target data point according to the matching degree of the data points to be calculated;

acquiring the overall significance degree of the target data point according to the clustering difference of the target data point and the data point to be calculated in the data point set to be calculated and the difference of the local significance degree;

updating the clustering center of the first clustering result of iterative self-organizing clustering according to the overall significance degree of the target data points; performing next iteration self-organizing clustering according to the updated clustering center; and analogizing to obtain a final iterative self-organizing clustering result;

obtaining the abnormal degree of the data points according to the final iterative self-organizing clustering result, obtaining the denoised audio data according to the abnormal degree of the data points, and carrying out semantic recognition on the audio data.

Further, the obtaining the local saliency degree of the target data point according to the amplitude value difference between the target data point and the reference data point and the amplitude value distribution of the target data point and the neighborhood data point comprises the following specific steps:

record the target data point as the firstNumber of piecesThe point is->Local significance of data points +.>The calculation method of (1) is as follows:

wherein,indicate->Amplitude values of the data points; />Indicate->The mean of the amplitude values of all reference data points of the data points; />Indicate->Amplitude values of neighbor data points of the data points; />Representing the maximum value of the amplitude values in all the data points; />Representing an acquisition variance value function; />Representing the acquisition of the maximum function.

Further, the method for obtaining the neighborhood distribution curves of the target data point and the first data point comprises the following specific steps:

sequentially connecting all neighbor data points of the target data points according to the sequence from left to right to obtain a neighbor distribution curve of the target data points;

and sequentially connecting all the neighborhood data points of the first data point with adjacent neighborhood data points according to the sequence from left to right to obtain a neighborhood distribution curve of the first data point.

Further, the step of obtaining the matching degree of the target data point and the data point to be calculated of the first data point according to the difference of the local saliency degree of the target data point and the first data point and the difference between the neighborhood distribution curves comprises the following specific steps:

dtw matching is carried out on the neighborhood distribution curve of the target data point and the neighborhood distribution curve of the first data point, and dtw distance between the neighborhood distribution curve of the target data point and the neighborhood distribution curve of the first data point is obtained;

record the target data point as the firstNo. 4 of clustering>Data points>The first data point is +.>No. 4 of clustering>Data points>Then->Is->Is +.>The calculation method of (1) is as follows:

wherein,indicate->No. 4 of clustering>A neighborhood distribution curve of data points; />Indicate->No. 4 of clustering>A neighborhood distribution curve of data points; />Representing neighborhood distribution curve +.>And neighborhood distribution curve->Dtw distance therebetween; />Indicate->No. 4 of clustering>Local significance level of data points; />Indicate->No. 4 of clustering>Local significance level of data points; />An exponential function based on a natural constant is represented.

Further, the step of obtaining the set of data points to be calculated of the target data point according to the matching degree of the data points to be calculated comprises the following specific steps:

acquiring the matching degree of the data points to be calculated between the target data point and all other first data points, recording the matching degree as a set of the matching degree of the data points to be calculated, and carrying out normalization processing on the matching degree of the data points to be calculated in the set of the matching degree of the data points to be calculated to obtain the matching degree of the data points to be calculated after normalization processing;

and presetting a to-be-calculated data point matching degree threshold, and for any one first data point, if the to-be-calculated data point matching degree between the target data point and the first data point is larger than the to-be-calculated data point matching degree threshold, taking the first data point as the to-be-calculated data point of the target data point, and forming to-be-calculated data points of all the target data points into a to-be-calculated data point set of the target data point.

Further, the step of obtaining the overall saliency degree of the target data point according to the difference of the clustering of the target data point and the data point to be calculated in the data point set to be calculated and the difference of the local saliency degree comprises the following specific steps:

the cluster where the data points to be calculated in the data point set to be calculated are located is marked as a cluster to be calculated;

for any one data point to be calculated in a data point set to be calculated of target data points, acquiring a normalized mutual information value between a cluster where the data point to be calculated is located and a cluster where the target data point is located;

and acquiring the overall significance degree of the target data point according to the normalized mutual information value between the cluster where the data point to be calculated is located and the cluster where the target data point is located.

Further, the step of obtaining the overall significance level of the target data point comprises the following specific steps:

record the target data point as the firstNo. 4 of clustering>Data points, the overall significance level of the target data pointsThe calculation method of (1) is as follows:

wherein,representing the number of clusters to be counted; />Indicate->No. 4 of clustering>Clusters where data points are located and +.>Normalized mutual information values among the clusters to be calculated; />Indicate->The number of data points to be calculated in the clusters to be calculated; />Indicate->The +.>Local significance levels of the individual data points to be calculated; />Indicate->The +.>A mean value of local significance levels of all reference data points of the plurality of data points to be calculated; />Indicate->The +.>Variance values of local saliency degrees of each data point to be calculated and all reference data points of the data points to be calculated; />Indicate->No. 4 of clustering>Local significance level of data points; />Indicate->No. 4 of clustering>A mean of the local significance levels of all reference data points of the data points; />Indicate->No. 4 of clustering>A variance value of local saliency degrees of a data point and all reference data points of the data point; />An exponential function based on a natural constant is represented.

Further, the updating of the cluster center of the first clustering result of the iterative self-organizing cluster according to the overall significance degree of the target data points comprises the following specific steps:

and for any data point in any cluster in the first clustering result, presetting an overall significance level threshold according to the overall significance level of the data point, and if the overall significance level of the data point is smaller than the overall significance level threshold, the data point participates in a cluster center updating process of the cluster of the first clustering result to acquire a new cluster center.

Further, the obtaining the abnormality degree of the data point according to the final iterative self-organizing clustering result, obtaining the denoised audio data according to the abnormality degree of the data point, and performing semantic recognition of the audio data, including the following specific steps:

recording any data point in the final iterative self-organizing clustering result as an object data point, and acquiring a clustering center of a cluster where the object data point is located to obtain a Euclidean distance between the object data point and the clustering center of the cluster;

obtaining Euclidean distances between all data points and the cluster center of the cluster in which the data points are located in the final iterative self-organizing clustering result, and performing linear normalization processing on the Euclidean distances between all data points and the cluster center of the cluster in which the data points are located as the abnormal degree of each data point;

presetting an abnormality degree threshold, and if the abnormality degree of the object data point is greater than the abnormality degree threshold, marking the object data point as an abnormality data point;

for any abnormal data point, removing the abnormal data point, taking the average value of the amplitude values of all neighborhood data points of the abnormal data point as a new amplitude value at the moment corresponding to the abnormal data point, further obtaining new audio data, and taking the new audio data as denoised audio data;

and according to the acquired denoised audio data, inputting the denoised audio data into a transducer model as input data to perform semantic recognition of the user audio data.

The invention also provides a digital human interaction dialogue system which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor realizes the steps of any digital human interaction dialogue method when executing the computer program.

The technical scheme of the invention has the beneficial effects that: according to the invention, the audio data in the interactive dialogue process of the user and the digital person are obtained, and the denoised audio data is obtained by utilizing an iterative self-organizing clustering algorithm. The method comprises the steps of acquiring a neighborhood data point and a reference data point of the data point, and acquiring the local significance degree according to the distribution difference between the target data point and the reference data point and combining the distribution characteristics between the data point and the neighborhood data point. And acquiring data points to be calculated of the data points according to the acquired local saliency degree and the distribution curve of the neighborhood data points, further acquiring the overall saliency degree of the data points, finally denoising the audio data, and carrying out semantic recognition on the audio data. In the process of quantifying the abnormal degree of each data point by using the traditional iterative self-organizing clustering result to remove noise, the abnormal data points caused by noise are used as updating basis for updating the clustering center, so that the defect that larger errors occur in the calculation of the abnormal degree of the data points is overcome, the accurate denoising of the audio data of a user can be realized, and the accuracy of semantic conversion and intelligent interaction dialogue of a subsequent digital human interaction dialogue system is improved.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flow chart of the steps of a digital human interactive dialogue method of the present invention.

Detailed Description

In order to further describe the technical means and effects adopted by the invention to achieve the preset aim, the following detailed description refers to the specific implementation, structure, characteristics and effects of a digital human interaction dialogue method and system according to the invention with reference to the accompanying drawings and preferred embodiments. In the following description, different "one embodiment" or "another embodiment" means that the embodiments are not necessarily the same. Furthermore, the particular features, structures, or characteristics of one or more embodiments may be combined in any suitable manner.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

The following specifically describes a specific scheme of a digital human interaction dialogue method and a system provided by the invention with reference to the accompanying drawings.

Referring to fig. 1, a flowchart of steps of a digital human interactive session method according to an embodiment of the present invention is shown, the method includes the following steps:

s001, collecting audio data of the digital human interaction dialogue system.

It should be noted that, in this embodiment, by acquiring audio data in the process of interaction between a user and a digital person, and analyzing each data point of the audio data, an iterative self-organizing clustering algorithm is utilized to implement quantization of noise degree of each data point, and the denoised audio data is acquired for subsequently completing semantic conversion and intelligent interaction of the digital person interaction dialogue system. Therefore, it is necessary to sample the voice of the user interacting with the digital person, convert the voice of the user into an electrical signal, and further acquire the audio data of the user.

Furthermore, the collected voice of the user is converted into an electric signal by an electric signal sensor in an audio data collecting module in the digital human interaction dialogue system, wherein the sampling frequency is 44.1kHz which is commonly used for the CD tone quality standard, the sampling time interval is a plurality of sound frames, the single sound frame is 50ms, and the embodiment is only an experience value according to the specific implementation situation of an implementer. Since the sound of the user is collected and environmental noise is contained, the collected audio data of the user is mixed audio data.

S002, obtaining a neighborhood data point and a reference data point of the data point, and obtaining the local significance degree of the target data point according to the distribution difference between the data point and the reference data point and combining the distribution characteristics between the data point and the neighborhood data point.

It should be noted that, during the digital human interactive session, the noise level of each data point in the audio data is quantized by collecting the audio data of the user. The iterative self-organizing clustering method is an unsupervised clustering method and is suitable for noise degree quantization of each data point of audio data. And clustering each data point of the audio data by an iterative self-organizing clustering method, and obtaining the Euclidean distance between each data point and the clustering center of the cluster in which the data point is located in a clustering result to quantify the abnormality degree of each data point so as to further characterize the noise degree of each data point. However, in each iteration process of the iterative self-organizing cluster, the cluster center of the cluster needs to be updated according to the data points in the cluster, but if the cluster contains a plurality of abnormal data points caused by noise in the updating process of the cluster center, the cluster center is caused to deviate in the next iteration process, an error clustering result is obtained, and further each data point in the audio data in the final iterative self-organizing cluster result is divided into each cluster in an error mode, so that larger error occurs in calculation of the abnormal degree of the data point, and accurate denoising of the audio data of a user is difficult to realize.

It should be further noted that, during the voice recognition of the user, the audio features of the audio data generated by the same phonemes in the voice of the user are very similar and present a certain intra-stage aggregation feature, meanwhile, the audio waveform is influenced by the phonemes in the voice of the user and presents a certain intra-stage aggregation feature, and during each clustering process of the iterative self-organizing cluster, analysis is performed according to the distribution feature between each data point of the cluster and other data points of the cluster in the previous clustering result, wherein each cluster represents the similarity feature of the same phoneme in the audio data, the distribution feature between each data point in the cluster represents the local significance degree of the data points of the audio data in the cluster, and if the local significance degree of the data points is larger, the distribution abnormality in the data points is reflected, that is, the probability of being influenced by noise in the audio data is larger. Because the noise has different time sequence distribution characteristics from the audio data, and the dynamic range of the noise is relatively smaller compared with the voice of a user, namely the influence degree of the noise on the audio data is limited, when the data points for analyzing the audio data analyze the local significance degree in the cluster, the distribution relation between the data points in other clusters needs to be comprehensively considered, the total significance degree of each data point in the audio data is further obtained, the cluster center is updated according to the total significance degree of each data point, and if the total significance degree of the data points is larger, the cluster center is not updated.

In particular, according to the collectionMixing audio data, constructing a two-dimensional sample space, wherein the amplitude value of each moment in the mixed audio data is put into a data point of the two-dimensional sample space, the abscissa of the two-dimensional sample space is the moment, and the ordinate is the amplitude value of the audio data, and then the corresponding coordinate of each data point in the two-dimensional sample space is obtained. For any data point, the data point is recorded as a target data point, and the neighborhood range of the data point is presetAdjacent front +.A.of the moment at which the target data point is acquired>The data points at each moment are marked as the front neighborhood data points, and the adjacent back +.>The data points at the moments are marked as the back neighborhood data points, and the front neighborhood data points and the back neighborhood data points are marked as the neighborhood data points of the target data points. It should be noted that if the number of data points in the front neighborhood or the rear neighborhood of the target data point is insufficient +.>And (3) calculating according to the actual data point number in the subsequent calculation process.

Further, iterative self-organizing clustering is carried out on all data points to obtain a first clustering result in the clustering process, wherein the first clustering result comprises a plurality of clusters. And in the first clustering result, marking other data points in the cluster where the target data point is located as reference data points of the target data point. Obtaining the local significance degree of the target data point according to the distribution difference between the target data point and the reference data point and combining the distribution characteristics between the target data point and the neighborhood data point, wherein the target data point is recorded as the first data pointData point, then->Local significance of data points +.>The calculation method of (1) is as follows:

wherein,indicate->Amplitude values of the data points; />Indicate->The mean of the amplitude values of all reference data points of the data points; />Indicate->Amplitude values of neighbor data points of the data points; />Representing the maximum value of the amplitude values in all the data points; />Representing an acquisition maximum function; />Indicating acquisition of->A variance value of an absolute value of the amplitude value difference between the data point and the neighborhood data point; />Representing an acquisition variance value function; wherein->The distribution difference between the target data point and the reference data point is represented, namely the amplitude significance degree of the target data point in the cluster is represented, the reference value of the local significance degree of the target data point is represented, and if the amplitude value difference between the target data point and the reference data point is larger, the local significance degree of the target data point is larger; by->To represent +.>The distribution characteristics of the data points in the time sequence in the audio data show that if the amplitude value difference between the target data point and the neighborhood data point is more discrete, the fluctuation change of the target data point in the local neighborhood range is larger, the degree of influence of noise is larger, and the amplitude value of the neighborhood data point is enhanced or weakened to different degrees compared with the amplitude value of the neighborhood data point, so that the degree of the reference value of the local significance degree needs to be amplified to be higher, namely the target data point is more significant; wherein a greater degree of local saliency of the target data point indicates a greater likelihood that the target data point is affected by noise.

S003, obtaining the matching degree of data points to be calculated among the data points according to the local significance degree of the obtained data points and the distribution curve of the neighborhood data points, and obtaining the data points to be calculated of the data points; and acquiring the overall significance degree of the data points according to the clusters of the data points and the difference of the local significance degrees among the clusters of the data points to be calculated.

It should be noted that, according to the local saliency degree of the obtained target data point, the saliency difference between the distribution characteristics between the target data point and the reference data point of the same phoneme may be represented. However, since the influence of noise on the audio data points is random in a local range, and the dynamic range of noise is relatively small as a whole, and the user's voice will have a large dynamic range through which the corresponding volume and intensity of sound change, i.e., if the audio signals characterized by the same phoneme are divided into clusters based on the influence of noise only according to the comparison of the target data point with the reference data point, it is necessary to comprehensively consider the distribution relation between the target data point and the data points in other clusters on the basis of the local significance degree of the acquired target data point. The neighborhood sound information of the same phonemes of the user sound is similar, so that data points to be calculated of the target data points in other clusters can be obtained, and the overall significance degree of the target data points is obtained according to the distribution relation of the target data points and the data points to be calculated, so that the significance degree of the target data points in the overall is represented.

Specifically, any one data point in the clusters other than the cluster where the target data point is located is recorded as a first data point, a reference data point and a neighborhood data point of the first data point are acquired, and the local significance degree of the first data point is acquired. Acquiring all neighborhood data points of the target data point, sequentially connecting all neighborhood data points of the target data point with adjacent neighborhood data points according to the sequence from left to right to obtain a neighborhood distribution curve of the target data point; similarly, obtaining a neighborhood distribution curve of the first data point; dtw matching is performed on the neighborhood distribution curve of the target data point and the neighborhood distribution curve of the first data point, so as to obtain dtw distance between the neighborhood distribution curve of the target data point and the neighborhood distribution curve of the first data point, wherein dtw matching is a known technology and is not repeated in the embodiment. According to the difference of the local significance degree of the target data point and the first data point and the difference between the neighborhood distribution curves, obtaining the matching degree of the target data point and the data point to be calculated of the first data point, wherein the target data point is recorded as the first data pointNo. 4 of clustering>Data points>The first data point is +.>No. 4 of clustering>Data points>Then->Is->Is +.>The calculation method of (1) is as follows:

wherein,indicate->No. 4 of clustering>A neighborhood distribution curve of data points; />Indicate->No. 4 of clustering>Personal dataA neighborhood distribution curve of points; />Representing neighborhood distribution curve +.>And neighborhood distribution curve->Dtw distance therebetween; />Indicate->No. 4 of clustering>Local significance level of data points; />Indicate->No. 4 of clustering>Local significance level of data points; />An exponential function based on natural constant is shown, and it is to be noted that +.>The model is only used for representing that the result output by the negative correlation and constraint model is in the interval of [0,1 ], and other models with the same purpose can be replaced in specific implementation, and the embodiment is only used for +_>The model is described as an example, without specific limitation, wherein +.>Representing the input of the model. It should be noted that, in the formula and mathematical model used in the present embodiment, there may be a case where the denominator is 0, but the present embodiment is easy to understand for the sake of description, so the case where the denominator is 0 is not handled; in the implementation, the molecular denominator is added by one at the same time, so that the non-implementation condition that the denominator is 0 can be avoided. Wherein->Representing the similarity of the neighborhood distribution between the target data point and the first data point, if the similarity of the neighborhood distribution of the two data points is larger, indicating that the neighborhood sound information between the two data points is more similar, wherein the first data point is the data point to be calculated of the target data point, the matching degree of the data point to be calculated of the target data point is higher, and the first data point is the data point to be calculated of the target data point; />And the local significance degree between the target data point and the first data point is represented, if the difference of the local significance degrees of the two data points is smaller, the influence degree of noise on the two data points is similar, and the matching degree of the data points to be calculated of the first data point is higher.

Further, obtaining the matching degree of the data point to be calculated between the target data point and all other first data points, recording the matching degree as a matching degree set of the data point to be calculated, carrying out normalization processing on the matching degree set of the data point to be calculated, obtaining the matching degree of the data point to be calculated after normalization processing, presetting the matching degree threshold of the data point to be calculated to be 0.58, and regarding any one first data point, if the matching degree of the data point to be calculated between the target data point and the first data point is greater than the matching degree threshold of the data point to be calculated, taking the first data point as the data point to be calculated of the target data point. It should be noted that the normalization function adopted in this embodiment isThe function of the function is that,wherein the normalization function and the data point matching degree threshold to be calculated can be determined according to specific implementation situations of an implementer. And marking all data points to be calculated of the target data points as a data point set to be calculated of the target data points, and marking clusters where the data points to be calculated in the data point set to be calculated are located as clusters to be calculated.

Further, for any one data point to be calculated in the data point set to be calculated of the target data point, NMI (Normalized Mutual Information, normalized mutual information value) between the cluster where the data point to be calculated is located and the cluster where the target data point is located is obtained as a similarity measurement index between the two clusters. The overall significance level of the target data point is obtained from the difference in the local significance levels of the target data point and the data points in the data point set to be calculated. Wherein the target data point is the firstNo. 4 of clustering>Data points, the overall significance degree of the target data points +>The calculation method of (1) is as follows:

wherein,representing the number of clusters to be counted; />Indicate->No. 4 of clustering>Clusters in which data points are locatedAnd->Normalized mutual information values among the clusters to be calculated; />Indicate->The number of data points to be calculated in the clusters to be calculated; />Indicate->The +.>Local significance levels of the individual data points to be calculated; />Indicate->The +.>A mean value of local significance levels of all reference data points of the plurality of data points to be calculated; />Indicate->The +.>Variance values of local saliency degrees of each data point to be calculated and all reference data points of the data points to be calculated; />Indicate->No. 4 of clustering>Local significance level of data points; />Indicate->No. 4 of clustering>A mean of the local significance levels of all reference data points of the data points; />Indicate->No. 4 of clustering>A variance value of local saliency degrees of a data point and all reference data points of the data point; />An exponential function based on natural constant is shown, and it is to be noted that +.>The model is only used for representing that the result output by the negative correlation and constraint model is in the interval of [0,1 ], and other models with the same purpose can be replaced in specific implementation, and the embodiment is only used for +_>The model is described as an example, without specific limitation, wherein +.>Representing the input of the model.Wherein->Representing the target data point and->The difference of the distribution characteristics between the data points to be calculated of the clusters to be calculated indicates that the connection between the target data point and the data point to be calculated is smaller if the difference of the distribution characteristics of the target data point in the cluster where the target data point is located is larger than the difference of the distribution characteristics of the data point to be calculated in the cluster where the target data point is located, and the influence degree of noise on the target data point is larger; by carrying out weighted average on the distribution characteristic differences between the target data point and all the clusters to be calculated, if the NMI value between the target data point and the clusters to be calculated is larger, the more similar the two clusters are, the larger the weight value of the corresponding clusters to be calculated in weighted average is.

S004, acquiring denoised audio data according to the overall significance degree of the acquired data points, and carrying out semantic recognition.

Specifically, for the first clustering result, the overall significance level of all data points is obtained. In the updating process of the first clustering center of the iterative self-organizing cluster, for any one data point in any one cluster in the first clustering result, a total saliency threshold value of 0.68 is preset according to the total saliency degree of the data point, if the total saliency degree of the data point is smaller than the total saliency degree threshold value, the data point participates in the center updating process of the cluster in the first clustering result, a new cluster center is obtained, and the next iterative self-organizing cluster process is performed, wherein the center updating process of the cluster is the process of an algorithm in the iterative self-organizing cluster and is a known technology, and the embodiment is not repeated. Similar operation can obtain other times of iterative self-organizing clustering results, and further obtain a final clustering result.

Further, marking any data point in the final iterative self-organizing clustering result as an object data point, and obtaining a clustering center of a cluster where the object data point is located to obtain a pairEuclidean distance between the image data point and the cluster center of the cluster; obtaining Euclidean distances between all data points and the clustering center of the cluster in which the data points are located in the final iterative self-organizing clustering result, and performing linear normalization processing on the Euclidean distances between all data points and the clustering center of the cluster in which the data points are located as the abnormality degree of each data point, wherein the normalization function used in the embodiment is thatA function; presetting an abnormality degree threshold value of 0.58, and if the abnormality degree of the object data point is greater than the abnormality degree threshold value, marking the object data point as an abnormality data point; and for any abnormal data point, removing the abnormal data point, taking the average value of the amplitude values of all neighborhood data points of the abnormal data point as a new amplitude value at the moment corresponding to the abnormal data point, further obtaining new audio data, and taking the new audio data as the denoised audio data. And according to the acquired denoised audio data, inputting the denoised audio data into a transducer model as input data to perform semantic recognition of the user audio data.

The invention also provides a digital human interaction dialogue system which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the steps S001-S004 are realized when the processor executes the computer program.

The above description is only of the preferred embodiments of the present invention and is not intended to limit the invention, but any modifications, equivalent substitutions, improvements, etc. within the principles of the present invention should be included in the scope of the present invention.

Claims

1. A digital human interactive dialogue method, characterized in that the method comprises the following steps:

recording any one data point in the clusters except the cluster where the target data point is located as a first data point; acquiring a neighborhood distribution curve of a target data point and a neighborhood distribution curve of a first data point;

the method for obtaining the matching degree of the target data point and the data point to be calculated of the first data point according to the difference of the local significance degree of the target data point and the first data point and the difference between the neighborhood distribution curves comprises the following specific steps:

record the target data point as the firstNo. 4 of clustering>Data points>The first data point is +.>No. 4 of clustering>Data pointsThen->Is->Is +.>The calculation method of (1) is as follows:

wherein,indicate->No. 4 of clustering>A neighborhood distribution curve of data points; />Indicate->No. 4 of clustering>A neighborhood distribution curve of data points; />Representing neighborhood distribution curve +.>And neighborhood distribution curve->Dtw distance therebetween; />Indicate->No. 4 of clustering>Local significance level of data points; />Indicate->No. 4 of clustering>Local significance level of data points; />An exponential function based on a natural constant;

the method for obtaining the overall saliency degree of the target data point according to the difference of the clustering of the target data point and the data point to be calculated in the data point set to be calculated and the difference of the local saliency degree comprises the following specific steps:

acquiring the overall significance degree of the target data point according to the normalized mutual information value between the cluster where the data point to be calculated is located and the cluster where the target data point is located;

the method for acquiring the overall significance degree of the target data point comprises the following specific steps:

record the target data point as the firstNo. 4 of clustering>Data points, the overall significance degree of the target data points +>The calculation method of (1) is as follows:

wherein,representing the number of clusters to be counted; />Indicate->No. 4 of clustering>Clusters where data points are located and the thNormalized mutual information values among the clusters to be calculated; />Indicate->The number of data points to be calculated in the clusters to be calculated;indicate->The +.>Local significance levels of the individual data points to be calculated; />Indicate->The +.>A mean value of local significance levels of all reference data points of the plurality of data points to be calculated; />Indicate->The +.>Variance values of local saliency degrees of each data point to be calculated and all reference data points of the data points to be calculated; />Indicate->No. 4 of clustering>Local significance level of data points; />Indicate->No. 4 of clustering>A mean of the local significance levels of all reference data points of the data points; />Indicate->The first cluster ofA variance value of local saliency degrees of a data point and all reference data points of the data point; />An exponential function based on a natural constant;

2. The method of claim 1, wherein the step of obtaining the local saliency of the target data point according to the amplitude value difference between the target data point and the reference data point and the amplitude value distribution of the target data point and the neighborhood data point comprises the following specific steps:

record the target data point as the firstData point, then->Local significance of data points +.>The calculation method of (1) is as follows:

3. The method for digital human interactive dialogue according to claim 1, wherein the steps of obtaining the neighborhood distribution curve of the target data point and the neighborhood distribution curve of the first data point comprise the following specific steps:

4. The method for digital human interactive dialogue according to claim 1, wherein the step of obtaining the set of data points to be calculated of the target data point according to the matching degree of the data points to be calculated comprises the following specific steps:

5. The method for digital human interactive dialogue according to claim 1, wherein updating the cluster center of the first clustering result of iterative self-organizing clusters according to the overall significance level of the target data points comprises the following specific steps:

6. The method for digital human interactive dialogue according to claim 1, wherein the obtaining the abnormal degree of the data point according to the final iterative self-organizing clustering result, obtaining the denoised audio data according to the abnormal degree of the data point, and performing semantic recognition of the audio data comprises the following specific steps:

7. A digital human interactive dialog system comprising a memory, a processor and a computer program stored in the memory and running on the processor, characterized in that the processor implements the steps of a digital human interactive dialog method as claimed in any of claims 1-6 when the computer program is executed by the processor.