CN116129926A - Natural language interaction information processing method for intelligent equipment - Google Patents

Natural language interaction information processing method for intelligent equipment Download PDF

Info

Publication number
CN116129926A
CN116129926A CN202310422056.5A CN202310422056A CN116129926A CN 116129926 A CN116129926 A CN 116129926A CN 202310422056 A CN202310422056 A CN 202310422056A CN 116129926 A CN116129926 A CN 116129926A
Authority
CN
China
Prior art keywords
range
frequency
voice signal
value
interaction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310422056.5A
Other languages
Chinese (zh)
Other versions
CN116129926B (en
Inventor
林皓
王留芳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing VRV Software Corp Ltd
Original Assignee
Beijing VRV Software Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing VRV Software Corp Ltd filed Critical Beijing VRV Software Corp Ltd
Priority to CN202310422056.5A priority Critical patent/CN116129926B/en
Publication of CN116129926A publication Critical patent/CN116129926A/en
Application granted granted Critical
Publication of CN116129926B publication Critical patent/CN116129926B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/02Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]

Abstract

The invention relates to the technical field of voice enhancement, in particular to a natural language interaction information processing method of intelligent equipment, which comprises the following steps: obtaining a spectrogram of a voice signal in the interaction process of intelligent equipment, segmenting the voice signal to obtain a time period range, performing super-pixel segmentation on the spectrogram of the voice signal to obtain a frequency range corresponding to a super-pixel region, analyzing the frequency range under each time period range to obtain a fitting reference weight value, performing self-adaptive same-frequency curve fitting according to the fitting reference weight value to obtain a same-frequency curve, performing inverse Fourier transform on the same-frequency curve to obtain an expanded interaction voice signal, and decomposing and denoising the obtained expanded interaction voice signal. The invention avoids the end effect problem in the traditional EMD algorithm, so that the obtained decomposition result is more accurate, and the denoising enhancement effect on the voice is improved.

Description

Natural language interaction information processing method for intelligent equipment
Technical Field
The invention relates to the technical field of voice enhancement, in particular to a natural language interaction information processing method of intelligent equipment.
Background
Natural language interactive information processing is a very important part in the field of artificial intelligence, and the main purpose of the natural language interactive information processing is to solve the man-machine conversation problem and enable intelligent equipment to understand content expressed by human beings. In the natural language interactive information processing process, the processing process needs multiple discipline theoretical knowledge and belongs to the cross disciplines. The natural language processing of the intelligent equipment still cannot meet the 'natural' interaction requirement at present, and one main reason is that the audio information environment in the interaction process is chaotic and is easy to be interfered by noise, so that the voice recognition rate is lower, and further, the intelligent equipment is caused to recognize by mistake and further cause to understand the content expressed by human in the natural language interaction process;
Because the audio information is easily interfered by noise, the recognized voice information often needs to be subjected to denoising processing in the preprocessing process of the natural language interaction information processing process. The EMD algorithm is an algorithm suitable for nonlinear and non-stationary signal processing, and is also applied to the processing process of audio information by decomposing a complex signal into a plurality of connotation modal components (IMF components) representing different frequencies and processing the different connotation modal components to achieve the purpose of denoising. In the decomposition process, the EMD algorithm decomposes with local extremum points, so that edge local extremum points appear at the end points, and then the end point effect is caused, so that the subsequent denoising effect is poor due to poor decomposition effect. The most common solution of the end effect is to expand the end information through data fitting, and the spectrogram is the information representation of the audio information, and has better information representation capability compared with the acquired sound wave image, so that the end effect problem in the EMD algorithm is solved by combining the audio information expression capability in the spectrogram and the data fitting algorithm, and the acquired decomposition result is more accurate, and the denoising effect is improved.
Disclosure of Invention
The invention provides a natural language interaction information processing method of intelligent equipment, which aims to solve the existing problems.
The intelligent device natural language interaction information processing method adopts the following technical scheme:
the invention provides a natural language interaction information processing method of intelligent equipment, which comprises the following steps:
obtaining a basic value according to the spectrogram of the historical voice signal, and obtaining a super-pixel region according to the spectrogram of the interactive voice signal;
executing the segmentation operation of the interactive voice signal to obtain a segmentation interval, comprising:
segmenting the interactive voice signal according to a preset interactive time period range to obtain a plurality of interactive time period ranges; obtaining the frequency range contained in each interaction time period range from the range contained in the maximum value and the minimum value of the frequency in the super pixel region, obtaining self-correlation according to the energy value difference corresponding to the adjacent frequency in the same frequency range at the same time point, and obtaining adjacent correlation according to the energy value difference corresponding to the same frequency range contained in the two interaction time period ranges respectively; the self-correlation and the adjacent correlation are subjected to weight fusion by utilizing the difference between the base value and the range size of the time period range to obtain the selection degree; under a plurality of preset time period ranges, a first interaction time period range with the largest selection degree is obtained and is recorded as a segmentation interval;
Intercepting the interactive voice signal in the first segmented section, and repeatedly executing the segmentation operation on the rest of the intercepted interactive voice signals until the interactive voice signal cannot be segmented any more, so as to obtain a plurality of segmented sections of the interactive voice signal;
the frequency range obtained by equally dividing is recorded as a sub-frequency range, a local range is obtained according to the minimum value between the frequency number in the sub-frequency range in all the segmented intervals and the interval length of all the segmented intervals, an energy curve is built in the local range according to the energy value difference of adjacent time points, and a fitting reference weight value is obtained according to the ratio of the number of the maximum value points of the energy curve to the number of the energy curve and the energy value average value in the local range;
performing curve fitting by using the energy value corresponding to the maximum fitting reference weight value to obtain a same-frequency curve, and transforming and expanding the same-frequency curve to obtain an expanded interactive voice signal;
and decomposing and denoising the expanded interactive voice signals to realize denoising enhancement on the interactive voice signals.
Further, the obtaining a basic value according to the spectrogram of the historical voice signal includes the following specific steps:
acquiring a spectrogram of the historical voice signal, and acquiring a time segment segmentation point according to the spectrogram, so as to obtain the first speech signal in the historical voice signal
Figure SMS_1
The time points are the preference degrees of the time segment segmentation points
Figure SMS_2
The acquisition method of (1) comprises the following steps:
Figure SMS_3
in the method, in the process of the invention,
Figure SMS_4
representing the first of the historical speech signals
Figure SMS_5
The maximum frequency at each time point, x represents any frequency in any frequency interval,
Figure SMS_6
represent the first
Figure SMS_7
The corresponding energy value at frequency x for each point in time,
Figure SMS_8
represent the first
Figure SMS_9
Corresponding energy value of each time point under the frequency x, S (a) represents the first place on the spectrogram
Figure SMS_10
Frequency intervals corresponding to the time points;
the degree of preference corresponding to any one time point
Figure SMS_11
When the time point is larger than a preset preference degree threshold value, the time point is taken as a time segment segmentation point, the obtained multiple time segment segmentation points are utilized to segment the historical voice signal, the time segment range of the multiple segmented historical voice signals is obtained and is recorded as a historical time segment range, the range size average value of the multiple historical time segment ranges is obtained and is recorded as a basic value。
Further, the method for obtaining the super-pixel region according to the spectrogram of the interactive voice signal comprises the following steps:
and segmenting the spectrogram of the interactive voice signal by using a super-pixel segmentation algorithm, and uniformly distributing a preset number of initial seed points in the spectrogram contained in any time range to obtain a plurality of super-pixel areas.
Further, the self-correlation is obtained by the following method:
Figure SMS_12
in the method, in the process of the invention,
Figure SMS_14
representing the self-correlation of the interactive voice signal within the u-th interactive time period,
Figure SMS_19
representing the number of frequency ranges contained within the u-th interaction period range,
Figure SMS_22
representing the ith interaction period in the range of the ith interaction period
Figure SMS_15
The frequency range of the frequency band is set,
Figure SMS_17
representing the ith interaction period in the range of the ith interaction period
Figure SMS_21
The number of frequencies of the frequency ranges;
Figure SMS_23
right endpoint lower than the right endpoint representing the ith interaction period range
Figure SMS_13
The energy value corresponding to the jth frequency in the frequency range,
Figure SMS_16
right endpoint lower-th of the right-hand side ith interaction period range representing the interaction period range
Figure SMS_18
The energy value corresponding to the j+1th frequency in the frequency range;
Figure SMS_20
an exponential function based on a natural constant is represented.
Further, the adjacent correlation is obtained by the following method:
Figure SMS_24
in the method, in the process of the invention,
Figure SMS_25
representing the adjacent correlation of the interactive voice signal within the u-th interactive period,
Figure SMS_28
representing the number of frequency ranges contained within the u-th interaction period range,
Figure SMS_31
representing the ith interaction period in the range of the ith interaction period
Figure SMS_27
The frequency range of the frequency band is set,
Figure SMS_29
representing the ith interaction period in the range of the ith interaction period
Figure SMS_32
The number of frequencies of the frequency ranges;
Figure SMS_34
right endpoint lower than the right endpoint representing the ith interaction period range
Figure SMS_26
The jth frequency pair in the frequency rangeThe amount of energy to be used,
Figure SMS_30
the left end point of the range representing the (u+1) th interaction period is lower than the (u+1) th interaction period
Figure SMS_33
The energy value corresponding to the j-th frequency in the frequency range.
Further, the selection degree is obtained by the following steps:
the method comprises the steps of presetting initial range size of a time range, taking preset fixed step length as an increment value of successive iteration of the initial range size, taking a time range corresponding to the time range after the initial range size is iteratively increased as a plurality of preset time range, obtaining self-correlation and adjacent correlation of interactive voice signal selection degree in the plurality of preset time range, and carrying out weight fusion on the self-correlation and the adjacent correlation by utilizing a basic value to obtain the selection degree, wherein the selection degree is as follows:
Figure SMS_35
in the method, in the process of the invention,
Figure SMS_36
the range size representing the range of the u-th interaction period,
Figure SMS_37
the base value is represented by a value of,
Figure SMS_38
representing the self-correlation of the interactive voice signal within the u-th interactive time period,
Figure SMS_39
representing the adjacent correlation of the interactive voice signal within the u-th interactive period,
Figure SMS_40
a range size representing a range of the ith interaction period is
Figure SMS_41
The degree of selection;
Figure SMS_42
an exponential function based on a natural constant is represented.
Further, the energy curve is obtained by the following steps:
starting from the point at the lower left corner of the local range, taking the abscissa as the direction, acquiring the point which has the smallest energy value difference with the starting point and the nearest frequency and is smaller than the preset energy value difference threshold value, if the condition is not met, not connecting, starting to connect with the last point at the lower left corner again, and the like, acquiring the connection sequence, and acquiring the energy curve according to the connection sequence and the energy value of each point.
Further, the fitting reference weight value is obtained by the following steps:
acquiring a peak point, namely a maximum point, of an energy curve of a local range where any point is located; equally dividing each frequency range into a plurality of sub-frequency ranges, analyzing each range as the same frequency, and obtaining the first frequency in any segment interval
Figure SMS_43
The first of the sub-frequency ranges
Figure SMS_44
Fitting reference weight values of the individual points:
Figure SMS_45
in the method, in the process of the invention,
Figure SMS_48
represent the first
Figure SMS_52
The first of the sub-frequency ranges
Figure SMS_55
The number of different time points corresponding to the maximum value points of the points in the local range,
Figure SMS_49
represent the first
Figure SMS_51
The first of the sub-frequency ranges
Figure SMS_54
The number of curves of the energy curve in the local range of the individual points,
Figure SMS_57
represent the first
Figure SMS_46
The first of the sub-frequency ranges
Figure SMS_50
The energy mean value in the local range of the individual points,
Figure SMS_53
represent the first
Figure SMS_56
The first of the sub-frequency ranges
Figure SMS_47
Fitting of individual points refers to the weight values.
Further, the implementation of denoising enhancement to the interactive voice signal specifically includes the following steps:
calculating a fitting reference weight value of each point in a spectrogram of the interactive voice signal, selecting an energy value of the maximum fitting reference weight value on each time point in a sub-frequency range to perform same-frequency curve fitting, obtaining fitted same-frequency curves for the energy values of the maximum fitting reference weight values on each time point in all the sub-frequency ranges, expanding the same-frequency curves obtained by fitting to the outside of the initial time point, further obtaining same-frequency curves under different frequencies after expansion, and performing inverse Fourier transform on the obtained expanded same-frequency curves to obtain expanded interactive voice signals;
EMD decomposition is carried out on the obtained expanded interactive voice signals to obtain a plurality of voice IMF components, and each voice IMF component is subjected to denoising by adopting a wavelet threshold denoising algorithm, so that denoising enhancement on the interactive voice signals is realized, and the interactive voice signals after denoising enhancement are obtained.
The technical scheme of the invention has the beneficial effects that: and the acquired audio information is expanded by adopting the same-frequency curve fitting and combining with the distribution characteristics in the spectrogram of the audio information, so that IMF components decomposed by an EMD algorithm are more accurate. The method comprises the steps of combining the distribution characteristics of audio information in a spectrogram, dividing the audio information into a plurality of time period ranges in the same-frequency curve fitting process, determining a sub-frequency range in each time period range, calculating fitting reference weight values of each point according to the regularity characteristics of energy value distribution in each sub-frequency range, further carrying out same-frequency curve fitting, expanding the audio information according to the obtained fitting curve, avoiding the end point effect problem in the traditional EMD algorithm, and enabling the obtained decomposition result to improve the voice denoising enhancement effect more accurately.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of steps of a natural language interaction information processing method of an intelligent device.
Detailed Description
In order to further explain the technical means and effects adopted by the invention to achieve the preset aim, the following is a detailed description of the specific implementation, structure, characteristics and effects of the data management method for the secure operation and maintenance system according to the invention, namely the intelligent device natural language interaction information processing method. In the following description, different "one embodiment" or "another embodiment" means that the embodiments are not necessarily the same. Furthermore, the particular features, structures, or characteristics of one or more embodiments may be combined in any suitable manner.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
The following specifically describes a specific scheme of the intelligent device natural language interaction information processing method provided by the invention with reference to the accompanying drawings.
Referring to fig. 1, a flowchart of steps of a method for processing natural language interaction information of an intelligent device according to an embodiment of the present invention is shown, where the method includes the following steps:
step S001, acquiring a history voice signal and an interaction voice signal in a man-machine interaction process by using a voice sensor in the intelligent equipment and acquiring a corresponding spectrogram.
The method comprises the steps that a voice sensor in the intelligent equipment is used for collecting and acquiring historical voice signals and interactive voice signals in the interaction process of a user and the intelligent equipment, the collected historical voice signals and the interactive voice signals are collectively called as voice signals, the model of the voice sensor is not set in the embodiment, and the meaning modal components of a plurality of voice signals can be obtained through EMD decomposition after adaptive same-frequency curve fitting in the follow-up step of the interactive voice signals collected by the voice sensor and are recorded as voice IMF components.
Converting the collected historical voice signals and the collected interactive voice signals into corresponding spectrograms: because the power spectrum of the voice signal decreases along with the increase of the frequency, most of the energy of the voice is concentrated in the low frequency part, so that the signal to noise ratio of the high frequency part is very low, a first-order high-pass filter is generally used for improving the signal to noise ratio of the signal in the high frequency part, after pre-emphasis is carried out on the voice, then framing and windowing operation is carried out, short-time Fourier transform processing is carried out on the voice signal with the set frame length and sampling frequency, so that a spectrogram is generated, wherein the specific process for generating the spectrogram is a known technology, and is not repeated in the embodiment. The sampling frequency is set to be 4kH, the frame length is set to be 20ms (namely, the corresponding narrow-band spectrogram), the window function is a hamming window function, and the empirical reference value is given according to the specific implementation situation of an implementer. The horizontal axis of the spectrogram is time, the vertical axis is frequency, and the pixel value of each point in the spectrogram is an energy value, in this embodiment, a linear normalization function is adopted to normalize all the energy values, and the normalized values are multiplied by 255, i.e. the energy values are in the range of 0-255.
Step S002, obtaining self-adaptive segmentation points according to the distribution characteristics of the spectrogram, and carrying out segmentation processing on the interactive voice signals to obtain the interactive voice signals with different time range sizes.
Because the collected voice signals are easily interfered by noise, each voice IMF component decomposed by the EMD algorithm is subjected to denoising processing in the traditional method. In the processing process, the EMD decomposition algorithm is decomposed by using local extreme points, so that edge local extreme points appear at the end points, and then the end point effect is caused, so that the subsequent denoising effect is poor due to poor decomposition effect. Therefore, the present embodiment uses the same-frequency curve fitting (Frequency Warping) algorithm to expand the voice signal, so as to avoid the end benefit.
In addition, the spectrogram is a characteristic feature of the voice signal on the frequency distribution, so that the corresponding distribution feature of the voice signal of the spectrogram can be combined to perform the same-frequency curve fitting, and in the same-frequency curve fitting process, a plurality of same-frequency curves exist, so that how to acquire a precise same-frequency curve for representing the voice signal feature determines the fitting effect, namely the solving effect of the end point effect in the subsequent EMD decomposition process.
Because the voice signal has intermittent characteristics (namely, a word exists or a word corresponds to some voice signals under the noiseless distribution of the voice signal), the voice signal is required to be divided into a plurality of time segment ranges in the same-frequency curve fitting process, and the same-frequency curve fitting is carried out on each time segment range.
After the voice signal is converted into the spectrogram, the intermittent characteristic of the voice signal and the same spectrogram signal of the signal represented by the local signal are required to be analyzed for representation when the same-frequency curve fitting is carried out, so that the spectrogram converted from the voice signal is required to be divided into a plurality of time segment ranges, and the voice signal in each time segment range is subjected to respective same-frequency curve fitting, and the length of each divided time segment range is acquired according to the distribution characteristic of the spectrogram because the spectrogram characteristics of the voice signal are similar in the same time segment.
In the use process of the intelligent device, the speaking modes or habits of each person often have different speaking habits, and each corresponding person has a speaking habit which is unique to the person, so that the speaking habit of the user can be quantified through the historical voice signal (the voice signal which is not existed or is affected by noise to a small extent can be recorded in a quiet environment as the historical voice signal, and thus the noise is not influenced in the default historical voice signal), and the basic value of the time period range is quantified according to the speaking habit, and the specific process is as follows:
Analyzing the spectrogram obtained by converting the historical voice signals, analyzing the spectrogram frequency in any time range (namely analyzing the spectrogram frequency in a single time vertical to the transverse axis), wherein the voice signals in the same time range are characterized by obvious differences between two adjacent sides of a straight line characterized by a certain time point vertical to the transverse axis, so that the voice signals are divided into a plurality of time ranges by acquiring time segment segmentation points;
the abscissa of each pixel point on the spectrogram represents a time point, the ordinate represents a frequency, the gray value represents an energy value, and the spectrogram can be known
Figure SMS_58
A time point corresponds to a frequency interval, denoted S (a), any one of the frequencies in the interval being denoted x,
Figure SMS_59
S(a)。
then the first of the historical voice signals
Figure SMS_60
The time points are the preference degrees of the time segment segmentation points
Figure SMS_61
The acquisition method of (1) comprises the following steps:
Figure SMS_62
in the method, in the process of the invention,
Figure SMS_63
representing the first of the historical speech signals
Figure SMS_64
Maximum frequency of each time point, S (a) represents the first on the spectrogram
Figure SMS_65
A frequency interval corresponding to each time point, x represents an arbitrary frequency in the frequency interval,
Figure SMS_66
represent the first
Figure SMS_67
The corresponding energy value at frequency x for each point in time,
Figure SMS_68
represent the first
Figure SMS_69
Corresponding energy values of the time points at the frequency x;
Wherein, if the first
Figure SMS_70
Adjacent time points at each time point (i.e
Figure SMS_71
And
Figure SMS_72
the larger the average energy value difference of the time points) under different frequencies, the larger the energy value difference of the spectrograms of the adjacent time points of the time points is, and the corresponding adjacent time points are characterized by different audio information.
Calculating the preference degree for all time points of the historical data, performing linear normalization processing on the preference degree of all time points, and setting a preference degree threshold value
Figure SMS_73
(depending on the implementation of the implementation, the embodiment is given as an empirical reference value), if the preference degree of a certain time point is greater than the preference degree threshold value, the time point is indicated to be a time segment segmentation point for dividing the historical voice signal into a plurality of time segment ranges, and a plurality of time segment segmentation points are obtained through threshold value judgment.
Segmenting the historical voice signals according to the obtained time segment segmentation points of the historical voice signals to obtain time segment ranges of a plurality of historical voice signals, namely the historical time segment ranges for short, obtaining the average value of the lengths of all the time segment ranges (namely the time length of the interval corresponding to the two adjacent time segment segmentation points), and recording the average value as a basic value
Figure SMS_74
In addition, the spectrogram of the interactive voice signal is segmented by using the superpixel segmentation algorithm, and the obtained frequency range of the superpixel region is a range formed by the minimum value and the maximum value of the ordinate, wherein the number of initial seed points of the superpixel segmentation algorithm is set to 15, the spectrogram contained in any time range is evenly distributed, a plurality of superpixel regions are obtained, the frequency in each superpixel region is similar, and each superpixel region is formed by the corresponding maximum frequency and minimum frequency, namely, one superpixel region corresponds to one frequency range, it is required to be noted that the time range is obtained by segmenting the time corresponding to the abscissa, the corresponding time range contains all the vertical axis information, namely, the frequency in the range, and the superpixel segmentation is a known technology, which is not repeated in the embodiment.
When dividing the interactive voice signal, taking an initial value W=5 as the initial range size of the interactive time period range of the interactive voice signal, taking the time period range of the interactive voice signal as the interactive time period range for short, setting a value 2 as the step size of the interactive time period range for carrying out iterative increment, taking the time point of the initial part (namely the leftmost side) of the interactive voice signal as the beginning, dividing the interactive voice signal by taking the initial range size W=5 of the interactive time period range to obtain the divided initial interactive time period range, carrying out iterative adjustment on the initial interactive time period range by utilizing the following step (2) and step (3), obtaining a first interactive time period range after the adjustment is completed, recording the first interactive time period range as a segmented section, intercepting the corresponding interactive voice signal of the first segmented section, taking the time point of the leftmost side of the intercepted residual voice signal as the beginning again, and so on to obtain all interactive time period ranges and range sizes of the interactive voice signal;
Step (2), when the initial range size W of the time period range of the interactive voice signal is increased in successive iterations with step size 2, adjusting the iteration process of the initial range size W:
it should be noted that, the energy values and frequencies of the voice signals in the same time period in the spectrogram should be similar, the energy values in one time period are similar and concentrated in the spectrogram, the adjacent time periods have larger difference in voice signals in the adjacent time period because of the difference and the non-concentration of the energy values, but in the segmentation process of the spectrogram of the currently acquired audio information, the voice signals in the voice interaction process of the user and the intelligent device are segmented according to the obtained segmentation points of the time period, so that larger errors can occur, and the embodiment iteratively adjusts the size of the time period range according to the self-correlation of the interactive voice signals in the same time period and the adjacent correlation in the adjacent time period to obtain the selection degree of the size of the time period range obtained by segmenting the acquired interactive voice signals so as to realize better segmentation processing of the interactive voice signals;
In addition, the time range of each super pixel area on the abscissa is obtained, all the super pixel areas, which are intersected with the u-th interaction time period range, of the time range corresponding to the super pixel area are obtained, and the frequency ranges of all the intersected super pixel areas are recorded as the frequency ranges contained in the u-th interaction time period range; in this embodiment, the rightmost corresponding time point of the u-th interaction time period range of the horizontal axis (i.e., the time point on the leftmost side of the time axis is 1 and gradually increases to the right) in the spectrogram of the interaction speech signal is marked as the right end point of the u-th interaction time period range, and the leftmost corresponding time point of the u-th interaction time period range is marked as the left end point of the u-th interaction time period range; there is also a portion where intersections exist with the plurality of super pixel regions at one time point, and the frequency ranges corresponding to the portions where intersections exist with the plurality of super pixel regions at one time point are recorded as the left end point or the right lower end point of the interaction time period range
Figure SMS_75
A frequency range.
The self-correlation of the interactive voice signal in the interactive time period range is as follows:
Figure SMS_76
in the method, in the process of the invention,
Figure SMS_79
representing the self-correlation of the interactive voice signal within the u-th interactive time period,
Figure SMS_82
Representing the number of frequency ranges contained within the u-th interaction period range,
Figure SMS_84
representing the ith frequency range within the ith interaction period range,
Figure SMS_78
representing the ith interaction period in the range of the ith interaction period
Figure SMS_81
The number of frequencies of the frequency ranges;
Figure SMS_85
right endpoint lower than the right endpoint representing the ith interaction period range
Figure SMS_86
The energy value corresponding to the jth frequency in the frequency range,
Figure SMS_77
right endpoint lower than the right endpoint representing the ith interaction period range
Figure SMS_80
The energy value corresponding to the j+1th frequency in the frequency range;
Figure SMS_83
an exponential function that is based on a natural constant;
because the voice information has different expressive ability at different frequencies, there are different energy values expressed in a plurality of frequency ranges, so when analyzing the degree of self-correlation, it is necessary to analyze the difference value between adjacent frequencies according to each frequency range, if the difference value between adjacent frequencies in the same frequency range is smaller, it indicates that the degree of self-correlation of the interactive voice signal is larger in the range of the interactive time period.
Adjacent correlation of interactive voice signals within an interactive time period:
Figure SMS_87
in the method, in the process of the invention,
Figure SMS_89
representing the adjacent correlation of the interactive voice signal within the u-th interactive period,
Figure SMS_92
representing the number of frequency ranges contained within the u-th interaction period range,
Figure SMS_96
Representing the ith interaction period in the range of the ith interaction period
Figure SMS_90
The frequency range of the frequency band is set,
Figure SMS_93
representing the ith interaction period in the range of the ith interaction period
Figure SMS_95
The number of frequencies of the frequency ranges;
Figure SMS_97
right endpoint lower than the right endpoint representing the ith interaction period range
Figure SMS_88
The energy value corresponding to the jth frequency in the frequency range,
Figure SMS_91
the left end point of the range representing the (u+1) th interaction period is lower than the (u+1) th interaction period
Figure SMS_94
The energy value corresponding to the j-th frequency in the frequency range;
according to the above analysis, after the analysis time period ranges are divided in the adjacent frequency ranges, the difference of the energy values between the same frequencies between the adjacent time points indicates that the adjacent correlation degree of the interactive voice signal corresponding to the interactive time period range is smaller if the difference of the energy values between the same frequencies between the adjacent time points is larger.
Step (3), when the initial range size W of the range of the ith interaction time period is increased by taking the value size 2 as an increasing step length, obtaining a selection degree once every time the range size is increased to judge whether the corresponding range size is selected asThe final range size of the u-th interaction period range is iteratively increased to be greater than or equal to the initial range size W of the u-th interaction period range
Figure SMS_98
Figure SMS_99
Representing the base value), stopping the increase; the degree of selection
Figure SMS_100
Figure SMS_101
In the method, in the process of the invention,
Figure SMS_102
the range size representing the range of the u-th interaction period,
Figure SMS_103
the base value is represented by a value of,
Figure SMS_104
representing the self-correlation of the interactive voice signal within the u-th interactive time period,
Figure SMS_105
representing the adjacent correlation of the interactive voice signal within the u-th interactive period,
Figure SMS_106
a range size representing a range of the ith interaction period is
Figure SMS_107
The degree of selection to be made is that,
Figure SMS_108
an exponential function based on a natural constant is represented.
According to the analysis of the content, the degree of selection
Figure SMS_109
After the initial range size representing the u-th interaction time period range is iteratively increased, the corresponding range size may be selected as a degree of range size of the final interaction time period range
Figure SMS_110
The larger the interactive voice signal is, the larger the self-correlation degree of the corresponding interactive voice signal is in the interactive time period range, and the smaller the adjacent correlation degree is, the complete voice signal characteristics are contained in the interactive time period range, and the distinction between the interactive voice signal and other adjacent voice signals is obvious.
If the range size of the u-th interaction time period range and the basic value obtained according to the historical voice signal
Figure SMS_111
The smaller the difference value is, the corresponding interactive voice signal in the interactive time period accords with the speaking habit of the user, so that the degree of self-correlation of the interactive voice signal in the u-th interactive time period is needed to be considered, the degree of adjacent correlation is prevented from being considered too much, and the influence of noise is avoided too much; the larger the difference value between the size of the u-th interaction time period range and the basic value is, the more the difference value indicates that the time period of the interaction voice signal in the current time period range does not accord with the speaking habit of the user, so that the adjacent correlation degree of the corresponding interaction voice signal in the current interaction time period range needs to be considered, and the situation that the difference or the lack of the complete voice information characteristic is contained due to the fact that the divided interaction time period range is too large and too small is avoided, so that the segmentation effect is poor is avoided.
When the interactive voice signal is segmented by utilizing the initial range size which is increased continuously and iteratively, the termination condition of the iteration is that
Figure SMS_112
Wherein
Figure SMS_113
Representing the base value when
Figure SMS_114
When the iteration increase is stopped, the iteration increase number is counted as A times, and it should be noted that when the initial range size cannot be iteratively increased to be similar to that of
Figure SMS_115
When the interactive voice signal is stored, the part of the interactive voice signal is reserved for subsequent calculation; and calculating the corresponding selection degree of the initial range size W after each iteration increase, namely, the range size obtained by each iteration increase corresponds to one selection degree, obtaining the corresponding selection degree of A in the iteration increase A times, obtaining the largest selection degree of A selection degrees, taking the range size corresponding to the largest selection degree as the final range size of the u-th interaction time range for dividing the interaction voice signal, and recording the final u-th interaction time range as a segmentation interval, so as to obtain the u-th segmentation interval and interval size obtained by dividing the interaction voice signal, and only keeping the first segmentation interval and interval size when u=1.
And (4) intercepting the interactive voice signal corresponding to the first segmentation interval, dividing the intercepted residual interactive voice signal again, taking the leftmost time point of the intercepted residual interactive voice signal as a starting point, and repeating the steps similar to the operations of the step (1), the step (2) and the step (3) until the acquired interactive voice signal is divided into a plurality of interactive voice signals of different segmentation intervals, so as to realize the self-adaptive division of the interactive voice signal.
So far, according to the distribution characteristics of the spectrogram, the self-adaptive interaction time period range is obtained, and the self-adaptive interaction time period range is utilized to segment the interaction voice signals, so that the interaction voice signals of a plurality of different segmentation intervals are obtained.
And step S003, performing same-frequency curve fitting on the interactive voice signals in each segmented interval, performing interactive voice signal expansion according to the fitting result, and performing EMD decomposition to obtain a plurality of voice IMF components.
And calculating the adaptively divided interactive voice signals according to the steps, wherein the interactive voice signals in each segment interval represent the characteristics of the same voice signal. In each time period, the frequency of the voice signal has very remarkable energy value change in all frequencies, and in any time period, the energy value in a local range of the frequency has obvious regularity characteristic at a certain time point, for example, the peak point of the energy at different frequencies appears at the same time point, and a regular peak characteristic is presented, so that the point which has remarkable energy in the same-frequency curve fitting process and has stronger regularity is larger in the fitting process, and the fitting reference weight value is larger.
And utilizing the interactive voice signals to carry out self-adaptive division on the segmented intervals, wherein each segmented interval represents the signal characteristics of the same voice, and carrying out the same-frequency curve fitting in a single time period. According to the obtained frequency range (super-pixel division obtaining frequency range) in each segment interval, the frequency range is equally divided into 30 sub-frequency ranges (which can be given as experience reference values according to the specific implementation situation of an implementer), and each range is used as the same frequency for analysis.
Acquiring the first of any segmented intervals
Figure SMS_116
The first of the sub-frequency ranges
Figure SMS_117
Fitting reference weight values of the individual points:
in the first place
Figure SMS_118
The first of the sub-frequency ranges
Figure SMS_119
The local range is built by taking the points as the central points, wherein the local range can be obtained according to the range sizes of the sub-frequency range and the time period, and the size of the local range is set
Figure SMS_120
The specific acquisition method comprises the following steps:
Figure SMS_121
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure SMS_123
the representation is rounded down and up,
Figure SMS_128
represent the first
Figure SMS_130
The first of the sub-frequency ranges
Figure SMS_125
The number of frequencies in the sub-frequency range in which the point is located,
Figure SMS_127
represent the first
Figure SMS_131
The first of the sub-frequency ranges
Figure SMS_133
The range size of the time period range in which the individual points are located,
Figure SMS_122
represent the first
Figure SMS_126
The first of the sub-frequency ranges
Figure SMS_129
The size of the local range of the individual points;
Figure SMS_132
The representation takes the minimum value of the value,
Figure SMS_124
indicating that the local range in odd-numbered sizes is obtained from the rounded-down values.
Within the local rangeConstructing the energy curve with the same value, wherein the point at the left lower corner of the local range is taken as a starting point, the point closest to the minimum frequency of the energy value difference of the starting point and smaller than the energy value difference threshold (energy value difference threshold) is obtained by taking the abscissa as the direction (namely, the point connected with a certain frequency at the next moment)
Figure SMS_134
) If the condition is not met, connection is not carried out, connection is started at another point (the last point at the lower left corner is started), and the connection sequence is obtained, and an energy curve is obtained according to the connection sequence and the energy value of each point; after the calculation of the point with the smallest time point in the local range (the leftmost point in the range) is completed, the energy curve is constructed for the starting point of the point which is not connected subsequently. The peak point, i.e. the maximum point, of each energy curve is obtained in all the energy curves.
The corresponding first segment of the interactive voice signal in the section of the ith segment of the spectrogram
Figure SMS_135
The first of the sub-frequency ranges
Figure SMS_136
Fitting reference weight values of individual points
Figure SMS_137
The acquisition method of (1) comprises the following steps:
Figure SMS_138
In the method, in the process of the invention,
Figure SMS_141
represent the first
Figure SMS_144
The first of the sub-frequency ranges
Figure SMS_148
Different corresponding times of peak points of points in local rangeThe number of the intermediate points is equal to the number of the intermediate points,
Figure SMS_142
represent the first
Figure SMS_145
The first of the sub-frequency ranges
Figure SMS_147
The number of curves of the energy curve in the local range of the individual points,
Figure SMS_150
represent the first
Figure SMS_139
The first of the sub-frequency ranges
Figure SMS_143
The energy mean value in the local range of the individual points,
Figure SMS_146
represent the first
Figure SMS_149
The first of the sub-frequency ranges
Figure SMS_140
Fitting of individual points refers to the weight values.
The energy average value of all the points in the local range of the point is taken as the basic value of the fitting reference weight value, wherein the larger the energy average value is, the more the characteristic signals of the voices contained in the local range of the point are indicated, the more important the corresponding point is, namely, the fitting reference weight value is larger, but the energy distribution regularity in the local range of the point is used for representing the credibility degree due to the influence of noise, and if the energy distribution signal is irregular (namely, the position with the largest corresponding equivalent energy value does not appear on the same time point) in the local range of the point, the irregular energy distribution signal in the local range of the corresponding point is indicated, and the lower the credibility degree of the point is indicated, and the smaller the corresponding fitting reference weight value is indicated.
Calculating fitting reference weight values of all points in all sub-frequency ranges, selecting energy values of the maximum fitting reference weight values on all time points in the sub-frequency ranges to perform same-frequency curve fitting, performing similar operation, obtaining fitted same-frequency curves in all the sub-frequency ranges, expanding the fitted same-frequency curves out of the initial time points (it is to be noted that when expanding, curve expression equations of the same-frequency curves obtained by known fitting are input into independent variables-1, -2, …, A, and corresponding energy values are output, so that expansion out of the initial time points is achieved, wherein A represents the degree of expansion, A= -5 is preset according to experience, the same-frequency curves of the expanded time points in different frequencies are obtained, and inverse Fourier transformation is performed on the obtained expanded same-frequency curves to obtain expanded interactive voice signals. The same frequency curve fitting and the inverse fourier transform are known techniques, and are not described in detail in this embodiment.
EMD decomposition is performed on the obtained expanded interactive voice signal to obtain a plurality of voice IMF components for subsequent denoising, wherein the EMD decomposition is a known technique, and is not repeated in the embodiment.
And performing same-frequency curve fitting on the interactive voice signals in each segmented interval, performing signal expansion according to the result of the same-frequency curve fitting to obtain expanded interactive voice signals, and performing EMD decomposition to obtain a plurality of voice IMF components.
Step S004, denoising each voice IMF component, and reconstructing the denoised voice IMF component to obtain denoised interactive voice signals.
According to the voice IMF components of the expanded interactive voice signals obtained through calculation in the steps, denoising each voice IMF component by adopting a wavelet threshold denoising algorithm, wherein the number of layers of wavelet decomposition is set to be 5, a threshold function of the wavelet threshold denoising algorithm adopts an existing soft threshold function, wherein wavelet threshold denoising is a known technology, and details are omitted in the embodiment. And reconstructing each voice IMF component after wavelet threshold denoising (reconstructing in an EMD algorithm) to obtain the interactive voice signal after denoising and enhancing.
Finally, it should be noted that, the energy value in this embodiment is the gray value of each point (i.e. pixel point) in the spectrogram corresponding to the voice signal.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the invention.

Claims (9)

1. The natural language interaction information processing method of the intelligent equipment is characterized by comprising the following steps of:
Obtaining a basic value according to the spectrogram of the historical voice signal, and obtaining a super-pixel region according to the spectrogram of the interactive voice signal;
executing the segmentation operation of the interactive voice signal to obtain a segmentation interval, comprising:
segmenting the interactive voice signal according to a preset interactive time period range to obtain a plurality of interactive time period ranges; obtaining the frequency range contained in each interaction time period range from the range contained in the maximum value and the minimum value of the frequency in the super pixel region, obtaining self-correlation according to the energy value difference corresponding to the adjacent frequency in the same frequency range at the same time point, and obtaining adjacent correlation according to the energy value difference corresponding to the same frequency range contained in the two interaction time period ranges respectively; the self-correlation and the adjacent correlation are subjected to weight fusion by utilizing the difference between the base value and the range size of the time period range to obtain the selection degree; under a plurality of preset time period ranges, a first interaction time period range with the largest selection degree is obtained and is recorded as a segmentation interval;
intercepting the interactive voice signal in the first segmented section, and repeatedly executing the segmentation operation on the rest of the intercepted interactive voice signals until the interactive voice signal cannot be segmented any more, so as to obtain a plurality of segmented sections of the interactive voice signal;
The frequency range obtained by equally dividing is recorded as a sub-frequency range, a local range is obtained according to the minimum value between the frequency number in the sub-frequency range in all the segmented intervals and the interval length of all the segmented intervals, an energy curve is built in the local range according to the energy value difference of adjacent time points, and a fitting reference weight value is obtained according to the ratio of the number of the maximum value points of the energy curve to the number of the energy curve and the energy value average value in the local range;
performing curve fitting by using the energy value corresponding to the maximum fitting reference weight value to obtain a same-frequency curve, and transforming and expanding the same-frequency curve to obtain an expanded interactive voice signal;
and decomposing and denoising the expanded interactive voice signals to realize denoising enhancement on the interactive voice signals.
2. The method for processing natural language interaction information of intelligent equipment according to claim 1, wherein the obtaining a basic value according to a spectrogram of a historical voice signal comprises the following specific steps:
acquiring a spectrogram of the historical voice signal, and acquiring a time segment segmentation point according to the spectrogram, so as to obtain the first speech signal in the historical voice signal
Figure QLYQS_1
The time points are the preference degree +. >
Figure QLYQS_2
The acquisition method of (1) comprises the following steps:
Figure QLYQS_3
in the method, in the process of the invention,
Figure QLYQS_4
representing the +.>
Figure QLYQS_5
Maximum frequency of each time point, x represents any frequency in any frequency interval, +.>
Figure QLYQS_6
Indicate->
Figure QLYQS_7
Energy values corresponding to the time points at frequency x, < >>
Figure QLYQS_8
Indicate->
Figure QLYQS_9
The corresponding energy value of the time points at the frequency x, S (a) represents +.>
Figure QLYQS_10
Frequency intervals corresponding to the time points;
the degree of preference corresponding to any one time point
Figure QLYQS_11
When the time point is larger than a preset preference degree threshold value, the time point is used as a time period segmentation point, the obtained multiple time period segmentation points are utilized to carry out segmentation processing on the historical voice signals, the time period range of the multiple segmented historical voice signals is obtained and is recorded as a historical time period range, and the range size average value of the multiple historical time period ranges is obtained and is recorded as a basic value.
3. The method for processing natural language interaction information of intelligent equipment according to claim 1, wherein the method for obtaining the super-pixel region according to the spectrogram of the interaction voice signal is as follows:
and segmenting the spectrogram of the interactive voice signal by using a super-pixel segmentation algorithm, and uniformly distributing a preset number of initial seed points in the spectrogram contained in any time range to obtain a plurality of super-pixel areas.
4. The method for processing natural language interaction information of intelligent equipment according to claim 1, wherein the self-correlation is obtained by the following steps:
Figure QLYQS_12
in the method, in the process of the invention,
Figure QLYQS_15
representing the self-correlation of the interactive voice signal within the u-th interactive time period,/for the interactive voice signal>
Figure QLYQS_17
Representing the number of frequency ranges contained in the u-th interaction period range,/for the number of frequency ranges contained in the u-th interaction period range>
Figure QLYQS_21
Representing the +.>
Figure QLYQS_14
The frequency range of the frequency band is set,
Figure QLYQS_18
representing the +.>
Figure QLYQS_20
The number of frequencies of the frequency ranges; />
Figure QLYQS_23
The right side end point representing the u-th interaction period range is down +.>
Figure QLYQS_13
Energy value corresponding to the jth frequency in the frequency range,/->
Figure QLYQS_16
Right end point of the right side ith interaction period range representing the right side ith interaction period range of the interaction period range +.>
Figure QLYQS_19
The j+1th frequency in the frequency range corresponds toAn energy value; />
Figure QLYQS_22
An exponential function based on a natural constant is represented.
5. The method for processing natural language interaction information of intelligent equipment according to claim 1, wherein the method for acquiring the adjacent correlation is as follows:
Figure QLYQS_24
in the method, in the process of the invention,
Figure QLYQS_26
representing the adjacent correlation of the interactive voice signal within the u-th interactive time period,/for the interactive voice signal>
Figure QLYQS_30
Representing the number of frequency ranges contained in the u-th interaction period range,/for the number of frequency ranges contained in the u-th interaction period range >
Figure QLYQS_33
Representing the +.>
Figure QLYQS_27
The frequency range of the frequency band is set,
Figure QLYQS_29
representing the +.>
Figure QLYQS_32
The number of frequencies of the frequency ranges; />
Figure QLYQS_34
The right side end point representing the u-th interaction period range is down +.>
Figure QLYQS_25
Energy value corresponding to the jth frequency in the frequency range,/->
Figure QLYQS_28
Left end point of the range representing the (u+1) th interaction period is lower than the (n)>
Figure QLYQS_31
The energy value corresponding to the j-th frequency in the frequency range.
6. The method for processing natural language interaction information of intelligent equipment according to claim 1, wherein the selection degree is obtained by the following steps:
the method comprises the steps of presetting initial range size of a time range, taking preset fixed step length as an increment value of successive iteration of the initial range size, taking a time range corresponding to the time range after the initial range size is iteratively increased as a plurality of preset time range, obtaining self-correlation and adjacent correlation of interactive voice signal selection degree in the plurality of preset time range, and carrying out weight fusion on the self-correlation and the adjacent correlation by utilizing a basic value to obtain the selection degree, wherein the selection degree is as follows:
Figure QLYQS_35
in the method, in the process of the invention,
Figure QLYQS_36
a range size indicating a range of the u-th interaction period,/->
Figure QLYQS_37
Represents a base value- >
Figure QLYQS_38
Representing the self-correlation of the interactive voice signal within the u-th interactive time period,/for the interactive voice signal>
Figure QLYQS_39
Representing the adjacent correlation of the interactive voice signal within the u-th interactive time period,/for the interactive voice signal>
Figure QLYQS_40
The range size representing the range of the u-th interaction period is +.>
Figure QLYQS_41
The degree of selection; />
Figure QLYQS_42
An exponential function based on a natural constant is represented.
7. The method for processing natural language interaction information of intelligent equipment according to claim 1, wherein the energy curve is obtained by the following steps:
starting from the point at the lower left corner of the local range, taking the abscissa as the direction, acquiring the point which has the smallest energy value difference with the starting point and the nearest frequency and is smaller than the preset energy value difference threshold value, if the condition is not met, not connecting, starting to connect with the last point at the lower left corner again, and the like, acquiring the connection sequence, and acquiring the energy curve according to the connection sequence and the energy value of each point.
8. The method for processing natural language interaction information of intelligent equipment according to claim 1, wherein the fitting reference weight value is obtained by the following steps:
acquiring a peak point, namely a maximum point, of an energy curve of a local range where any point is located; equally dividing each frequency range into a plurality of sub-frequency ranges, analyzing each range as the same frequency, and obtaining the first frequency in any segment interval
Figure QLYQS_43
The +.>
Figure QLYQS_44
Fitting reference weight values of individual points:
Figure QLYQS_45
In the method, in the process of the invention,
Figure QLYQS_49
indicate->
Figure QLYQS_52
The +.>
Figure QLYQS_55
The number of different time points corresponding to the maximum value points of the individual points in their local range,/->
Figure QLYQS_47
Indicate->
Figure QLYQS_51
The +.>
Figure QLYQS_54
The number of curves of the energy curve in the local area of the individual points, +.>
Figure QLYQS_57
Indicate->
Figure QLYQS_46
The +.>
Figure QLYQS_50
Energy mean value in local range of individual points, < >>
Figure QLYQS_53
Indicate->
Figure QLYQS_56
The +.>
Figure QLYQS_48
Fitting of individual points refers to the weight values.
9. The method for processing natural language interaction information of intelligent equipment according to claim 1, wherein the implementation of denoising enhancement on the interaction voice signal comprises the following steps:
calculating a fitting reference weight value of each point in a spectrogram of the interactive voice signal, selecting an energy value of the maximum fitting reference weight value on each time point in a sub-frequency range to perform same-frequency curve fitting, obtaining fitted same-frequency curves for the energy values of the maximum fitting reference weight values on each time point in all the sub-frequency ranges, expanding the same-frequency curves obtained by fitting to the outside of the initial time point, further obtaining same-frequency curves under different frequencies after expansion, and performing inverse Fourier transform on the obtained expanded same-frequency curves to obtain expanded interactive voice signals;
EMD decomposition is carried out on the obtained expanded interactive voice signals to obtain a plurality of voice IMF components, and each voice IMF component is subjected to denoising by adopting a wavelet threshold denoising algorithm, so that denoising enhancement on the interactive voice signals is realized, and the interactive voice signals after denoising enhancement are obtained.
CN202310422056.5A 2023-04-19 2023-04-19 Natural language interaction information processing method for intelligent equipment Active CN116129926B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310422056.5A CN116129926B (en) 2023-04-19 2023-04-19 Natural language interaction information processing method for intelligent equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310422056.5A CN116129926B (en) 2023-04-19 2023-04-19 Natural language interaction information processing method for intelligent equipment

Publications (2)

Publication Number Publication Date
CN116129926A true CN116129926A (en) 2023-05-16
CN116129926B CN116129926B (en) 2023-06-09

Family

ID=86303158

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310422056.5A Active CN116129926B (en) 2023-04-19 2023-04-19 Natural language interaction information processing method for intelligent equipment

Country Status (1)

Country Link
CN (1) CN116129926B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116935880A (en) * 2023-09-19 2023-10-24 深圳市一合文化数字科技有限公司 Integrated machine man-machine interaction system and method based on artificial intelligence
CN117037834A (en) * 2023-10-08 2023-11-10 广州市艾索技术有限公司 Conference voice data intelligent acquisition method and system
CN117373471A (en) * 2023-12-05 2024-01-09 鸿福泰电子科技(深圳)有限公司 Audio data optimization noise reduction method and system
CN117711419A (en) * 2024-02-05 2024-03-15 卓世智星(成都)科技有限公司 Intelligent data cleaning method for data center
CN117711419B (en) * 2024-02-05 2024-04-26 卓世智星(成都)科技有限公司 Intelligent data cleaning method for data center

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050192805A1 (en) * 2004-02-26 2005-09-01 Hirokazu Kudoh Voice analysis device, voice analysis method and voice analysis program
WO2017144007A1 (en) * 2016-02-25 2017-08-31 深圳创维数字技术有限公司 Method and system for audio recognition based on empirical mode decomposition
CN111754991A (en) * 2020-06-28 2020-10-09 汪秀英 Method and system for realizing distributed intelligent interaction by adopting natural language
WO2022005615A1 (en) * 2020-06-30 2022-01-06 Microsoft Technology Licensing, Llc Speech enhancement
CN114974253A (en) * 2022-05-20 2022-08-30 北京北信源软件股份有限公司 Natural language interpretation method and device based on character image and storage medium
CN115273876A (en) * 2022-07-28 2022-11-01 天津中科听芯科技有限公司 Voice data enhancement method, system and device for AI voice communication
WO2023024725A1 (en) * 2021-08-23 2023-03-02 Oppo广东移动通信有限公司 Audio control method and apparatus, and terminal and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050192805A1 (en) * 2004-02-26 2005-09-01 Hirokazu Kudoh Voice analysis device, voice analysis method and voice analysis program
WO2017144007A1 (en) * 2016-02-25 2017-08-31 深圳创维数字技术有限公司 Method and system for audio recognition based on empirical mode decomposition
CN111754991A (en) * 2020-06-28 2020-10-09 汪秀英 Method and system for realizing distributed intelligent interaction by adopting natural language
WO2022005615A1 (en) * 2020-06-30 2022-01-06 Microsoft Technology Licensing, Llc Speech enhancement
WO2023024725A1 (en) * 2021-08-23 2023-03-02 Oppo广东移动通信有限公司 Audio control method and apparatus, and terminal and storage medium
CN114974253A (en) * 2022-05-20 2022-08-30 北京北信源软件股份有限公司 Natural language interpretation method and device based on character image and storage medium
CN115273876A (en) * 2022-07-28 2022-11-01 天津中科听芯科技有限公司 Voice data enhancement method, system and device for AI voice communication

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王留芳 等: "智能语音技术在蓄电池充电系统中的应用", 《微计算机信息》, no. 4 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116935880A (en) * 2023-09-19 2023-10-24 深圳市一合文化数字科技有限公司 Integrated machine man-machine interaction system and method based on artificial intelligence
CN116935880B (en) * 2023-09-19 2023-11-21 深圳市一合文化数字科技有限公司 Integrated machine man-machine interaction system and method based on artificial intelligence
CN117037834A (en) * 2023-10-08 2023-11-10 广州市艾索技术有限公司 Conference voice data intelligent acquisition method and system
CN117037834B (en) * 2023-10-08 2023-12-19 广州市艾索技术有限公司 Conference voice data intelligent acquisition method and system
CN117373471A (en) * 2023-12-05 2024-01-09 鸿福泰电子科技(深圳)有限公司 Audio data optimization noise reduction method and system
CN117373471B (en) * 2023-12-05 2024-02-27 鸿福泰电子科技(深圳)有限公司 Audio data optimization noise reduction method and system
CN117711419A (en) * 2024-02-05 2024-03-15 卓世智星(成都)科技有限公司 Intelligent data cleaning method for data center
CN117711419B (en) * 2024-02-05 2024-04-26 卓世智星(成都)科技有限公司 Intelligent data cleaning method for data center

Also Published As

Publication number Publication date
CN116129926B (en) 2023-06-09

Similar Documents

Publication Publication Date Title
CN116129926B (en) Natural language interaction information processing method for intelligent equipment
CN110491407B (en) Voice noise reduction method and device, electronic equipment and storage medium
CN107845389B (en) Speech enhancement method based on multi-resolution auditory cepstrum coefficient and deep convolutional neural network
CN110867181B (en) Multi-target speech enhancement method based on SCNN and TCNN joint estimation
CN110600050B (en) Microphone array voice enhancement method and system based on deep neural network
CN108447495B (en) Deep learning voice enhancement method based on comprehensive feature set
US6862558B2 (en) Empirical mode decomposition for analyzing acoustical signals
US6691090B1 (en) Speech recognition system including dimensionality reduction of baseband frequency signals
WO2018223727A1 (en) Voiceprint recognition method, apparatus and device, and medium
US6721698B1 (en) Speech recognition from overlapping frequency bands with output data reduction
WO2020224226A1 (en) Voice enhancement method based on voice processing and related device
CN110111769B (en) Electronic cochlea control method and device, readable storage medium and electronic cochlea
CN107785028A (en) Voice de-noising method and device based on signal autocorrelation
US20230317056A1 (en) Audio generator and methods for generating an audio signal and training an audio generator
CN113744749B (en) Speech enhancement method and system based on psychoacoustic domain weighting loss function
CN111261182A (en) Wind noise suppression method and system suitable for cochlear implant
US6701291B2 (en) Automatic speech recognition with psychoacoustically-based feature extraction, using easily-tunable single-shape filters along logarithmic-frequency axis
Do et al. Speech source separation using variational autoencoder and bandpass filter
CN111681649B (en) Speech recognition method, interaction system and achievement management system comprising system
CN110197657B (en) Dynamic sound feature extraction method based on cosine similarity
CN111968651A (en) WT (WT) -based voiceprint recognition method and system
CN116013344A (en) Speech enhancement method under multiple noise environments
Hamid et al. Single channel speech enhancement using adaptive soft-thresholding with bivariate EMD
CN114882898A (en) Multi-channel speech signal enhancement method and apparatus, computer device and storage medium
Radha et al. Enhancing speech quality using artificial bandwidth expansion with deep shallow convolution neural network framework

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant