CN114818769A - Man-machine symbiosis based optimization method and device for sound signal processing - Google Patents

Man-machine symbiosis based optimization method and device for sound signal processing Download PDF

Info

Publication number
CN114818769A
CN114818769A CN202210142501.8A CN202210142501A CN114818769A CN 114818769 A CN114818769 A CN 114818769A CN 202210142501 A CN202210142501 A CN 202210142501A CN 114818769 A CN114818769 A CN 114818769A
Authority
CN
China
Prior art keywords
sound signal
processing
sound
signal
visual information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210142501.8A
Other languages
Chinese (zh)
Inventor
陶霖密
刘政
姚雪
陶妍
谢宇超
倪正
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
CSSC Systems Engineering Research Institute
Original Assignee
Tsinghua University
CSSC Systems Engineering Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University, CSSC Systems Engineering Research Institute filed Critical Tsinghua University
Priority to CN202210142501.8A priority Critical patent/CN114818769A/en
Publication of CN114818769A publication Critical patent/CN114818769A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/08Feature extraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Abstract

The embodiment of the invention provides an optimization method and device for sound signal processing based on man-machine symbiosis, wherein the method comprises the steps of acquiring a first sound signal, carrying out sound signal processing based on the first sound signal, acquiring a second sound signal, and playing the second sound signal; generating corresponding visual information according to the second sound signal, and displaying the visual information; acquiring the adjustment operation of an operator for the second sound signal in response to the visual information and the second sound signal; according to the adjustment operation, a processing strategy of the sound signal processing is updated.

Description

Man-machine symbiosis based optimization method and device for sound signal processing
Technical Field
The invention relates to the technical field of sound processing, in particular to a method and a device for optimizing sound signal processing based on man-machine symbiosis
Background
With the development of technology, it is a trend to process original sound signals to make them more suitable for the auditory system and the auditory habit of human beings, or to make human beings more convenient to obtain useful information from the processed sound signals. However, in the existing sound processing technology, a technician is usually required to configure processing algorithms and parameters based on experience, and the configured algorithms and parameters can only achieve a good processing effect within a certain application range. That is, its signal processing has limitations in the scope of application and dependence on a particular person.
Disclosure of Invention
In order to solve the above technical problems, the present invention provides an optimization method for sound signal processing, including:
acquiring a first sound signal, performing sound signal processing based on the first sound signal, acquiring a second sound signal, and playing the second sound signal;
generating corresponding visual information according to the second sound signal, and displaying the visual information;
acquiring an adjusting operation performed by an operator for the second sound signal in response to the visual information and the second sound signal;
and updating the processing strategy of the sound signal processing according to the adjusting operation.
Preferably, the obtaining of the first sound signal comprises obtaining the original sound signals of several channels.
Preferably, the sound signal processing based on the first sound signal to obtain a second sound signal includes:
fusing the original sound signals of the channels to obtain an intermediate signal of at least a single channel;
and carrying out first signal processing on the intermediate signal to obtain a second sound signal.
Preferably, the visual information includes:
and the dynamic relative energy spectrogram or dynamic tone color distribution diagram corresponding to the second sound signal.
Preferably, the visual information further comprises:
a tonal variability curve corresponding to the second sound signal, and/or a dynamic rhythm variability curve.
Preferably, updating the processing strategy of the sound signal processing according to the adjustment operation includes:
updating a processing strategy of the sound signal processing according to the adjustment operation based on machine learning.
Preferably, updating the processing strategy of the sound signal processing according to the adjustment operation based on machine learning includes:
acquiring a third sound signal corresponding to the first sound signal at the next moment; inputting the third signal into a machine learning model to obtain a fourth signal;
determining a first difference between the fourth signal and the expected signal according to the tuning parameters corresponding to the adjusting operation;
the parameters of the machine learning model are updated with the aim that the first difference tends to become smaller.
In a second aspect, an apparatus for optimizing sound signal processing is provided, including:
the sound signal processing unit is configured to acquire a first sound signal, perform sound signal processing based on the first sound signal, acquire a second sound signal, and play the second sound signal;
a visual information processing unit configured to generate corresponding visual information according to the second sound signal and display the visual information;
an adjustment information acquisition unit configured to acquire an adjustment operation performed by an operator for a second sound signal in response to the visual information and the second sound signal;
a processing policy updating unit configured to update a processing policy of the sound signal processing according to the adjustment operation.
In a third aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of the first aspect.
In a fourth aspect, a computing device is provided, comprising a memory and a processor, wherein the memory stores executable code, and the processor executes the executable code to implement the method of the first aspect.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic diagram of a method for optimizing sound signal processing according to an embodiment of the present invention;
fig. 2 is a flowchart of an optimization method for sound signal processing according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a visual enhancement signal provided by an embodiment of the present invention;
FIG. 4 is a schematic diagram of a visual enhancement signal provided in accordance with another embodiment of the present invention;
FIG. 5 is a schematic view of a visual enhancement signal provided by another embodiment of the present invention;
FIG. 6 is a schematic diagram of a sound adjustment apparatus provided in an embodiment of the present invention;
fig. 7 is a structural diagram of an optimization apparatus for processing an audio signal according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Sound signal processing is one of signal processing techniques. After long-term practice, a large number of classical algorithms exist in the field of sound signal processing, and various processing can be performed on sound signals. However, these algorithms typically have their own parameters adjusted. In the face of a specific application scene, good results can be obtained by setting appropriate parameters, and the algorithm cannot play due roles due to wrong parameters. In the current sound processing scheme, for various processing algorithms, it is usually necessary to set operation parameters based on manual experience, or to determine operation parameters suitable for a scene through experiments. However, this method has the following problems: on the one hand, the labor cost of a technician is high. On the other hand, the technical requirements for the implementer are high, and the implementation effect is too dependent on the experience and the technical level of the implementer.
In order to solve the above technical problem, an embodiment of the present invention provides an optimization apparatus for sound signal processing. The principle of the method is first explained below.
Humans perceive sound signals through the ears. However, in situations such as extremely weak signals, low signal-to-noise ratios, or for auditory handicapped people, the human ear often does not perceive and understand auditory information well. For this situation, the current solution is to acquire and enhance the signal by sensors, e.g. by hearing aids, sound signal amplifiers, etc., to help people perceive and understand the sound. However, simply amplifying the signal does not enhance human perception and understanding of the sound signal in situations of very low signal-to-noise ratio.
Visual-auditory fusion is a basic function of human cognition, and visual assistance can greatly enhance human perception of auditory signals. For example, in daily communication, a face-to-face communication can be better heard and understood than a telephone communication because the visual channel can facilitate the acquisition and understanding of auditory signals according to visual information such as lip movement and expression of a speaker. Based on this observation, embodiments of the present invention propose methods to enhance human auditory cognition by processing and visualizing sound signals. The essence of the method is that the auditory signals are processed and analyzed intelligently by a machine, and the processed auditory signals are presented by images, graphs and the like by adopting a visualization method, so that real-time visual feedback is provided for a listener, and the listener can better understand auditory contents through fusion with visual cognition. And the strategy parameters of signal processing can be updated based on the feedback of listeners to the presented auditory and visual signals, so that a better signal processing strategy can be automatically learned. Further, as a result of investigation by the inventors, there is no technique for enhancing auditory perception by visual-auditory fusion and feedback in the prior art.
Fig. 1 is a schematic diagram of an optimization method for sound signal processing according to an embodiment of the present invention. The sound collection sensor 2, the sound processing device 3, the sound adjusting device 4, the visual presenting device 5 and the sound playing device 6 may form a system for implementing the optimization method for processing the sound signal. Specifically, the sound signal 1 can be collected by the collection sensor 2, the sound signal 1 is subjected to sound signal processing by the sound processing device 3, and the processing result is subjected to visualization processing. The result of the visual processing can be displayed to the listener by means of visual presentation means 5. At the same time, the sound processed signal can be played to the listener through the output speaker 6. The listener 7 can combine the visual results seen with the sounds heard from the sound playing means 6 to enhance the understanding of the sounds he hears. The listener 7 can also adjust parameters in the sound conditioning means 4 to condition the sound he hears so that the sound played from the sound playing means 6 is more easily perceived and understood. The sound processing device 3 can obtain the parameters of the audience adjustment device 4, and adjust the parameters of the algorithm of sound signal processing through machine learning based on the parameters, so that the sound information after the sound signal processing is more suitable for human comprehension. In one example, the acquired sound signal 1 may be, for example, a low signal-to-noise ratio sound which may be a mixture of multiple sound sources. In one example, the visual presentation device may be, for example, a graphical display device. In one example, the listener can also select either the processed sound signal or the raw unprocessed signal for adjustment by the sound adjustment means 4, depending on the result of viewing the display.
Fig. 2 is a flowchart of an optimization method for processing an audio signal according to an embodiment of the present invention. As shown in fig. 2, the process of the method at least comprises the following steps:
step 21, acquiring a first sound signal, performing sound signal processing based on the first sound signal, acquiring a second sound signal, and playing the second sound signal. In one embodiment, the original sound signals of several channels may be acquired. In another embodiment, the original sound signals of several channels may be fused to obtain an intermediate signal of at least a single channel; and carrying out first signal processing on the intermediate signal to obtain a second sound signal.
Specifically, for example, in one embodiment, there may be multiple sensors for collecting sound, and these sensors may be denoted as M (M1, M2, …, Mm), where M represents the number of sensors. At time t, the information acquired by these sensors may be represented as:
S M (t)={S M1 (t),S M2 (t),…,S Mm (t)} (1)
wherein S is M (t) is the sum of the sound signals collected at time t, S M1 (t),S M2 (t),…,S Mm (t) are the sound signals collected by the respective sensors M1, M2, …, Mm.
In different embodiments, the type, number, performance of sensors may vary, for example, from scenario to scenario or from task to task. In a specific embodiment, the type, number and performance of the sensors are generally relatively stable, that is, within a period of time T, the signals obtained by the sensors can be expressed as:
S M (t)→Σ M (T) (2)
since the sensors of modern equipment, regardless of their type or number, are far more complex and powerful than the human ear, the amount of information they acquire is far more than the information Σ that can be acquired by the human ear H Large, can be expressed as, for any T:
Figure BDA0003506965270000061
it can be seen from formula (3) that the information obtained by the machine is directly presented to a person in some form without processing, and the person can only select one or two channels to listen. The amount of information acquired by a sensor is much greater than the capacity or ability of a person to acquire and process information. The large amount of information also burdens human auditory cognition, resulting in cognitive fatigue that reduces the perception and understanding of auditory events in the environment. Thus, in one embodiment, the function Φ may be processed 1 The signals acquired by the sensors are processed so that
Φ 1 (S M (t))→Σ M1 (T) (4)
And, sigma M1 (T)<<Σ M (T) (5)
I.e. so that the processing result Σ M1 (T) the amount of information is greater than the original signal S obtained M The amount of information of (t).
Nevertheless, the result of the processing may be beyond the cognitive abilities of the person, or may be chaotic and may be expressed as:
Figure BDA0003506965270000071
thus, in one embodiment, a family of cognitively processed functions Φ (Φ 1, Φ 2, …, Φ n) may be utilized, including signal processing algorithms, such that
Φ(S M (t))={Φ 1 (S M (t)),Φ 2 (S M (t)),…,Φ n (S M (t))}→Σ Mn (T) (7)
And the number of the first and second groups is,
Figure BDA0003506965270000072
in other words, the acoustic signal sm (t) acquired by the sensor passes through the family of functions Φ (Φ) 12 ,…,Φ n ) After processing, the main information contained in the signal has been extracted, and the amount of information is within the acceptable range of the human auditory cognitive system.
And step 22, generating corresponding visual information according to the second sound signal, and displaying the visual information.
As described above, the sound signal within the range of acceptance of the human viewing and listening recognition is obtained in step 21, and the corresponding visual information may be generated from the sound signal and displayed to the listener.
In particular, in one embodiment, the signal Σ after signal processing may be processed by a visualization function Γ Mn (T) is presented (e.g. by means of a graphical interface displayed by the graphical display device 5 in fig. 1) such that the human visual perception gains the perception of the sound signal, which can be expressed as:
S H (t,Γ(Φ(S M (t))))→Σ H (9)
wherein S is H Information is presented visually. As can be seen from equation (9), in the visual-auditory fusion system, human visual perception of sound is realized by processing and visualizing sound signals, i.e. after acquisition by a sensor, processing function family Φ and visualization function Γ, human visual perception is realized. In this sense, the processing function family Φ and the interaction function Γ determine the human visual perception of the sound signal under certain sensor hardware 2 configurations. Obtaining a better and more optimal processing function family phi through machine learning determines a way in which a human can more efficiently acquire sound information with a smaller cognitive load in the whole visual-auditory fusion system. The approach is the vision presented by people through a learning system in the process of human-computer symbiosisSignal, auditory signal to obtain better cognitive ability of the auditory signal.
In different embodiments, different specific visual information may be generated based on the second audible signal. In one embodiment, a dynamic correlation spectrogram (e.g., dynamic correlation spectrogram 8 shown in fig. 3) or a dynamic tone color profile (e.g., dynamic tone color profile 14 shown in fig. 5) corresponding to the second audio signal may be displayed. In another embodiment, a tonality-variability curve (tonality-variability curve 9 shown in fig. 3) and/or a dynamic rhythm-variability curve (dynamic rhythm-variability curve 13 shown in fig. 4) corresponding to the second sound signal may also be displayed.
And step 23, acquiring an adjusting operation performed by the operator for the second sound signal in response to the visual information and the second sound signal.
The operator, or listener, can hear the audio signal played in step 21 and the visual information corresponding to the audio signal displayed in step 22 by hearing, and make adjustments to the audio signal based on a combined analysis of the visually perceived signal and the audibly heard signal. In this step, an adjustment operation performed by the operator for the sound signal is received. In one embodiment, the operator may perform sound signal conditioning by a sound conditioning device 4 such as that shown in FIG. 1. In different specific embodiments, specific parameters of different types of sound signals may be adjusted by different specific types of sound adjusting devices, which is not limited in this specification. Fig. 6 is a schematic diagram of a sound adjusting apparatus according to an embodiment of the present invention, as shown in fig. 6, the sound adjusting apparatus used in this embodiment presents a visualized dynamic spectrum 10 and a spectrum energy parameter 12 of a sound signal, and further has an adjusting knob 11, and an operator can suppress energy of a sound frequency selected by the operator through the adjusting knob 11.
And 24, updating the processing strategy of the sound signal processing according to the adjusting operation.
According to one embodiment, the processing strategy of the sound signal processing may be updated according to the adjustment operation based on machine learning. In a specific embodiment, a third sound signal corresponding to the first sound signal at the next time may be obtained; inputting the third signal into a machine learning model to obtain a fourth signal; determining a first difference between the fourth signal and the expected signal according to the tuning parameters corresponding to the adjusting operation; the parameters of the machine learning model are updated with the aim that the first difference tends to become smaller.
In particular, in the context of machine learning, it is assumed that the family of cognitive processing functions Φ of the signal is based on a training data set D from an original classical algorithm or initial modelΦLearned, can be expressed as:
D:Φ→Φ(Φ 12 ,…,Φ n ) (10)
wherein phi 12 ,…,Φ n N cognitive processing functions are represented. Since the data set D is usually acquired some time in the past, the situation of the external environment is reflected some time in the past, while the current actual environment is constantly changing. Therefore, in one embodiment, a new training data set can be constructed from the data obtained from the current environment, so that the learned new signal cognition processing function family Φ has a better processing effect.
In particular, from the machine learning perspective, at time t 1 ,t 2 ,t 3 .., sound signal S acquired by the sensor M (t) can be expressed as: d 1 ,D 2 ,D 3 …. When the visual-auditory fusion system is in operation, the current time t 1 The signal obtained is D 1 Since the visual-auditory fusion system already has a pre-trained cognitive processing function family phi, the signal at the current moment is D 1 After the cognitive processing function family phi is processed, a processing result is obtained and can be expressed as
Figure BDA0003506965270000091
Figure BDA0003506965270000092
That is, the sound signal D obtained by the sound sensor 1 Through the cognitive information processing, the main information in the information is extracted. On this basis, the visualization system presents the information by means of a visualization function Γ
Figure BDA0003506965270000093
So that the human visual system can perceive the sound signal.
It can be considered that the above process is the processing and presentation of sound information by a machine so that human vision can see the sound. On the basis of this, the person also hears the sound signal acoustically and adjusts the sound signal on the basis of a comprehensive analysis of the visually perceived signal and the acoustically heard signal, for example by parameter adjustment on a sound adjustment device. Wherein the adjusted parameter value can be used as the information processing result
Figure BDA0003506965270000101
The difference between the sound that a human wishes to hear or cognitively perceive as better (or the desired sound signal) and the sound signal that it actually hears, can be expressed as:
Figure BDA0003506965270000102
wherein, theta is a data annotation function based on the adjusting parameter, D 1 The labeled data. Based on the labeling result, through a machine learning method, the signal cognition processing function Φ can learn a processing algorithm better or more suitable for the current sound signal, which can be expressed as:
D 1 :Φ(Φ 12 ,…,Φ n )→Φ 112 ,…,Φ 1 i ,…,Φ 1 j ,…,Φ n ) (13)
wherein phi 1 i ,…,Φ 1 j To representAnd (4) processing functions updated after learning. Part of the functions of the family of cognitive processing functions of the sound signal are based on a new data set D 1 The parameters are adjusted through deep learning, the performance of the improved function on sound signal processing is improved, and the processing function is more adaptive to the current sound signal. Since new data sets D can be continuously acquired 1 Over time, more and more functions may be improved in progressive learning to adapt to the acoustic signals acquired in their current environment. That is to say, the cognitive processing function family of the sound signal is continuously improved along with the increase of feedback and training data of people in the symbiotic relationship, that is, the performance of sound processing and visualization is improved along with the evolution of time, and the essence is that the parameters of sound signal processing are improved through a machine learning method, so that the performance of current signal processing is improved.
The process is further illustrated by the following specific examples.
The first embodiment is as follows:
firstly, information acquired by the sound sensor at time t is fused into a signal according to an intelligent algorithm fx to improve the signal-to-noise ratio of the signal, which can be expressed as:
S MX (t)=f x (S M1 (t),S M2 (t),…,S Mm (t)) (14)
then, the algorithm Φ can be processed by a machine learning-based processing algorithm X1 For the fused signal S MX (t) performing a dynamic relative energy spectrum calculation, which can be expressed as:
Σ MX1 (t)=Φ X1 (S MX (t)) (15)
on the basis, the calculation result information sigma is displayed by a visualization function gamma MX1 (t) is presented such that the human visual perception gains the perception of the sound signal, which can be expressed as:
Σ HX (t)=Γ(Σ MX1 (t)) (16)
specifically, in one example, the visual information presented may be a dynamic relative spectrogram of the signal, such as dynamic relative spectrogram 8 shown in fig. 3, where the abscissa is time of scrolling and the ordinate is energy channel, hue and gray scale representing the magnitude of energy. In one example, the number of energy spectrum channels and the bandwidth of each energy spectrum channel in the dynamic relative spectrogram can be derived based on machine learning.
In one example, the processing algorithm Φ may also be based on machine learning X2 For the fused signal S MX (t) performing dynamic tonal variability calculations, calculating local tonal variation information in the high noise signal, such that:
Σ MX2 (t)=Φ X2 (S MX (t)) (17)
on the basis, information sigma is displayed through a visualization function gamma MX2 (t) is superposed on the dynamic energy spectrum diagram, namely:
Σ HX (t)=Γ(Σ MX1 (t),Σ MX2 (t)) (18)
thus presenting a tonality-variability curve (including specific variations) under the dynamic spectrum of the signal, such as tonality-variability curve 9 under the dynamic spectrum shown in fig. 3, where the abscissa is the time of the roll, the ordinate is the rate of tonality change, and the number indicates the magnitude of the tonality change. In one example, the interval and bandwidth of the tonality may be derived based on machine learning.
The above process is processing and presenting sound information by a machine, so that human vision can see sound characteristics such as dynamic relative energy spectrum, tonal variability curve and the like extracted by an algorithm. On the basis of this, the human being hears the sound signal also by hearing, and on the basis of the dynamic relative energy spectrum 8 seen visually, the tonality rate curve 9 and the comprehensive analysis of the signal heard audibly, for example, the one in the example shown in fig. 4, the spectral energy parameter 12 of the sound signal can be adjusted by the adjusting means 11 to suppress the energy of the selected frequency.
If a person is better able to perceive useful information in the sound signal after adjusting the parameters, the information can be modified based on a deep learning algorithm based on the adjusted parameters,making the signal cognitively processing function phi X2 Learning becomes an algorithm phi 'with better processing effect or better adaptation to the current sound signal' X2 It can be expressed as:
Φ X2 →Φ’ X2 (19)
namely, the processing parameters are adjusted through deep learning, and the performance of the processing function on the sound signal is improved.
The audio signal is processed by equations (15) to (18) to obtain new display information Σ' HX (t) of (d). The cognitive processing function of sound signals improves with human operation, i.e. the performance of sound processing and visualization improves with the evolution of time. The fusion of visual and auditory information is realized, the auditory cognitive load of people is reduced, and people can better perceive and understand sound signals in a complex high-dynamic environment.
Example two:
in this embodiment, on the basis of the first embodiment, a processing algorithm Φ based on machine learning is adopted X3 For the fused signal S MX (t) performing dynamic rhythm-rate calculations to calculate local rhythm variations in the high-noise signal, which may be expressed as:
Σ MX3 (t)=Φ X3 (S MX (t)) (20)
on the basis, information sigma is displayed through a visualization function gamma MX3 (t) is superimposed on the dynamic energy spectrum chart and can be represented as:
Σ HX (t)=Γ(Σ MX1 (t),Σ MX2 (t),Σ MX3 (t)) (21)
thus, a rhythmic variability curve 13 is presented under the dynamic energy spectrum of the signal, as shown in fig. 5, where the abscissa is the time of the roll and the ordinate is the rate and magnitude of the tonal change. Both the interval and the bandwidth of the rhythm are automatically derived through machine learning.
Also, similar to the embodiments, the signal-aware processing function Φ may be adjusted according to the tuning parameters X3 Through the algorithm of deep migration learning and the like, the method becomes better,Or an algorithm more adapted to the current sound signal, which will not be described in detail here.
Example three:
in this embodiment, based on the second embodiment, a processing algorithm Φ based on machine learning is adopted X4 Combining a classical algorithm method to the fused signal S MX (t) performing dynamic timbre variability calculations, calculating a local timbre distribution in the noisy signal such that:
Σ MX4 (t)=Φ X4 (S MX (t)) (22)
on the basis, the information sigma can be displayed by a visualization function gamma MX4 (t) substitution of dynamic energy spectrogram Sigma MX1 (t), namely:
Σ HX (t)=Γ(Σ MX2 (t),Σ MX3 (t),Σ MX4 (t)) (23)
thus, a dynamic tone map 14 and a rhythm, tonal variation curve of the signal are presented, as shown in fig. 6, where the abscissa is the time of the scrolling and the ordinate is the rate and magnitude of the tonal, rhythmic variation. In one example, the interval and bandwidth of rhythm and tone can be obtained based on the machine learning described above, and can also be learned to improve based on parameter adjustment.
In summary, the method has the following advantages: on one hand, by acquiring the adjustment behavior of a person, taking the adjusted parameters of the person as the labels of the signal processing results, and then performing machine learning, the sound signal cognitive processing function obtains a processing algorithm better or more suitable for the acquired sound in the current environment through deep learning. On the other hand, the voice information which cannot be perceived by human vision can be converted into visual information which is easy to understand, and the information such as key features in sensor data is explored, so that the human can obtain auditory information through natural visual perception, fusion of the visual and auditory information is achieved, auditory cognitive load of the human is reduced, and the ability of the human to perceive and cognize low signal-to-noise ratio voice information in a complex environment is improved.
According to an embodiment of yet another aspect, an apparatus for optimizing sound signal processing is provided. Fig. 7 is a block diagram of an apparatus for optimizing sound signal processing according to an embodiment of the present invention, and as shown in fig. 7, the apparatus 700 includes:
a sound signal processing unit 71 configured to acquire a first sound signal, perform sound signal processing based on the first sound signal, acquire a second sound signal, and play the second sound signal;
a visual information processing unit 72 configured to generate corresponding visual information according to the second sound signal and display the visual information;
an adjustment information acquisition unit 73 configured to acquire an adjustment operation performed by an operator for a second sound signal in response to the visual information and the second sound signal;
a processing strategy updating unit 74 configured to update a processing strategy of the sound signal processing according to the adjustment operation.
According to an embodiment of yet another aspect, there is also provided a computer readable medium comprising a computer program stored thereon, which computer when executed performs the method described above.
According to an embodiment of still another aspect, there is also provided a computing device including a memory and a processor, the memory storing executable code, and the processor implementing the method when executing the executable code.
The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
Those of skill would further appreciate that the various illustrative components and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied in hardware, a software module executed by a processor, or a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (10)

1. A method of optimizing sound signal processing, comprising:
acquiring a first sound signal, performing sound signal processing based on the first sound signal, acquiring a second sound signal, and playing the second sound signal;
generating corresponding visual information according to the second sound signal, and displaying the visual information;
acquiring an adjusting operation performed by an operator for the second sound signal in response to the visual information and the second sound signal;
and updating the processing strategy of the sound signal processing according to the adjusting operation.
2. The method of claim 1, wherein acquiring a first sound signal comprises acquiring a number of channels of original sound signals.
3. The method of claim 2, wherein performing sound signal processing based on the first sound signal to obtain a second sound signal comprises:
fusing the original sound signals of the channels to obtain an intermediate signal of at least a single channel;
and carrying out first signal processing on the intermediate signal to obtain a second sound signal.
4. The method of claim 1, wherein the visual information comprises:
and the dynamic relative energy spectrogram or dynamic tone color distribution diagram corresponding to the second sound signal.
5. The method of claim 4, wherein the visual information further comprises:
a tonal variability curve corresponding to the second sound signal, and/or a dynamic rhythm variability curve.
6. The method of claim 1, wherein updating the processing strategy of the sound signal processing in accordance with the adjustment operation comprises:
updating a processing strategy of the sound signal processing according to the adjustment operation based on machine learning.
7. The method of claim 6, wherein updating the processing strategy of the sound signal processing in accordance with the adjustment operation based on machine learning comprises:
acquiring a third sound signal corresponding to the first sound signal at the next moment; inputting the third signal into a machine learning model to obtain a fourth signal;
determining a first difference between the fourth signal and the expected signal according to the tuning parameters corresponding to the adjusting operation;
the parameters of the machine learning model are updated with the aim that the first difference tends to become smaller.
8. An apparatus for optimizing sound signal processing, comprising:
the sound signal processing unit is configured to acquire a first sound signal, perform sound signal processing based on the first sound signal, acquire a second sound signal, and play the second sound signal;
a visual information processing unit configured to generate corresponding visual information according to the second sound signal and display the visual information;
an adjustment information acquisition unit configured to acquire an adjustment operation performed by an operator for a second sound signal in response to the visual information and the second sound signal;
a processing policy updating unit configured to update a processing policy of the sound signal processing according to the adjustment operation.
9. A computer-readable storage medium, on which a computer program is stored which, when executed in a computer, causes the computer to carry out the method of any one of claims 1-7.
10. A computing device comprising a memory having executable code stored therein and a processor that, when executing the executable code, implements the method of any of claims 1-7.
CN202210142501.8A 2022-02-16 2022-02-16 Man-machine symbiosis based optimization method and device for sound signal processing Pending CN114818769A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210142501.8A CN114818769A (en) 2022-02-16 2022-02-16 Man-machine symbiosis based optimization method and device for sound signal processing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210142501.8A CN114818769A (en) 2022-02-16 2022-02-16 Man-machine symbiosis based optimization method and device for sound signal processing

Publications (1)

Publication Number Publication Date
CN114818769A true CN114818769A (en) 2022-07-29

Family

ID=82527298

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210142501.8A Pending CN114818769A (en) 2022-02-16 2022-02-16 Man-machine symbiosis based optimization method and device for sound signal processing

Country Status (1)

Country Link
CN (1) CN114818769A (en)

Similar Documents

Publication Publication Date Title
EP2031900B1 (en) Hearing aid fitting procedure and processing based on subjective space representation
KR101779641B1 (en) Personal communication device with hearing support and method for providing the same
US7343021B2 (en) Optimum solution method, hearing aid fitting apparatus utilizing the optimum solution method, and system optimization adjusting method and apparatus
JP5247656B2 (en) Asymmetric adjustment
US8666084B2 (en) Method and arrangement for training hearing system users
EP3020212B1 (en) Pre-processing of a channelized music signal
US20180227682A1 (en) Hearing enhancement and augmentation via a mobile compute device
JP5085769B1 (en) Acoustic control device, acoustic correction device, and acoustic correction method
US5572593A (en) Method and apparatus for detecting and extending temporal gaps in speech signal and appliances using the same
CN112995876A (en) Signal processing in a hearing device
CN113301872A (en) Apparatus and method for improving perceptibility by sound control
US10334376B2 (en) Hearing system with user-specific programming
DE112019003350T5 (en) INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING METHOD AND PROGRAM
JP2532007B2 (en) Hearing aid fitting device
JP2012063614A (en) Masking sound generation device
JP3894695B2 (en) Hearing aid adjustment device and hearing aid
JP3982797B2 (en) Method and apparatus for determining signal quality
KR102292544B1 (en) Apparatus and method for evaluating cognitive response of comparative sounds
RU2713984C1 (en) Method of training people with hearing disorders of 1 to 4 degree and speech defects on oral-aural development simulator
CN114818769A (en) Man-machine symbiosis based optimization method and device for sound signal processing
AU2010347009B2 (en) Method for training speech recognition, and training device
DE102006015497A1 (en) Audio system e.g.hearing aid, has input module with push button realized by external device and system, where sequence of instruction is produced, such that user of system is requested for operating switch for providing feedback to system
DE102020208720B4 (en) Method for operating a hearing system depending on the environment
JP2837639B2 (en) Remote controller
Ni et al. A Real-Time Smartphone App for Field Personalization of Hearing Enhancement by Adaptive Dynamic Range Optimization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination