CN111568384A

CN111568384A - Voice noise reduction method and device in medical scanning and computer equipment

Info

Publication number: CN111568384A
Application number: CN202010474082.9A
Authority: CN
Inventors: 史宇航; 毛苏杭
Original assignee: Shanghai United Imaging Healthcare Co Ltd
Current assignee: Shanghai United Imaging Healthcare Co Ltd
Priority date: 2020-05-29
Filing date: 2020-05-29
Publication date: 2020-08-25

Abstract

The application relates to a voice noise reduction method, a voice noise reduction device, computer equipment and a readable storage medium in medical scanning, wherein the method comprises the steps of acquiring original sound data between scanning and operating of image scanning equipment; the original sound data comprises talkback voice data and first noise data generated by the image scanning equipment based on a current scanning protocol; acquiring a current scanning protocol of image scanning equipment; inputting original sound data and a scanning protocol into a deep learning neural network with complete training; acquiring target talkback voice data output by a well-trained deep learning neural network; and the target talkback voice data is obtained by carrying out noise reduction processing on the original voice data based on the well-trained deep learning neural network. The method can effectively filter the noise generated by the image scanning equipment in the medical scanning, further optimize the scanning process and improve the scanning speed.

Description

Voice noise reduction method and device in medical scanning and computer equipment

Technical Field

The present application relates to the field of medical technology, and in particular, to a method and an apparatus for reducing noise in a medical scan, and a computer device.

Background

In medical imaging examination, a patient and a technician need to communicate through a voice interphone to complete a scanning activity. Efficient voice talkback can improve scanning speed and optimize scanning flow and results. However, the technician and patient communication is often subject to acoustic disturbances, such as instrument operating sounds, background noise, and intercom echoes. To remove this sound, one would optimize the captured sound using conventional speech noise reduction algorithms.

Conventional noise reduction algorithms typically only provide effective noise reduction for regular and steady noise. However, in the medical scanning process, most of the environmental noise comes from random noise sources, so that the traditional noise reduction algorithm has a limited effect of optimizing the noise when the image scanning device is operated.

Aiming at the problem of poor noise optimization effect generated when the image scanning equipment operates in the related art, no effective solution is provided at present.

Disclosure of Invention

The application provides a voice noise reduction method, a voice noise reduction device, computer equipment and a computer readable storage medium in medical scanning, so as to at least solve the problem that the optimization effect of noise generated in medical scanning is poor in the related art.

In a first aspect, an embodiment of the present application provides a method for reducing noise in a medical scan, where the method includes:

acquiring original sound data between scans and operations of an image scanning device; the original sound data comprises talkback voice data and first noise data generated by the image scanning equipment based on a current scanning protocol;

acquiring a current scanning protocol of the image scanning equipment;

inputting the original sound data and the scanning protocol into a deep learning neural network with complete training;

acquiring target talkback voice data output by the well-trained deep learning neural network; and the target talkback voice data is obtained by carrying out noise reduction processing on the original voice data based on the well-trained deep learning neural network.

In some of these embodiments, before inputting the raw sound data and the scanning protocol to a well-trained deep learning neural network, the method further comprises:

constructing an initial deep learning neural network;

acquiring noise-free voice data;

acquiring a scanning protocol of image scanning equipment in a scanning process and second noise data generated under the scanning protocol;

obtaining a training sample according to the noiseless voice data, the scanning protocol and the second noise data;

and inputting the training sample into the initial deep learning neural network, updating the parameters of the initial deep learning neural network through error back propagation until the error is converged, and obtaining the deep learning neural network with complete training.

In some embodiments, the obtaining training samples from the noiseless voice data, the scan protocol, and the second noisy data comprises:

synthesizing the second noise data and the adjusted noise-free voice data to obtain synthesized data;

and obtaining the training sample according to the synthetic data and the scanning protocol.

In some embodiments, before the synthesizing the second noise data and the adjusted noiseless speech data, the method further comprises: adjusting a volume and/or a speed of the noiseless voice data.

In some of these embodiments, the obtaining noise-free speech data comprises:

and in the stage of not starting the image scanning equipment, acquiring the voice data subjected to denoising treatment in the scanning room as the noise-free voice data.

In some embodiments, the acquiring the second noise data generated by the image scanning device in the scanning protocol and the scanning protocol during the scanning process includes:

and acquiring sound data generated by different types of image scanning equipment under different scanning protocols through a noise acquisition device, and taking the sound data as second noise data.

In some of these embodiments, the error employed in training the initial deep-learning neural network comprises: and errors of the voice data obtained after noise reduction processing is carried out on the training samples by the actually acquired noiseless voice data and the initial deep learning neural network.

In a second aspect, an embodiment of the present application provides an apparatus for reducing noise in a medical scan, where the apparatus includes:

the first acquisition module is used for acquiring original sound data between scanning and operating rooms of the image scanning equipment; the original sound data comprises talkback voice data and first noise data generated by the image scanning equipment based on a current scanning protocol;

the second acquisition module is used for acquiring the current scanning protocol of the image scanning equipment;

the data input module is used for inputting the original sound data and the scanning protocol into a deep learning neural network with complete training;

the third acquisition module is used for acquiring target talkback voice data output by the well-trained deep learning neural network; and the target talkback voice data is obtained by carrying out noise reduction processing on the original voice data based on the well-trained deep learning neural network.

In a third aspect, an embodiment of the present application provides a computer device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and the processor, when executing the computer program, implements the voice noise reduction method according to the first aspect.

In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the speech noise reduction method according to the first aspect.

Compared with the prior art, the voice noise reduction method in medical scanning provided by the embodiment of the application learns noise data generated by the image scanning equipment under different scanning protocols by utilizing the deep learning algorithm, so that noise parts can be accurately filtered, clean and noiseless voice is output, a patient and a doctor can communicate more effectively in the scanning process, the scanning flow is optimized, and the scanning speed is increased.

The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below to provide a more thorough understanding of the application.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

FIG. 1 is a flow chart of a method for speech noise reduction in a medical scan according to an embodiment;

FIG. 2 is a schematic diagram of MRI different sequence noise provided by an embodiment;

FIG. 3a is a spectrum diagram of gradient noise according to an embodiment;

FIG. 3b is a spectrum diagram of cold head noise according to an embodiment;

FIG. 4 is a schematic structural diagram of a convolutional neural network according to an embodiment;

FIG. 5 is a diagram illustrating a preferred structure of a convolutional neural network according to an embodiment;

FIG. 6 is a flowchart for obtaining training samples according to a noiseless voice data, a scanning protocol and a second noisy data according to another embodiment;

FIGS. 7 a-7 c are time domain waveforms of the non-noise data, the second noise data and the synthesized data according to an embodiment;

FIGS. 8a to 8c are spectrograms of the noiseless data, the second noise data and the synthesized data according to another embodiment;

FIG. 9 is a block diagram of a voice noise reduction apparatus in a medical scan according to an embodiment;

FIG. 10 is a block diagram illustrating an exemplary embodiment of a voice noise reduction apparatus for medical scanning;

fig. 11 is a hardware configuration diagram of a computer device in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described and illustrated below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments provided in the present application without any inventive step are within the scope of protection of the present application.

It is obvious that the drawings in the following description are only examples or embodiments of the present application, and that it is also possible for a person skilled in the art to apply the present application to other similar contexts on the basis of these drawings without inventive effort. Moreover, it should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another.

Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of ordinary skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments without conflict.

Unless defined otherwise, technical or scientific terms referred to herein shall have the ordinary meaning as understood by those of ordinary skill in the art to which this application belongs. Reference to "a," "an," "the," and similar words throughout this application are not to be construed as limiting in number, and may refer to the singular or the plural. The present application is directed to the use of the terms "including," "comprising," "having," and any variations thereof, which are intended to cover non-exclusive inclusions; for example, a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to the listed steps or elements, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. Reference to "connected," "coupled," and the like in this application is not intended to be limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. The term "plurality" as referred to herein means two or more. "and/or" describes an association relationship of associated objects, meaning that three relationships may exist, for example, "A and/or B" may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. Reference herein to the terms "first," "second," "third," and the like, are merely to distinguish similar objects and do not denote a particular ordering for the objects.

The present embodiment is preferably applied to a medical image scanning apparatus, such as a Positron Emission Tomography (PET) apparatus, a Magnetic Resonance Imaging (MRI) apparatus, and the like. In the present embodiment, the invention will be described and illustrated by taking the example of the magnetic resonance imaging apparatus acquiring a magnetic resonance image.

Fig. 1 is a flowchart of a voice denoising method in a medical scan according to an embodiment, as shown in fig. 1, including steps 110 to 140; wherein:

step 110, acquiring original sound data between scans and operations of an image scanning device; the original sound data comprises talkback voice data and first noise data generated by the image scanning equipment based on the current scanning protocol.

During a medical scan, the patient is in the scan room and the technician controls the scan process in the procedure room. During scanning, the patient and the technician need to communicate through voice intercommunication to complete the scanning activity, but the raw sound data can be affected by the operating sound of the image scanning equipment and other environmental sounds, so that the collected raw sound data needs to be subjected to noise reduction processing to improve the scanning process.

In the scanning process, original sound data in a scanning room and an operating room of the image scanning equipment can be collected in real time, wherein the original sound data comprises talkback voice data and first noise data generated by the image scanning equipment based on a current scanning protocol. The talkback voice data is the actual conversation content between the patient and the technician, and the first noise data is the operation sound data of the image scanning equipment. It will be appreciated that the raw sound data may also include background noise and noise generated by the sound collection device, such as echoes of a microphone.

Step 120, acquiring a current scanning protocol of the image scanning device.

The noise source and the generated noise are different under different scanning modes aiming at different image scanning equipment. For example, the noise results vary from one magnetic resonance scan sequence to another, including a variety of sequences such as the T1W-IR sequence, inversion recovery sequence, gradient echo sequence, fast spin echo sequence FSE, and so forth. Different magnetic resonance scanning sequences are adopted for scanning, and the noise generated by the image scanning equipment is very different. As can be seen from fig. 2, when the MRI sequences 1 to 4 are respectively used for image scanning, the spectral distribution of noise generated by the image scanning device is very different. In addition, in magnetic resonance, the noise includes noise at the time of cold head operation and noise at the time of gradient operation, fig. 3a is a spectrogram of gradient noise, and fig. 3b is a spectrogram of cold head noise, as is apparent from fig. 3a and 3b, the frequency spectrum distribution of gradient noise and cold head noise is very different, and therefore, the corresponding noise data is also very different. In addition, there are a plurality of types of gradient noise data. By acquiring the current scanning protocol of the image scanning device, the noise data generated by the image scanning device can be correspondingly determined.

In the drawings, Frequency and Time are shown as Frequency and Time, respectively.

And step 130, inputting the original sound data into a well-trained deep learning neural network.

After the original sound data and the scanning protocol are input into the deep learning neural network, the deep learning neural network can search the corresponding noise data according to the scanning protocol, and then the original sound data is subjected to noise reduction processing according to the noise data. Specifically, the original sound data may be subtracted from the found noise data.

It can be understood that, when the deep learning model is trained, the deep learning model is trained to learn the noise generated by the image scanning device corresponding to each scanning protocol, so that adaptive noise reduction processing can be performed according to the specific scanning mode of the current image scanning device, and the noise reduction precision and the noise reduction efficiency are improved.

Step 140, obtaining target talkback voice data output by the well-trained deep learning neural network; and the target talkback voice data is obtained by carrying out noise reduction processing on the original voice data based on the well-trained deep learning neural network.

In the prior art, a filter with a higher order is usually used to perform adaptive filtering on speech to complete a noise reduction process. However, the noise reduction method can only effectively reduce regular and stable noise, and because most of the noise generated by the image scanning device is from random noise sources in the medical scanning process, the traditional voice noise reduction method has a poor voice noise reduction effect in the medical scanning. Compared with the prior art, the deep learning neural network is introduced into the scheme provided by the steps, and the noise data generated by the medical equipment in each scanning protocol is learned through the deep learning neural network, so that the noise part in the original sound data can be accurately filtered, and clean and noiseless voice is output. It can be seen that the above steps provided by this embodiment solve the problem of poor noise reduction effect on speech in medical scanning in the related art.

The neural network in this embodiment includes any artificial neural network that can implement a deep learning algorithm. Among artificial Neural Networks, Convolutional Neural Networks (CNNs) are a class of feed-forward Neural Networks (fed forward Neural Networks) that include convolution calculations and have deep structures, and are one of the representative algorithms for deep learning. The deep learning neural network has memorability, shared parameters and complete image, so that the nonlinear characteristics of noise can be learned with high efficiency. Deep learning neural networks have proven and successfully implement data detection-related applications. The inventor finds out in the research process that: in the medical scanning process, most of the environmental noise comes from random noise sources and is a typical nonlinear noise, so that the deep learning neural network is adopted to perform noise reduction on the original sound data in the embodiment, so that a good effect can be achieved.

In some embodiments, after the original sound data between scans and operations of the image scanning device is acquired, in order to remove abnormal values, the original sound data may be further preprocessed, where the preprocessing includes, but is not limited to, at least one of the following: the original sound data is smoothed. The smoothing process may be performed by smoothing the waveform using a 2n + 1-point simple moving average filtering method, a weighted moving average filtering method, a smooth function smoothing filtering method, a one-dimensional median filtering method, or the like. By carrying out the preprocessing on the original sound data, the influence of abnormal values on a noise reduction result is reduced in the noise reduction process, and the training efficiency is improved in the training process.

In some embodiments, before inputting the raw sound data into the well-trained deep learning neural network, the method for speech noise reduction in medical scanning further comprises:

constructing an initial deep learning neural network;

acquiring noiseless voice data and second noise data generated by image scanning equipment in a scanning process;

obtaining a training sample according to the second noise data and the noiseless voice data;

In this embodiment, the architecture of the neural network may be implemented using tensorflow, and alternative architectures include caffe, pytore, and the like. The structure of the Neural network employed in the embodiment of the present invention will be described below by taking a Convolutional Neural Network (CNN) as an example. In other embodiments, a Recurrent Neural Networks (RNN) may also be used, and particularly, the present embodiment is not limited.

Fig. 4 is a schematic structural diagram of a convolutional neural network according to an embodiment of the present invention, as shown in fig. 4, the convolutional neural network includes: input layer, convolution layer, normalization layer, pooling layer, full-link layer, loss layer and output layer.

The input layer is used for inputting data, and in this embodiment, the input layer is training sample data. The training sample data comprises voice data obtained by synthesizing collected noise data and noise-free voice data and a scanning protocol operated by the image scanning equipment when the noise data is collected. The input data of the input layer may be three types: (1) waveform data and scanning protocol of the voice data; (2) spectral data and scanning protocols for voice data; (3) waveform data of voice data, spectrum data, and a set of scan protocols. Wherein: the waveform data of the voice data includes: 512 data points for each speech frame length and 256 data points for each frame shift (the amount of overlap of two preceding and succeeding frames). Since human speech is not intermittent, and is correlated between each frame, the frame shift is better approximated to actual speech. The spectral data of the voice data is: and performing Fourier transform on the time domain sampling point of each frame on the basis of the waveform data, obtaining the power spectrum of the time domain sampling point, and then taking the logarithm to obtain 257-dimensional log power spectrum characteristics. All the extracted log power spectrum features are subjected to standardization treatment, the mean value is returned to 0, and the variance is returned to 1, so that the gradient descent method is favorable for finding the minimum value more quickly. The scanning protocol comprises: any information that is related to the operation command of the image scanning device, i.e. the information of the image scanning device. For example in magnetic resonance, it may be a sequence of scans or a type of scan sequence.

Convolutional layers are used to extract different features of the input data, with lower convolutional layers possibly extracting only some low-level features, and with more layers of networks being able to iteratively extract more complex features from the low-level features.

The normalization layer is used for forcibly pulling back the input distribution which is gradually mapped to the nonlinear function and then is close to the extreme saturation region of the value-taking interval to the standard normal distribution with the mean value of 0 and the variance of 1, so that the input value of the nonlinear transformation function falls into a region which is sensitive to input, and the problem of gradient disappearance is avoided.

The pooling layer is used for down-sampling data, learning and classifying multi-scale data features, improving the classification identification degree of model, providing nonlinearity, reducing the number of model parameters and reducing the over-fitting problem.

The full connection layer is used for refitting at the tail part of the CNN, so that the loss of characteristic information is reduced.

The lossy layer receives two inputs, one of which is the optimized speech data output by the CNN and the other of which is the actually collected noise-free speech data. The loss layer carries out a series of operations on the two inputs to obtain a loss function of the current network. The goal of deep learning is to find the weights in the weight space that minimize the loss function. The loss function is obtained in the forward propagation calculation and is also the starting point of the backward propagation, the loss function basically consists of a real value and a predicted value, the correct loss function can achieve the effect of enabling the predicted value to approach the real value all the time, and when the predicted value and the real value are equal, the loss value is the minimum. The loss function employed in this embodiment is preferably a flexible maximum function, a cross-entropy loss function or a squared error loss function.

The output layer is used for outputting the voice waveform data or the frequency spectrum data after noise reduction corresponding to the input.

Experiments show that the three-layer CNN neural network shown in FIG. 5 is adopted in the embodiment of the invention, so that the balance between the representation capability of the neural network and the calculation cost of the training network can be achieved. In this embodiment, the normalization layer is preferably a batch normalization layer. The bulk normalization layer can improve the gradient of flow through the network relative to the local response normalization layer; allowing a greater learning rate and thus increasing training speed.

In some embodiments, said deriving training samples from said second noisy data and said noiseless speech data comprises

steps

610 and 620; wherein:

step 610, synthesizing the second noise data and the adjusted noise-free voice data to obtain synthesized data;

and step 620, obtaining the training sample according to the synthetic data and the scanning protocol.

In some embodiments, before performing synthesis processing on the second noise data and the adjusted noiseless speech data, the method further includes:

adjusting a volume and/or a speed of the noiseless voice data. In particular, the adjustment may be made by a speech adjustment algorithm. The voice adjusting algorithm can be preset, and after the noise-free voice data are obtained, the voice adjusting algorithm is directly called to adjust the volume and/or the speed of the noise-free voice data.

In this embodiment, in order to increase the robustness of the deep learning model, the volume and the speed of the noiseless speech data may be adjusted before the second noisy data and the noiseless speech data are synthesized. The volume may be specifically adjusted to be higher or lower, and the speed may be adjusted, for example, the speed may be played at 1.5 times and 0.8 times, so as to simulate the speaking voice at different speeds and volumes. In addition, in practical application, the volume and the speed are not limited to be adjusted, for example, the tone of the voice can be adjusted, and generally, the material of the sounding body and the speaking mode of the speaker are important factors for determining the tone, so dialects in different regions and specific conversation modes can be better simulated by adjusting the tone, and the robustness of the deep learning model can be further increased.

It is understood that the volume and/or the speed of the second noise data may be adjusted at the same time to simulate the noise generated by the image scanning device in different models and different operating states.

And synthesizing the second noise data and the adjusted noiseless voice data, which may specifically be: the adjusted noise-free voice data and the second noise data are subjected to algorithms such as addition or weighted addition to realize sound mixing, and the sound mixing can also be carried out by adopting an analog signal. The method is used for synthesizing to obtain the data containing the noise and the talkback voice data, and the algorithm is simple and easy to implement. Of course, other speech synthesis algorithms may be used to perform the synthesis process, and this embodiment is not limited thereto.

In addition, in the synthesis process, synthesis processing may be performed on time domain waveforms of speech data and noise data, as shown in fig. 7. Fig. 7a is a time domain waveform of the collected noise-free voice data, specifically, a human voice, fig. 7b is a time domain waveform of the collected second noise data, which may be noise data generated during an MRI operation, for example, and fig. 7c is a time domain waveform of synthesized data obtained by synthesizing the noise-free voice data and the second noise data.

In the synthesis process, the fourier transform may be performed on the voice data and the noise data to convert the time domain waveform into a frequency spectrum, and then the synthesis process may be performed on the frequency spectrums of the voice data and the noise data, as shown in fig. 8. Fig. 8a shows a spectrum of the collected noiseless voice data, specifically, a human voice, fig. 8b shows a spectrum of the collected second noise data, which may be noise data generated during the operation of MRI, for example, and fig. 7c shows a spectrum of synthesized data obtained by synthesizing the noiseless voice data and the second noise data.

The signal-to-noise ratio of the synthesized speech can be fixed 5 signal-to-noise ratios, such as-5 dB, 0dB, 5dB, 10dB and 15dB, and the specific value of the signal-to-noise ratio is not limited in this embodiment.

In some of these embodiments, the obtaining noise-free speech data comprises:

and at the stage of not starting the image scanning equipment, acquiring the voice data subjected to denoising treatment in the scanning room as the noise-free voice data.

In the clinical scanning process, a specific dialogue mode is provided between the patient and the technician, and in addition, in consideration of dialects of different regions of the patient and the technician, the embodiment collects pure and noiseless dialogue of different people between scans as noiseless data used for training, so that the robustness of a deep learning model can be improved, and the method has more representative significance for deep learning training.

In some of these embodiments, the noise-free voice data may use a public voice database, such as the TIMIT voice database. The speech sampling frequency of the TIMIT speech library is 16kHz, the TIMIT speech library comprises 6300 sentences, each person of 630 persons from eight main dialect regions speaks given 10 sentences, all the sentences are manually segmented and marked on a phoneme level (phone level), and speech data under different scenes can be simulated.

In some of these embodiments, said obtaining second noise data comprises: acquiring sound data generated by different types of image scanning equipment under different scanning protocols through a noise acquisition device, and taking the sound data as second noise data; wherein, the microphone is arranged at one end along the axial direction of the scanning cavity.

The types of the image scanning equipment comprise CT, MRI, PET/CT and the like, sound data generated by different types of image scanning equipment in the operation process are different, sound data generated by the same type of image scanning equipment under different scanning protocols are also different, and the noise data under different scenes can be simulated by acquiring the sound data generated by the different types of image scanning equipment under different scanning protocols as second noise data through the noise acquisition device, so that the robustness of the deep learning model is improved.

The noise collection device can adopt a microphone, and in addition, in order to keep the voice optimization effect to be most embodied in practical application, the microphone adopted by the noise collection device is similar to the type of microphone used in practical clinic. For example, magnetic resonance employs a diamagnetic microphone. Such microphone hardware may not be compatible with commercially available microphones. The sound quality obtained by different microphones is different, and the data collected by the actually used microphones is closer to the real situation.

It should be noted that the steps illustrated in the above-described flow diagrams or in the flow diagrams of the figures may be performed in a computer system, such as a set of computer-executable instructions, and that, although a logical order is illustrated in the flow diagrams, in some cases, the steps illustrated or described may be performed in an order different than here.

In one embodiment, as shown in fig. 9, there is provided a voice noise reduction apparatus in medical scanning, including: a first obtaining module 910, a second obtaining module 920, a data input module 930, and a third obtaining module 940, wherein:

a first obtaining module 910, configured to obtain original sound data between scans and operations of an image scanning apparatus; the original sound data comprises talkback voice data and first noise data generated by the image scanning equipment based on a current scanning protocol;

a second obtaining module 920, configured to obtain a current scanning protocol of the image scanning apparatus;

a data input module 930, configured to input the original voice data and the scanning protocol into a well-trained deep learning neural network;

a third obtaining module 940, configured to obtain target intercom voice data output by the well-trained deep learning neural network; and the target talkback voice data is obtained by carrying out noise reduction processing on the original voice data based on the well-trained deep learning neural network.

The voice noise reduction device in medical scanning provided by this embodiment includes a first obtaining module 910, a second obtaining module 920, a data input module 930, and a third obtaining module 940, and is configured to obtain, by the first obtaining module 910, original sound data between scanning and operation of an image scanning device; the original sound data comprises talkback voice data and first noise data generated by the image scanning equipment based on a current scanning protocol; a second obtaining module 920, configured to obtain a current scanning protocol of the image scanning apparatus; a data input module 930, configured to input the original voice data and the scanning protocol into a well-trained deep learning neural network; a third obtaining module 940, configured to obtain target intercom voice data output by the well-trained deep learning neural network; and the target talkback voice data is obtained by carrying out noise reduction processing on the original voice data based on the well-trained deep learning neural network. By the method and the device, the problem of unsatisfactory voice noise reduction effect in medical scanning is solved, the voice noise reduction effect in the medical scanning is improved, the scanning process is further optimized, and the scanning speed is increased.

In some embodiments, the voice denoising device in medical scanning further comprises a model training module for constructing an initial deep learning neural network to obtain noiseless voice data;

acquiring a scanning protocol of image scanning equipment in a scanning process and second noise data generated under the scanning protocol; obtaining a training sample according to the noiseless voice data, the scanning protocol and the second noise data; and inputting the training sample into the initial deep learning neural network, updating the parameters of the initial deep learning neural network through error back propagation until the error is converged, and obtaining the deep learning neural network with complete training.

In some embodiments, the model training module is further configured to: synthesizing the second noise data and the adjusted noise-free voice data to obtain synthesized data; and obtaining the training sample according to the synthetic data and the scanning protocol.

In some embodiments, the model training module is further configured to: adjusting a volume and/or a speed of the noiseless voice data.

In some embodiments, the model training module is further configured to: and in the stage of not starting the image scanning equipment, acquiring the voice data subjected to denoising treatment in the scanning room as the noise-free voice data.

In some embodiments, the model training module is further configured to: and acquiring sound data generated by different types of image scanning equipment under different scanning protocols through a noise acquisition device, and taking the sound data as second noise data.

In a practical use scenario, the speech noise reduction apparatus is shown in fig. 10. The first acquisition module is a microphone, is arranged at one end of the scanning cavity and is used for acquiring original sound data in the scanning process. The microphone inputs the collected original sound data into the deep learning model, and the deep learning model carries out noise reduction processing on the original sound data and outputs optimized voice data.

The second obtaining module is specifically a scanning protocol obtaining module, and is configured to obtain a current scanning protocol of the image scanning device in real time during a scanning process of the image scanning device.

For the specific definition of the speech noise reduction apparatus, reference may be made to the above definition of the speech noise reduction method, which is not described herein again. The modules in the voice noise reduction device can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In addition, the speech noise reduction method of the embodiment described in conjunction with fig. 1 may be implemented by a computer device. Fig. 11 is a hardware configuration diagram of a computer device according to an embodiment of the present application.

The computer device may comprise a processor 101 and a memory 102 storing computer program instructions.

Specifically, the processor 101 may include a Central Processing Unit (CPU), or A Specific Integrated Circuit (ASIC), or may be configured to implement one or more Integrated circuits of the embodiments of the present Application.

Memory 102 may include, among other things, mass storage for data or instructions. By way of example, and not limitation, memory 102 may include a Hard Disk Drive (Hard Disk Drive, abbreviated HDD), a floppy Disk Drive, a Solid State Drive (SSD), flash memory, an optical Disk, a magneto-optical Disk, tape, or a Universal Serial Bus (USB) Drive or a combination of two or more of these. Memory 102 may include removable or non-removable (or fixed) media, where appropriate. The memory 102 may be internal or external to the data processing apparatus, where appropriate. In a particular embodiment, the memory 102 is a Non-Volatile (Non-Volatile) memory. In particular embodiments, Memory 102 includes Read-Only Memory (ROM) and Random Access Memory (RAM). The ROM may be mask-programmed ROM, Programmable ROM (PROM), Erasable PROM (EPROM), Electrically Erasable PROM (EEPROM), Electrically rewritable ROM (EAROM), or FLASH Memory (FLASH), or a combination of two or more of these, where appropriate. The RAM may be a Static Random-Access Memory (SRAM) or a Dynamic Random-Access Memory (DRAM), where the DRAM may be a Fast Page Mode Dynamic Random-Access Memory (FPMDRAM), an extended data output Dynamic Random-Access Memory (EDODRAM), a Synchronous Dynamic Random-Access Memory (SDRAM), and the like.

The memory 102 may be used to store or cache various data files that need to be processed and/or used for communication, as well as possible computer program instructions executed by the processor 102.

The processor 101 may read and execute the computer program instructions stored in the memory 102 to implement any of the voice noise reduction methods in the above embodiments.

In some of these embodiments, the computer device may also include a communication interface 103 and bus 100. As shown in fig. 11, the processor 101, the memory 102, and the communication interface 103 are connected via a bus 100 to complete communication therebetween.

The communication interface 103 is used for implementing communication between modules, apparatuses, units and/or devices in the embodiments of the present application. The communication port 103 may also be implemented with other components such as: the data communication is carried out among external equipment, image/data acquisition equipment, a database, external storage, an image/data processing workstation and the like.

Bus 100 includes hardware, software, or both to couple the components of the computer device to each other. Bus 100 includes, but is not limited to, at least one of the following: data Bus (Data Bus), Address Bus (Address Bus), Control Bus (Control Bus), Expansion Bus (Expansion Bus), and Local Bus (Local Bus). By way of example, and not limitation, Bus 100 may include an Accelerated Graphics Port (AGP) or other Graphics Bus, an Enhanced Industry Standard Architecture (EISA) Bus, a Front-Side Bus (FSB), a HyperTransport (HT) interconnect, an ISA (ISA) Bus, an InfiniBand (InfiniBand) interconnect, a Low Pin Count (LPC) Bus, a memory Bus, a Micro Channel Architecture (MCA) Bus, a Peripheral Component Interconnect (PCI) Bus, a PCI-Express (PCI-X) Bus, a Serial Advanced Technology Attachment (SATA) Bus, a Video electronics standards Association Local Bus (VLB) Bus, or other suitable Bus or a combination of two or more of these. Bus 100 may include one or more buses, where appropriate. Although specific buses are described and shown in the embodiments of the application, any suitable buses or interconnects are contemplated by the application.

The computer device may execute the voice noise reduction method in the embodiment of the present application based on the obtained program instruction, thereby implementing the voice noise reduction method described in conjunction with fig. 1.

In addition, in combination with the voice noise reduction method in the foregoing embodiment, the embodiment of the present application may provide a computer-readable storage medium to implement. The computer readable storage medium having stored thereon computer program instructions; the computer program instructions, when executed by a processor, implement any of the speech noise reduction methods of the above embodiments.

The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the claims. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A method of speech noise reduction in a medical scan, the method comprising:

acquiring a current scanning protocol of the image scanning equipment;

2. The method of claim 1, wherein prior to inputting the raw sound data and the scanning protocol to a well-trained deep learning neural network, the method further comprises:

constructing an initial deep learning neural network;

acquiring noise-free voice data;

3. The method of claim 2, wherein obtaining training samples based on the noiseless voice data, the scanning protocol, and the second noisy data comprises:

4. The method of claim 3, wherein prior to the synthesizing the second noisy data and the adjusted noiseless speech data, the method further comprises:

adjusting a volume and/or a speed of the noiseless voice data.

5. The method of claim 2, wherein the obtaining noise-free speech data comprises:

6. The method of claim 2, wherein acquiring the scan protocol of the image scanning device during the scanning process and the second noise data generated under the scan protocol comprises:

7. The method of claim 2, wherein training the error employed in the initial deep learning neural network comprises: and errors of the voice data obtained after noise reduction processing is carried out on the training samples by the actually acquired noiseless voice data and the initial deep learning neural network.

8. An apparatus for speech noise reduction in a medical scan, the apparatus comprising:

9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.