CN116246604A - Method and apparatus for personalized sound masking in a vehicle - Google Patents

Method and apparatus for personalized sound masking in a vehicle Download PDF

Info

Publication number
CN116246604A
CN116246604A CN202211143018.8A CN202211143018A CN116246604A CN 116246604 A CN116246604 A CN 116246604A CN 202211143018 A CN202211143018 A CN 202211143018A CN 116246604 A CN116246604 A CN 116246604A
Authority
CN
China
Prior art keywords
sound
masking
personalized
controller
occupant
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211143018.8A
Other languages
Chinese (zh)
Inventor
李明玉
俞正根
赵文焕
李康德
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hyundai Motor Co
Kia Corp
Original Assignee
Hyundai Motor Co
Kia Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hyundai Motor Co, Kia Corp filed Critical Hyundai Motor Co
Publication of CN116246604A publication Critical patent/CN116246604A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10KSOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
    • G10K11/00Methods or devices for transmitting, conducting or directing sound in general; Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
    • G10K11/16Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
    • G10K11/175Methods or devices for protecting against, or for damping, noise or other acoustic waves in general using interference effects; Masking sound
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/26Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10KSOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
    • G10K11/00Methods or devices for transmitting, conducting or directing sound in general; Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
    • G10K11/16Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
    • G10K11/175Methods or devices for protecting against, or for damping, noise or other acoustic waves in general using interference effects; Masking sound
    • G10K11/1752Masking
    • G10K11/1754Speech masking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/15Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being formant information
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/02Casings; Cabinets ; Supports therefor; Mountings therein
    • H04R1/025Arrangements for fixing loudspeaker transducers, e.g. in a box, furniture
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10KSOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
    • G10K2210/00Details of active noise control [ANC] covered by G10K11/178 but not provided for in any of its subgroups
    • G10K2210/10Applications
    • G10K2210/128Vehicles
    • G10K2210/1282Automobiles
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10KSOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
    • G10K2210/00Details of active noise control [ANC] covered by G10K11/178 but not provided for in any of its subgroups
    • G10K2210/30Means
    • G10K2210/301Computational
    • G10K2210/3023Estimation of noise, e.g. on error signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10KSOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
    • G10K2210/00Details of active noise control [ANC] covered by G10K11/178 but not provided for in any of its subgroups
    • G10K2210/30Means
    • G10K2210/301Computational
    • G10K2210/3027Feedforward
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/18Artificial neural networks; Connectionist approaches
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2499/00Aspects covered by H04R or H04S not otherwise provided for in their subgroups
    • H04R2499/10General applications
    • H04R2499/13Acoustic transducers and sound field adaptation in vehicles
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Human Computer Interaction (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Fittings On The Vehicle Exterior For Carrying Loads, And Devices For Holding Or Mounting Articles (AREA)

Abstract

The invention relates to a personalized sound masking method and device in a vehicle. The present invention provides a computer-implemented method for sound masking. The method comprises the following steps: identifying one of a plurality of categories based on a frequency characteristic of a voice of a first occupant in the vehicle; acquiring reference voice data corresponding to the identified category; generating a personalized masking sound for the first occupant by synthesizing the reference voice data with white noise; the output of the personalized masking sound is controlled.

Description

Method and apparatus for personalized sound masking in a vehicle
Cross Reference to Related Applications
The present application claims priority and benefit from korean patent application No.10-2021-0174408, filed on 8 of 12 th month 2021, the entire contents of which are incorporated herein by reference.
Technical Field
The present invention relates to a method and apparatus for sound masking in a vehicle.
Background
The statements in this section merely provide background information related to the present disclosure and may not constitute prior art.
Recently, vehicles provide various complex functions such as a navigation function, a telephone function, an audio function, and a radio function.
The functions required by the driver or the occupants in the vehicle become more and more complex. For example, in an occupant of a vehicle, the driver may want to receive voice guidance for navigation and the occupant may want to listen to music. In addition, the driver may not want the occupant to hear the call to protect privacy when making a call through the bluetooth connection inside the vehicle.
In light of this demand, studies on sound masking technology have been actively conducted. Sound masking is a technique of causing an occupant to recognize surrounding noise less by generating a masking sound, which is an artificial sound such as white noise.
However, cognitive or psychological fatigue may accumulate when a person is continuously exposed to masking sounds. Further, outputting masking sounds uniformly has a limitation in masking various surrounding noises.
Disclosure of Invention
According to at least one aspect, the present invention provides a computer-implemented method for sound masking. The method comprises the following steps: identifying, by the controller, one of a plurality of categories based on a frequency characteristic of a voice of a first occupant in the vehicle; acquiring, by a controller, reference voice data corresponding to the identified category; generating, by the controller, a personalized masking sound for the first occupant by synthesizing the reference voice data with white noise; the output of the personalized masking sound is controlled by a controller.
According to at least another aspect, the present invention provides a sound masking device. The sound masking device includes: a microphone configured to receive speech of a first occupant in the vehicle; at least one speaker disposed in the vehicle; and a controller configured to identify one of a plurality of categories based on frequency characteristics of the voice, acquire reference voice data corresponding to the identified category, generate a personalized masking sound for the first occupant by synthesizing the reference voice data with white noise, and control at least one speaker to output the personalized masking sound.
Drawings
Fig. 1 is a schematic diagram showing a plurality of speakers and seats in a vehicle interior according to one embodiment of the present invention.
Fig. 2 is a block diagram of a sound control system according to one embodiment of the present invention.
Fig. 3 is a schematic diagram for explaining a sound masking method according to an embodiment of the present invention.
Fig. 4A is a schematic diagram for explaining category identification using a classification model according to an embodiment of the present invention.
Fig. 4B is a schematic diagram for explaining a process of generating a personalized masking sound according to one embodiment of the present invention.
Fig. 5 is a schematic view showing a personalized masking sound according to an embodiment of the present invention.
Fig. 6 is a flowchart illustrating a sound masking method according to an embodiment of the present invention.
Detailed Description
Hereinafter, some embodiments of the present invention will be described in detail with reference to the exemplary drawings. With respect to the reference numerals for the components of the various figures, it should be noted that like reference numerals refer to like components even though they are shown in different figures. In addition, in describing the present invention, if descriptions of well-known configurations or functions related to the present invention may obscure the subject matter of the present invention, detailed descriptions of well-known configurations or functions related to the present invention may be omitted.
Furthermore, terms such as "first," "second," "i," "ii," "a," "b," and the like may be used to describe components of the present invention. These terms are only used to distinguish one corresponding component from another and the nature, order, or sequence of the corresponding components is not limited by these terms. In the specification, when a unit "comprises" or "comprising" a certain component, it means that other components may be further included without excluding other components, unless explicitly stated otherwise.
The individual components or methods of the apparatus according to the invention may be implemented as hardware or software, or as a combination of hardware and software. Further, the functions of the respective components may be implemented as software, and the microprocessor may execute the functions of the software corresponding to the respective components.
The present invention provides such a method and apparatus: which classifies voices of occupants in a vehicle according to frequency characteristics and generates and outputs personalized masking sounds for the voices of the occupants to reduce cognitive or psychological fatigue due to sound masking that other occupants may feel.
Furthermore, the present invention provides such a method and apparatus: which classifies voices of occupants in a vehicle according to frequency characteristics and generates and outputs personalized masking sounds for the voices of the occupants, thereby improving sound masking performance.
Furthermore, the present invention provides such a method and apparatus: when the voice of the passenger is input in real time, the voice of the passenger is accurately classified by using the artificial neural network, so that inaccurate masking sound is prevented from being output.
Fig. 1 is a schematic diagram showing a plurality of speakers and seats in a vehicle interior according to one embodiment of the present invention.
Referring to fig. 1, the vehicle 10 includes a plurality of seats 101, 102, 103, and 104, and a plurality of speakers 111, 112, 113, 114, 115, and 116.
In fig. 1, a vehicle 10 has the shape of a passenger vehicle according to one embodiment. In another embodiment of the present invention, the vehicle 10 may have various shapes, such as a bus, a van, and a train.
In fig. 1, the positions and the number of the plurality of speakers 111, 112, 113, 114, 115, and 116 and the plurality of seats 101, 102, 103, and 104 correspond to one embodiment. The plurality of speakers 111, 112, 113, 114, 115, and 116 and the plurality of seats 101, 102, 103, and 104 may be installed at any position inside the vehicle 10, and the number thereof is not limited.
In fig. 1, a plurality of speakers 111, 112, 113, 114, 115, and 116 are devices that output sound. A plurality of speakers 111, 112, 113, 114, 115, and 116 are provided in the vehicle 10. Specifically, the first speaker 111 is provided to the door of the first seat 101. The second speaker 112 is provided to a door of the second seat 102. The third speaker 113 is provided at a door of the third seat 103. The fourth speaker 114 is provided in the door of the fourth seat 104. The fifth speaker 115 is provided at a front position of the vehicle 10. The sixth speaker 116 is disposed at a rear position of the vehicle 10.
The plurality of seats 101, 102, 103, and 104 are seats on which occupants are seated, respectively. In addition to the plurality of speakers 111, 112, 113, 114, 115, and 116 shown in fig. 1, a headrest speaker or a backrest speaker may be further included in the headrest or backrest of the plurality of seats 101, 102, 103, and 104, respectively. The headrest speaker or the backrest speaker may output sound to a specific occupant more intensively than the plurality of speakers 111, 112, 113, 114, 115, and 116. The occupant can effectively listen to the sound output from the headrest speaker or the backrest speaker.
The sound control apparatus for controlling sound in the vehicle 10 generates control signals of sound output through the plurality of speakers 111, 112, 113, 114, 115, and 116. For example, the sound control apparatus may generate a music play signal, a video play signal, a voice call signal, a navigation guidance signal, and various warning signals. The plurality of speakers 111, 112, 113, 114, 115, and 116 may output sound inside the vehicle 10 based on the sound control signals. The sound control apparatus corresponds to a sound masking apparatus according to one embodiment of the present invention.
The plurality of speakers 111, 112, 113, 114, 115, and 116 may output sound inside the vehicle 10 by generating constructive or destructive interference of the sound signals in the low frequency band and the sound signals in the medium-high frequency band. That is, according to the number and arrangement of the plurality of speakers 111, 112, 113, 114, 115, and 116, sound may be output to some areas of the interior of the vehicle 10.
Fig. 2 is a block diagram of a sound control system according to one embodiment of the present invention.
Referring to fig. 2, the vehicle 20 includes at least one of an input unit 200, a communication unit 210, a microphone 220, a speaker 230, a filter 240, a storage unit 250, and a controller 260. The vehicle 20 may further include an amplifier for controlling the speaker 230. The various components may be devices or logic mounted to the vehicle 20.
The input unit 200 receives input from an occupant in the vehicle 20. The input unit 200 may receive an input such as voice or touch of an occupant. For example, the input unit 200 may receive an input command for sound masking initiation from an occupant.
The communication unit 210 communicates with devices other than the vehicle 20. The communication unit 210 may communicate with a terminal of an occupant or an infrastructure surrounding the vehicle 20.
The microphone 220 receives the voice of the occupant in the vehicle 20. Microphone 220 may further receive sound in vehicle 20. The microphone 220 may itself filter the received sound in order to distinguish the occupant's voice from other sounds.
The speaker 230 is a component provided in the vehicle 20 to output sound. The speaker 230 outputs sound based on the sound signal generated by the controller 260. For example, the speaker 230 may output a masking sound based on the masking sound signal of the controller 260.
The speakers 230 may be classified into tweeters (tweeters), midrange speakers (squarers), woofers (woofers), and subwoofers (sub-woofers) according to frequency bands. The tweeter outputs sound of a high audio frequency band, the midrange speaker outputs sound of a midrange audio frequency band, the woofer outputs sound of a low audio frequency band, and the woofer outputs sound of a low audio frequency band.
The speaker 230 may be classified into a general type, a vibration type, and a film type. The normal speaker means a normal speaker. The vibration type speaker is a speaker for generating vibrations of a bass segment. The thin film speaker is a speaker having a thin film shape and outputting sound by vibration of the thin film. In particular, the film type speaker is small in size and is mainly used in a narrow space.
The speakers 230 may include at least one of a door speaker 231, a roof speaker 232, a headrest speaker 233, a backrest speaker 234, a front speaker 235, and a rear speaker 236.
The door speaker 231, the headrest speaker 233, the backrest speaker 234, the front speaker 235, and the rear speaker 236 are the same as those described with reference to fig. 1.
The roof speaker 232 is a speaker provided at the roof of the vehicle 20 so as to face the inside of the vehicle 20.
The roof speakers 232 may be arranged linearly, and the linear arrangement may have various angles with respect to the front of the vehicle 20. Further, the roof speaker 232 may be provided in an arrangement of two intersecting straight lines.
The roof speaker 232 may be a film speaker.
Vehicle 20 may provide more audio channels than a conventional vehicle through overhead speakers 232.
The filter 240 may filter the sound signal output by each of the plurality of speakers 230 using a predetermined algorithm.
The filter 240 may be implemented as an algorithm in the form of a transfer function. The filter 240 can cause the sound signals output from the plurality of speakers 230 to be output only to a specific region or can cancel the sound signals output from the plurality of speakers 230 to other regions by canceling or passing the signals of the specific frequency band with respect to the sound signals generated by the controller 260.
The storage unit 250 stores instructions, programs, and data related to a sound masking method according to one embodiment of the present invention.
The controller 260 determines at least one speaker among the plurality of speakers 230 of the vehicle 20 to output sound, and generates a sound signal for controlling the sound output by the at least one speaker. Here, the sound signal may include at least one of a control signal for a sound pressure level and a control signal for a frequency band.
The controller 260 generates a personalized masking sound of an occupant in the vehicle 20 based on the voice of the occupant, and outputs the masking sound through the speaker 230.
Specifically, the controller 260 recognizes one of a plurality of categories based on the voice of the occupant received by the microphone 220. Specifically, the controller 260 identifies a category corresponding to the voice of the occupant based on the frequency characteristic of the voice of the occupant.
Here, the plurality of categories represent categories classified based on different frequency characteristics. For example, the plurality of categories may include an adult male category, an adult female category, an elderly male category, an elderly female category, a pediatric male category, a pediatric female category, a young male category, and a young female category. Further, the plurality of categories may be divided into categories that may be classified based on frequency characteristics.
According to one embodiment of the present invention, the controller 260 may convert the occupant's voice into a time-frequency representation (time-frequency representation) and identify the category corresponding to the time-frequency representation as one category.
Here, the time-frequency representation is a spectrogram or mel-spectrogram (mel-spectral).
The spectrogram has a time axis and a frequency axis, and is a graph or image in which the amplitude changes with time and the amplitude changes with frequency are represented by colors. The spectrogram may be generated from the speech signal in the time domain by algorithms such as Discrete Fourier Transform (DFT), discrete Cosine Transform (DCT), short Time Fourier Transform (STFT), fast Fourier Transform (FFT), etc.
A person can better recognize a sound variation of a low frequency than a sound variation of a high frequency. Mel-frequency spectrogram is a fourier transform result applied to mel-scale according to frequency in consideration of human hearing ability.
According to one embodiment of the invention, the controller 260 may utilize a classification model to identify a class corresponding to a time-frequency representation of the occupant's voice, as will be described in detail with reference to FIG. 4A.
Thereafter, the controller 260 acquires reference voice data corresponding to the recognized category.
The reference voice data represents voice data according to frequency characteristics of a category. Specifically, the reference voice data is data obtained by recording voices of persons corresponding to respective categories and storing the recorded voices as voice signals of a time domain or time-frequency representations of the voices. For example, a specific sentence or sound uttered by an adult male belonging to the adult male category may be stored as the reference voice data.
The controller 260 generates a personalized masking sound of the first occupant by synthesizing white noise with the reference voice data.
White noise represents noise with a complete and constant spectrum without a specific auditory pattern. White noise has a constant amplitude in frequency.
However, white noise is an example, and the controller 260 may use noise of a different color instead of white noise. For example, the controller 260 may synthesize noise suitable for sound masking, such as pink noise, brown noise, etc., with the reference voice data.
The controller 260 controls the speaker 230 to output a personalized masking sound. To improve the sound masking performance, the controller 260 may output a personalized masking sound using a speaker provided near another occupant who needs sound masking. For example, the controller 260 may output the personalized masking sound using a headrest speaker or a backrest speaker provided to the seat of another occupant.
The controller 260 may concentrate the output on the speaker 230 disposed near the head of another occupant. Further, the controller 260 may control the speaker 230 such that the personalized masking sound is output to the position of the head of the other occupant while preventing the personalized masking sound from being output to other regions than the position of the head of the other occupant. That is, the controller 260 may control the personalized masking sound to be output to another occupant in a beam forming (beam forming) form.
In particular, the controller 260 may generate control signals that cause constructive interference of the sound signal at the position of the head of the other occupant and destructive interference of the sound signal at other regions than the position of the head of the other occupant. The controller 260 may generate control signals that cause sound signals output from the plurality of speakers 230 to be output to only one region in the vehicle 20 while preventing sound signals from being output to other regions by constructive and destructive interference due to the phase difference. In this way, only another occupant can hear the personalized masking sound.
One occupant cannot recognize the voice of another occupant due to the personalized masking sound. That is, the voice of the occupant is masked by the personalized masking sound.
Since the frequency characteristic of the voice of the occupant is reflected in the personalized masking sound, the personalized masking sound has superior sound masking performance compared to white noise. Accordingly, the cognitive or psychological fatigue of another occupant hearing the personalized masking sound due to the masking sound can be reduced.
Meanwhile, the controller 260 according to an embodiment of the present invention may adjust the amplitude of the personalized masking sound based on the amplitude of the voice of the occupant. For example, the controller 260 may adjust the amplitude of the personalized masking sound such that the amplitude of the personalized masking sound becomes smaller than the amplitude of the occupant's voice. Since the sound masking performance is improved when the amplitude of the masking sound is smaller than that of the voice, the controller 260 can improve the sound masking performance by adjusting the amplitude of the personalized masking sound. In addition, the smaller amplitude of the personalized masking sound may reduce cognitive or psychological fatigue of another occupant hearing the personalized masking sound due to the masking sound.
Fig. 3 is a schematic diagram for explaining a sound masking method according to an embodiment of the present invention.
Referring to fig. 3, a first occupant 300 and a second occupant 310 are shown. The first occupant 300 is depicted as an adult female.
The microphone 220 receives the voice of the first occupant 300.
The controller 260 recognizes one of a plurality of categories based on the voice of the first occupant 300. Specifically, the controller 260 converts the voice of the first occupant 300 into a time-frequency representation, and identifies an adult female category corresponding to the time-frequency representation of the first occupant 300 among a plurality of categories.
The controller 260 acquires pre-stored reference voice data corresponding to the identified adult female category. Here, the reference voice data is data obtained by recording a specific sentence or sound uttered by an adult female in advance.
The controller 260 generates a personalized masking sound for the voice of the first occupant 300 by synthesizing white noise with the reference voice data.
The controller 260 outputs the personalized masking sound to the second occupant 310 through the speaker 230. In this case, the controller 260 may output the personalized masking sound using a speaker provided near the second occupant 310.
Since the second occupant 310 hears the personalized masking sound for the voice of the first occupant 300, the second occupant 310 cannot recognize surrounding noise, and can feel less cognitive or psychological fatigue due to sound masking.
Fig. 4A is a schematic diagram for explaining category identification using a classification model according to an embodiment of the present invention.
Referring to fig. 4A, a speech signal 400, a time-frequency representation 410, a classification model 420, and adult female categories 434 are shown.
According to one embodiment of the invention, the sound masking device may utilize the classification model 420 to identify a class corresponding to the time-frequency representation 410 of the occupant's speech signal 400.
Hereinafter, training (training) and structure of the classification model 420 will be described.
The classification model 420 is trained to classify the categories of the time-frequency representation of the training speech data. Specifically, classification model 420 receives input of a time-frequency representation of training speech data. The classification model 420 extracts features of the input time-frequency representation. The classification model 420 calculates probability values for the input time-frequency representation belonging to the respective classes based on the extracted features. The classification model 420 identifies the class of the training speech data based on the probability values. The classification model 420 is trained by adjusting weights or bias in the classification model based on the comparison between the classification recognition result of the classification model 420 and the correct answer.
Since the time-frequency representation input to classification model 420 is a two-dimensional image (i.e., a mel-frequency spectrogram), classification model 420 may include a convolutional neural network. The convolutional neural network may include at least one convolutional layer 422 and a classifier 424. The convolution layer 422 is the component that extracts features from an input image. Classifier 424 is a component that classifies an input image using features extracted by convolutional layer 422. The classifier may calculate probability values of the input image belonging to the respective categories among the plurality of categories, and may identify the category of the input image based on the probability values.
The classification model 420 does not necessarily include convolutional neural networks, and may include various artificial neural networks such as recurrent neural networks, long-term short-term memory networks (long short term memory network, LSTM). In this case, the input of the classification model 420 also varies according to the type of the neural network.
Meanwhile, the classification model 420 may be trained by a supervised learning method. Supervised learning refers to a method of training an artificial neural network in a state where learning data is given a label, which may refer to an answer or result inferred by the artificial neural network in a case where learning data is input to the artificial neural network.
Referring back to fig. 4A, to identify the category of the speech signal 400 of the occupant in the vehicle, the sound masking device converts the speech signal 400 of the occupant in the vehicle into a time-frequency representation 410. In the sound masking device, the occupant's speech signal 400 may be converted to a time-frequency representation 410 by various fourier transforms. Here, the time-frequency representation 410 is represented as a mel-frequency spectrogram.
The sound masking device inputs the time-frequency representation 410 to the classification model 420. The classification model 420 extracts frequency characteristics from the time-frequency representation 410 and outputs a category to which the time-frequency representation 410 belongs among the classified plurality of categories according to the frequency characteristics. The sound masking device may identify that the time-frequency representation 410 has frequency characteristics of an adult female class 434 based on the output of the classification model 420. That is, the sound masking device can recognize that the voice signal 400 is the voice of an adult female.
The sound masking device can accurately classify the class of the speech signal 400 using the classification model 420 that has been trained through deep learning. In addition, the sound masking device may classify the category of the voice signal 400 received in real time.
Fig. 4B is a schematic diagram for explaining a process of generating a personalized masking sound according to one embodiment of the present invention.
Referring to fig. 4B, a category group 430, an adult male category 432, an adult female category 434, an elderly male category 436, a child female category 438, reference voice data 444, white noise 450, and personalized masking sound 460 are shown.
The reference speech data 444, white noise 450 and personalized masking sound 460 are represented as graphs having a frequency axis and an amplitude axis, but this is an example. Each graph may be represented as a signal having a time domain or as a time-frequency representation.
In fig. 4A, the sound masking device recognizes that the speech signal 400 belongs to the adult female class 434.
The sound masking device obtains pre-stored reference voice data 444 corresponding to the identified adult female category 434.
Here, the sound masking device may prepare reference voice data corresponding to each category in the category group 430 in advance. For example, the sound masking device may record a voice obtained by an adult female speaking a specific sentence or sound, and store the recorded voice in advance as the reference voice data 444. In this case, the reference voice data 444 includes the voice of the speaker, and the more surrounding noise is not included therein, the better the masking performance can be improved.
The sound masking device generates a personalized masking sound 460 by synthesizing the reference voice data 444 with the white noise 450.
Unlike the white noise 450, the personalized masking sound 460 has the frequency characteristics of the reference voice data 444 of the adult female class 434. Accordingly, the occupant who hears the personalized masking sound 460 cannot recognize the voice signal 400, and fatigue of the occupant due to the personalized masking sound 460 can be reduced.
Fig. 5 is a schematic view showing a personalized masking sound according to an embodiment of the present invention.
Referring to fig. 5, there are shown first reference voice data 502, second reference voice data 504, white noise 510, a first personalized masking sound 512, and a second personalized masking sound 514.
In fig. 5, the first reference voice data 502 and the second reference voice data 504 are data corresponding to different categories.
The sound masking device generates a first personalized masking sound 512 by synthesizing white noise 510 with the first reference speech data 502. The sound masking device generates and outputs a second personalized masking sound 514 by synthesizing the white noise 510 with the second reference voice data 504.
In this way, by outputting a personalized masking sound according to the voice of the occupant, the sound masking device can improve the masking performance and reduce the fatigue of the listener.
Meanwhile, in the case where there are a plurality of occupants in the vehicle, the sound masking device according to another embodiment of the present invention may generate and output personalized masking sounds for the respective occupants.
Specifically, when two occupants in the vehicle speak, the voice signals of the two occupants can be recognized as different categories. The first reference voice data 502 and the second reference voice data 504 may correspond to categories identified based on voice signals of two occupants.
The sound masking device outputs both the first personalized masking sound 512 and the second personalized masking sound 514 to other occupants than the two occupants. The occupants hearing the two personalized masking sounds cannot recognize the voices of the two occupants.
Fig. 6 is a flowchart illustrating a sound masking method according to an embodiment of the present invention.
Referring to fig. 6, the sound masking device receives the voice of the first occupant in the vehicle (S600).
The sound masking device identifies one of a plurality of categories based on the frequency characteristic of the voice (S602).
Here, the plurality of categories are classified based on different frequency characteristics.
According to one embodiment of the invention, the sound masking device may identify the category based on a time-frequency representation of the speech. The time-frequency representation is a spectrogram or mel-frequency spectrogram. Specifically, the sound masking device acquires a time-frequency representation of speech. The sound masking device may identify a category of the plurality of categories that corresponds to the time-frequency representation.
In this case, the sound masking device may identify the category using a classification model including an artificial neural network. The sound masking device inputs the time-frequency representation into a classification model trained to classify the class of the time-frequency representation of the training speech data. The sound masking means may identify a class corresponding to the time-frequency representation of the speech based on the output of the classification model. The classification model may include a convolutional neural network as an artificial neural network.
The sound masking device acquires reference voice data corresponding to the recognized category (S604).
The sound masking device generates a personalized masking sound of the first occupant by synthesizing the reference voice data with white noise (S606).
The sound masking device outputs a personalized masking sound (S608).
According to an embodiment of the present invention, the sound masking device may adjust the amplitude of the personalized masking sound based on the amplitude of the speech.
According to one embodiment of the present invention, the sound masking device may control at least one speaker so that the personalized masking sound is outputted to the second occupant in a concentrated manner. For this reason, the sound masking device can output a personalized masking sound with the speaker close to the second occupant. Further, the sound masking device may output the personalized masking sound by using beamforming to reach only the second occupant.
As described above, according to one embodiment of the present invention, by classifying the voices of occupants in a vehicle according to frequency characteristics, and generating and outputting personalized masking sounds for the voices of the occupants, it is possible to reduce cognitive or psychological fatigue caused by sound masking felt by other occupants.
According to another embodiment of the present invention, by classifying the voices of occupants in a vehicle according to frequency characteristics and generating and outputting a personalized masking sound for the voices of the occupants, the performance of sound masking can be improved.
According to another embodiment of the present invention, when the voice of the occupant is input in real time, by accurately classifying the voice of the occupant using the artificial neural network, it is possible to prevent an inaccurate masking sound from being output.
Various implementations of the systems and techniques described here can include digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include embodiments utilizing one or more computer programs capable of executing on a programmable system. The programmable system includes at least one programmable processor (which may be a special purpose processor or a general purpose processor) connected to receive data and instructions from and send data and instructions to the storage system, at least one input device, and at least one output device. A computer program (also referred to as a program, software application, or code) contains instructions for a programmable processor and is stored in a "computer-readable recording medium".
The computer-readable recording medium includes all types of recording devices in which data that can be read by a computer system is stored. The computer-readable recording medium may include nonvolatile or non-transitory, for example, ROM, CD-ROM, magnetic tape, floppy disk, memory card, hard disk, magneto-optical disk, and storage device, and may also include transitory media such as data transmission media. Furthermore, the computer readable recording medium can be distributed over network coupled computer systems so that the computer readable code is stored and executed in a distributed fashion.
Although it is described in the flowchart/timing chart of the present specification that each process is sequentially performed, this is merely an illustration of the technical idea of an embodiment of the present invention. In other words, since various modifications and changes may be made by one of ordinary skill in the art without departing from the essential characteristics of the invention, by changing the order depicted in the flow chart/timing diagram or performing one or more steps in parallel, the flow chart/timing diagram is not limited to the timing order.
Although embodiments of the present invention have been described for illustrative purposes, those skilled in the art will appreciate that various modifications, additions and substitutions are possible, without departing from the scope and spirit of the invention. Accordingly, embodiments of the present invention have been described for brevity and clarity. The scope of the technical idea of the present embodiment is not limited by the description. Thus, it will be appreciated by those of ordinary skill in the art that the scope of the present invention should not be limited by the embodiments explicitly described above, but by the claims and their equivalents.

Claims (15)

1. A computer-implemented method for sound masking, the method comprising:
identifying, by the controller, one of a plurality of categories based on a frequency characteristic of a voice of a first occupant in the vehicle;
acquiring, by a controller, reference voice data corresponding to the identified category;
generating, by the controller, a personalized masking sound for the first occupant by synthesizing the reference voice data with white noise;
the output of the personalized masking sound is controlled by a controller.
2. The method of claim 1, wherein the plurality of categories are classified based on different frequency characteristics.
3. The method of claim 1, wherein identifying one of a plurality of categories comprises:
acquiring, by a controller, a time-frequency representation of the speech;
a category of the plurality of categories corresponding to the time-frequency representation is identified as the one category by the controller.
4. The method of claim 3, wherein identifying a category corresponding to the time-frequency representation comprises:
inputting, by the controller, the time-frequency representation into a classification model trained to classify a class of the time-frequency representation of the training speech data;
the class corresponding to the time-frequency representation of the speech is identified by the controller based on the output of the classification model.
5. A method according to claim 3, wherein the time-frequency representation is a spectrogram or mel-spectrogram.
6. The method of claim 4, wherein the classification model comprises a convolutional neural network.
7. The method of claim 1, wherein controlling the output of the personalized masking sound comprises:
the amplitude of the personalized masking sound is adjusted by the controller based on the amplitude of the speech.
8. The method of claim 1, wherein controlling the output of the personalized masking sound comprises:
at least one speaker is controlled by the controller such that a personalized masking sound is output to the second occupant.
9. A sound masking device, comprising:
a microphone configured to receive speech of a first occupant in the vehicle;
at least one speaker disposed in the vehicle; and
a controller configured to: identifying one of a plurality of categories based on frequency characteristics of the voice, acquiring reference voice data corresponding to the identified category, generating a personalized masking sound for the first occupant by synthesizing the reference voice data with white noise, and controlling at least one speaker to output the personalized masking sound.
10. The sound masking device of claim 9, wherein the plurality of categories are classified based on different frequency characteristics.
11. The sound masking device of claim 9, wherein the controller is further configured to:
acquiring a time-frequency representation of voice;
a category of the plurality of categories corresponding to the time-frequency representation is identified as the one category.
12. The sound masking device of claim 11, wherein the controller is further configured to:
inputting the time-frequency representation into a classification model trained to classify the class of the time-frequency representation of the training speech data;
a class corresponding to the time-frequency representation of the speech is identified based on the output of the classification model.
13. The sound masking apparatus of claim 12, wherein the classification model comprises a convolutional neural network.
14. The sound masking device of claim 9, wherein the controller is configured to adjust the amplitude of the personalized masking sound based on the amplitude of the speech.
15. The sound masking device of claim 9, wherein the controller is configured to control at least one speaker to output a personalized masking sound to the second occupant.
CN202211143018.8A 2021-12-08 2022-09-20 Method and apparatus for personalized sound masking in a vehicle Pending CN116246604A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR10-2021-0174408 2021-12-08
KR1020210174408A KR20230086096A (en) 2021-12-08 2021-12-08 Method and Device for Customized Sound Masking in Vehicle

Publications (1)

Publication Number Publication Date
CN116246604A true CN116246604A (en) 2023-06-09

Family

ID=86607917

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211143018.8A Pending CN116246604A (en) 2021-12-08 2022-09-20 Method and apparatus for personalized sound masking in a vehicle

Country Status (3)

Country Link
US (1) US12002442B2 (en)
KR (1) KR20230086096A (en)
CN (1) CN116246604A (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117746828B (en) * 2024-02-20 2024-04-30 华侨大学 Noise masking control method, device, equipment and medium for open office

Family Cites Families (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005343401A (en) * 2004-06-07 2005-12-15 Nissan Motor Co Ltd Noise masking device
KR100643310B1 (en) * 2005-08-24 2006-11-10 삼성전자주식회사 Method and apparatus for disturbing voice data using disturbing signal which has similar formant with the voice signal
US9469247B2 (en) * 2013-11-21 2016-10-18 Harman International Industries, Incorporated Using external sounds to alert vehicle occupants of external events and mask in-car conversations
GB2553571B (en) * 2016-09-12 2020-03-04 Jaguar Land Rover Ltd Apparatus and method for privacy enhancement
JP6837214B2 (en) * 2016-12-09 2021-03-03 パナソニックIpマネジメント株式会社 Noise masking device, vehicle, and noise masking method
US10373626B2 (en) * 2017-03-15 2019-08-06 Guardian Glass, LLC Speech privacy system and/or associated method
JP6887139B2 (en) * 2017-03-29 2021-06-16 パナソニックIpマネジメント株式会社 Sound processing equipment, sound processing methods, and programs
DE102017213241A1 (en) * 2017-08-01 2019-02-07 Bayerische Motoren Werke Aktiengesellschaft Method, device, mobile user device, computer program for controlling an audio system of a vehicle
JP6982828B2 (en) * 2017-11-02 2021-12-17 パナソニックIpマネジメント株式会社 Noise masking device, vehicle, and noise masking method
JP6957362B2 (en) * 2018-01-09 2021-11-02 フォルシアクラリオン・エレクトロニクス株式会社 Privacy protection system
CN108806707B (en) * 2018-06-11 2020-05-12 百度在线网络技术(北京)有限公司 Voice processing method, device, equipment and storage medium
JP2020203643A (en) * 2019-06-19 2020-12-24 株式会社デンソーテン Controller for active noise canceler, method for controlling active noise canceler, and program
US10629182B1 (en) * 2019-06-24 2020-04-21 Blackberry Limited Adaptive noise masking method and system
EP4011099A1 (en) * 2019-08-06 2022-06-15 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. System and method for assisting selective hearing
CN111883166B (en) * 2020-07-17 2024-05-10 北京百度网讯科技有限公司 Voice signal processing method, device, equipment and storage medium
KR20220016423A (en) * 2020-07-31 2022-02-09 현대자동차주식회사 Vehicle and method for controlling thereof
CN112562664A (en) * 2020-11-27 2021-03-26 上海仙塔智能科技有限公司 Sound adjusting method, system, vehicle and computer storage medium
CN113707156B (en) * 2021-08-06 2024-04-05 武汉科技大学 Vehicle-mounted voice recognition method and system

Also Published As

Publication number Publication date
US20230178061A1 (en) 2023-06-08
US12002442B2 (en) 2024-06-04
KR20230086096A (en) 2023-06-15

Similar Documents

Publication Publication Date Title
US20220159403A1 (en) System and method for assisting selective hearing
CN110024030B (en) Context aware hearing optimization engine
US9508335B2 (en) Active noise control and customized audio system
KR101285391B1 (en) Apparatus and method for merging acoustic object informations
US7761292B2 (en) Method and apparatus for disturbing the radiated voice signal by attenuation and masking
US10825353B2 (en) Device for enhancement of language processing in autism spectrum disorders through modifying the auditory stream including an acoustic stimulus to reduce an acoustic detail characteristic while preserving a lexicality of the acoustics stimulus
US20110071822A1 (en) Selective audio/sound aspects
CN110520323B (en) Method, device, mobile user equipment and computing unit for controlling audio system
GB2521175A (en) Spatial audio processing apparatus
US20170125038A1 (en) Transfer function to generate lombard speech from neutral speech
US9761223B2 (en) Acoustic impulse response simulation
US12002442B2 (en) Method and device for personalized sound masking in vehicle
JP2023536270A (en) Systems and Methods for Headphone Equalization and Room Adaptation for Binaural Playback in Augmented Reality
CN110545504A (en) Personal hearing device, external sound processing device and related computer program product
US20240144937A1 (en) Estimating identifiers of one or more entities
US11877133B2 (en) Audio output using multiple different transducers
JPWO2020016927A1 (en) Sound field control device and sound field control method
JP6995254B2 (en) Sound field control device and sound field control method
CN115039419A (en) Information processing apparatus, information processing method, information processing program, and information processing system
US20220262389A1 (en) Method and apparatus for improving speech intelligibility in a room
Zhu et al. Feasibility of vocal emotion conversion on modulation spectrogram for simulated cochlear implants
Schmidt et al. Evaluation of in-car communication systems
US20240233741A9 (en) Controlling local rendering of remote environmental audio
US20240135944A1 (en) Controlling local rendering of remote environmental audio
Lee et al. A diagonal‐steering‐based binaural beamforming algorithm incorporating a diagonal speech localizer for persons with bilateral hearing impairment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication