US20220223145A1 - Speech filtering for masks - Google Patents
Speech filtering for masks Download PDFInfo
- Publication number
- US20220223145A1 US20220223145A1 US17/145,431 US202117145431A US2022223145A1 US 20220223145 A1 US20220223145 A1 US 20220223145A1 US 202117145431 A US202117145431 A US 202117145431A US 2022223145 A1 US2022223145 A1 US 2022223145A1
- Authority
- US
- United States
- Prior art keywords
- computer
- mask
- sound
- occupant
- type
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001914 filtration Methods 0.000 title description 3
- 238000000034 method Methods 0.000 claims description 22
- 230000004044 response Effects 0.000 claims description 5
- 230000008569 process Effects 0.000 description 19
- 238000004891 communication Methods 0.000 description 9
- 238000013527 convolutional neural network Methods 0.000 description 8
- 210000002569 neuron Anatomy 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 230000008859 change Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000001364 causal effect Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000005670 electromagnetic radiation Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000007620 mathematical function Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- 230000005855 radiation Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 238000001931 thermography Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
- G06F3/165—Management of the audio stream, e.g. setting of volume, audio stream path
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
- G06F3/167—Audio in a user interface, e.g. using voice commands for navigating, audio feedback
-
- G06K9/00832—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/59—Context or environment of the image inside of a vehicle, e.g. relating to seat occupancy, driver state or inner lighting conditions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/59—Context or environment of the image inside of a vehicle, e.g. relating to seat occupancy, driver state or inner lighting conditions
- G06V20/593—Recognising seat occupancy
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
Definitions
- Such a system includes a microphone.
- the system converts spoken words detected by the microphone into text or another form to which a command can be matched. Recognized commands can include adjusting climate controls, selecting media to play, etc.
- FIG. 1 is a top view of an example vehicle with a passenger cabin exposed for illustration.
- FIG. 2 is a block diagram of a system of the vehicle.
- FIG. 3 is a process flow diagram of an example process for filtering speech of an occupant of the vehicle wearing a mask.
- FIG. 4 is a plot of sound pressure versus frequency for speech while wearing a mask for a plurality of masks.
- a computer includes a processor and a memory storing instructions executable by the processor to receive sensor data of an occupant of a vehicle, identify a type of mask worn by the occupant based on the sensor data, select a sound filter according to the type of mask from a plurality of sound filters stored in the memory, receive sound data, apply the selected sound filter to the sound data, and perform an operation using the filtered sound data.
- the sensor data may be image data showing the occupant.
- the operation may be identifying a voice command to activate a feature.
- the operation may be transmitting the filtered sound data in a telephone call.
- the operation may be outputting the filtered sound data by a speaker of the vehicle.
- the instructions may include instructions to perform the operation using the sound data unfiltered upon determining that the occupant is not wearing a mask.
- the instructions may include selecting a generic sound filter from the plurality of sound filters upon identifying the type of mask as an unknown type.
- the instructions may include instructions to transmit an update to a remote server upon identifying the type of mask as the unknown type.
- the update may include image data of the mask.
- the instructions may include instructions to identify the type of mask worn by the occupant based on an input by the occupant.
- the instructions may include instructions to override the identification based on the sensor data with the identification based on the input upon receiving the input.
- the instructions may include instructions to prompt the occupant to provide the input upon determining that the occupant is wearing a mask.
- the instructions may include instructions to prompt the occupant to provide the input upon determining that one of the occupant is wearing a mask with a type identified with a confidence score below a confidence threshold or the type of the mask is an unknown type.
- the instructions may include instructions to transmit an update to a remote server in response to the input indicating that the type of the mask is not among the types of masks stored in the memory.
- the instructions may include instructions to choose the occupant for which to identify the type of mask from a plurality of occupants based on volumes of sound data from respective microphones.
- the instructions may include instructions to choose the occupant for which to identify the type of mask from a plurality of occupants based on the occupant being in a predesignated region of the image data.
- Each sound filter may adjust a volume of the sound data by an amount that varies depending on frequency.
- Each sound filter increases the volume of the sound data at at least one frequency.
- the instructions may include instructions to receive an update from a remote server changing the plurality of sound filters stored in the memory.
- a method includes receiving sensor data of an occupant of a vehicle, identifying a type of mask worn by the occupant based on the sensor data, selecting a sound filter according to the type of mask from a plurality of sound filters stored in memory, receiving sound data, applying the selected sound filter to the sound data, and performing an operation using the filtered sound data.
- a computer 100 includes a processor and a memory storing instructions executable by the processor to receive sensor data of an occupant of a vehicle 102 , identify a type of mask worn by the occupant based on the sensor data, select a sound filter according to the type of mask from a plurality of sound filters stored in the memory, receive sound data, apply the selected sound filter to the sound data, and perform an operation using the filtered sound data.
- the computer 100 can be used to boost the clarity of speech from an occupant wearing a mask by selecting the type of mask and thereby applying the filter most appropriate to equalize the speech.
- the choice of filter permits the frequencies muffled by that particular mask to be amplified.
- the filtered sound data can thus reliably be used to perform operations such as a voice command to activate a feature of the vehicle 102 , a transmission in a telephone call, or broadcasting as a telecom to a speaker 114 elsewhere in the vehicle 102 .
- the voice command can be reliably recognized, the telephone call can be reliably understood by the person at the other end from the occupant, and the telecom message can be reliably understood by the other occupant of the vehicle 102 .
- the vehicle 102 may be any suitable type of automobile, e.g., a passenger or commercial automobile such as a sedan, a coupe, a truck, a sport utility, a crossover, a van, a minivan, a taxi, a bus, etc.
- the vehicle 102 may be autonomous.
- the vehicle 102 may be autonomously operated such that the vehicle 102 may be driven without constant attention from a driver, i.e., the vehicle 102 may be self-driving without human input.
- the vehicle 102 includes a passenger cabin 104 to house occupants of the vehicle 102 .
- the passenger cabin 104 includes one or more front seats 106 disposed at a front of the passenger cabin 104 and one or more back seats 106 disposed behind the front seats 106 .
- the passenger cabin 104 may also include third-row seats 106 (not shown) at a rear of the passenger cabin 104 .
- the vehicle 102 includes at least one camera 108 .
- the camera 108 can detect electromagnetic radiation in some range of wavelengths.
- the camera 108 may detect visible light, infrared radiation, ultraviolet light, or some range of wavelengths including visible, infrared, and/or ultraviolet light.
- the camera 108 can be a thermal imaging camera.
- the camera 108 is positioned so that a field of view of the camera 108 encompasses at least one of the seats 106 , e.g., the driver seat 106 , or the front and back seats 106 .
- the camera 108 can be positioned on an instrument panel 118 or rear-view mirror and oriented rearward relative to the passenger cabin 104 .
- the vehicle 102 includes at least one microphone 110 , e.g., a first microphone 110 a and a second microphone 110 b .
- the microphones 110 are transducers that convert sound into an electrical signal.
- the microphones 110 can be any suitable type of microphones for detecting speech by occupants of the vehicle 102 , e.g., dynamic, condenser, contact, etc.
- the microphones 110 can be arranged at respective locations or positions in the passenger cabin 104 to collectively detect speech from occupants in different seats 106 .
- the first microphone 110 a can be positioned in the instrument panel 118
- the second microphone 110 b can be positioned between the front seats 106 and oriented to pick up sound from the back seats 106 .
- a user interface 112 presents information to and receives information from an occupant of the vehicle 102 .
- the user interface 112 may be located, e.g., on the instrument panel 118 in the passenger cabin 104 , or wherever it may be readily seen by the occupant.
- the user interface 112 may include dials, digital readouts, screens, speakers 114 , and so on for providing information to the occupant, e.g., human-machine interface (HMI) elements such as are known.
- HMI human-machine interface
- the user interface 112 may include buttons, knobs, keypads, the microphones 110 , and so on for receiving information from the occupant.
- the speakers 114 are electroacoustic transducers that convert an electrical signal into sound.
- the speakers 114 can be any suitable type for producing sound audible to the occupants, e.g., dynamic.
- the speakers 114 can be arranged at respective locations or positions in the passenger cabin 104 to collectively produce sound for occupants in respective seats 106 .
- the computer 100 is a microprocessor-based computing device, e.g., a generic computing device including a processor and a memory, an electronic controller or the like, a field-programmable gate array (FPGA), an application-specific integrated circuit (ASIC), etc.
- the computer 100 can thus include a processor, a memory, etc.
- the memory of the computer 100 can include media for storing instructions executable by the processor as well as for electronically storing data and/or databases, and/or the computer 100 can include structures such as the foregoing by which programming is provided.
- the computer 100 can be multiple computers coupled together.
- the computer 100 may transmit and receive data through a communications network 116 such as a controller area network (CAN) bus, Ethernet, WiFi®, Local Interconnect Network (LIN), onboard diagnostics connector (OBD-II), and/or by any other wired or wireless communications network.
- a communications network 116 such as a controller area network (CAN) bus, Ethernet, WiFi®, Local Interconnect Network (LIN), onboard diagnostics connector (OBD-II), and/or by any other wired or wireless communications network.
- the computer 100 may be communicatively coupled to the camera 108 , the microphones 110 , the user interface 112 , the speakers 114 , a transceiver 118 , and other components via the communications network 116 .
- the transceiver 118 may be connected to the communications network.
- the transceiver 118 may be adapted to transmit signals wirelessly through any suitable wireless communication protocol, such as cellular, Bluetooth®, Bluetooth® Low Energy (BLE), ultra-wideband (UWB), WiFi, IEEE 802.11a/b/g, other RF (radio frequency) communications, etc.
- the transceiver 118 may be adapted to communicate with a remote server 120 , that is, a server distinct and spaced from the vehicle 102 .
- the remote server 120 may be located outside the vehicle 102 .
- the remote server 120 may be associated with another vehicle (e.g., V2V communications), an infrastructure component (e.g., V2I communications via Dedicated Short-Range Communications (DSRC) or the like), an emergency responder, a mobile device associated with the owner of the vehicle 102 , etc.
- the transceiver 118 may be one device or may include a separate transmitter and receiver.
- the computer 100 stores a plurality of sound filters in memory.
- Masks often have a small effect on volume when the frequency is 500 Hz or less and muffle sounds more considerably at 1000 Hz and higher to an extent that depends on the type of mask.
- One of the sound filters stored in memory is associated with the unknown type of mask, and that sound filter can be a generic sound filter, e.g., an average of the other sound filters stored in memory.
- the sound filters stored in memory can be updated from the remote server 120 , e.g., an over-the-air (OTA) update via the transceiver 118 .
- An update can add new sound filters for a new type of mask for which a sound filter is not already stored by the computer 100 .
- the update can change one or more of the sound filters already stored by the computer 100 .
- the sound filters stored by the computer 100 can be updated as new types of masks are introduced, materials of existing masks change, etc.
- the update can occur periodically or on demand.
- FIG. 3 is a process flow diagram illustrating an exemplary process 300 for filtering speech of an occupant of the vehicle 102 wearing a mask.
- the memory of the computer 100 stores executable instructions for performing the steps of the process 300 and/or programming can be implemented in structures such as mentioned above.
- the computer 100 receives data from the camera 108 and the microphones 110 , detects a mask worn by an occupant based on the data, and identifies the type of the mask. If the occupant is wearing a mask of a type identified with a confidence score above a confidence threshold, the computer 100 selects a sound filter corresponding to the type of mask.
- the computer 100 prompts input from the occupant about the type of mask and selects a sound filter corresponding to the type of mask either inputted by the occupant or identified by the computer 100 .
- the computer 100 applies the selected sound filter to sound data, and performs an operation using the filtered sound data. If there are no masks, the computer 100 performs the operation based on the unfiltered sound data.
- the process 300 begins in a block 305 , in which the computer 100 receives sensor data of at least one occupant of the vehicle 102 , e.g., image data from the camera 108 showing the occupants and/or sound data from the microphones 110 of speech by the occupants.
- sensor data of at least one occupant of the vehicle 102 , e.g., image data from the camera 108 showing the occupants and/or sound data from the microphones 110 of speech by the occupants.
- the computer 100 detects a mask worn by one of the occupants. If a plurality of occupants are in the passenger cabin 104 , the computer 100 chooses one of the occupants. For example, the computer 100 can choose the occupant based on the occupant being in a predesignated region of the image data, e.g., corresponding to an occupant sitting in a particular seat 106 such as an operator seat 106 , and then detect the mask worn by that occupant. This can permit the computer 100 to detect a mask worn by the operator of the vehicle 102 .
- the computer 100 can choose the occupant based on volumes of sound data from the respective microphones 110 , e.g., based on the microphone 110 with the highest volume, and then detect the mask worn by the occupant closest to that microphone 110 .
- This can permit the computer 100 to detect a mask worn by an occupant most likely to be speaking for performing the operation, e.g., an occupant sitting in the back seat 106 when the volume from the microphone 110 b is greater than from the microphone 110 a .
- the computer 100 can identify the mask or unmasked face using conventional image-recognition techniques, e.g., a convolutional neural network programmed to accept images as input and output an identified mask presence or absence.
- the image data from the camera 108 can be used as the input.
- the convolutional neural network can use images of occupants of vehicles wearing and not wearing masks produced by cameras situated in the same location as the camera 108 .
- a convolutional neural network includes a series of layers, with each layer using the previous layer as input. Each layer contains a plurality of neurons that receive as input data generated by a subset of the neurons of the previous layers and generate output that is sent to neurons in the next layer.
- Types of layers include convolutional layers, which compute a dot product of a weight and a small region of input data; pool layers, which perform a down-sampling operation along spatial dimensions; and fully connected layers, which generate outputs based on the output of all neurons of the previous layer.
- the final layer of the convolutional neural network generates a confidence score for mask and for unmasked face, and the final output is whichever of mask or unmasked face has the highest confidence score.
- a “confidence score” is a measure of a probability that the identification is correct.
- the identification of an occupant face as masked or unmasked can be performed for respective occupants in the passenger cabin 104 .
- the computer 100 may detect masks worn by multiple occupants.
- the computer 100 identifies the types of masks worn by the occupants.
- the computer 100 can execute a convolutional neural network as described above for each detected mask using the image data, and the output is the type of mask with the highest confidence score for each occupant.
- the convolutional neural network can operate on the image data of the mask, or alternatively on image data of a logo on the mask.
- the types of masks can be specified by, e.g., manufacturer and model, e.g., 3M 1860, 3M 1870, Kimberly-Clark 49214, Scott Xcel, etc.
- One of the possible types of masks is an unknown type, i.e., a mask that is none of the masks stored in memory.
- a single convolutional neural network can be executed for the blocks 310 and 315 , and the output for each occupant is one of the types of masks, the unknown type, or unmasked face, whichever has the highest confidence score.
- the computer 100 may identify types of masks (or unmasked face) worn by multiple occupants. If the identification of the type of mask is the unknown type, the computer 100 transmits an update to the remote server 120 via the transceiver 118 .
- the update can include the image data showing the mask of unknown type.
- the computer 100 determines whether the occupant is wearing a mask, i.e., whether the output of the convolutional neural network(s) is mask and/or a type of mask for the occupant, and the computer 100 determines whether the confidence score of the type of mask is above a confidence threshold.
- the confidence threshold can be chosen to indicate a high likelihood that the type of mask is correctly identified.
- the process 300 Upon determining that the occupant is wearing a mask and that the confidence score for the type of mask is above the threshold score, the process 300 proceeds to a block 335 . Upon determining that the occupant is not wearing a mask, the process 300 proceeds to a block 355 .
- the computer 100 prompts the occupants to provide an input through the user interface 112 specifying a type of mask that the occupant is wearing.
- the user interface 112 can present a list of types of masks for the occupant to choose from.
- the list can be a default list stored in memory.
- the list can include the types of masks with the highest confidence scores as determined in the block 315 , or the user interface 112 can display a single type of mask with the highest confidence score and ask the occupant to confirm that the type of mask is correct.
- the list can include an option, e.g., “other,” for indicating that the type of the mask is not among the types of masks stored by the computer 100 .
- Selecting this option can be treated as though the occupant selected that the type of the mask is the unknown type.
- the computer 100 can transmit an update to the remote server 120 via the transceiver 118 , if the computer 100 did not already do so in the block 315 .
- the update can include the image data showing the mask of unknown type.
- the computer 100 determines whether the occupant inputted a type of mask in response to the prompt in the block 325 .
- the occupant provides the input by selecting the type of mask from the list, and the occupant can fail to provide the input by selecting an option declining to provide a type of mask, e.g., an option labeled “Choose mask automatically,” or by failing to select a type of mask within a time threshold.
- the time threshold can be chosen to provide the occupant sufficient time to response to the prompt. If the occupant did not select a type of mask, the process 300 proceeds to a block 335 . If the occupant selected a type of mask, the process 300 proceeds to a block 340 .
- the computer 100 selects a sound filter according to the type of mask identified in the block 315 from the plurality of the sound filters stored in memory. Selecting from the plurality of sound filters can provide a sound filter that most accurately adjusts the sound data to the baseline level.
- the computer 100 can select multiple sound filters, each associated with one of the identified types of masks.
- the computer 100 can combine the sound filters together, e.g., by simple averaging or by weighting.
- the sound filters can be weighted based on locations of the occupants wearing the masks relative to one of the microphones 110 generating sound data, e.g., based on volumes of the sound data from the respective microphones 110 . If the first microphone 110 a is generating sound data with greater volume than the second microphone 110 b , then the sound filters are weighted according to relative distances of the masks of each type from the chosen microphone 110 a .
- the process 300 proceeds to a block 345 .
- the computer 100 identifies the type of mask based on the input by the occupant and selects the sound filter from memory associated with the identified type of mask. In other words, the computer 100 overrides the identification based on the image data or sound data with the identification based on the input upon receiving the input, by executing the block 340 instead of the block 335 .
- the process 300 proceeds to a block 345 .
- the computer 100 receives sound data from the microphones 110 .
- the sound data can include speech by the occupants.
- the computer 100 applies the selected sound filter or the combination of the selected sound filters to the sound data.
- the process 300 proceeds to a block 360 .
- the computer 100 receives sound data from the microphones 110 .
- the sound data can include speech by the occupants.
- the process 300 proceeds to the block 360 .
- the computer 100 performs an operation using the sound data, either the filtered sound data from the block 350 or the unfiltered sound data from the block 355 .
- the operation can be identifying a voice command to activate a feature, e.g., converting the sound data into text such as “Call Pizza Place,” “Play Podcast,” “Decrease Temperature,” etc. (or into equivalent data identifying the command)
- Using the filtered sound data can help the computer 100 to accurately identify the voice command.
- the operation can be transmitting the sound data in a telephone call.
- a mobile phone can be paired with the user interface 112 and used to place a telephone call. Using the filtered sound data can make it easy for the recipient of the call to understand what the occupant is saying.
- the operation can be outputting the filtered sound data by one or more of the speakers 114 .
- Sound data originating from the first microphone 110 can be used and outputted by the speaker 114 at a rear of the passenger cabin 104 ; in other words, the first microphone 110 and the speaker 114 form a telecom.
- Using the filtered sound data can make it easier for an occupant in the back seat 106 to understand what the occupant in the front seat 106 is saying than directly hearing the occupant speaking while muffled by the mask.
- the process 300 ends.
- Computer executable instructions may be compiled or interpreted from computer programs created using a variety of programming languages and/or technologies, including, without limitation, and either alone or in combination, JavaTM, C, C++, Visual Basic, Java Script, Perl, HTML, etc.
- a processor e.g., a microprocessor
- receives instructions e.g., from a memory, a computer 100 readable medium, etc., and executes these instructions, thereby performing one or more processes, including one or more of the processes described herein.
- Such instructions and other data may be stored and transmitted using a variety of computer readable media.
- a file in a networked device is generally a collection of data stored on a computer readable medium, such as a storage medium, a random-access memory, etc.
- a computer readable medium includes any medium that participates in providing data (e.g., instructions), which may be read by a computer. Such a medium may take many forms, including, but not limited to, nonvolatile media, volatile media, etc.
- Nonvolatile media include, for example, optical or magnetic disks and other persistent memory.
- Volatile media include dynamic random-access memory (DRAM), which typically constitutes a main memory.
- DRAM dynamic random-access memory
- Computer readable media include, for example, a floppy disk, a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD ROM, DVD, any other optical medium, punch cards, paper tape, any other physical medium with patterns of holes, a RAM, a PROM, an EPROM, a FLASH EEPROM, any other memory chip or cartridge, or any other medium from which a computer can read.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Acoustics & Sound (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Molecular Biology (AREA)
- Data Mining & Analysis (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Fittings On The Vehicle Exterior For Carrying Loads, And Devices For Holding Or Mounting Articles (AREA)
Abstract
A computer includes a processor and a memory storing instructions executable by the processor to receive sensor data of an occupant of a vehicle, identify a type of mask worn by the occupant based on the sensor data, select a sound filter according to the type of mask from a plurality of sound filters stored in the memory, receive sound data, apply the selected sound filter to the sound data, and perform an operation using the filtered sound data.
Description
- Many modern vehicles include voice-recognition systems. Such a system includes a microphone. The system converts spoken words detected by the microphone into text or another form to which a command can be matched. Recognized commands can include adjusting climate controls, selecting media to play, etc.
-
FIG. 1 is a top view of an example vehicle with a passenger cabin exposed for illustration. -
FIG. 2 is a block diagram of a system of the vehicle. -
FIG. 3 is a process flow diagram of an example process for filtering speech of an occupant of the vehicle wearing a mask. -
FIG. 4 is a plot of sound pressure versus frequency for speech while wearing a mask for a plurality of masks. - A computer includes a processor and a memory storing instructions executable by the processor to receive sensor data of an occupant of a vehicle, identify a type of mask worn by the occupant based on the sensor data, select a sound filter according to the type of mask from a plurality of sound filters stored in the memory, receive sound data, apply the selected sound filter to the sound data, and perform an operation using the filtered sound data.
- The sensor data may be image data showing the occupant.
- The operation may be identifying a voice command to activate a feature.
- The operation may be transmitting the filtered sound data in a telephone call.
- The operation may be outputting the filtered sound data by a speaker of the vehicle.
- The instructions may include instructions to perform the operation using the sound data unfiltered upon determining that the occupant is not wearing a mask.
- The instructions may include selecting a generic sound filter from the plurality of sound filters upon identifying the type of mask as an unknown type. The instructions may include instructions to transmit an update to a remote server upon identifying the type of mask as the unknown type. The update may include image data of the mask.
- The instructions may include instructions to identify the type of mask worn by the occupant based on an input by the occupant. The instructions may include instructions to override the identification based on the sensor data with the identification based on the input upon receiving the input.
- The instructions may include instructions to prompt the occupant to provide the input upon determining that the occupant is wearing a mask.
- The instructions may include instructions to prompt the occupant to provide the input upon determining that one of the occupant is wearing a mask with a type identified with a confidence score below a confidence threshold or the type of the mask is an unknown type.
- The instructions may include instructions to transmit an update to a remote server in response to the input indicating that the type of the mask is not among the types of masks stored in the memory.
- The instructions may include instructions to choose the occupant for which to identify the type of mask from a plurality of occupants based on volumes of sound data from respective microphones.
- The instructions may include instructions to choose the occupant for which to identify the type of mask from a plurality of occupants based on the occupant being in a predesignated region of the image data.
- Each sound filter may adjust a volume of the sound data by an amount that varies depending on frequency. Each sound filter increases the volume of the sound data at at least one frequency.
- The instructions may include instructions to receive an update from a remote server changing the plurality of sound filters stored in the memory.
- A method includes receiving sensor data of an occupant of a vehicle, identifying a type of mask worn by the occupant based on the sensor data, selecting a sound filter according to the type of mask from a plurality of sound filters stored in memory, receiving sound data, applying the selected sound filter to the sound data, and performing an operation using the filtered sound data.
- With reference to the Figures, a
computer 100 includes a processor and a memory storing instructions executable by the processor to receive sensor data of an occupant of avehicle 102, identify a type of mask worn by the occupant based on the sensor data, select a sound filter according to the type of mask from a plurality of sound filters stored in the memory, receive sound data, apply the selected sound filter to the sound data, and perform an operation using the filtered sound data. - The
computer 100 can be used to boost the clarity of speech from an occupant wearing a mask by selecting the type of mask and thereby applying the filter most appropriate to equalize the speech. The choice of filter permits the frequencies muffled by that particular mask to be amplified. The filtered sound data can thus reliably be used to perform operations such as a voice command to activate a feature of thevehicle 102, a transmission in a telephone call, or broadcasting as a telecom to aspeaker 114 elsewhere in thevehicle 102. The voice command can be reliably recognized, the telephone call can be reliably understood by the person at the other end from the occupant, and the telecom message can be reliably understood by the other occupant of thevehicle 102. - With reference to
FIG. 1 , thevehicle 102 may be any suitable type of automobile, e.g., a passenger or commercial automobile such as a sedan, a coupe, a truck, a sport utility, a crossover, a van, a minivan, a taxi, a bus, etc. Thevehicle 102, for example, may be autonomous. In other words, thevehicle 102 may be autonomously operated such that thevehicle 102 may be driven without constant attention from a driver, i.e., thevehicle 102 may be self-driving without human input. - The
vehicle 102 includes apassenger cabin 104 to house occupants of thevehicle 102. Thepassenger cabin 104 includes one ormore front seats 106 disposed at a front of thepassenger cabin 104 and one ormore back seats 106 disposed behind thefront seats 106. Thepassenger cabin 104 may also include third-row seats 106 (not shown) at a rear of thepassenger cabin 104. - The
vehicle 102 includes at least onecamera 108. Thecamera 108 can detect electromagnetic radiation in some range of wavelengths. For example, thecamera 108 may detect visible light, infrared radiation, ultraviolet light, or some range of wavelengths including visible, infrared, and/or ultraviolet light. For example, thecamera 108 can be a thermal imaging camera. - The
camera 108 is positioned so that a field of view of thecamera 108 encompasses at least one of theseats 106, e.g., thedriver seat 106, or the front andback seats 106. For example, thecamera 108 can be positioned on aninstrument panel 118 or rear-view mirror and oriented rearward relative to thepassenger cabin 104. - The
vehicle 102 includes at least onemicrophone 110, e.g., afirst microphone 110 a and asecond microphone 110 b. Themicrophones 110 are transducers that convert sound into an electrical signal. Themicrophones 110 can be any suitable type of microphones for detecting speech by occupants of thevehicle 102, e.g., dynamic, condenser, contact, etc. - The
microphones 110 can be arranged at respective locations or positions in thepassenger cabin 104 to collectively detect speech from occupants indifferent seats 106. For example, thefirst microphone 110 a can be positioned in theinstrument panel 118, and thesecond microphone 110 b can be positioned between thefront seats 106 and oriented to pick up sound from theback seats 106. - A
user interface 112 presents information to and receives information from an occupant of thevehicle 102. Theuser interface 112 may be located, e.g., on theinstrument panel 118 in thepassenger cabin 104, or wherever it may be readily seen by the occupant. Theuser interface 112 may include dials, digital readouts, screens,speakers 114, and so on for providing information to the occupant, e.g., human-machine interface (HMI) elements such as are known. Theuser interface 112 may include buttons, knobs, keypads, themicrophones 110, and so on for receiving information from the occupant. - The
speakers 114 are electroacoustic transducers that convert an electrical signal into sound. Thespeakers 114 can be any suitable type for producing sound audible to the occupants, e.g., dynamic. Thespeakers 114 can be arranged at respective locations or positions in thepassenger cabin 104 to collectively produce sound for occupants inrespective seats 106. - With reference to
FIG. 2 , thecomputer 100 is a microprocessor-based computing device, e.g., a generic computing device including a processor and a memory, an electronic controller or the like, a field-programmable gate array (FPGA), an application-specific integrated circuit (ASIC), etc. Thecomputer 100 can thus include a processor, a memory, etc. The memory of thecomputer 100 can include media for storing instructions executable by the processor as well as for electronically storing data and/or databases, and/or thecomputer 100 can include structures such as the foregoing by which programming is provided. Thecomputer 100 can be multiple computers coupled together. - The
computer 100 may transmit and receive data through acommunications network 116 such as a controller area network (CAN) bus, Ethernet, WiFi®, Local Interconnect Network (LIN), onboard diagnostics connector (OBD-II), and/or by any other wired or wireless communications network. Thecomputer 100 may be communicatively coupled to thecamera 108, themicrophones 110, theuser interface 112, thespeakers 114, atransceiver 118, and other components via thecommunications network 116. - The
transceiver 118 may be connected to the communications network. Thetransceiver 118 may be adapted to transmit signals wirelessly through any suitable wireless communication protocol, such as cellular, Bluetooth®, Bluetooth® Low Energy (BLE), ultra-wideband (UWB), WiFi, IEEE 802.11a/b/g, other RF (radio frequency) communications, etc. Thetransceiver 118 may be adapted to communicate with a remote server 120, that is, a server distinct and spaced from thevehicle 102. The remote server 120 may be located outside thevehicle 102. For example, the remote server 120 may be associated with another vehicle (e.g., V2V communications), an infrastructure component (e.g., V2I communications via Dedicated Short-Range Communications (DSRC) or the like), an emergency responder, a mobile device associated with the owner of thevehicle 102, etc. Thetransceiver 118 may be one device or may include a separate transmitter and receiver. - With reference to
FIG. 4 , thecomputer 100 stores a plurality of sound filters in memory. Each sound filter specifies how much to adjust a sound pressure, i.e., volume, of sound data according to a frequency, e.g., each sound filter provides sound pressure as a mathematical function of frequency, SP=F(f), in which SP is sound pressure, F is the sound filter, and f is frequency. The sound filter Fi can be a difference of a baseline sound pressure SPbase and a sound pressure for a type of mask SPi, i.e., Fi(f)=SPbase(f)−SPi(f), in which i is an index of the type of mask. Masks often have a small effect on volume when the frequency is 500 Hz or less and muffle sounds more considerably at 1000 Hz and higher to an extent that depends on the type of mask. One of the sound filters stored in memory is associated with the unknown type of mask, and that sound filter can be a generic sound filter, e.g., an average of the other sound filters stored in memory. - The sound filters stored in memory can be updated from the remote server 120, e.g., an over-the-air (OTA) update via the
transceiver 118. An update can add new sound filters for a new type of mask for which a sound filter is not already stored by thecomputer 100. Alternatively or additionally, the update can change one or more of the sound filters already stored by thecomputer 100. Thus, the sound filters stored by thecomputer 100 can be updated as new types of masks are introduced, materials of existing masks change, etc. The update can occur periodically or on demand. -
FIG. 3 is a process flow diagram illustrating anexemplary process 300 for filtering speech of an occupant of thevehicle 102 wearing a mask. The memory of thecomputer 100 stores executable instructions for performing the steps of theprocess 300 and/or programming can be implemented in structures such as mentioned above. As a general overview of theprocess 300, thecomputer 100 receives data from thecamera 108 and themicrophones 110, detects a mask worn by an occupant based on the data, and identifies the type of the mask. If the occupant is wearing a mask of a type identified with a confidence score above a confidence threshold, thecomputer 100 selects a sound filter corresponding to the type of mask. If the occupant is wearing a mask of a type identified with a confidence score above a confidence threshold, thecomputer 100 prompts input from the occupant about the type of mask and selects a sound filter corresponding to the type of mask either inputted by the occupant or identified by thecomputer 100. Thecomputer 100 applies the selected sound filter to sound data, and performs an operation using the filtered sound data. If there are no masks, thecomputer 100 performs the operation based on the unfiltered sound data. - The
process 300 begins in ablock 305, in which thecomputer 100 receives sensor data of at least one occupant of thevehicle 102, e.g., image data from thecamera 108 showing the occupants and/or sound data from themicrophones 110 of speech by the occupants. - Next, in a
block 310, thecomputer 100 detects a mask worn by one of the occupants. If a plurality of occupants are in thepassenger cabin 104, thecomputer 100 chooses one of the occupants. For example, thecomputer 100 can choose the occupant based on the occupant being in a predesignated region of the image data, e.g., corresponding to an occupant sitting in aparticular seat 106 such as anoperator seat 106, and then detect the mask worn by that occupant. This can permit thecomputer 100 to detect a mask worn by the operator of thevehicle 102. For another example, thecomputer 100 can choose the occupant based on volumes of sound data from therespective microphones 110, e.g., based on themicrophone 110 with the highest volume, and then detect the mask worn by the occupant closest to thatmicrophone 110. This can permit thecomputer 100 to detect a mask worn by an occupant most likely to be speaking for performing the operation, e.g., an occupant sitting in theback seat 106 when the volume from themicrophone 110 b is greater than from themicrophone 110 a. Thecomputer 100 can identify the mask or unmasked face using conventional image-recognition techniques, e.g., a convolutional neural network programmed to accept images as input and output an identified mask presence or absence. The image data from thecamera 108 can be used as the input. The convolutional neural network can use images of occupants of vehicles wearing and not wearing masks produced by cameras situated in the same location as thecamera 108. A convolutional neural network includes a series of layers, with each layer using the previous layer as input. Each layer contains a plurality of neurons that receive as input data generated by a subset of the neurons of the previous layers and generate output that is sent to neurons in the next layer. Types of layers include convolutional layers, which compute a dot product of a weight and a small region of input data; pool layers, which perform a down-sampling operation along spatial dimensions; and fully connected layers, which generate outputs based on the output of all neurons of the previous layer. The final layer of the convolutional neural network generates a confidence score for mask and for unmasked face, and the final output is whichever of mask or unmasked face has the highest confidence score. For the purposes of this disclosure, a “confidence score” is a measure of a probability that the identification is correct. The identification of an occupant face as masked or unmasked can be performed for respective occupants in thepassenger cabin 104. Alternatively or additionally, thecomputer 100 may detect masks worn by multiple occupants. - Next, in a
block 315, thecomputer 100 identifies the types of masks worn by the occupants. Thecomputer 100 can execute a convolutional neural network as described above for each detected mask using the image data, and the output is the type of mask with the highest confidence score for each occupant. The convolutional neural network can operate on the image data of the mask, or alternatively on image data of a logo on the mask. The types of masks can be specified by, e.g., manufacturer and model, e.g.,3M 3M 1870, Kimberly-Clark 49214, Scott Xcel, etc. One of the possible types of masks is an unknown type, i.e., a mask that is none of the masks stored in memory. Alternatively, a single convolutional neural network can be executed for theblocks computer 100 may identify types of masks (or unmasked face) worn by multiple occupants. If the identification of the type of mask is the unknown type, thecomputer 100 transmits an update to the remote server 120 via thetransceiver 118. The update can include the image data showing the mask of unknown type. - Next, in a
decision block 320, thecomputer 100 determines whether the occupant is wearing a mask, i.e., whether the output of the convolutional neural network(s) is mask and/or a type of mask for the occupant, and thecomputer 100 determines whether the confidence score of the type of mask is above a confidence threshold. The confidence threshold can be chosen to indicate a high likelihood that the type of mask is correctly identified. Upon determining that the occupant is wearing a mask and that the confidence score for the type of mask is below the threshold score (or if the identified type of mask is the unknown type), theprocess 300 proceeds to ablock 325. Upon determining that the occupant is wearing a mask and that the confidence score for the type of mask is above the threshold score, theprocess 300 proceeds to ablock 335. Upon determining that the occupant is not wearing a mask, theprocess 300 proceeds to ablock 355. - In the
block 325, thecomputer 100 prompts the occupants to provide an input through theuser interface 112 specifying a type of mask that the occupant is wearing. For example, theuser interface 112 can present a list of types of masks for the occupant to choose from. The list can be a default list stored in memory. Alternatively, the list can include the types of masks with the highest confidence scores as determined in theblock 315, or theuser interface 112 can display a single type of mask with the highest confidence score and ask the occupant to confirm that the type of mask is correct. The list can include an option, e.g., “other,” for indicating that the type of the mask is not among the types of masks stored by thecomputer 100. Selecting this option can be treated as though the occupant selected that the type of the mask is the unknown type. When this option is selected, thecomputer 100 can transmit an update to the remote server 120 via thetransceiver 118, if thecomputer 100 did not already do so in theblock 315. The update can include the image data showing the mask of unknown type. - Next, in a
decision block 330, thecomputer 100 determines whether the occupant inputted a type of mask in response to the prompt in theblock 325. The occupant provides the input by selecting the type of mask from the list, and the occupant can fail to provide the input by selecting an option declining to provide a type of mask, e.g., an option labeled “Choose mask automatically,” or by failing to select a type of mask within a time threshold. The time threshold can be chosen to provide the occupant sufficient time to response to the prompt. If the occupant did not select a type of mask, theprocess 300 proceeds to ablock 335. If the occupant selected a type of mask, theprocess 300 proceeds to ablock 340. - In the
block 335, thecomputer 100 selects a sound filter according to the type of mask identified in theblock 315 from the plurality of the sound filters stored in memory. Selecting from the plurality of sound filters can provide a sound filter that most accurately adjusts the sound data to the baseline level. - Alternatively, when the
computer 100 has identified multiple types of masks, thecomputer 100 can select multiple sound filters, each associated with one of the identified types of masks. Thecomputer 100 can combine the sound filters together, e.g., by simple averaging or by weighting. The sound filters can be weighted based on locations of the occupants wearing the masks relative to one of themicrophones 110 generating sound data, e.g., based on volumes of the sound data from therespective microphones 110. If thefirst microphone 110 a is generating sound data with greater volume than thesecond microphone 110 b, then the sound filters are weighted according to relative distances of the masks of each type from the chosenmicrophone 110 a. For example, if a mask of a type 1 is a distance d1 from the chosenmicrophone 110 a and a mask of a type 2 is a distance d2 from the chosenmicrophone 110 a, then the weights can be w1=d1/(d1+d2) and w2=d2/(d1+d2), and the combined sound filter can be Fcombo(f)=w1*F1(f)+w2*F2(f). After theblock 335, theprocess 300 proceeds to ablock 345. - In the
block 340, thecomputer 100 identifies the type of mask based on the input by the occupant and selects the sound filter from memory associated with the identified type of mask. In other words, thecomputer 100 overrides the identification based on the image data or sound data with the identification based on the input upon receiving the input, by executing theblock 340 instead of theblock 335. After theblock 340, theprocess 300 proceeds to ablock 345. - In the
block 345, thecomputer 100 receives sound data from themicrophones 110. The sound data can include speech by the occupants. - Next, in a
block 350, thecomputer 100 applies the selected sound filter or the combination of the selected sound filters to the sound data. The sound filter adjusts a volume of the sound data by an amount that varies depending on the frequency. For example, for each frequency f of the sound data, the sound filter adjusts the sound pressure, i.e., adjusts the volume, by the value of the sound filter for that frequency, e.g., SPfilt(f)=F(t)+SPunfilt(f). For example, the sound filter can adjust the volume only slightly when the frequency is 500 Hz or less and increase the volume more considerably at 1000 Hz and higher to an extent that depends on the type of mask. After theblock 350, theprocess 300 proceeds to ablock 360. - In the
block 355, i.e., after not detecting any masks, thecomputer 100 receives sound data from themicrophones 110. The sound data can include speech by the occupants. After theblock 355, theprocess 300 proceeds to theblock 360. - In the
block 360, thecomputer 100 performs an operation using the sound data, either the filtered sound data from theblock 350 or the unfiltered sound data from theblock 355. For example, the operation can be identifying a voice command to activate a feature, e.g., converting the sound data into text such as “Call Pizza Place,” “Play Podcast,” “Decrease Temperature,” etc. (or into equivalent data identifying the command) Using the filtered sound data can help thecomputer 100 to accurately identify the voice command. For another example, the operation can be transmitting the sound data in a telephone call. A mobile phone can be paired with theuser interface 112 and used to place a telephone call. Using the filtered sound data can make it easy for the recipient of the call to understand what the occupant is saying. For another example, the operation can be outputting the filtered sound data by one or more of thespeakers 114. Sound data originating from thefirst microphone 110 can be used and outputted by thespeaker 114 at a rear of thepassenger cabin 104; in other words, thefirst microphone 110 and thespeaker 114 form a telecom. Using the filtered sound data can make it easier for an occupant in theback seat 106 to understand what the occupant in thefront seat 106 is saying than directly hearing the occupant speaking while muffled by the mask. After theblock 360, theprocess 300 ends. - Computer executable instructions may be compiled or interpreted from computer programs created using a variety of programming languages and/or technologies, including, without limitation, and either alone or in combination, Java™, C, C++, Visual Basic, Java Script, Perl, HTML, etc. In general, a processor (e.g., a microprocessor) receives instructions, e.g., from a memory, a
computer 100 readable medium, etc., and executes these instructions, thereby performing one or more processes, including one or more of the processes described herein. Such instructions and other data may be stored and transmitted using a variety of computer readable media. A file in a networked device is generally a collection of data stored on a computer readable medium, such as a storage medium, a random-access memory, etc. A computer readable medium includes any medium that participates in providing data (e.g., instructions), which may be read by a computer. Such a medium may take many forms, including, but not limited to, nonvolatile media, volatile media, etc. Nonvolatile media include, for example, optical or magnetic disks and other persistent memory. Volatile media include dynamic random-access memory (DRAM), which typically constitutes a main memory. Common forms of computer readable media include, for example, a floppy disk, a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD ROM, DVD, any other optical medium, punch cards, paper tape, any other physical medium with patterns of holes, a RAM, a PROM, an EPROM, a FLASH EEPROM, any other memory chip or cartridge, or any other medium from which a computer can read. - The disclosure has been described in an illustrative manner, and it is to be understood that the terminology which has been used is intended to be in the nature of words of description rather than of limitation. Use of “in response to” and “upon determining” indicates a causal relationship, not merely a temporal relationship. The adjectives “first” and “second” are used throughout this document as identifiers and are not intended to signify importance, order, or quantity. Many modifications and variations of the present disclosure are possible in light of the above teachings, and the disclosure may be practiced otherwise than as specifically described.
Claims (20)
1. A computer comprising a processor and a memory storing instructions executable by the processor to:
receive sensor data of an occupant of a vehicle;
identify a type of mask worn by the occupant based on the sensor data;
select a sound filter according to the type of mask from a plurality of sound filters stored in the memory;
receive sound data;
apply the selected sound filter to the sound data; and
perform an operation using the filtered sound data.
2. The computer of claim 1 , wherein the sensor data is image data showing the occupant.
3. The computer of claim 1 , wherein the operation is identifying a voice command to activate a feature.
4. The computer of claim 1 , wherein the operation is transmitting the filtered sound data in a telephone call.
5. The computer of claim 1 , wherein the operation is outputting the filtered sound data by a speaker of the vehicle.
6. The computer of claim 1 , wherein the instructions include instructions to perform the operation using the sound data unfiltered upon determining that the occupant is not wearing a mask.
7. The computer of claim 1 , wherein the instructions include instructions to select a generic sound filter from the plurality of sound filters upon identifying the type of mask as an unknown type.
8. The computer of claim 7 , wherein the instructions include instructions to transmit an update to a remote server upon identifying the type of mask as the unknown type.
9. The computer of claim 8 , wherein the update includes image data of the mask.
10. The computer of claim 1 , wherein the instructions include instructions to identify the type of mask worn by the occupant based on an input by the occupant.
11. The computer of claim 10 , wherein the instructions include instructions to override the identification based on the sensor data with the identification based on the input upon receiving the input.
12. The computer of claim 10 , wherein the instructions include instructions to prompt the occupant to provide the input upon determining that the occupant is wearing a mask.
13. The computer of claim 10 , wherein the instructions include instructions to prompt the occupant to provide the input upon determining that one of the occupant is wearing a mask with a type identified with a confidence score below a confidence threshold or the type of the mask is an unknown type.
14. The computer of claim 10 , wherein the instructions include instructions to transmit an update to a remote server in response to the input indicating that the type of the mask is not among the types of masks stored in the memory.
15. The computer of claim 1 , wherein the instructions include instructions to choose the occupant for which to identify the type of mask from a plurality of occupants based on volumes of sound data from respective microphones.
16. The computer of claim 1 , wherein the instructions include instructions to choose the occupant for which to identify the type of mask from a plurality of occupants based on the occupant being in a predesignated region of the image data.
17. The computer of claim 1 , wherein each sound filter adjusts a volume of the sound data by an amount that varies depending on frequency.
18. The computer of claim 17 , wherein each sound filter increases the volume of the sound data at at least one frequency.
19. The computer of claim 1 , wherein the instructions include instructions to receive an update from a remote server changing the plurality of sound filters stored in the memory.
20. A method comprising:
receiving sensor data of an occupant of a vehicle;
identifying a type of mask worn by the occupant based on the sensor data;
selecting a sound filter according to the type of mask from a plurality of sound filters stored in memory;
receiving sound data;
applying the selected sound filter to the sound data; and
performing an operation using the filtered sound data.
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/145,431 US11404061B1 (en) | 2021-01-11 | 2021-01-11 | Speech filtering for masks |
CN202210029150.XA CN114764322A (en) | 2021-01-11 | 2022-01-11 | Speech filtering for masks |
DE102022100538.0A DE102022100538A1 (en) | 2021-01-11 | 2022-01-11 | VOICE FILTERING FOR MASKS |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/145,431 US11404061B1 (en) | 2021-01-11 | 2021-01-11 | Speech filtering for masks |
Publications (2)
Publication Number | Publication Date |
---|---|
US20220223145A1 true US20220223145A1 (en) | 2022-07-14 |
US11404061B1 US11404061B1 (en) | 2022-08-02 |
Family
ID=82116700
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/145,431 Active 2041-04-22 US11404061B1 (en) | 2021-01-11 | 2021-01-11 | Speech filtering for masks |
Country Status (3)
Country | Link |
---|---|
US (1) | US11404061B1 (en) |
CN (1) | CN114764322A (en) |
DE (1) | DE102022100538A1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20220406327A1 (en) * | 2021-06-19 | 2022-12-22 | Kyndryl, Inc. | Diarisation augmented reality aide |
US12033656B2 (en) * | 2021-06-19 | 2024-07-09 | Kyndryl, Inc. | Diarisation augmented reality aide |
Citations (30)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20010012373A1 (en) * | 2000-02-09 | 2001-08-09 | Siemens Aktiengesellschaft | Garment-worn microphone, and communication system and method employing such a microphone for voice control of devices |
US20010021905A1 (en) * | 1996-02-06 | 2001-09-13 | The Regents Of The University Of California | System and method for characterizing voiced excitations of speech and acoustic signals, removing acoustic noise from speech, and synthesizing speech |
US20030027600A1 (en) * | 2001-05-09 | 2003-02-06 | Leonid Krasny | Microphone antenna array using voice activity detection |
US20030177007A1 (en) * | 2002-03-15 | 2003-09-18 | Kabushiki Kaisha Toshiba | Noise suppression apparatus and method for speech recognition, and speech recognition apparatus and method |
US20030223622A1 (en) * | 2002-05-31 | 2003-12-04 | Eastman Kodak Company | Method and system for enhancing portrait images |
US20040167776A1 (en) * | 2003-02-26 | 2004-08-26 | Eun-Kyoung Go | Apparatus and method for shaping the speech signal in consideration of its energy distribution characteristics |
US20040181399A1 (en) * | 2003-03-15 | 2004-09-16 | Mindspeed Technologies, Inc. | Signal decomposition of voiced speech for CELP speech coding |
US20040254793A1 (en) * | 2003-06-12 | 2004-12-16 | Cormac Herley | System and method for providing an audio challenge to distinguish a human from a computer |
US20060009970A1 (en) * | 2004-06-30 | 2006-01-12 | Harton Sara M | Method for detecting and attenuating inhalation noise in a communication system |
US20060009971A1 (en) * | 2004-06-30 | 2006-01-12 | Kushner William M | Method and apparatus for characterizing inhalation noise and calculating parameters based on the characterization |
US20070163588A1 (en) * | 2005-11-08 | 2007-07-19 | Jack Hebrank | Respirators for Delivering Clean Air to an Individual User |
US7254535B2 (en) * | 2004-06-30 | 2007-08-07 | Motorola, Inc. | Method and apparatus for equalizing a speech signal generated within a pressurized air delivery system |
US20100110489A1 (en) * | 2008-11-05 | 2010-05-06 | Yoshimichi Kanda | Image forming apparatus, method of controlling the same based on speech recognition, and computer program product |
US20110036347A1 (en) * | 2009-08-14 | 2011-02-17 | Scott Technologies, Inc | Air purifying respirator having inhalation and exhalation ducts to reduce rate of pathogen transmission |
US20120008002A1 (en) * | 2010-07-07 | 2012-01-12 | Tessera Technologies Ireland Limited | Real-Time Video Frame Pre-Processing Hardware |
US20120166188A1 (en) * | 2010-12-28 | 2012-06-28 | International Business Machines Corporation | Selective noise filtering on voice communications |
US20120191447A1 (en) * | 2011-01-24 | 2012-07-26 | Continental Automotive Systems, Inc. | Method and apparatus for masking wind noise |
US20150012270A1 (en) * | 2013-07-02 | 2015-01-08 | Family Systems, Ltd. | Systems and methods for improving audio conferencing services |
US20150221299A1 (en) * | 2014-02-04 | 2015-08-06 | Avaya, Inc. | Speech analytics with adaptive filtering |
US20180253590A1 (en) * | 2015-03-20 | 2018-09-06 | Inspirata, Inc. | Systems, methods, and apparatuses for digital histopathological imaging for prescreened detection of cancer and other abnormalities |
US10140089B1 (en) * | 2017-08-09 | 2018-11-27 | 2236008 Ontario Inc. | Synthetic speech for in vehicle communication |
US20180369616A1 (en) * | 2015-12-07 | 2018-12-27 | Christopher Dobbing | Respirator mask management system |
US20190121532A1 (en) * | 2017-10-23 | 2019-04-25 | Google Llc | Method and System for Generating Transcripts of Patient-Healthcare Provider Conversations |
US20200109869A1 (en) * | 2017-06-19 | 2020-04-09 | Oy Lifa Air Ltd. | Electrical filter structure |
US20210004982A1 (en) * | 2019-07-02 | 2021-01-07 | Boohma Technologies Llc | Digital Image Processing System for Object Location and Facing |
US20210117649A1 (en) * | 2020-12-26 | 2021-04-22 | David Gonzalez Aguirre | Systems and methods for privacy-preserving facemask-compliance-level measurement |
US20210343400A1 (en) * | 2020-01-24 | 2021-11-04 | Overjet, Inc. | Systems and Methods for Integrity Analysis of Clinical Data |
US20210368881A1 (en) * | 2020-05-29 | 2021-12-02 | Dallas/Fort Worth International Airport Board | Respirator mask and method for manufacturing |
US20220012894A1 (en) * | 2020-07-08 | 2022-01-13 | Nec Corporation Of America | Image analysis for detecting mask compliance |
US20220139388A1 (en) * | 2020-10-30 | 2022-05-05 | Google Llc | Voice Filtering Other Speakers From Calls And Audio Messages |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2009265722A (en) | 2008-04-22 | 2009-11-12 | Calsonic Kansei Corp | Face direction sensing device |
CN111444869A (en) | 2020-03-31 | 2020-07-24 | 高新兴科技集团股份有限公司 | Method and device for identifying wearing state of mask and computer equipment |
-
2021
- 2021-01-11 US US17/145,431 patent/US11404061B1/en active Active
-
2022
- 2022-01-11 CN CN202210029150.XA patent/CN114764322A/en active Pending
- 2022-01-11 DE DE102022100538.0A patent/DE102022100538A1/en active Pending
Patent Citations (30)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20010021905A1 (en) * | 1996-02-06 | 2001-09-13 | The Regents Of The University Of California | System and method for characterizing voiced excitations of speech and acoustic signals, removing acoustic noise from speech, and synthesizing speech |
US20010012373A1 (en) * | 2000-02-09 | 2001-08-09 | Siemens Aktiengesellschaft | Garment-worn microphone, and communication system and method employing such a microphone for voice control of devices |
US20030027600A1 (en) * | 2001-05-09 | 2003-02-06 | Leonid Krasny | Microphone antenna array using voice activity detection |
US20030177007A1 (en) * | 2002-03-15 | 2003-09-18 | Kabushiki Kaisha Toshiba | Noise suppression apparatus and method for speech recognition, and speech recognition apparatus and method |
US20030223622A1 (en) * | 2002-05-31 | 2003-12-04 | Eastman Kodak Company | Method and system for enhancing portrait images |
US20040167776A1 (en) * | 2003-02-26 | 2004-08-26 | Eun-Kyoung Go | Apparatus and method for shaping the speech signal in consideration of its energy distribution characteristics |
US20040181399A1 (en) * | 2003-03-15 | 2004-09-16 | Mindspeed Technologies, Inc. | Signal decomposition of voiced speech for CELP speech coding |
US20040254793A1 (en) * | 2003-06-12 | 2004-12-16 | Cormac Herley | System and method for providing an audio challenge to distinguish a human from a computer |
US20060009970A1 (en) * | 2004-06-30 | 2006-01-12 | Harton Sara M | Method for detecting and attenuating inhalation noise in a communication system |
US20060009971A1 (en) * | 2004-06-30 | 2006-01-12 | Kushner William M | Method and apparatus for characterizing inhalation noise and calculating parameters based on the characterization |
US7254535B2 (en) * | 2004-06-30 | 2007-08-07 | Motorola, Inc. | Method and apparatus for equalizing a speech signal generated within a pressurized air delivery system |
US20070163588A1 (en) * | 2005-11-08 | 2007-07-19 | Jack Hebrank | Respirators for Delivering Clean Air to an Individual User |
US20100110489A1 (en) * | 2008-11-05 | 2010-05-06 | Yoshimichi Kanda | Image forming apparatus, method of controlling the same based on speech recognition, and computer program product |
US20110036347A1 (en) * | 2009-08-14 | 2011-02-17 | Scott Technologies, Inc | Air purifying respirator having inhalation and exhalation ducts to reduce rate of pathogen transmission |
US20120008002A1 (en) * | 2010-07-07 | 2012-01-12 | Tessera Technologies Ireland Limited | Real-Time Video Frame Pre-Processing Hardware |
US20120166188A1 (en) * | 2010-12-28 | 2012-06-28 | International Business Machines Corporation | Selective noise filtering on voice communications |
US20120191447A1 (en) * | 2011-01-24 | 2012-07-26 | Continental Automotive Systems, Inc. | Method and apparatus for masking wind noise |
US20150012270A1 (en) * | 2013-07-02 | 2015-01-08 | Family Systems, Ltd. | Systems and methods for improving audio conferencing services |
US20150221299A1 (en) * | 2014-02-04 | 2015-08-06 | Avaya, Inc. | Speech analytics with adaptive filtering |
US20180253590A1 (en) * | 2015-03-20 | 2018-09-06 | Inspirata, Inc. | Systems, methods, and apparatuses for digital histopathological imaging for prescreened detection of cancer and other abnormalities |
US20180369616A1 (en) * | 2015-12-07 | 2018-12-27 | Christopher Dobbing | Respirator mask management system |
US20200109869A1 (en) * | 2017-06-19 | 2020-04-09 | Oy Lifa Air Ltd. | Electrical filter structure |
US10140089B1 (en) * | 2017-08-09 | 2018-11-27 | 2236008 Ontario Inc. | Synthetic speech for in vehicle communication |
US20190121532A1 (en) * | 2017-10-23 | 2019-04-25 | Google Llc | Method and System for Generating Transcripts of Patient-Healthcare Provider Conversations |
US20210004982A1 (en) * | 2019-07-02 | 2021-01-07 | Boohma Technologies Llc | Digital Image Processing System for Object Location and Facing |
US20210343400A1 (en) * | 2020-01-24 | 2021-11-04 | Overjet, Inc. | Systems and Methods for Integrity Analysis of Clinical Data |
US20210368881A1 (en) * | 2020-05-29 | 2021-12-02 | Dallas/Fort Worth International Airport Board | Respirator mask and method for manufacturing |
US20220012894A1 (en) * | 2020-07-08 | 2022-01-13 | Nec Corporation Of America | Image analysis for detecting mask compliance |
US20220139388A1 (en) * | 2020-10-30 | 2022-05-05 | Google Llc | Voice Filtering Other Speakers From Calls And Audio Messages |
US20210117649A1 (en) * | 2020-12-26 | 2021-04-22 | David Gonzalez Aguirre | Systems and methods for privacy-preserving facemask-compliance-level measurement |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20220406327A1 (en) * | 2021-06-19 | 2022-12-22 | Kyndryl, Inc. | Diarisation augmented reality aide |
US12033656B2 (en) * | 2021-06-19 | 2024-07-09 | Kyndryl, Inc. | Diarisation augmented reality aide |
Also Published As
Publication number | Publication date |
---|---|
US11404061B1 (en) | 2022-08-02 |
DE102022100538A1 (en) | 2022-07-14 |
CN114764322A (en) | 2022-07-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107396249B (en) | System for providing occupant-specific acoustic functions in a transportation vehicle | |
US9251694B2 (en) | Vehicle system passive notification using remote device | |
US20170327082A1 (en) | End-to-end accommodation functionality for passengers of fully autonomous shared or taxi-service vehicles | |
CN106878956B (en) | Determining vehicle user location after a collision event | |
CN105835804B (en) | For monitoring the method and apparatus that vehicle back occupant takes a seat region | |
WO2018099677A1 (en) | Improvements relating to hearing assistance in vehicles | |
CN103761462A (en) | Method for personalizing driving information by identifying vocal print | |
DE102016109814A1 (en) | Discrete emergency reaction | |
DE102013208506B4 (en) | Hierarchical recognition of vehicle drivers and selection activation of vehicle settings based on the recognition | |
CN106611602A (en) | Vehicle sound collection apparatus and sound collection method | |
US11044566B2 (en) | Vehicle external speaker system | |
CN103733647A (en) | Automatic sound adaptation for an automobile | |
US10155523B2 (en) | Adaptive occupancy conversational awareness system | |
US20190219413A1 (en) | Personalized roadway congestion notification | |
US10708700B1 (en) | Vehicle external speaker system | |
US11096613B2 (en) | Systems and methods for reducing anxiety in an occupant of a vehicle | |
US20170339529A1 (en) | Method and apparatus for vehicle occupant location detection | |
CN114194128A (en) | Vehicle volume control method, vehicle, and storage medium | |
US10504516B2 (en) | Voice control for emergency vehicle | |
US11404061B1 (en) | Speech filtering for masks | |
CN114387963A (en) | Vehicle and control method thereof | |
JP2016194804A (en) | Person identifying apparatus and program | |
US11355136B1 (en) | Speech filtering in a vehicle | |
US20230088122A1 (en) | Didactic videos for vehicle operation | |
US11787290B2 (en) | Projection on a vehicle window |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: FORD GLOBAL TECHNOLOGIES, LLC, MICHIGAN Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:AMMAN, SCOTT ANDREW;NEUBECKER, CYNTHIA M.;WHEELER, JOSHUA;AND OTHERS;SIGNING DATES FROM 20201203 TO 20210104;REEL/FRAME:054870/0679 |
|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |