WO2022230111A1 - Voice communication device and call voice processing method - Google Patents
Voice communication device and call voice processing method Download PDFInfo
- Publication number
- WO2022230111A1 WO2022230111A1 PCT/JP2021/016990 JP2021016990W WO2022230111A1 WO 2022230111 A1 WO2022230111 A1 WO 2022230111A1 JP 2021016990 W JP2021016990 W JP 2021016990W WO 2022230111 A1 WO2022230111 A1 WO 2022230111A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- mask
- communication device
- caller
- call
- microphone
- Prior art date
Links
- 238000004891 communication Methods 0.000 title claims abstract description 100
- 238000003672 processing method Methods 0.000 title claims description 16
- 238000000034 method Methods 0.000 claims abstract description 28
- 230000008569 process Effects 0.000 claims abstract description 20
- 238000012545 processing Methods 0.000 claims description 52
- 238000001514 detection method Methods 0.000 claims description 47
- 238000003384 imaging method Methods 0.000 claims description 7
- 238000010295 mobile communication Methods 0.000 description 25
- 230000015654 memory Effects 0.000 description 19
- 238000010586 diagram Methods 0.000 description 17
- 238000004458 analytical method Methods 0.000 description 14
- 238000010191 image analysis Methods 0.000 description 12
- 238000013500 data storage Methods 0.000 description 9
- 239000000463 material Substances 0.000 description 7
- 230000006870 function Effects 0.000 description 6
- 230000001133 acceleration Effects 0.000 description 5
- 230000000694 effects Effects 0.000 description 5
- 238000003825 pressing Methods 0.000 description 5
- 230000005540 biological transmission Effects 0.000 description 4
- 230000000873 masking effect Effects 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 230000008859 change Effects 0.000 description 3
- 230000006866 deterioration Effects 0.000 description 3
- 238000007726 management method Methods 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 230000004043 responsiveness Effects 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 208000035285 Allergic Seasonal Rhinitis Diseases 0.000 description 1
- 208000025721 COVID-19 Diseases 0.000 description 1
- 208000036142 Viral infection Diseases 0.000 description 1
- 230000002238 attenuated effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000012790 confirmation Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000001815 facial effect Effects 0.000 description 1
- 230000001771 impaired effect Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 229910044991 metal oxide Inorganic materials 0.000 description 1
- 150000004706 metal oxides Chemical class 0.000 description 1
- 239000004576 sand Substances 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000009385 viral infection Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M1/00—Substation equipment, e.g. for use by subscribers
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M1/00—Substation equipment, e.g. for use by subscribers
- H04M1/72—Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R3/00—Circuits for transducers, loudspeakers or microphones
Definitions
- the present invention relates to a communication device and a call voice processing method, and more particularly to a communication device and a call voice processing method that improve the difficulty of hearing even when a person wearing a mask wears a voice call.
- Non-Patent Document 1 introduces a hearing aid that corrects difficulty in hearing conversations caused by masks.
- the hearing aid boosts the high frequency range of 2 kHz to 5 kHz that is attenuated by the mask to compensate for difficulty in hearing.
- the mask mode should be activated when the user determines that hearing is difficult, or when he or she confirms that the other party is wearing a mask.
- the person making the call is generally far away, and it is difficult to distinguish whether the difficulty in hearing is due to the mask or the poor communication environment.
- the listener of a call confirms whether or not the speaker is wearing a mask each time. Therefore, the actual situation is that it is actually difficult for the person listening to the call to take actions to alleviate the difficulty of hearing due to the mask.
- Patent Document 1 there is no mention of correcting deterioration of speech quality due to masks in the first place. Therefore, in Patent Literature 1 and Non-Patent Literature 1, there is a problem in setting the call voice quality mode by masking in a call.
- the present invention was made to solve the above-mentioned problems, and aims to correct the deterioration of call quality caused by talking while wearing a mask without making the person listening to the call aware of it.
- An example of this is a communication device using telecommunications, which includes a microphone, a communication device for transmitting and receiving communication voice, a booster for boosting the high frequency level in the communication range, and a communication device.
- a caller state collection device for collecting caller state information to detect whether a person is wearing a mask; and a processor connected to each of the microphone, the communicator, the booster, and the caller state collection device.
- the processor comprises a mask detection step of analyzing the caller state information to detect whether or not the caller is wearing a mask;
- a transmission step of transmitting the call voice after boost control processing by the booster from the communication device; is characterized by executing
- FIG. 2 is a hardware configuration diagram of a communication device (smartphone); A functional block diagram of a program stored in a smartphone.
- 4 is a flowchart showing a call voice processing method according to the first embodiment;
- the block diagram of the call system which concerns on 2nd embodiment. 8 is a flowchart showing a call voice processing method according to the second embodiment; 9 is a flowchart showing a call voice processing method according to the third embodiment;
- FIG. 4 is an explanatory diagram of classifying speech analysis processing; 6 is a flow chart showing the flow of processing in which image analysis processing and audio analysis processing are used together for mask detection processing.
- 1 is an external view of an HMD, which is one form of a communication device; FIG. The hardware block diagram of HMD. 4 is a functional block diagram of a program stored in the HMD; FIG. The figure explaining the call scene using HMD.
- FIG. 1st embodiment mentions the smart phone 1 as an example and demonstrates it as a telephone call apparatus.
- a camera is used as a caller status collection device for collecting caller status information to detect whether or not a caller is wearing a mask, and the caller status information is captured data obtained by imaging with the camera.
- a communication device using telecommunication is used, but "using telecommunication" means a communication form that converts call voice into an electric signal and transmits and receives it, and the form of transmission and reception is wired.
- Communication, wireless communication, packet communication, mobile communication via an IP network, etc. can be of any type.
- mobile communication via an IP network it is possible to perform a video call in which call data consisting of video and call voice are simultaneously transmitted, but the video call is also included in the call in the present embodiment.
- Figure 1 is a hardware configuration diagram of a smartphone.
- the smartphone 1 includes a processor 30, a storage 40, a GPS receiver 51, a geomagnetic sensor 52, an acceleration sensor 53, a gyro sensor 54, a LAN communication device 61, a mobile communication device 62, a short-range wireless communication device 63, a display 71, and an in-camera 72. , an out-camera 73 , a microphone 81 , a speaker 82 , a touch sensor 91 , an operation key 92 , and a booster 95 .
- the processor 30 is a microprocessor unit that controls the entire smartphone 1 according to a predetermined operating program, and includes, for example, a CPU and an MPU (Micro Processor Unit).
- the system bus 31 is a data communication path for transmitting and receiving various commands and data between the processor 30 and each component block within the smartphone 1 .
- the storage 40 includes a ROM 41 that stores programs for controlling the operation of the smartphone 1, a non-volatile memory that stores various data such as operation setting values, detection values from sensors, objects including content, and library information downloaded from a library. It includes a memory 42 and a rewritable RAM 43 such as a work area used in various program operations.
- a flash ROM may be used as the non-volatile memory 42, or another memory medium may be used.
- the storage 40 needs to retain stored information even when power is not supplied to the smartphone 1 from the outside. Therefore, instead of the nonvolatile memory 42, devices such as semiconductor element memories such as SSDs (Solid State Drives), magnetic disk drives such as HDDs (Hard Disc Drives), and the like may be used.
- semiconductor element memories such as SSDs (Solid State Drives), magnetic disk drives such as HDDs (Hard Disc Drives), and the like may be used.
- the storage 40 can store operation programs downloaded from the network and various data created by the operation programs. In addition, it is possible to store captured data such as moving images and still images captured using the imaging function of the in-camera 72 and the out-camera 73 .
- the smartphone 1 includes a GPS (Global Positioning System) receiver 51 , a geomagnetic sensor 52 , an acceleration sensor 53 and a gyro sensor 54 . These sensors make it possible to detect the position, tilt, direction, movement, etc. of the smartphone 1 . Also, the smartphone 1 may further include other sensors such as an illuminance sensor, an altitude sensor, a proximity sensor, and the like.
- GPS Global Positioning System
- the LAN communication device 61 is connected to a wide area network via an access point or the like, and transmits and receives data to and from an external server on the wide area network.
- the connection with an access point or the like may be made through a wireless communication connection such as Wi-Fi (registered trademark).
- the mobile communication device 62 performs telephone communication (call) and data transmission/reception through wireless communication with a mobile phone base station or the like of a mobile phone communication network. Communication with mobile phone base stations, etc. is 4G (Generation), 5G mobile communication, W-CDMA (Wideband Code Division Multiple Access) (registered trademark) method, GSM (Global System for Mobile communications) method, LTE (Long Term Evolution) ) method, or any other communication method.
- a LAN communication device 61 and a mobile communication device 62 perform voice communication by mobile communication and connect to a wide area network by wireless LAN.
- the short-range wireless communication device 63 exchanges information with external Bluetooth (registered trademark) devices and external NFC-compatible devices through Bluetooth (registered trademark) communication, NFC standard communication, and the like.
- the LAN communication device 61, the mobile communication device 62, and the short-range wireless communication device 63 each have an encoding circuit, a decoding circuit, an antenna, and the like.
- another communication device such as an infrared communication device may be provided.
- the display 71 is, for example, a display device such as a backlight liquid crystal display or a self-luminous organic EL display. display the captured image data.
- the in-camera 72 is provided on the same surface as the surface of the smartphone 1 on which the display 71 is provided.
- the in-camera 72 captures an image of a caller (who is both a speaker and a listener) using the smartphone 1 and generates facial image data of the caller.
- the smartphone 1 executes face recognition processing based on this face image data, and is also used for so-called Face ID, which permits the use of the smartphone 1.
- the out-camera 73 is provided on the back surface of the smartphone 1 . It is also used when capturing an image of a landscape or the like.
- Each of the in-camera 72 and the out-camera 73 uses an electronic device such as a CCD (Charge Coupled Device) or a CMOS (Complementary Metal Oxide Semiconductor) sensor to convert the light input from the lens into an electrical signal, thereby capturing images of surroundings and objects. It is a camera that inputs image information of an object.
- a CCD Charge Coupled Device
- CMOS Complementary Metal Oxide Semiconductor
- the microphone 81 converts sound in the real space, user's voice, etc. into voice information and inputs it.
- the speaker 82 outputs voice information and the like necessary for the user.
- voice information and the like necessary for the user.
- earphones and headphones can also be connected, and it goes without saying that they can be used properly depending on the application.
- the touch sensor 91 is stacked on the display screen of the display 71 .
- the operation key 92 is configured by arranging button switches and the like.
- the touch sensor 91 and the operation key 92 are examples of operation input devices for inputting operation instructions to the smartphone 1, and may be other operation input devices.
- the smart phone 1 may be operated using a separate portable terminal device connected by wired communication or wireless communication using the LAN communication device 61 or the short-range wireless communication device 63 .
- the image captured by the in-camera 72 may be analyzed, and the smartphone 1 may be operated by actions such as gestures.
- Fig. 2 is a functional block diagram of the program stored in the smartphone.
- the non-volatile memory 42 stores a basic operation program 421 and a call program 422 as processing programs.
- the nonvolatile memory 42 further includes a data storage area 430 for various data used when executing each program.
- the processor 30 reads each program stored in the nonvolatile memory 42 and refers to data stored in the data storage area 430 as necessary.
- the processor 30 performs mask detection when the call processing unit 4222 detects a press signal of the call application icon 22 (see FIG. 4) or an incoming call signal from another call device. Performs the functions of section 4220 .
- the mask detection unit 4220 detects the presence or absence of a mask by image analysis processing. Therefore, the in-camera 72 is activated, image data is acquired, image analysis processing is performed based on the image data, and mask detection processing is performed. I do.
- the mask detection unit 4220 outputs the mask presence/absence detection result to the boost control unit 4221 .
- the boost control unit 4221 outputs a boost control signal to the booster 95 according to the mask presence/absence detection result.
- the booster 95 performs high-frequency boost processing on the call voice acquired from the microphone 81, and outputs and transmits the result from the call processing unit 4222 to the communication device.
- voice analysis is performed on call voice collected by the microphone 81 to detect the presence or absence of a mask. Detect the presence or absence of a mask. Details of the movement of each signal and data shown in FIG. 2 will be clarified as needed in the following description.
- the mobile communication network 5 (see FIG. 3) deploys the basic operation program 421 and the call program 422 in the RAM 43 and executes them.
- Various data stored in the data storage area 430 are mainly data necessary for executing the call program 422 .
- a caller face image (referred to as "registered face data 431") in which the caller's face is captured, a mask database (D/B) 432, registered voice data 433, and class data 434, which are necessary depending on the call voice processing method.
- D/B mask database
- registered voice data 433, and class data 434 which are necessary depending on the call voice processing method.
- the nonvolatile memory 42 may be a single memory medium as shown, or may be composed of a plurality of memory media.
- FIG. 3 is a configuration diagram of the call system according to the first embodiment.
- Caller 2 and caller 2A are both speakers and listeners.
- a call will be made using mobile communication, but Internet protocol (also referred to as IP protocol) may be used in a wireless LAN.
- Internet protocol also referred to as IP protocol
- the smartphone 1 or the smartphone 1A may be the calling device according to the present invention, or both the smartphone 1 and the smartphone 1A may be the calling devices according to the present invention.
- reference numerals 3 and 3A are masks
- reference numerals 4 and 4A are mobile communication base stations
- reference numeral 5 is mobile communication networks
- reference numerals 6 and 6A are wireless LAN access points
- reference numerals 60 and 60A are wireless LAN signals
- 7 is a wide area network.
- Mobile communication base stations 4 , 4 A are main components of mobile communication network 5 and also components of wide area network 7 .
- Caller 2 uses smartphone 1 to, for example, dial up and start a call.
- a signal for a call is transmitted to the mobile communication base station 4, and transmitted to the smartphone 1A held by the caller 2A via the mobile communication network 5 and the mobile communication base station 4A.
- the caller 2A accepts the incoming call using the smartphone 1A, a call is made between the caller 2 and the caller 2A.
- smartphone 1 and smartphone 1A are also connected to wide area network 7.
- routes There are two types of routes, a route via mobile communication base stations 4 and 4A and a route using access points 6 and 6A with wireless LAN signals 60 and 60A.
- FIG. 4 is a diagram showing the home screen 20 of the smartphone 1.
- FIG. The call port 83 is connected to the microphone 81 and the speaker 82 . By pressing the operation key 92, the home screen 20 is displayed.
- time, location information, weather information, and a search window are arranged at the top, but these are not essential.
- What is important on the home screen 20 is an icon button group 21 for starting an application program. By pressing the icon button corresponding to the application program, the home screen 20 is shifted to the screen of the application program.
- the icon button group 21 includes a call application icon 22 that activates the call program. Caller 2, 2A pushes the button to start the call program.
- FIG. 5 is a diagram showing the dial screen 20a displayed by the calling program.
- a numeric keypad 23 and a dial start button 24 are displayed, and the caller uses the numeric keypad 23 to input the number of the other party to call, and presses the dial start button 24 to call the other party.
- a telephone directory may be displayed and a calling party may be selected from the telephone directory.
- FIG. 6 shows such a case, showing the display screen 20b when there is an incoming call.
- the calling party for example, the calling party's name, telephone number, information indicating non-notification
- a call acceptance button 25 for example, the calling party's name, telephone number, information indicating non-notification
- a call rejection button 25 is displayed.
- the dial screen 20a in FIGS. 5 and 6 and the display screen 20b when a call is made correspond to the initial screen of the call program.
- the caller performs an action of looking at the display screen.
- the in-camera 72 is activated to image the face of the caller.
- This processing is realized by outputting an imaging instruction signal by the in-camera 72 from the calling program 422 when the calling program 422 is activated in response to the detection of the pressing signal of the calling application icon 22 of the calling program by the processor 30 .
- FIG. 7 is a flow chart showing the call voice processing method according to the first embodiment. As described in FIG. 1, each step of the call program 422 is executed by the processor 30 of the smart phone 1. FIG.
- the home screen 20 is displayed on the smartphone 1 (S10). If the caller presses the call application icon 22 (see FIG. 4) (S11: YES), or if a call is received (see FIG. 5) (S12: YES), the processor 30 activates the call program 422. If NO in both steps S11 and S12, the process waits in step S10.
- the call program 422 executes the call process (S13-S17) and the image analysis process (S21-S24) in multi-process.
- the mask detection unit 4220 activates the in-camera 72 (S21) and acquires the camera image (S22).
- the mask detection unit 4220 performs subject detection processing on the camera image and performs face recognition processing (S23).
- the mask detection unit 4220 determines whether the face image includes a mask (S24: corresponds to the mask detection step).
- a boost control signal indicating execution of boost is output to boost control section 4221. If it is determined that there is no mask (S24: NO), a boost control signal indicating no boost (non-execution of boost) is output to boost control section 4221 .
- the call processing unit 4222 sends a call request to the dialed number (S13), and when the other party gives permission, the call starts (S15).
- the call processing unit 4222 responds to the incoming call request from the communication partner (S14), and starts the call (S15).
- the booster 95 performs boost control processing according to the boost control signal (S16: corresponds to the boost control step).
- the booster 95 uses the boost control signal output in step S24 to turn ON or OFF the amount of boost in the high frequency band, specifically 2 kHz to 5 kHz.
- the smartphone 1 transmits the call voice that has undergone boost control in step S16 to the mobile communication network 5 or the wide area network 7 (corresponding to a transmission step).
- the call processing unit 4222 determines the end of the call (S17), ends the call with YES and returns to the home screen (S10), and continues the call with NO.
- the image analysis process S20 is performed at the start of the call program to perform mask detection, but it may be continued as indicated by the dashed line returning from S23 to S22 in FIG. 8 even during the execution of the call. By continuously executing this function, a change in state such as putting on or taking off a mask is detected during a call. Furthermore, in the face recognition processing of S23, the first recognized face image may be stored as the registered face data 431 in the data storage area 430, and the registered face data 431 may be referred to. To prevent an erroneous mask detection result from being obtained when the face of a person other than a caller is captured by performing mask detection on the same face image as a registered face image.
- the booster 95 described above is realized by a hardware circuit, it is also possible for the processor 30 to perform software processing according to a program stored in the storage 40 .
- the call device and call voice processing method of the first embodiment it is possible to automatically set whether or not the speaker wears a mask and the high-frequency boost in a call by telecommunication. You can improve the difficulty of hearing during calls without worrying about whether you are wearing a mask.
- FIG. 8 is a configuration diagram of a call system according to the second embodiment.
- the call service server 8 is connected to the wide area network 7 in the second embodiment.
- the call service server 8 has a mask database 8a.
- a call is made between the caller 2 and the caller 2A, and the smartphone 1 (or the smartphone 1A as well) uses the mask database 8a of the call service server 8.
- FIG. 8 A call is made between the caller 2 and the caller 2A, and the smartphone 1 (or the smartphone 1A as well) uses the mask database 8a of the call service server 8.
- the mask database 8a is created by the call service server 8 collecting information on masks in the wide area network 7.
- the mask information includes a mask image, material data, etc.
- mask image as a search key from among the face images of the caller captured by the camera of the smartphone 1
- mask data with similar mask shapes and textures is selected.
- the amount of sound attenuation when passing through the mask is estimated.
- the mask database 8a may be stored in the call service server 8 and may be referred to. Alternatively, as shown in FIG. Good (see mask database 432 in FIG. 2).
- FIG. 9 is a flowchart showing a call voice processing method according to the second embodiment.
- the mask detection unit 4220 determines whether the face image includes a mask (S24: YES), and then refers to the mask database 432 or 8a to determine the mask material and the like. Search for information (S25).
- the amount of boost for each level corresponds to multiple mask materials.
- the search result becomes a boost control signal together with no mask (S24: NO, no boost).
- S24 NO, no boost.
- the default boost amount is used.
- FIG. 10 is a flow chart showing a call voice processing method according to the third embodiment. Each step of the processing flowchart is executed by the processor 30 of the smartphone 1 .
- the same numbers (S10 to S17) are assigned to steps that perform the same processing as in the flowchart of FIG. S30 is the speech analysis process consisting of S31-S37.
- the speech analysis process S30 may be a multi-process that is executed in parallel with each step of the call (S15, S16).
- the speech analysis process S30 is started at the timing immediately after the caller starts the call (S15) (connected to the other party of the call by means of telecommunication).
- FIG. 11 is an explanatory diagram showing classifying processing of speech analysis processing.
- the high-low level ratio data are classified into several classes.
- the data in FIG. 11 in which the level ratio and the class are associated is called level ratio class data.
- the audio attenuation amount is further associated with the level ratio class data. Therefore, when the level ratio of the call voice is determined and the class to which it belongs is determined, the voice attenuation associated with that class is determined. Therefore, the step of determining the level ratio class corresponds to the speech attenuation amount estimation step.
- the vertical axis is the high/low level ratio
- the horizontal axis is the frequency. Plot how often the calculated high-low level ratio data occurred. From the plotted data group, the boundaries and representative values of classes 0 to 3 are determined. Class 0 is unmasked data with the smallest high-to-low level ratio, and class 3 is the class with the largest high-to-low level ratio. The ratios of the representative value of class 0 and the representative values of classes 1 to 3 are boost amounts 1 to 3, respectively. Classification may be based on an unsupervised learning process that clusters data using machine learning. At this time, it is classified into a plurality of classes corresponding to the material of the mask.
- the ratio of high and low levels varies due to individual differences and the degree of tightness of wearing a mask. Therefore, if the high-to-low level ratio itself is made to correspond to the amount of boost, the amount of boost may become unstable and the difficulty of hearing speech may not be improved. Classification will reduce the effects of individual differences and the degree of closeness of wearing a mask.
- the mask detection unit 4220 calculates the amount of boost according to the class and obtains the boost control signal.
- a boost control signal is sent to the boost control step of S16. In addition to immediately reflecting the change in class in the boost control signal, it is also possible to reduce frequent changes in the amount of boost by reflecting changes over time.
- the mask detection unit 4220 updates the class data 434 in the data storage area 430 of FIG. Acquisition of voice during a call is performed not only during one call but also during multiple calls. In a plurality of calls, speech in a plurality of forms, such as without masking and with masking, is acquired and used as class data. At this time, the registration voice data 433 may be referenced to determine whose voice the voice is, and the class data 434 may be handled for each voice.
- the mask detection unit 4220 uses voice analysis processing to detect the presence or absence of a mask, it is not necessary to capture the caller's face with the in-camera 72 . Therefore, for example, while driving a car, the communication device is placed in a storage box, the caller wears a headset with a microphone and earphones, and the communication device and headset are paired and used by proximity communication. However, automatic mask detection can be performed.
- FIG. 12 is a flow chart showing the flow of processing that uses both image analysis processing and audio analysis processing for mask detection processing.
- the image analysis process (S20) is executed, and the voice analysis process (S30) is executed during the call after the start of the call (S15).
- a mask detection process is achieved that satisfies both characteristics of immediacy of the image analysis process (S20) and responsiveness to changes due to the sound analysis process (S30).
- mask detection processing is possible even in situations where the in-camera 72 is not effective by using voice analysis processing. It is also possible to respond to changes such as wearing and removing a mask during a call. Furthermore, by using it together with image analysis processing, immediacy and responsiveness to changes can be satisfied.
- FIG. 13 A fourth embodiment will be described with reference to FIGS. 13 to 15.
- a head-mounted display also referred to as HMD
- the microphone 107 provided in the HMD 100 is used as the caller state collection device, and the call voice collected by the microphone 107 is used as the caller state information.
- Fig. 13 is an external view of the HMD 100, which is one form of a communication device.
- HMD 100 comprises a frame housing including left temple 130A, right temple 130B, front frame 130C, and nose pads 130D.
- the front frame 130C includes a camera 101, a distance measuring sensor 102, a left projector 104A, a right projector 104B, and a screen 104C.
- Left temple 130A is provided with left speaker 106A and microphone 107 .
- a controller 103 is provided on the right temple 130B.
- a caller wears the HMD 100 on his or her face using the frame housing. It should be noted that the placement location of each component may be different from that shown in FIG. 13 .
- the left projector 104A, right projector 104B, and screen 104C form an image display unit, but instead of these, a transflective screen or a non-transmissive display may be used.
- a transflective screen or a non-transmissive display may be used.
- the semi-transmissive screen the user sees the front background through the semi-transmissive screen, but in the non-transmissive display, the camera image capturing the front background is displayed on the non-transmissive display for confirmation. .
- the camera 101 is attached to capture an image of the background in front of the user's line of sight, and the distance sensor 102 measures the distance to the background.
- FIG. 14 is a hardware configuration diagram of the HMD 100.
- the HMD 100 includes a camera 101, a distance measurement sensor 102, a left projector 104A, a right projector 104B, a transflective screen 104C, operation keys 105, a left speaker 106A, a right speaker 106B, a microphone 107, and a storage 140 built into the controller 103. , a GPS receiver 151 , a geomagnetic sensor 152 , an acceleration sensor 153 , a gyro sensor 154 , a LAN communication device 161 , a mobile communication device 162 and a short-range wireless communication device 163 .
- the storage 140 includes a ROM 141, a nonvolatile memory 142, and a RAM 143.
- the controller 103 captures the camera image captured by the camera 101 and the distance image measured by the distance sensor 102 and supplies them to the storage 140 and processor 113 inside the controller 103 . Further, the controller 103 incorporates a GPS receiver 151, a sensor group such as a gyro sensor, and a communication unit. Further, the controller 103 creates images to be projected by the left projector 104A and right projector 104B and sounds to be output to the left speaker 106A and right speaker 106B.
- FIG. 15 is a block diagram showing programs and data stored in the nonvolatile memory 142 of the HMD 100.
- the nonvolatile memory 142 includes a basic operation program 521 , HMD program 522 , call program 523 and data storage area 524 .
- the processor 113 develops the basic operation program 521, the HMD program 522, and the call program 523 in the RAM 143 and executes them in the processor 113.
- the data storage area 524 stores data necessary for executing the basic operation program 521 , the HMD program 522 , and the call program 523 .
- the call program 523 includes a mask detection unit 4220, a boost control unit 4221, and a call processing unit 4222, as in the first embodiment.
- the processor 113 executes the function of the mask detection section 4220 when the call processing section 4222 detects a signal of pressing the call application icon 22 or an incoming call signal.
- the mask detection unit 4220 analyzes the speech from the microphone 107 and outputs the result of mask detection to the boost control unit 4221 .
- the boost control unit 4221 outputs a boost control signal to the booster 95 according to the mask presence/absence detection result.
- the booster 95 performs high-frequency boost processing on the call voice acquired from the microphone 107, and outputs and transmits the result from the call processing unit 4222 to the communication device.
- FIG. 16 is a diagram for explaining a call scene using the HMD 100.
- the caller 2 wears the HMD 100 and images the front background with the camera 101 of the HMD 100 . Also, the distance sensor 102 measures the distance to the background in front. These data are sent to the HMD management server 9 connected to the wide area network 7 .
- the administrator 191 shares these data with the information communication device 192. If the front background is, for example, a facility maintenance site, the manager 191 gives voice and image instructions to the caller 2 .
- the image is a three-dimensional AR (Argument Reality) object, and is three-dimensionally displayed multiplexed so that it touches the facilities in the background in front of the HMD 100.
- AR Aral Reality
- Caller 2 responds to the manager's 191 instructions.
- One form of response is voice. Questions about instructions are also given by voice.
- the voice of the caller 2 is performed using a wireless LAN by voice communication using IP protocol.
- the voice communication always maintains a communication state without providing a calling procedure such as dialing.
- the call program is executed, mask detection processing and boost control according to the result thereof are executed.
- voice analysis processing can be applied as mask detection processing, but image analysis processing may also be applied by adding an in-camera that captures the face of the caller to the HMD 100 .
- the camera of the smartphone may be used by pairing with a device such as a smartphone.
- an HMD can be applied as a communication device.
- the present invention is not limited to the embodiments described in FIGS. 1 to 16 above, and part of the configuration of one embodiment can be replaced with another embodiment. It is also possible to add the configuration of another embodiment to the configuration of one embodiment. All of these belong to the scope of the present invention, and numerical values, messages, etc. appearing in texts and drawings are only examples, and even if different ones are used, the effect of the present invention is not impaired.
- communication information terminals are used not only by the owner of the terminal, but also by acquaintances.
- the caller is identified by face recognition and voice recognition during the call, the class data to be applied to each individual is prepared for multiple people, and the optimal voice correction is performed for the caller, thereby realizing a further improved call. can do.
- the smartphone 1 and the HMD 100 are used as communication devices, but devices to which the present invention can be applied are not limited to the smartphone 1 and the HMD 100.
- the call voice processing method may be applied to a web conference system in which a camera and microphone are connected to a personal computer.
- the call program 422 is installed in the personal computer in advance. Then, the caller positioned in front of the microphone is imaged by a camera, whether or not the caller is wearing a mask is detected based on the captured image data, and boost control similar to that described above is executed according to the detection result.
- the processor of the personal computer when participating in a web conference with the camera turned off, performs audio analysis processing based on the audio data collected by the microphone to detect whether the mask is worn or not. Boost control may be performed accordingly.
- participants can participate with the camera turned on or off according to their wishes. It may be executed by switching to processing.
- the present invention can be applied to any device that connects at least one of a camera or a microphone, a processor with a call function, a storage that stores the call program 422, and a controller that includes a booster.
- the in-camera 72 is used as the camera for imaging the caller, but in the Web conference system, the camera for imaging the caller is a web camera that is detachably connected to the personal computer, or is formed integrally with the personal computer.
- a built-in camera may be used, and the aspect of the camera that captures the image of the caller is appropriately selected according to the device to which the present invention is applied.
- some or all of the functions, etc. of the invention may be implemented in hardware, for example, by designing them in an integrated circuit. It may also be implemented in software by a microprocessor unit, CPU, etc. interpreting and executing an operating program. Moreover, the implementation range of software is not limited, and hardware and software may be used together.
- HMD management Server 20 Home screen 20a: Dial screen 20b: Display screen 21: Icon button group 22: Call application icon 23: Numeric keypad 24: Dial start button 25: Call acceptance button 26: Call rejection button 27: Information 30: Processor 31: System Bus 40: Storage 41: ROM 42: non-volatile memory 43: RAM 51: GPS receiver 52: geomagnetic sensor 53: acceleration sensor 54: gyro sensor 60, 60A: wireless LAN signal 61: LAN communication device 62: mobile communication device 63: short-range wireless communication device 71: display 72: in-camera 73: Out camera 81 : Microphone 82 : Speaker 83 : Call port 91 : Touch sensor 92 : Operation key 95 : Booster 100 : HMD 101: Camera 102: Ranging sensor 103: Controller
Landscapes
- Engineering & Computer Science (AREA)
- Signal Processing (AREA)
- Computer Networks & Wireless Communication (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Telephone Function (AREA)
Abstract
The present invention provides a voice communication device that uses telecommunications. The voice communication device includes a microphone, a communicator that transmits and receives a call voice, a booster that boosts a high frequency level in a call voice range, a caller state collection device that collects caller state information to detect whether a caller is wearing a mask, and a processor. The processor: analyzes the caller state information to detect whether the caller is wearing a mask; outputs, to the booster, a boost control signal for controlling the high frequency boost of the call voice collected by the microphone according to the result of detecting whether the caller is wearing a mask; and causes the communicator to transmit the call voice on which the boost control process has been performed.
Description
本発明は、通話装置及び通話音声処理方法に係り、通話する人がマスクをつけた状態であっても聞きづらさを改善する通話装置及び通話音声処理方法に関する。
The present invention relates to a communication device and a call voice processing method, and more particularly to a communication device and a call voice processing method that improve the difficulty of hearing even when a person wearing a mask wears a voice call.
電気通信手段を用いた会話(以下、通話と記す)において、古くから通話品質の向上に関する提案が見られる。例えば、特許文献1には「通話者が騒音下にいる場合の音声マスキングを補正する騒音モード、通常通話時において相手音声が聞きづらい場合に使用する音質モードの2つの音質変更モードを用意する。メモリに記憶されたデータを読み出し、このデータで演算処理装置によりデジタルイコライザ、ノイズサプレッサ、ボリューム、エコーキャンセラ、側音追加回路を制御することにより、上述騒音モード又は音質モードに設定可能とする。(要約抜粋)」との記載がある。
In conversations using telecommunication means (hereinafter referred to as calls), there have long been proposals for improving call quality. For example, in Patent Document 1, "Two sound quality change modes are prepared: a noise mode that corrects voice masking when the caller is in a noisy environment, and a sound quality mode that is used when it is difficult to hear the other party's voice during a normal call. Memory. By reading out the data stored in the memory and controlling the digital equalizer, noise suppressor, volume, echo canceller, and sidetone addition circuit with this data by the arithmetic processing unit, the above noise mode or sound quality mode can be set.(Summary) Excerpt)” is stated.
また近年、花粉症や黄砂の影響の軽減、更にはCOVID-19のようなウィルス感染予防等のために、日常でマスクを着用する機会が増えている。日常生活でのマスク着用の課題として、発声がマスクによりさえぎられ、会話や通話が聞き取り難くなるということが指摘されている。
Also, in recent years, there are more and more opportunities to wear masks on a daily basis in order to reduce the effects of hay fever and yellow sand, as well as prevent viral infections such as COVID-19. It has been pointed out that wearing a mask in daily life makes it difficult to hear conversations and calls because the mask blocks vocalizations.
非特許文献1には、マスクに起因する会話の聞き取り難さを補正する補聴器が紹介されている。該補聴器では、マスクモードにおいて、マスクで減衰する2kHz~5kHzという音声の高域をブーストして、聞き難さを補正する。
Non-Patent Document 1 introduces a hearing aid that corrects difficulty in hearing conversations caused by masks. In the mask mode, the hearing aid boosts the high frequency range of 2 kHz to 5 kHz that is attenuated by the mask to compensate for difficulty in hearing.
非特許文献1では、ユーザが聞き取り難いという判断を行った時、あるいは話し相手がマスクを着用していることを確認した時に、マスクモードを活性化させればよい。しかしながら通話では、通話する人は一般に遠距離に居て、聞き難さの原因がマスクにあるのか、あるいは劣悪な通信環境にあるのかといった原因を区別しづらい。通話の聞き手が通話の話し手に対して、マスク着用の有無をその都度確認するのも、スムーズな通話の観点から好ましくない。したがって、通話の聞く者が、マスクによる聞きづらさを緩和する行動を的確に行うことは事実上困難であるという実情がある。
In Non-Patent Document 1, the mask mode should be activated when the user determines that hearing is difficult, or when he or she confirms that the other party is wearing a mask. However, in a call, the person making the call is generally far away, and it is difficult to distinguish whether the difficulty in hearing is due to the mask or the poor communication environment. It is also not preferable from the viewpoint of smooth communication that the listener of a call confirms whether or not the speaker is wearing a mask each time. Therefore, the actual situation is that it is actually difficult for the person listening to the call to take actions to alleviate the difficulty of hearing due to the mask.
また特許文献1では、そもそもマスクによる通話品質の劣化を補正することについては言及されていない。したがって、特許文献1及び非特許文献1では、通話におけるマスクによる通話音声の品質モードの設定に課題がある。
Also, in Patent Document 1, there is no mention of correcting deterioration of speech quality due to masks in the first place. Therefore, in Patent Literature 1 and Non-Patent Literature 1, there is a problem in setting the call voice quality mode by masking in a call.
本発明は、上述課題を解決するためになされたものであり、マスクを着用したまま通話することによる通話品質の劣化の補正を、通話を聞く者に意識させることなく行うことを目的とする。
The present invention was made to solve the above-mentioned problems, and aims to correct the deterioration of call quality caused by talking while wearing a mask without making the person listening to the call aware of it.
上述課題を解決するために、本発明は特許請求の範囲に記載の構成を備える。その一例をあげるなれば、電気通信を用いた通話装置であって、前記通話装置は、マイクと、通話音声の送受信を行う通信器と、通話音域帯における高域レベルをブーストするブースターと、通話者のマスク着用の有無を検出するため通話者状態情報を収集する通話者状態収集デバイスと、前記マイク、前記通信器、前記ブースター、及び前記通話者状態収集デバイスと、のそれぞれに接続されたプロセッサと、を備え、前記プロセッサは、前記通話者状態情報を解析して前記通話者のマスク着用の有無を検出するマスク検出ステップと、前記通話者のマスク着用の有無の検出結果に応じて、前記マイクが集音した通話音声の高域ブーストを制御するブースト制御信号を前記ブースターに出力するブースト制御ステップと、前記ブースターによるブースト制御処理後の通話音声を、前記通信器から送信させる送信ステップと、を実行することを特徴とする。
In order to solve the above problems, the present invention has the configuration described in the claims. An example of this is a communication device using telecommunications, which includes a microphone, a communication device for transmitting and receiving communication voice, a booster for boosting the high frequency level in the communication range, and a communication device. a caller state collection device for collecting caller state information to detect whether a person is wearing a mask; and a processor connected to each of the microphone, the communicator, the booster, and the caller state collection device. and the processor comprises a mask detection step of analyzing the caller state information to detect whether or not the caller is wearing a mask; A boost control step of outputting to the booster a boost control signal for controlling a high frequency boost of call voice collected by a microphone; a transmission step of transmitting the call voice after boost control processing by the booster from the communication device; is characterized by executing
本発明によれば、マスクを着用したまま通話することによる通話品質の劣化の補正を、通話を聞く者に意識させることなく行うことができる。なお、上述した以外の発明の目的、構成、効果については以下の実施形態で明らかにされる。
According to the present invention, it is possible to correct deterioration of call quality caused by talking while wearing a mask without making the person listening to the call aware of it. Objects, configurations, and effects of the invention other than those described above will be clarified in the following embodiments.
以下、図面を参照しながら本発明の実施形態について説明する。以下の説明において全図を通じて同一の構成、ステップには同一の符号を付し、重複説明を省略する。
Hereinafter, embodiments of the present invention will be described with reference to the drawings. In the following description, the same reference numerals are given to the same configurations and steps throughout the drawings, and repeated descriptions will be omitted.
[第一実施形態]
図1から図7を参照して、第一実施形態について説明する。第一実施形態は、通話装置として、スマートフォン1を例に挙げて説明する。また通話者のマスク着用の有無を検出するため通話者状態情報を収集する通話者状態収集デバイスとしてカメラを用い、通話者状態情報はカメラが撮像して得た撮像データを用いる実施形態である。全ての実施形態において、電気通信を用いた通話装置を用いるが、「電気通信を用いた」とは通話音声を電気信号に変換して送受信する通信形態を意味しており、送受信の形態は有線通信、無線通信、パケット通信、IPネットワークを経由したモバイル通信等、種別を問わない。なお、IPネットワークを経由したモバイル通信では、映像と通話音声からなる通話データとを同送するビデオ電話が行えるが、ビデオ電話も本実施形態でいう通話に含む。 [First embodiment]
A first embodiment will be described with reference to FIGS. 1 to 7. FIG. 1st embodiment mentions thesmart phone 1 as an example and demonstrates it as a telephone call apparatus. In this embodiment, a camera is used as a caller status collection device for collecting caller status information to detect whether or not a caller is wearing a mask, and the caller status information is captured data obtained by imaging with the camera. In all the embodiments, a communication device using telecommunication is used, but "using telecommunication" means a communication form that converts call voice into an electric signal and transmits and receives it, and the form of transmission and reception is wired. Communication, wireless communication, packet communication, mobile communication via an IP network, etc., can be of any type. In addition, in mobile communication via an IP network, it is possible to perform a video call in which call data consisting of video and call voice are simultaneously transmitted, but the video call is also included in the call in the present embodiment.
図1から図7を参照して、第一実施形態について説明する。第一実施形態は、通話装置として、スマートフォン1を例に挙げて説明する。また通話者のマスク着用の有無を検出するため通話者状態情報を収集する通話者状態収集デバイスとしてカメラを用い、通話者状態情報はカメラが撮像して得た撮像データを用いる実施形態である。全ての実施形態において、電気通信を用いた通話装置を用いるが、「電気通信を用いた」とは通話音声を電気信号に変換して送受信する通信形態を意味しており、送受信の形態は有線通信、無線通信、パケット通信、IPネットワークを経由したモバイル通信等、種別を問わない。なお、IPネットワークを経由したモバイル通信では、映像と通話音声からなる通話データとを同送するビデオ電話が行えるが、ビデオ電話も本実施形態でいう通話に含む。 [First embodiment]
A first embodiment will be described with reference to FIGS. 1 to 7. FIG. 1st embodiment mentions the
図1は、スマートフォンのハードウェア構成図である。
Figure 1 is a hardware configuration diagram of a smartphone.
スマートフォン1は、プロセッサ30、ストレージ40、GPS受信器51、地磁気センサ52、加速度センサ53、ジャイロセンサ54、LAN通信器61、モバイル通信器62、近距離無線通信器63、ディスプレイ71、インカメラ72、アウトカメラ73、マイク81、スピーカ82、タッチセンサ91、操作キー92、及びブースター95を含み、各構成要素がそれぞれシステムバス31を介して互いに通信接続されて構成される。
The smartphone 1 includes a processor 30, a storage 40, a GPS receiver 51, a geomagnetic sensor 52, an acceleration sensor 53, a gyro sensor 54, a LAN communication device 61, a mobile communication device 62, a short-range wireless communication device 63, a display 71, and an in-camera 72. , an out-camera 73 , a microphone 81 , a speaker 82 , a touch sensor 91 , an operation key 92 , and a booster 95 .
プロセッサ30は、所定の動作プログラムにしたがってスマートフォン1全体を制御するマイクロプロセッサユニットであり、例えばCPUやMPU(Micro Processor Unit)がある。
The processor 30 is a microprocessor unit that controls the entire smartphone 1 according to a predetermined operating program, and includes, for example, a CPU and an MPU (Micro Processor Unit).
システムバス31は、プロセッサ30とスマートフォン1内の各構成ブロックとの間で各種コマンドやデータなどの送受信を行うためのデータ通信路である。
The system bus 31 is a data communication path for transmitting and receiving various commands and data between the processor 30 and each component block within the smartphone 1 .
ストレージ40は、スマートフォン1の動作を制御するためのプログラムなどを記憶するROM41、動作設定値や各センサからの検出値やコンテンツを含むオブジェクトやライブラリからダウンロードしたライブラリ情報などの各種データを記憶する不揮発性メモリ42、各種プログラム動作で使用するワークエリアなどの書き替え可能なRAM43を含む。不揮発性メモリ42として、フラッシュROMを用いてもよいし、その他のメモリ媒体であってもよい。
The storage 40 includes a ROM 41 that stores programs for controlling the operation of the smartphone 1, a non-volatile memory that stores various data such as operation setting values, detection values from sensors, objects including content, and library information downloaded from a library. It includes a memory 42 and a rewritable RAM 43 such as a work area used in various program operations. A flash ROM may be used as the non-volatile memory 42, or another memory medium may be used.
ストレージ40は、スマートフォン1に外部から電源が供給されていない状態であっても記憶している情報を保持する必要がある。そこで、不揮発性メモリ42に代わり、SSD(Solid State Drive)などの半導体素子メモリ、HDD(Hard Disc Drive)などの磁気ディスクドライブ、等のデバイスが用いられてもよい。
The storage 40 needs to retain stored information even when power is not supplied to the smartphone 1 from the outside. Therefore, instead of the nonvolatile memory 42, devices such as semiconductor element memories such as SSDs (Solid State Drives), magnetic disk drives such as HDDs (Hard Disc Drives), and the like may be used.
ストレージ40は、ネットワーク上からダウンロードした動作プログラムや動作プログラムで作成した各種データ等を記憶可能である。また、インカメラ72やアウトカメラ73による撮像機能を使用して撮像した動画や静止画等の撮像データを記憶可能である。
The storage 40 can store operation programs downloaded from the network and various data created by the operation programs. In addition, it is possible to store captured data such as moving images and still images captured using the imaging function of the in-camera 72 and the out-camera 73 .
スマートフォン1は、GPS(Global Positioning System)受信器51、地磁気センサ52、加速度センサ53、ジャイロセンサ54を備える。これらのセンサにより、スマートフォン1の位置、傾き、方角、動き、等を検出することが可能となる。また、スマートフォン1が、照度センサ、高度センサ、近接センサ等、他のセンサを更に備えていても良い。
The smartphone 1 includes a GPS (Global Positioning System) receiver 51 , a geomagnetic sensor 52 , an acceleration sensor 53 and a gyro sensor 54 . These sensors make it possible to detect the position, tilt, direction, movement, etc. of the smartphone 1 . Also, the smartphone 1 may further include other sensors such as an illuminance sensor, an altitude sensor, a proximity sensor, and the like.
LAN通信器61は、アクセスポイント等を介して広域ネットワークと接続され、広域ネットワーク上の外部サーバとデータの送受信を行う。アクセスポイント等との接続は、Wi-Fi(登録商標)等の無線通信接続で行われても良い。
The LAN communication device 61 is connected to a wide area network via an access point or the like, and transmits and receives data to and from an external server on the wide area network. The connection with an access point or the like may be made through a wireless communication connection such as Wi-Fi (registered trademark).
モバイル通信器62は、移動体電話通信網の携帯電話基地局等との無線通信により、電話通信(通話)及びデータの送受信を行う。携帯電話基地局等との通信は4G(Generation)、5Gのモバイル通信、W-CDMA(Wideband Code Division Multiple Access)(登録商標)方式やGSM(Global System for Mobile communications)方式、LTE(Long Term Evolution)方式、或いはその他の通信方式によって行われても良い。LAN通信器61、モバイル通信器62は、モバイル通信で音声通話を行い、無線LANで広域ネットワークに接続する。
The mobile communication device 62 performs telephone communication (call) and data transmission/reception through wireless communication with a mobile phone base station or the like of a mobile phone communication network. Communication with mobile phone base stations, etc. is 4G (Generation), 5G mobile communication, W-CDMA (Wideband Code Division Multiple Access) (registered trademark) method, GSM (Global System for Mobile communications) method, LTE (Long Term Evolution) ) method, or any other communication method. A LAN communication device 61 and a mobile communication device 62 perform voice communication by mobile communication and connect to a wide area network by wireless LAN.
近距離無線通信器63は、Bluetooth(登録商標)通信や、NFC規格通信などにより、外部のBluetooth(登録商標)機器や、外部のNFC対応機器に対して、情報のやり取りを行う。
The short-range wireless communication device 63 exchanges information with external Bluetooth (registered trademark) devices and external NFC-compatible devices through Bluetooth (registered trademark) communication, NFC standard communication, and the like.
LAN通信器61、モバイル通信器62、近距離無線通信器63は、それぞれ符号化回路や復号回路やアンテナ等を備える。また、上述通信器の他、更に赤外線通信器など、他の通信器を備えていても良い。
The LAN communication device 61, the mobile communication device 62, and the short-range wireless communication device 63 each have an encoding circuit, a decoding circuit, an antenna, and the like. In addition to the communication device described above, another communication device such as an infrared communication device may be provided.
ディスプレイ71は、例えばバックライト型液晶ディスプレイや、自己発光型である有機ELディスプレイなどの表示デバイスであり、広域ネットワークを介して入手した画像データや映像データ、あるいはインカメラ72やアウトカメラ73で撮像した撮像データを表示する。
The display 71 is, for example, a display device such as a backlight liquid crystal display or a self-luminous organic EL display. display the captured image data.
インカメラ72は、スマートフォン1におけるディスプレイ71が備えられた面と同一の表面に設けられる。インカメラ72は、スマートフォン1を利用する通話者(話し手にもなり、聞き手にもなる)を撮像し、通話者の顔画像データを生成する。スマートフォン1は、この顔画像データを基に顔認識処理を実行し、スマートフォン1の利用を許可する、いわゆるFace IDにも利用される。
The in-camera 72 is provided on the same surface as the surface of the smartphone 1 on which the display 71 is provided. The in-camera 72 captures an image of a caller (who is both a speaker and a listener) using the smartphone 1 and generates facial image data of the caller. The smartphone 1 executes face recognition processing based on this face image data, and is also used for so-called Face ID, which permits the use of the smartphone 1.
アウトカメラ73は、スマートフォン1における裏面に設けられる。そして風景等を撮像する際に用いられる。
The out-camera 73 is provided on the back surface of the smartphone 1 . It is also used when capturing an image of a landscape or the like.
インカメラ72、アウトカメラ73のそれぞれは、CCD(Charge Coupled Device)やCMOS(Complementary Metal Oxide Semiconductor)センサ等の電子デバイスを用いてレンズから入力した光を電気信号に変換することにより、周囲や対象物の画像情報を入力するカメラである。
Each of the in-camera 72 and the out-camera 73 uses an electronic device such as a CCD (Charge Coupled Device) or a CMOS (Complementary Metal Oxide Semiconductor) sensor to convert the light input from the lens into an electrical signal, thereby capturing images of surroundings and objects. It is a camera that inputs image information of an object.
マイク81は、実空間の音やユーザの音声などを音声情報に変換して入力する。
The microphone 81 converts sound in the real space, user's voice, etc. into voice information and inputs it.
スピーカ82は、ユーザに必要な音声情報等を出力する。勿論、イヤホンや、ヘッドホンも接続可能であり、用途により使い分けることができることは言うまでもない。
The speaker 82 outputs voice information and the like necessary for the user. Of course, earphones and headphones can also be connected, and it goes without saying that they can be used properly depending on the application.
タッチセンサ91は、ディスプレイ71の表示画面上に積層して配置される。
The touch sensor 91 is stacked on the display screen of the display 71 .
操作キー92は、ボタンスイッチ等を並べて構成される。
The operation key 92 is configured by arranging button switches and the like.
タッチセンサ91及び操作キー92は、スマートフォン1に対する操作指示の入力を行う操作入力装置の一例であり、その他の操作入力装置でも良い。また、LAN通信器61や近距離無線通信器63を利用し、有線通信又は無線通信により接続された別体の携帯端末機器を用いてスマートフォン1の操作を行っても良い。
The touch sensor 91 and the operation key 92 are examples of operation input devices for inputting operation instructions to the smartphone 1, and may be other operation input devices. Alternatively, the smart phone 1 may be operated using a separate portable terminal device connected by wired communication or wireless communication using the LAN communication device 61 or the short-range wireless communication device 63 .
また、インカメラ72の撮像映像を解析し、ジェスチャなどの動作で、スマートフォン1の操作を行っても良い。
Alternatively, the image captured by the in-camera 72 may be analyzed, and the smartphone 1 may be operated by actions such as gestures.
図2は、スマートフォンに格納されるプログラムの機能ブロック図である。
Fig. 2 is a functional block diagram of the program stored in the smartphone.
不揮発性メモリ42は、処理プログラムとして、基本動作プログラム421、通話プログラム422を格納している。不揮発性メモリ42は更に各プログラムの実行に際して用いられる各種データがデータ格納領域430を含む。プロセッサ30は、不揮発性メモリ42に格納される各プログラムを読み出し、必要に応じてデータ格納領域430に格納されたデータを参照する。
The non-volatile memory 42 stores a basic operation program 421 and a call program 422 as processing programs. The nonvolatile memory 42 further includes a data storage area 430 for various data used when executing each program. The processor 30 reads each program stored in the nonvolatile memory 42 and refers to data stored in the data storage area 430 as necessary.
図2を参照して概略を説明するならば、プロセッサ30は、通話処理部4222が通話アプリアイコン22(図4参照)の押し下げ信号、又は他の通話装置からの着信信号を検出すると、マスク検出部4220の機能を実行する。マスク検出部4220は、第一実施形態では画像解析処理によりマスクの有無を検出するので、インカメラ72を起動し、撮像データを取得、その撮像データを基に画像解析処理を行い、マスク検出処理を行う。
2, the processor 30 performs mask detection when the call processing unit 4222 detects a press signal of the call application icon 22 (see FIG. 4) or an incoming call signal from another call device. Performs the functions of section 4220 . In the first embodiment, the mask detection unit 4220 detects the presence or absence of a mask by image analysis processing. Therefore, the in-camera 72 is activated, image data is acquired, image analysis processing is performed based on the image data, and mask detection processing is performed. I do.
マスク検出部4220は、マスク有無検出結果をブースト制御部4221に出力する。ブースト制御部4221は、マスク有無検出結果に応じたブースト制御信号をブースター95に出力する。ブースター95はマイク81から取得した通話音声に高域ブースト処理を行い通話処理部4222から通信器に出力、送信する。
The mask detection unit 4220 outputs the mask presence/absence detection result to the boost control unit 4221 . The boost control unit 4221 outputs a boost control signal to the booster 95 according to the mask presence/absence detection result. The booster 95 performs high-frequency boost processing on the call voice acquired from the microphone 81, and outputs and transmits the result from the call processing unit 4222 to the communication device.
第三実施形態では、マイク81が集音した通話音声を音声解析してマスク有無検出を行うので、その場合においては、マスク検出部4220はインカメラ72ではなく、マイク81から通話音声を得てマスク有無検出を行う。図2に示す各信号やデータの動きの詳細は、以下の説明で随時明らかにする。
In the third embodiment, voice analysis is performed on call voice collected by the microphone 81 to detect the presence or absence of a mask. Detect the presence or absence of a mask. Details of the movement of each signal and data shown in FIG. 2 will be clarified as needed in the following description.
モバイル通信ネットワーク5(図3参照)は、基本動作プログラム421、通話プログラム422をRAM43に展開して実行する。
The mobile communication network 5 (see FIG. 3) deploys the basic operation program 421 and the call program 422 in the RAM 43 and executes them.
データ格納領域430に格納される各種データは、主に通話プログラム422を実行するのに必要なデータである。例えば、通話者の顔が撮像された通話者顔画像(「登録顔データ431」という)、マスクデータベース(D/B)432、登録音声データ433、クラスデータ434があり、通話音声処理方法によって必要に応じて格納する。不揮発性メモリ42は、図示した様にひとつのメモリ媒体であっても良く、複数のメモリ媒体で構成しても良い。
Various data stored in the data storage area 430 are mainly data necessary for executing the call program 422 . For example, there is a caller face image (referred to as "registered face data 431") in which the caller's face is captured, a mask database (D/B) 432, registered voice data 433, and class data 434, which are necessary depending on the call voice processing method. Store accordingly. The nonvolatile memory 42 may be a single memory medium as shown, or may be composed of a plurality of memory media.
図3は、第一実施形態に係る通話システムの構成図である。通話者2と通話者2Aは、話し手にもなるし、聞き手にもなる。通話は、モバイル通信を利用して行う場合を説明するが、無線LANでインターネットプロトコル(IPプロトコルとも記す)を利用してもよい。また図3では、スマートフォン1とスマートフォン1Aのいずれかが本発明に係る通話装置であってもよく、スマートフォン1及びスマートフォン1Aの両方が本発明に係る通話装置であってもよい。
FIG. 3 is a configuration diagram of the call system according to the first embodiment. Caller 2 and caller 2A are both speakers and listeners. A call will be made using mobile communication, but Internet protocol (also referred to as IP protocol) may be used in a wireless LAN. Further, in FIG. 3, either the smartphone 1 or the smartphone 1A may be the calling device according to the present invention, or both the smartphone 1 and the smartphone 1A may be the calling devices according to the present invention.
更に図3にて、符号3、3Aはマスク、符号4、4Aはモバイル通信基地局、符号5はモバイル通信ネットワーク、符号6、6Aは無線LANのアクセスポイント、符号60、60Aは無線LAN信号、7は広域ネットワークである。モバイル通信基地局4、4Aはモバイル通信ネットワーク5の主要な構成要素であるとともに、広域ネットワーク7の構成要素でもある。
Further, in FIG. 3, reference numerals 3 and 3A are masks, reference numerals 4 and 4A are mobile communication base stations, reference numeral 5 is mobile communication networks, reference numerals 6 and 6A are wireless LAN access points, reference numerals 60 and 60A are wireless LAN signals, 7 is a wide area network. Mobile communication base stations 4 , 4 A are main components of mobile communication network 5 and also components of wide area network 7 .
通話者2は、スマートフォン1を用いて例えばダイヤルアップし、通話を開始する。通話のための信号は、モバイル通信基地局4に送信され、モバイル通信ネットワーク5、モバイル通信基地局4Aを経て、通話者2Aが保持するスマートフォン1Aに送信される。通話者2Aがスマートフォン1Aで着信許諾すると、通話者2と通話者2A間で通話が行われる。
Caller 2 uses smartphone 1 to, for example, dial up and start a call. A signal for a call is transmitted to the mobile communication base station 4, and transmitted to the smartphone 1A held by the caller 2A via the mobile communication network 5 and the mobile communication base station 4A. When the caller 2A accepts the incoming call using the smartphone 1A, a call is made between the caller 2 and the caller 2A.
一方、スマートフォン1とスマートフォン1Aは、広域ネットワーク7にも繋がっている。その経路には2種類あり、モバイル通信基地局4、4Aを介する経路、無線LAN信号60、60Aでアクセスポイント6、6Aを利用する経路である。
On the other hand, smartphone 1 and smartphone 1A are also connected to wide area network 7. There are two types of routes, a route via mobile communication base stations 4 and 4A and a route using access points 6 and 6A with wireless LAN signals 60 and 60A.
図4から図6は、スマートフォン1の表示画面を示す。以下では説明のため図3の通話者2が操作するスマートフォン1を例に挙げて説明するが、スマートフォン1に関する説明はスマートフォン1Aにも当てはまるものとする。
4 to 6 show the display screens of the smartphone 1. For the sake of explanation, the smartphone 1 operated by the caller 2 in FIG. 3 will be described below as an example, but the explanation regarding the smartphone 1 shall also apply to the smartphone 1A.
図4は、スマートフォン1のホーム画面20を示す図である。通話口83は、マイク81、スピーカ82は繋がる。操作キー92を押釦して、のホーム画面20を表示させる。
FIG. 4 is a diagram showing the home screen 20 of the smartphone 1. FIG. The call port 83 is connected to the microphone 81 and the speaker 82 . By pressing the operation key 92, the home screen 20 is displayed.
ホーム画面20には、時刻、位置情報、天気情報や検索ウィンドウが上部に配置されるが、これらは必須のものではない。ホーム画面20において、重要なのは、アプリケーションプログラムを起動するためのアイコン釦群21である。アプリケーションプログラムに対応したアイコン釦を押釦して、ホーム画面20からアプリケーションプログラムの画面に移行する。
On the home screen 20, time, location information, weather information, and a search window are arranged at the top, but these are not essential. What is important on the home screen 20 is an icon button group 21 for starting an application program. By pressing the icon button corresponding to the application program, the home screen 20 is shifted to the screen of the application program.
アイコン釦群21は通話プログラムを起動する通話アプリアイコン22を含む。通話者2、2Aが押釦して、通話プログラムを起動する。
The icon button group 21 includes a call application icon 22 that activates the call program. Caller 2, 2A pushes the button to start the call program.
図5は、通話プログラムで表示するダイヤル画面20aを示す図である。テンキー23とダイヤル開始釦24が表示され、通話者は、テンキー23を使って、通話相手の番号を入力し、ダイヤル開始釦24を押釦して、通話相手を呼び出す。この他、通話相手を呼び出す方法としては、電話帳を表示させ、電話帳から通話相手を選択するようにしてもよい。
FIG. 5 is a diagram showing the dial screen 20a displayed by the calling program. A numeric keypad 23 and a dial start button 24 are displayed, and the caller uses the numeric keypad 23 to input the number of the other party to call, and presses the dial start button 24 to call the other party. In addition, as a method of calling a calling party, a telephone directory may be displayed and a calling party may be selected from the telephone directory.
更に、通話プログラムを起動させるのは、通話者が通話アプリアイコン22を押釦する以外の場合がある。図6がその場合であり、通話の着信があった場合の表示画面20bを示す図である。
Furthermore, there are cases other than pressing the call application icon 22 by the caller to activate the call program. FIG. 6 shows such a case, showing the display screen 20b when there is an incoming call.
通話コールされた時の表示画面20bでは、通話相手を特定する情報27(例えば通話相手の名前、電話番号、非通知であることを示すの情報)であり、通話許諾釦25、及び通話拒絶釦26が表示される。通話の相手人から呼び出された時、相手人の名前を確認し、通話許諾釦25を押釦して、通話を開始させる。
On the display screen 20b when a call is made, there is information 27 specifying the calling party (for example, the calling party's name, telephone number, information indicating non-notification), a call acceptance button 25, and a call rejection button. 26 is displayed. When called by the other party of the call, confirm the name of the other party and press the call approval button 25 to start the call.
図5、図6のダイヤル画面20a、通話コールされたときの表示画面20bは、通話プログラムの初期画面に相当する。これらの初期画面において、通話者は表示画面を見る動作を行う。この時インカメラ72を起動して、通話者の顔部を撮像する。
The dial screen 20a in FIGS. 5 and 6 and the display screen 20b when a call is made correspond to the initial screen of the call program. In these initial screens, the caller performs an action of looking at the display screen. At this time, the in-camera 72 is activated to image the face of the caller.
この処理は、プロセッサ30が通話プログラムの通話アプリアイコン22の押し下げ信号の検出に応じて通話プログラム422を起動させると、通話プログラム422からインカメラ72による撮像指示信号が出力されることにより実現する。
This processing is realized by outputting an imaging instruction signal by the in-camera 72 from the calling program 422 when the calling program 422 is activated in response to the detection of the pressing signal of the calling application icon 22 of the calling program by the processor 30 .
図7は、第一実施形態に係る通話音声処理方法を示すフローチャートである。図1で記載した様に、通話プログラム422の各ステップは、スマートフォン1のプロセッサ30で実行される。
FIG. 7 is a flow chart showing the call voice processing method according to the first embodiment. As described in FIG. 1, each step of the call program 422 is executed by the processor 30 of the smart phone 1. FIG.
処理の開始ではスマートフォン1にホーム画面20が表示されている(S10)。通話者が通話アプリアイコン22(図4参照)を押し下げる場合(S11:YES)、又は、通話コールを着信した場合(図5参照)(S12:YES)、プロセッサ30は通話プログラム422を起動させる。ステップS11、S12の両方がNOの場合、ステップS10で待機する。
At the start of the process, the home screen 20 is displayed on the smartphone 1 (S10). If the caller presses the call application icon 22 (see FIG. 4) (S11: YES), or if a call is received (see FIG. 5) (S12: YES), the processor 30 activates the call program 422. If NO in both steps S11 and S12, the process waits in step S10.
S11、もしくはS12がYESの場合、通話プログラム422は、通話プロセス(S13~S17)と画像解析プロセス(S21~S24)とをマルチプロセスで実行する。
If S11 or S12 is YES, the call program 422 executes the call process (S13-S17) and the image analysis process (S21-S24) in multi-process.
画像解析プロセス(S20)においてマスク検出部4220は、インカメラ72を起動し(S21)、カメラ画像を取得する(S22)。マスク検出部4220は、カメラ画像に対して被写体検出処理を実行し、顔認識処理を実行する(S23)。マスク検出部4220は、顔画像にマスクが含まれるかを判定する(S24:マスク検出ステップに相当する。)。
In the image analysis process (S20), the mask detection unit 4220 activates the in-camera 72 (S21) and acquires the camera image (S22). The mask detection unit 4220 performs subject detection processing on the camera image and performs face recognition processing (S23). The mask detection unit 4220 determines whether the face image includes a mask (S24: corresponds to the mask detection step).
マスク有(S24:YES)と判定すると、ブースト実行を示すブースト制御信号をブースト制御部4221に出力する。マスク無(S24:NO)と判定すると、ブースト無し(ブースト不実行)を示すブースト制御信号をブースト制御部4221に出力する。
When it is determined that there is a mask (S24: YES), a boost control signal indicating execution of boost is output to boost control section 4221. If it is determined that there is no mask (S24: NO), a boost control signal indicating no boost (non-execution of boost) is output to boost control section 4221 .
通話プロセスでは、通話処理部4222がダイヤルされた番号に通話要求を発信し(S13)、通話相手から許諾された場合に通話を開始する(S15)。又は通話処理部4222は、通信相手からの通話要求の着信に応答し(S14)、通話を開始する(S15)。
In the call process, the call processing unit 4222 sends a call request to the dialed number (S13), and when the other party gives permission, the call starts (S15). Alternatively, the call processing unit 4222 responds to the incoming call request from the communication partner (S14), and starts the call (S15).
ブースター95は、ブースト制御信号にしたがってブースト制御処理を行う(S16:ブースト制御ステップに相当する。)。ブースター95は、ステップS24で出力されたブースト制御信号で、通話音域帯のうちの高域、具体的には2kHz~5kHzのブースト量をON又はOFF制御する。スマートフォン1は、ステップS16でブースト制御を行った通話音声を、モバイル通信ネットワーク5又は広域ネットワーク7に送信する(送信ステップに相当する)。
The booster 95 performs boost control processing according to the boost control signal (S16: corresponds to the boost control step). The booster 95 uses the boost control signal output in step S24 to turn ON or OFF the amount of boost in the high frequency band, specifically 2 kHz to 5 kHz. The smartphone 1 transmits the call voice that has undergone boost control in step S16 to the mobile communication network 5 or the wide area network 7 (corresponding to a transmission step).
通話処理部4222は、通話終了の判断を行い(S17)、YESで通話を切ってホーム画面に戻り(S10)、NOで通話を継続する。
The call processing unit 4222 determines the end of the call (S17), ends the call with YES and returns to the home screen (S10), and continues the call with NO.
なお画像解析プロセスS20は、通話プログラムの開始時に行い、マスク検出を実行するが、通話の実行中においても、図8のS23からS22に戻る破線に示すように継続してもよい。継続して実行することにより、通話途中で、マスクを着脱するような状態の変化を検知する。更に、S23の顔認識処理では、最初に認識した顔画像を登録顔データ431としてデータ格納領域430に保存しておいて、該登録顔データ431を参照するようにしてもよい。登録している顔画像と同一の顔画像に対してマスク検出を行うようにすることにより、通話者以外の顔を捉えた時に、誤ったマスク検出結果とならないようにする。
Note that the image analysis process S20 is performed at the start of the call program to perform mask detection, but it may be continued as indicated by the dashed line returning from S23 to S22 in FIG. 8 even during the execution of the call. By continuously executing this function, a change in state such as putting on or taking off a mask is detected during a call. Furthermore, in the face recognition processing of S23, the first recognized face image may be stored as the registered face data 431 in the data storage area 430, and the registered face data 431 may be referred to. To prevent an erroneous mask detection result from being obtained when the face of a person other than a caller is captured by performing mask detection on the same face image as a registered face image.
なお、上記のブースター95は、ハードウェア回路にて実現しているが、プロセッサ30にてストレージ40に記憶されたプログラムに従ってソフト処理にて行う事も可能である。
Although the booster 95 described above is realized by a hardware circuit, it is also possible for the processor 30 to perform software processing according to a program stored in the storage 40 .
以上説明したように、第一実施形態の通話装置及び通話音声処理方法によれば、電気通信による通話において、話し手のマスクの着用の有無及び高域ブーストを自動的に設定でき、聞き手は話し手のマスク着用の有無を気にすることなく、通話時の聞きづらさを改善できる。
As described above, according to the call device and call voice processing method of the first embodiment, it is possible to automatically set whether or not the speaker wears a mask and the high-frequency boost in a call by telecommunication. You can improve the difficulty of hearing during calls without worrying about whether you are wearing a mask.
[第二実施形態]
図8及び図9を参照して、第二実施形態について説明する。図8は、第二実施形態に係る通話システムの構成図である。 [Second embodiment]
A second embodiment will be described with reference to FIGS. 8 and 9. FIG. FIG. 8 is a configuration diagram of a call system according to the second embodiment.
図8及び図9を参照して、第二実施形態について説明する。図8は、第二実施形態に係る通話システムの構成図である。 [Second embodiment]
A second embodiment will be described with reference to FIGS. 8 and 9. FIG. FIG. 8 is a configuration diagram of a call system according to the second embodiment.
図8に示すように、第二実施形態では通話サービスサーバ8が広域ネットワーク7に接続される。通話サービスサーバ8は、マスクデータベース8aを備える。通話者2と通話者2A間で通話が行われるが、スマートフォン1(あるいはスマートフォン1Aも)は、通話サービスサーバ8のマスクデータベース8aを利用する。
As shown in FIG. 8, the call service server 8 is connected to the wide area network 7 in the second embodiment. The call service server 8 has a mask database 8a. A call is made between the caller 2 and the caller 2A, and the smartphone 1 (or the smartphone 1A as well) uses the mask database 8a of the call service server 8. FIG.
マスクデータベース8aは、通話サービスサーバ8が広域ネットワーク7にあるマスクの情報を収集して作成する。マスク情報としては、マスク画像、素材のデータ等であり、スマートフォン1のカメラで撮像する通話者の顔画像ののうちマスク画像を検索のキーとして、マスク形状、テクスチャの似通ったマスクデータを選択する。選択したマスクの素材等の情報からマスクを通した時の音声の減衰量を推定する。
The mask database 8a is created by the call service server 8 collecting information on masks in the wide area network 7. The mask information includes a mask image, material data, etc. Using the mask image as a search key from among the face images of the caller captured by the camera of the smartphone 1, mask data with similar mask shapes and textures is selected. . Based on information such as the material of the selected mask, the amount of sound attenuation when passing through the mask is estimated.
なお、マスクデータベース8aは、通話サービスサーバ8に格納し、これを参照するようにしてもよいが、図2に示すように予めスマートフォン1にダウンロードしてFROM42のデータ格納領域430に格納してもよい(図2のマスクデータベース432を参照)。
The mask database 8a may be stored in the call service server 8 and may be referred to. Alternatively, as shown in FIG. Good (see mask database 432 in FIG. 2).
図9は、第二実施形態に係る通話音声処理方法を示すフローチャートである。
FIG. 9 is a flowchart showing a call voice processing method according to the second embodiment.
図9に示すフローチャートの画像解析プロセスS20において、マスク検出部4220は、顔画像にマスクが含まれるかを判定した後(S24:YES)、マスクデータベース432もしくは8aを参照して、マスク素材等の情報を検索する(S25)。
In the image analysis process S20 of the flowchart shown in FIG. 9, the mask detection unit 4220 determines whether the face image includes a mask (S24: YES), and then refers to the mask database 432 or 8a to determine the mask material and the like. Search for information (S25).
複数のマスク素材に対して、それぞれのレベルのブースト量が対応する。検索結果は、マスク無(S24:NO、ブースト無)とともに、ブースト制御信号となる。この時、マスクデータベース432もしくは8aに一致する素材がない場合やマスクデータベース8aにアクセスできない場合には、デフォルトのブースト量とする。
The amount of boost for each level corresponds to multiple mask materials. The search result becomes a boost control signal together with no mask (S24: NO, no boost). At this time, if there is no matching material in the mask database 432 or 8a, or if the mask database 8a cannot be accessed, the default boost amount is used.
以上説明したように、第二実施形態の通話装置及び通話音声処理方法によれば、第一実施形態と同様の効果を有するとともに、マスク素材に対応したブースト量を設定でき、聞き手がより聞き取りやすい通話を実現できる。
As described above, according to the call device and the call voice processing method of the second embodiment, the effect similar to that of the first embodiment can be obtained, and the boost amount corresponding to the mask material can be set, making it easier for the listener to hear. A call can be made.
[第三実施形態]
図10~図12を参照して、第三実施形態について説明する。第三実施形態は、通話者のマスク着用の有無を検出するため通話者状態情報を収集する通話者状態収集デバイスとしてマイクを用い、通話者状態情報はマイクが集音した通話音声データ(通話者が発した音声をマイクが集音した音声)を用いる実施形態である。図10は、第三実施形態に係る通話音声処理方法を示すフローチャートである。処理フローチャートの各ステップは、スマートフォン1のプロセッサ30で実行される。 [Third Embodiment]
A third embodiment will be described with reference to FIGS. 10 to 12. FIG. In the third embodiment, a microphone is used as a caller status collection device that collects caller status information to detect whether or not the caller is wearing a mask. This is an embodiment using the voice collected by the microphone). FIG. 10 is a flow chart showing a call voice processing method according to the third embodiment. Each step of the processing flowchart is executed by theprocessor 30 of the smartphone 1 .
図10~図12を参照して、第三実施形態について説明する。第三実施形態は、通話者のマスク着用の有無を検出するため通話者状態情報を収集する通話者状態収集デバイスとしてマイクを用い、通話者状態情報はマイクが集音した通話音声データ(通話者が発した音声をマイクが集音した音声)を用いる実施形態である。図10は、第三実施形態に係る通話音声処理方法を示すフローチャートである。処理フローチャートの各ステップは、スマートフォン1のプロセッサ30で実行される。 [Third Embodiment]
A third embodiment will be described with reference to FIGS. 10 to 12. FIG. In the third embodiment, a microphone is used as a caller status collection device that collects caller status information to detect whether or not the caller is wearing a mask. This is an embodiment using the voice collected by the microphone). FIG. 10 is a flow chart showing a call voice processing method according to the third embodiment. Each step of the processing flowchart is executed by the
図10のフローチャートにおいて、図7のフローチャートと同一の処理を行うステップについては、同一の番号(S10~S17)を付与している。S30が、S31~S37から成る音声解析プロセスである。音声解析プロセスS30は、通話の各ステップ(S15、S16)とは、並行して実行するマルチプロセスであっても良い。
In the flowchart of FIG. 10, the same numbers (S10 to S17) are assigned to steps that perform the same processing as in the flowchart of FIG. S30 is the speech analysis process consisting of S31-S37. The speech analysis process S30 may be a multi-process that is executed in parallel with each step of the call (S15, S16).
音声解析プロセスS30は、通話者が通話を開始(S15)した直後の(電気通信手段で、通話の相手方と繋がった)タイミングで開始する。マスク検出部4220は、通話中の音声を取得し(S31)、“あいうえお”のような母音抽出する(S32)。次いで “かたさふす”のような子音を抽出する(S33)。母音と子音のレベル比を高低レベル比(=低域レベル/高域レベル)算出ステップで求め(S34)、高低レベル比のクラスを決定する(S35)。
The speech analysis process S30 is started at the timing immediately after the caller starts the call (S15) (connected to the other party of the call by means of telecommunication). The mask detection unit 4220 acquires the voice during a call (S31), and extracts vowels such as "AIUEO" (S32). Next, a consonant such as "katasafusu" is extracted (S33). A level ratio between vowels and consonants is obtained in a high-low level ratio (=low-frequency level/high-frequency level) calculation step (S34), and a class of the high-low level ratio is determined (S35).
図11は、音声解析処理のクラス化処理を示す説明図である。クラス化処理では、高低レベル比のデータは、幾つかのクラスに分類される。レベル比とクラスとを関係づけた図11のデータを、レベル比クラスデータという。図11ではレベル比クラスデータに更に音声減衰量を関連付ける。よって、通話音声のレベル比を求め、所属するクラスを決定するとそのクラスに関連付けられた音声減衰量が求まる。よってレベル比のクラスを決定するステップは、音声減衰量推定ステップに相当する。
FIG. 11 is an explanatory diagram showing classifying processing of speech analysis processing. In the classifying process, the high-low level ratio data are classified into several classes. The data in FIG. 11 in which the level ratio and the class are associated is called level ratio class data. In FIG. 11, the audio attenuation amount is further associated with the level ratio class data. Therefore, when the level ratio of the call voice is determined and the class to which it belongs is determined, the voice attenuation associated with that class is determined. Therefore, the step of determining the level ratio class corresponds to the speech attenuation amount estimation step.
図11にて、縦軸は高低レベル比、横軸は頻度である。算出された高低レベル比データがどれだけの頻度で発生したかをプロットする。プロットされたデータ群から、クラス0~クラス3の境界と代表値を定める。クラス0が高低レベル比の最も小さい、マスク無のデータであり、クラス3が高低レベル比が最も大きいクラスである。クラス0の代表値とクラス1~クラス3の代表値の比がブースト量1~ブースト量3となる。クラス分けは、データを機械学習でクラスタリングする教師無学習処理に基づいて行ってもよい。この時、マスクの素材等に対応して複数のクラスに分類される。高低レベル比は個人差、マスク着用の密着程度等の影響でばらつきを有する。したがって、高低レベル比そのものをブースト量に対応させた場合、ブースト量が不安定になり、通話の聞き難さが改善されないという結果になる場合がある。クラス化により個人差、マスク着用の密着程度等の影響を軽減させる。
In FIG. 11, the vertical axis is the high/low level ratio, and the horizontal axis is the frequency. Plot how often the calculated high-low level ratio data occurred. From the plotted data group, the boundaries and representative values of classes 0 to 3 are determined. Class 0 is unmasked data with the smallest high-to-low level ratio, and class 3 is the class with the largest high-to-low level ratio. The ratios of the representative value of class 0 and the representative values of classes 1 to 3 are boost amounts 1 to 3, respectively. Classification may be based on an unsupervised learning process that clusters data using machine learning. At this time, it is classified into a plurality of classes corresponding to the material of the mask. The ratio of high and low levels varies due to individual differences and the degree of tightness of wearing a mask. Therefore, if the high-to-low level ratio itself is made to correspond to the amount of boost, the amount of boost may become unstable and the difficulty of hearing speech may not be improved. Classification will reduce the effects of individual differences and the degree of closeness of wearing a mask.
マスク検出部4220は、クラスに応じてブースト量を算出し、ブースト制御信号を得る。ブースト制御信号をS16のブースト制御ステップに送出する。クラスの変化を直ちにブースト制御信号に反映させる以外に、時間的な推移を加味して反映させることにより、ブースト量の変化が頻繁に起こるのを軽減させてもよい。
The mask detection unit 4220 calculates the amount of boost according to the class and obtains the boost control signal. A boost control signal is sent to the boost control step of S16. In addition to immediately reflecting the change in class in the boost control signal, it is also possible to reduce frequent changes in the amount of boost by reflecting changes over time.
マスク検出部4220は、図2のデータ格納領域430内のクラスデータ434を更新する。通話中の音声の取得は、一通話中だけではなく、複数の通話において行われる。複数の通話において、マスク無、マスク有などの複数の形態での音声が取得され、クラスデータとして用いられる。この時、登録音声データ433を参照して、音声が誰の声かを判別して、個々の音声に対してクラスデータ434を取り扱ってもよい。
The mask detection unit 4220 updates the class data 434 in the data storage area 430 of FIG. Acquisition of voice during a call is performed not only during one call but also during multiple calls. In a plurality of calls, speech in a plurality of forms, such as without masking and with masking, is acquired and used as class data. At this time, the registration voice data 433 may be referenced to determine whose voice the voice is, and the class data 434 may be handled for each voice.
本実施形態によれば、マスク検出部4220がマスクの有無の検出に際して音声解析処理を用いる場合、インカメラ72による通話者の顔撮像を必要としない。よって、例えば、自動車の運転中であって、通話装置を収納ボックスに置き、通話者がマイクとイヤホンのヘッドセットを装着して、通話装置とヘッドセットを近接通信でペアリングして使用する状況であっても、自動マスク検出が行える。
According to this embodiment, when the mask detection unit 4220 uses voice analysis processing to detect the presence or absence of a mask, it is not necessary to capture the caller's face with the in-camera 72 . Therefore, for example, while driving a car, the communication device is placed in a storage box, the caller wears a headset with a microphone and earphones, and the communication device and headset are paired and used by proximity communication. However, automatic mask detection can be performed.
図12は、マスク検出処理に画像解析処理と音声解析処理を併用する処理の流れを示すのフローチャートである。
FIG. 12 is a flow chart showing the flow of processing that uses both image analysis processing and audio analysis processing for mask detection processing.
通話音声処理方法の起動時(S11:YES、もしくはS12:YES)に、画像解析プロセス(S20)を実行し、通話開始(S15)後の通話中に、音声解析プロセス(S30)を実行する。この結果、画像解析プロセス(S20)の即時性と音声解析処理(S30)による変化への対応性の両特性を満足させるマスク検出処理が達成される。
When the call voice processing method is activated (S11: YES or S12: YES), the image analysis process (S20) is executed, and the voice analysis process (S30) is executed during the call after the start of the call (S15). As a result, a mask detection process is achieved that satisfies both characteristics of immediacy of the image analysis process (S20) and responsiveness to changes due to the sound analysis process (S30).
以上説明したように、第三実施形態によれば、音声解析処理の利用により、インカメラ72が有効でない状況でも、マスク検出処理が可能となる。また通話中の、マスクの着脱という変化にも対応可能である。更に、画像解析処理との併用により、即時性、変化への対応性を満足させることができる。
As described above, according to the third embodiment, mask detection processing is possible even in situations where the in-camera 72 is not effective by using voice analysis processing. It is also possible to respond to changes such as wearing and removing a mask during a call. Furthermore, by using it together with image analysis processing, immediacy and responsiveness to changes can be satisfied.
[第四実施形態]
図13~図15で、第四実施形態について説明する。第四実施形態では、通話装置としてヘッドマウントディスプレイ(HMDとも記す)を用いる場合である。第四実施形態では、通話者状態収集デバイスとしてHMD100に備えられたマイク107を用い、通話者状態情報としてマイク107が集音した通話音声を用いる。 [Fourth embodiment]
A fourth embodiment will be described with reference to FIGS. 13 to 15. FIG. In the fourth embodiment, a head-mounted display (also referred to as HMD) is used as the calling device. In the fourth embodiment, themicrophone 107 provided in the HMD 100 is used as the caller state collection device, and the call voice collected by the microphone 107 is used as the caller state information.
図13~図15で、第四実施形態について説明する。第四実施形態では、通話装置としてヘッドマウントディスプレイ(HMDとも記す)を用いる場合である。第四実施形態では、通話者状態収集デバイスとしてHMD100に備えられたマイク107を用い、通話者状態情報としてマイク107が集音した通話音声を用いる。 [Fourth embodiment]
A fourth embodiment will be described with reference to FIGS. 13 to 15. FIG. In the fourth embodiment, a head-mounted display (also referred to as HMD) is used as the calling device. In the fourth embodiment, the
図13は、通話装置の一形態であるHMD100の外観図である。HMD100は、左つる130A、右つる130B、前フレーム130C、及びノーズパッド130Dを含むフレーム筐体を備える。前フレーム130Cには、カメラ101、測距センサ102、左プロジェクタ104A、右プロジェクタ104B、スクリーン104Cを備える。左つる130Aには、左スピーカ106A、マイク107を備える。右つる130Bにはコントローラ103を備える。通話者は、フレーム筐体を用いてHMD100を自身の顔部に装着する。なお、各構成要素の配置場所は図13と異なっていてもよい。
Fig. 13 is an external view of the HMD 100, which is one form of a communication device. HMD 100 comprises a frame housing including left temple 130A, right temple 130B, front frame 130C, and nose pads 130D. The front frame 130C includes a camera 101, a distance measuring sensor 102, a left projector 104A, a right projector 104B, and a screen 104C. Left temple 130A is provided with left speaker 106A and microphone 107 . A controller 103 is provided on the right temple 130B. A caller wears the HMD 100 on his or her face using the frame housing. It should be noted that the placement location of each component may be different from that shown in FIG. 13 .
左プロジェクタ104A、右プロジェクタ104B、スクリーン104Cは画像表示部を形成するが、これらに代えて、半透過型のスクリーン、もしくは非透過型ディスプレイであってもよい。半透過型のスクリーンでは、ユーザは前方背景を、半透過型スクリーンを透過して視認するが、非透過型ディスプレイでは、前方背景を撮像するカメラ画像を非透過型ディスプレイに表示することにより確認する。
The left projector 104A, right projector 104B, and screen 104C form an image display unit, but instead of these, a transflective screen or a non-transmissive display may be used. In the semi-transmissive screen, the user sees the front background through the semi-transmissive screen, but in the non-transmissive display, the camera image capturing the front background is displayed on the non-transmissive display for confirmation. .
カメラ101は、ユーザの視線前方の背景を撮像するように取り付けられており、測距センサ102は背景との距離を測定する。
The camera 101 is attached to capture an image of the background in front of the user's line of sight, and the distance sensor 102 measures the distance to the background.
図14は、HMD100のハードウェア構成図である。HMD100は、カメラ101、測距センサ102、左プロジェクタ104A、右プロジェクタ104B、半透過型のスクリーン104C、操作キー105、左スピーカ106A、右スピーカ106B、マイク107と、コントローラ103に内蔵されるストレージ140、GPS受信器151、地磁気センサ152、加速度センサ153、ジャイロセンサ154、LAN通信器161、モバイル通信器162、近距離無線通信器163を備え、各構成要素がバス108により互いに接続される。
FIG. 14 is a hardware configuration diagram of the HMD 100. FIG. The HMD 100 includes a camera 101, a distance measurement sensor 102, a left projector 104A, a right projector 104B, a transflective screen 104C, operation keys 105, a left speaker 106A, a right speaker 106B, a microphone 107, and a storage 140 built into the controller 103. , a GPS receiver 151 , a geomagnetic sensor 152 , an acceleration sensor 153 , a gyro sensor 154 , a LAN communication device 161 , a mobile communication device 162 and a short-range wireless communication device 163 .
ストレージ140はROM141、不揮発性メモリ142、及びRAM143を含む。
The storage 140 includes a ROM 141, a nonvolatile memory 142, and a RAM 143.
コントローラ103は、カメラ101で撮像したカメラ画像や測距センサ102で測定した距離画像を取り込み、コントローラ103の内部のストレージ140やプロセッサ113に供給する。また、コントローラ103は、GPS受信器151、ジャイロセンサ等のセンサ群、通信部を内蔵する。更にコントローラ103は、左プロジェクタ104A、右プロジェクタ104Bで投影する画像や左スピーカ106A及び右スピーカ106Bに出力する音を作成する。
The controller 103 captures the camera image captured by the camera 101 and the distance image measured by the distance sensor 102 and supplies them to the storage 140 and processor 113 inside the controller 103 . Further, the controller 103 incorporates a GPS receiver 151, a sensor group such as a gyro sensor, and a communication unit. Further, the controller 103 creates images to be projected by the left projector 104A and right projector 104B and sounds to be output to the left speaker 106A and right speaker 106B.
図15は、HMD100の不揮発性メモリ142に格納されるプログラム及びデータを示すブロック図である。不揮発性メモリ142には、基本動作プログラム521、HMDプログラム522、通話プログラム523及びデータ格納領域524が含まれる。プロセッサ113は、基本動作プログラム521、HMDプログラム522、通話プログラム523をRAM143に展開して、プロセッサ113で実行する。データ格納領域524には、基本動作プログラム521、HMDプログラム522、通話プログラム523を実行するのに必要なデータを格納する。通話プログラム523には、第一実施形態と同様、マスク検出部4220、ブースト制御部4221、通話処理部4222が含まれる。
FIG. 15 is a block diagram showing programs and data stored in the nonvolatile memory 142 of the HMD 100. FIG. The nonvolatile memory 142 includes a basic operation program 521 , HMD program 522 , call program 523 and data storage area 524 . The processor 113 develops the basic operation program 521, the HMD program 522, and the call program 523 in the RAM 143 and executes them in the processor 113. FIG. The data storage area 524 stores data necessary for executing the basic operation program 521 , the HMD program 522 , and the call program 523 . The call program 523 includes a mask detection unit 4220, a boost control unit 4221, and a call processing unit 4222, as in the first embodiment.
プロセッサ113は、通話処理部4222が通話アプリアイコン22の押し下げ信号、又は着信信号を検出すると、マスク検出部4220の機能を実行する。マスク検出部4220はマイク107からの通話音声を基に音声解析をして、マスク有無検出結果をブースト制御部4221に出力する。ブースト制御部4221は、マスク有無検出結果に応じたブースト制御信号をブースター95に出力する。ブースター95はマイク107から取得した通話音声に高域ブースト処理を行い通話処理部4222から通信器に出力、送信する。
The processor 113 executes the function of the mask detection section 4220 when the call processing section 4222 detects a signal of pressing the call application icon 22 or an incoming call signal. The mask detection unit 4220 analyzes the speech from the microphone 107 and outputs the result of mask detection to the boost control unit 4221 . The boost control unit 4221 outputs a boost control signal to the booster 95 according to the mask presence/absence detection result. The booster 95 performs high-frequency boost processing on the call voice acquired from the microphone 107, and outputs and transmits the result from the call processing unit 4222 to the communication device.
図16は、HMD100を用いた通話シーンを説明する図である。通話者2はHMD100を装着して、HMD100のカメラ101で前方背景を撮像する。また測距センサ102で前方背景との距離を測定する。これらのデータは、広域ネットワーク7に接続されたHMD管理サーバ9に送出される。
FIG. 16 is a diagram for explaining a call scene using the HMD 100. FIG. The caller 2 wears the HMD 100 and images the front background with the camera 101 of the HMD 100 . Also, the distance sensor 102 measures the distance to the background in front. These data are sent to the HMD management server 9 connected to the wide area network 7 .
管理人191は、情報通信デバイス192でこれらのデータを共有する。前方背景が、例えば設備の保守現場である場合、管理人191は、通話者2に向け指示を音声及び画像で伝える。
The administrator 191 shares these data with the information communication device 192. If the front background is, for example, a facility maintenance site, the manager 191 gives voice and image instructions to the caller 2 .
画像は3次元のAR(Argument Reality)オブジェクトであって、HMD100の前方背景の設備に接するように、3次元的に多重表示させる。
The image is a three-dimensional AR (Argument Reality) object, and is three-dimensionally displayed multiplexed so that it touches the facilities in the background in front of the HMD 100.
通話者2は管理人191の指示に応答する。応答の一形態に音声がある。指示に対する質問も音声によって行う。通話者2の音声は、IPプロトコルを利用した音声通信によって、無線LANを用いて行われる。該音声通信は、ダイヤル等の呼び出し手続きを特に設けず、常に通信状態を維持する。通信状態にある限り通話プログラムが実行され、マスク検出処理及びその結果に応じたブースト制御が実行される。マスク検出処理としては、音声解析処理が適用できるのは言うまでもないが、HMD100に通話者の顔を捉えるインカメラを付加することにより、画像解析処理を適用してもよい。また、スマートフォンのようなデバイスとペアリングして、スマートフォンのカメラを用いてもよい。
Caller 2 responds to the manager's 191 instructions. One form of response is voice. Questions about instructions are also given by voice. The voice of the caller 2 is performed using a wireless LAN by voice communication using IP protocol. The voice communication always maintains a communication state without providing a calling procedure such as dialing. As long as the communication state exists, the call program is executed, mask detection processing and boost control according to the result thereof are executed. It goes without saying that voice analysis processing can be applied as mask detection processing, but image analysis processing may also be applied by adding an in-camera that captures the face of the caller to the HMD 100 . Alternatively, the camera of the smartphone may be used by pairing with a device such as a smartphone.
以上説明したように、第四実施形態によれば、通話装置として、HMDが適用できる。
As described above, according to the fourth embodiment, an HMD can be applied as a communication device.
本発明は、上述の図1から図16で説明した実施形態に限定されるものではなく、ある実施形態の構成の一部を他の実施形態に置き換えることが可能である。また、ある実施形態の構成に、他の実施形態の構成を加えることも可能である。これらは全て本発明の範疇に属するものであり、更に文中や図中に現れる数値やメッセージ等もあくまで一例であり、異なるものを用いても本発明の効果を損なうものでない。
The present invention is not limited to the embodiments described in FIGS. 1 to 16 above, and part of the configuration of one embodiment can be replaced with another embodiment. It is also possible to add the configuration of another embodiment to the configuration of one embodiment. All of these belong to the scope of the present invention, and numerical values, messages, etc. appearing in texts and drawings are only examples, and even if different ones are used, the effect of the present invention is not impaired.
また、通信情報端末では、その端末の持ち主のみの通話にとどまらず、知人等が使用する場合もある。その場合に、通話時の顔認証や音声認証により通話者を特定し、個人に適用するクラスデータは複数人分用意し、通話者に最適な音声補正を行う事により、より改善した通話を実現することができる。
In addition, communication information terminals are used not only by the owner of the terminal, but also by acquaintances. In that case, the caller is identified by face recognition and voice recognition during the call, the class data to be applied to each individual is prepared for multiple people, and the optimal voice correction is performed for the caller, thereby realizing a further improved call. can do.
また上述では、通話装置としてスマートフォン1とHMD100とを例に挙げて説明したが、本発明が適用できるデバイスは、スマートフォン1、HMD100には限られない。
Also, in the above description, the smartphone 1 and the HMD 100 are used as communication devices, but devices to which the present invention can be applied are not limited to the smartphone 1 and the HMD 100.
例えば、カメラとマイクとをパーソナルコンピュータに接続したWeb会議システムにも通話音声処理方法を適用してもよい。この場合、予めパーソナルコンピュータに通話プログラム422をインストールしておく。そして、カメラでマイク前に位置する通話者を撮像し、その撮像データを基に通話者のマスク着用有無を検出し、その検出結果に応じて上述と同様のブースト制御を実行すればよい。
For example, the call voice processing method may be applied to a web conference system in which a camera and microphone are connected to a personal computer. In this case, the call program 422 is installed in the personal computer in advance. Then, the caller positioned in front of the microphone is imaged by a camera, whether or not the caller is wearing a mask is detected based on the captured image data, and boost control similar to that described above is executed according to the detection result.
また、カメラをOFFにしてWeb会議に参加する際には、マイクが集音した音声データを基に、パーソナルコンピュータのプロセッサが音声解析処理を実行してマスクの着用有無を検出し、その結果に応じてブースト制御を行ってもよい。Web会議では会議の参加者の希望によってカメラをON又はOFFのどちらの態様でも参加することができるので、通話プログラム422はカメラのON、OFF信号を基にマスク検出方法を画像解析処理と音声解析処理とに切り替えて実行してもよい。
Also, when participating in a web conference with the camera turned off, the processor of the personal computer performs audio analysis processing based on the audio data collected by the microphone to detect whether the mask is worn or not. Boost control may be performed accordingly. In a Web conference, participants can participate with the camera turned on or off according to their wishes. It may be executed by switching to processing.
このように、カメラ又はマイクの少なくとも一つと、通話機能を有するプロセッサ、通話プログラム422を格納するストレージ、ブースターを含むコントローラとを接続したデバイスであれば、本発明は適用できる。
In this way, the present invention can be applied to any device that connects at least one of a camera or a microphone, a processor with a call function, a storage that stores the call program 422, and a controller that includes a booster.
また、スマートフォン1では通話者を撮像するカメラとしてインカメラ72を用いたが、Web会議システムでは通話者を撮像するカメラとして、パーソナルコンピュータに着脱可能に接続するWebカメラや、パーソナルコンピュータに一体に形成された内蔵カメラでもよく、通話者を撮像するカメラの態様は、本発明が適用されるデバイスに応じて適宜選択される。
In addition, in the smartphone 1, the in-camera 72 is used as the camera for imaging the caller, but in the Web conference system, the camera for imaging the caller is a web camera that is detachably connected to the personal computer, or is formed integrally with the personal computer. A built-in camera may be used, and the aspect of the camera that captures the image of the caller is appropriately selected according to the device to which the present invention is applied.
また、発明の機能等は、それらの一部又は全部を、例えば集積回路で設計する等によりハードウェアで実装しても良い。また、マイクロプロセッサユニット、CPU等が動作プログラムを解釈して実行することによりソフトウェアで実装しても良い。また、ソフトウェアの実装範囲を限定するものでなく、ハードウェアとソフトウェアを併用しても良い。
In addition, some or all of the functions, etc. of the invention may be implemented in hardware, for example, by designing them in an integrated circuit. It may also be implemented in software by a microprocessor unit, CPU, etc. interpreting and executing an operating program. Moreover, the implementation range of software is not limited, and hardware and software may be used together.
1、1A :スマートフォン
2、2A :通話者
3 :マスク
4、4A :モバイル通信基地局
5 :モバイル通信ネットワーク
6、6A :アクセスポイント
7 :広域ネットワーク
8 :通話サービスサーバ
8a :マスクデータベース
9 :HMD管理サーバ
20 :ホーム画面
20a :ダイヤル画面
20b :表示画面
21 :アイコン釦群
22 :通話アプリアイコン
23 :テンキー
24 :ダイヤル開始釦
25 :通話許諾釦
26 :通話拒絶釦
27 :情報
30 :プロセッサ
31 :システムバス
40 :ストレージ
41 :ROM
42 :不揮発性メモリ
43 :RAM
51 :GPS受信器
52 :地磁気センサ
53 :加速度センサ
54 :ジャイロセンサ
60、60A:無線LAN信号
61 :LAN通信器
62 :モバイル通信器
63 :近距離無線通信器
71 :ディスプレイ
72 :インカメラ
73 :アウトカメラ
81 :マイク
82 :スピーカ
83 :通話口
91 :タッチセンサ
92 :操作キー
95 :ブースター
100 :HMD
101 :カメラ
102 :測距センサ
103 :コントローラ
104A :左プロジェクタ
104B :右プロジェクタ
104C :スクリーン
105 :操作キー
106A :左スピーカ
106B :右スピーカ
107 :マイク
108 :バス
113 :プロセッサ
130A :左つる
130B :右つる
130C :前フレーム
130D :ノーズパッド
140 :ストレージ
141 :ROM
142 :不揮発性メモリ
143 :RAM
151 :GPS受信器
152 :地磁気センサ
153 :加速度センサ
154 :ジャイロセンサ
161 :LAN通信器
162 :モバイル通信器
163 :近距離無線通信器
191 :管理人
192 :情報通信デバイス
421 :基本動作プログラム
422 :通話プログラム
430 :データ格納領域
431 :登録顔データ
432 :マスクデータベース
433 :登録音声データ
434 :クラスデータ
521 :基本動作プログラム
522 :HMDプログラム
523 :通話プログラム
524 :データ格納領域
4220 :マスク検出部
4221 :ブースト制御部
4222 :通話処理部 1, 1A: Smartphone 2, 2A: Caller 3: Mask 4, 4A: Mobile communication base station 5: Mobile communication network 6, 6A: Access point 7: Wide area network 8: Call service server 8a: Mask database 9: HMD management Server 20: Home screen 20a: Dial screen 20b: Display screen 21: Icon button group 22: Call application icon 23: Numeric keypad 24: Dial start button 25: Call acceptance button 26: Call rejection button 27: Information 30: Processor 31: System Bus 40: Storage 41: ROM
42: non-volatile memory 43: RAM
51: GPS receiver 52: geomagnetic sensor 53: acceleration sensor 54: gyro sensor 60, 60A: wireless LAN signal 61: LAN communication device 62: mobile communication device 63: short-range wireless communication device 71: display 72: in-camera 73: Out camera 81 : Microphone 82 : Speaker 83 : Call port 91 : Touch sensor 92 : Operation key 95 : Booster 100 : HMD
101: Camera 102: Ranging sensor 103:Controller 104A: Left projector 104B: Right projector 104C: Screen 105: Operation key 106A: Left speaker 106B: Right speaker 107: Microphone 108: Bus 113: Processor 130A: Left temple 130B: Right Temple 130C: Front frame 130D: Nose pad 140: Storage 141: ROM
142: non-volatile memory 143: RAM
151 : GPS receiver 152 : Geomagnetic sensor 153 : Acceleration sensor 154 : Gyro sensor 161 : LAN communication device 162 : Mobile communication device 163 : Near field communication device 191 : Manager 192 : Information communication device 421 : Basic operation program 422 : Call program 430 : Data storage area 431 : Registered face data 432 : Mask database 433 : Registered voice data 434 : Class data 521 : Basic operation program 522 : HMD program 523 : Call program 524 : Data storage area 4220 : Mask detector 4221 : Boost control unit 4222: call processing unit
2、2A :通話者
3 :マスク
4、4A :モバイル通信基地局
5 :モバイル通信ネットワーク
6、6A :アクセスポイント
7 :広域ネットワーク
8 :通話サービスサーバ
8a :マスクデータベース
9 :HMD管理サーバ
20 :ホーム画面
20a :ダイヤル画面
20b :表示画面
21 :アイコン釦群
22 :通話アプリアイコン
23 :テンキー
24 :ダイヤル開始釦
25 :通話許諾釦
26 :通話拒絶釦
27 :情報
30 :プロセッサ
31 :システムバス
40 :ストレージ
41 :ROM
42 :不揮発性メモリ
43 :RAM
51 :GPS受信器
52 :地磁気センサ
53 :加速度センサ
54 :ジャイロセンサ
60、60A:無線LAN信号
61 :LAN通信器
62 :モバイル通信器
63 :近距離無線通信器
71 :ディスプレイ
72 :インカメラ
73 :アウトカメラ
81 :マイク
82 :スピーカ
83 :通話口
91 :タッチセンサ
92 :操作キー
95 :ブースター
100 :HMD
101 :カメラ
102 :測距センサ
103 :コントローラ
104A :左プロジェクタ
104B :右プロジェクタ
104C :スクリーン
105 :操作キー
106A :左スピーカ
106B :右スピーカ
107 :マイク
108 :バス
113 :プロセッサ
130A :左つる
130B :右つる
130C :前フレーム
130D :ノーズパッド
140 :ストレージ
141 :ROM
142 :不揮発性メモリ
143 :RAM
151 :GPS受信器
152 :地磁気センサ
153 :加速度センサ
154 :ジャイロセンサ
161 :LAN通信器
162 :モバイル通信器
163 :近距離無線通信器
191 :管理人
192 :情報通信デバイス
421 :基本動作プログラム
422 :通話プログラム
430 :データ格納領域
431 :登録顔データ
432 :マスクデータベース
433 :登録音声データ
434 :クラスデータ
521 :基本動作プログラム
522 :HMDプログラム
523 :通話プログラム
524 :データ格納領域
4220 :マスク検出部
4221 :ブースト制御部
4222 :通話処理部 1, 1A:
42: non-volatile memory 43: RAM
51: GPS receiver 52: geomagnetic sensor 53: acceleration sensor 54:
101: Camera 102: Ranging sensor 103:
142: non-volatile memory 143: RAM
151 : GPS receiver 152 : Geomagnetic sensor 153 : Acceleration sensor 154 : Gyro sensor 161 : LAN communication device 162 : Mobile communication device 163 : Near field communication device 191 : Manager 192 : Information communication device 421 : Basic operation program 422 : Call program 430 : Data storage area 431 : Registered face data 432 : Mask database 433 : Registered voice data 434 : Class data 521 : Basic operation program 522 : HMD program 523 : Call program 524 : Data storage area 4220 : Mask detector 4221 : Boost control unit 4222: call processing unit
Claims (10)
- 電気通信を用いた通話装置であって、
前記通話装置は、マイクと、通話音声の送受信を行う通信器と、通話音域帯における高域レベルをブーストするブースターと、通話者のマスク着用の有無を検出するため通話者状態情報を収集する通話者状態収集デバイスと、前記マイク、前記通信器、前記ブースター、及び前記通話者状態収集デバイスと、のそれぞれに接続されたプロセッサと、を備え、
前記プロセッサは、
前記通話者状態情報を解析して前記通話者のマスク着用の有無を検出するマスク検出ステップと、
前記通話者のマスク着用の有無の検出結果に応じて、前記マイクが集音した通話音声の高域ブーストを制御するブースト制御信号を前記ブースターに出力するブースト制御ステップと、
前記ブースターによるブースト制御処理後の通話音声を、前記通信器から送信させる送信ステップと、
を実行することを特徴とする通話装置。 A communication device using telecommunication,
The communication device includes a microphone, a communication device that transmits and receives call voice, a booster that boosts the high frequency level in the communication range, and a call that collects caller state information to detect whether the caller is wearing a mask. a party state collection device; and a processor coupled to each of the microphone, the communicator, the booster, and the caller state collection device;
The processor
a mask detection step of analyzing the caller state information and detecting whether or not the caller is wearing a mask;
a boost control step of outputting to the booster a boost control signal for controlling a high-frequency boost of the call voice collected by the microphone, according to the detection result of whether or not the caller is wearing a mask;
a transmitting step of transmitting, from the communication device, the call voice after the boost control processing by the booster;
A communication device characterized by executing - 請求項1に記載の通話装置であって、
前記通話者状態収集デバイスは、カメラであり、前記記通話者状態情報は、前記カメラが撮像した撮像データであり、
前記プロセッサは、
前記マスク検出ステップにおいて、前記撮像データに対して顔認識処理を実行し、前記通話者の顔が撮像された通話者顔画像を認識し、当該通話者顔画像のうちマスクが撮像されたマスク画像が検出できるか否かに基づいて、前記マスク着用の有無を検出する、
ことを特徴とする通話装置。 The communication device according to claim 1,
The caller state collection device is a camera, the caller state information is imaging data captured by the camera,
The processor
In the mask detection step, face recognition processing is performed on the imaging data, a face image of the caller in which the face of the caller is captured is recognized, and a mask image in which the mask is captured in the face image of the caller is recognized. Detecting whether or not the mask is worn based on whether or not can be detected,
A communication device characterized by: - 請求項2に記載の通話装置であって、
前記プロセッサは、
マスクの種類と当該マスクを着用した際の音声減衰量とを関連付けたマスクデータベースを参照し、前記マスク画像に対応する音声減衰量を推定する音声減衰量推定ステップを更に実行し、
前記ブースト制御ステップにおいて、前記推定した音声減衰量を基にブースト量を決定し、前記ブースト制御信号を出力する、
ことを特徴とする通話装置。 The communication device according to claim 2,
The processor
Referencing a mask database that associates the type of mask with the amount of sound attenuation when wearing the mask, and further performing a sound attenuation amount estimation step of estimating the sound attenuation amount corresponding to the mask image,
In the boost control step, determining a boost amount based on the estimated audio attenuation amount and outputting the boost control signal;
A communication device characterized by: - 請求項2に記載の通話装置であって、
前記プロセッサは、
前記通話装置が前記通信器から通話要求を送信すると、もしくは前記通信器が他の通話装置からの通話要求を着信し前記通話装置が応答を開始すると、前記マスク検出ステップにおけるマスク検出処理を開始する、
ことを特徴とする通話装置。 The communication device according to claim 2,
The processor
When the communication device transmits a call request from the communication device, or when the communication device receives a call request from another communication device and the communication device starts responding, mask detection processing in the mask detection step is started. ,
A communication device characterized by: - 請求項4に記載の通話装置であって、
前記プロセッサは、
前記マスク検出ステップにおいて、前記マスク検出処理を開始した際の前記通話者顔画像を登録し、通話が継続中は前記カメラで新たな撮像データを生成し、前記新たな撮像データに対して顔認識処理を実行して新たな顔画像を認識すると、前記新たな顔画像と登録された前記通話者顔画像とが一致しているかを判定し、一致している場合には前記新たな顔画像に基づいて前記通話者のマスク着用の有無を再度検出し、一致していない場合は前記通話者顔画像に基づく前記通話者のマスク着用の有無の検出結果を継続して用いる、
ことを特徴とする通話装置。 The communication device according to claim 4,
The processor
In the mask detection step, the face image of the caller at the start of the mask detection process is registered, new image data is generated by the camera while the call is continuing, and face recognition is performed on the new image data. When the processing is executed to recognize a new face image, it is determined whether or not the new face image matches the registered face image of the caller. re-detect whether or not the caller is wearing a mask based on the face image of the caller, and continue to use the detection result of whether or not the caller is wearing a mask based on the face image of the caller.
A communication device characterized by: - 請求項1に記載の通話装置であって、
前記通話者状態収集デバイスは、前記マイクであり、前記通話者状態情報は、前記マイクが収集した通話音声であり、
前記プロセッサは、
前記マスク検出ステップにおいて、前記マイクが収集した通話音声に対して前記通話音域帯における低域レベルと前記高域レベルとを検出し、前記低域レベルと前記高域レベルのレベル比を算出し、当該レベル比に基づいて、前記マスク着用の有無を検出する、
ことを特徴とする通話装置。 The communication device according to claim 1,
The caller state collection device is the microphone, the caller state information is call audio collected by the microphone,
The processor
in the mask detection step, detecting the low-frequency level and the high-frequency level in the communication sound band of the call sound collected by the microphone, calculating a level ratio between the low-frequency level and the high-frequency level; Detecting whether or not the mask is worn based on the level ratio;
A communication device characterized by: - 請求項6に記載の通話装置であって、
前記プロセッサは、
前記マスク検出ステップにおいて、予めレベル比をクラス化したレベル比クラスデータを参照して、前記マイクが集音した通話音声のレベル比が属するクラスを決定し、
前記ブースト制御ステップにおいて、決定するクラスに応じてブースト量を決定し、前記ブースト制御信号を出力する、
ことを特徴とする通話装置。 The communication device according to claim 6,
The processor
In the mask detection step, referring to level ratio class data obtained by classifying level ratios in advance, determining a class to which the level ratio of the call voice collected by the microphone belongs,
In the boost control step, determining a boost amount according to the class to be determined and outputting the boost control signal;
A communication device characterized by: - 請求項2に記載の通話装置であって、
前記通話者状態収集デバイスとして前記マイクを更に用い、
前記通話者状態情報として前記マイクが収集した通話音声を更に用い、
前記プロセッサは、
前記マスク検出ステップにおいて、前記カメラが起動している場合は、前記撮像データに対して顔認識処理を実行し、前記通話者の顔が撮像された顔画像を認識し、当該顔画像のうちマスクが撮像されたマスク画像が検出できるか否かに基づいて、前記マスク着用の有無を検出し、前記カメラが起動している場合は、前記マイクが収集した通話音声の低域レベルと前記高域レベルとを検出し、前記低域レベルと前記高域レベルのレベル比を算出し、当該レベル比に基づいて、前記マスク着用の有無を検出する、
ことを特徴とする通話装置。 The communication device according to claim 2,
further using the microphone as the caller state collection device;
further using call voice collected by the microphone as the caller state information;
The processor
In the mask detection step, when the camera is activated, face recognition processing is performed on the imaged data, a face image obtained by capturing the face of the caller is recognized, and the face image is masked. Detects whether or not the mask is worn based on whether or not the captured mask image can be detected, and when the camera is activated, the low frequency level and the high frequency of the call voice collected by the microphone level, calculate the level ratio between the low-frequency level and the high-frequency level, and detect whether or not the mask is worn based on the level ratio;
A communication device characterized by: - 請求項1に記載の通話装置であって、
前記通信器は、IPネットワークを介して前記通話音声を含む通話データの送受信を行う、
ことを特徴とする通話装置。 The communication device according to claim 1,
The communication device transmits and receives call data including the call voice via an IP network.
A communication device characterized by: - 電気通信を用いた通話装置に備えられたプロセッサで実行される通話音声処理方法であって、
前記通話装置は、マイクと、通話音声の送受信を行う通信器と、通話音域帯における高域レベルをブーストするブースターと、通話者のマスク着用の有無を検出するため通話者状態情報を収集する通話者状態収集デバイスと、前記マイク、前記通信器、前記ブースター、及び前記通話者状態収集デバイスと、のそれぞれに接続されたプロセッサと、を備え、
前記プロセッサが、
前記通話者状態情報を解析して前記通話者のマスク着用の有無を検出し、前記マスク着用の有無の検出結果に応じて、前記マイクが集音した通話音声の高域ブーストを制御するブースト制御信号を前記ブースターに出力し、前記ブースターによるブースト制御処理後の通話音声を、前記通信器から送信させるステップを含む、
ことを特徴とする通話音声処理方法。
A call voice processing method executed by a processor provided in a call device using telecommunication,
The communication device includes a microphone, a communication device that transmits and receives call voice, a booster that boosts the high frequency level in the communication range, and a call that collects caller state information to detect whether the caller is wearing a mask. a party state collection device; and a processor coupled to each of the microphone, the communicator, the booster, and the caller state collection device;
the processor
Boost control for analyzing the state information of the caller to detect whether the caller is wearing a mask, and controlling a high frequency boost of the call voice collected by the microphone according to the detection result of the mask wearing. A step of outputting a signal to the booster and causing the communication device to transmit the call voice after boost control processing by the booster,
A call voice processing method characterized by:
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2023516954A JPWO2022230111A1 (en) | 2021-04-28 | 2021-04-28 | |
PCT/JP2021/016990 WO2022230111A1 (en) | 2021-04-28 | 2021-04-28 | Voice communication device and call voice processing method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2021/016990 WO2022230111A1 (en) | 2021-04-28 | 2021-04-28 | Voice communication device and call voice processing method |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022230111A1 true WO2022230111A1 (en) | 2022-11-03 |
Family
ID=83848101
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2021/016990 WO2022230111A1 (en) | 2021-04-28 | 2021-04-28 | Voice communication device and call voice processing method |
Country Status (2)
Country | Link |
---|---|
JP (1) | JPWO2022230111A1 (en) |
WO (1) | WO2022230111A1 (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2009265722A (en) * | 2008-04-22 | 2009-11-12 | Calsonic Kansei Corp | Face direction sensing device |
JP2015135358A (en) * | 2014-01-16 | 2015-07-27 | シャープ株式会社 | Voice input device and telephone set |
JP2016189135A (en) * | 2015-03-30 | 2016-11-04 | パイオニア株式会社 | Recognition device, recognition method, and recognition program |
-
2021
- 2021-04-28 JP JP2023516954A patent/JPWO2022230111A1/ja active Pending
- 2021-04-28 WO PCT/JP2021/016990 patent/WO2022230111A1/en active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2009265722A (en) * | 2008-04-22 | 2009-11-12 | Calsonic Kansei Corp | Face direction sensing device |
JP2015135358A (en) * | 2014-01-16 | 2015-07-27 | シャープ株式会社 | Voice input device and telephone set |
JP2016189135A (en) * | 2015-03-30 | 2016-11-04 | パイオニア株式会社 | Recognition device, recognition method, and recognition program |
Also Published As
Publication number | Publication date |
---|---|
JPWO2022230111A1 (en) | 2022-11-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9271077B2 (en) | Method and system for directional enhancement of sound using small microphone arrays | |
US9253303B2 (en) | Signal processing apparatus and storage medium | |
CN109360549B (en) | Data processing method, wearable device and device for data processing | |
CN111917980B (en) | Photographing control method and device, storage medium and electronic equipment | |
KR20110126277A (en) | Apparatus and method for improving a call voice quality in portable terminal | |
CN114697732B (en) | Shooting method, shooting system and electronic equipment | |
US10827455B1 (en) | Method and apparatus for sending a notification to a short-range wireless communication audio output device | |
CN112037825B (en) | Audio signal processing method and device and storage medium | |
WO2023151526A1 (en) | Audio acquisition method and apparatus, electronic device and peripheral component | |
JP2009218950A (en) | Portable terminal device with camera | |
JP6690749B2 (en) | Information processing apparatus, communication control method, and computer program | |
US20240147128A1 (en) | Mode control method, device for bluetooth headset, and computer readable storage medium | |
WO2022230111A1 (en) | Voice communication device and call voice processing method | |
CN112882622A (en) | Data processing method and device, terminal and storage medium | |
CN116033304B (en) | Audio output method, electronic equipment and readable storage medium | |
CN113596662B (en) | Method for suppressing howling, device for suppressing howling, earphone, and storage medium | |
EP3941027A1 (en) | Adaptive method and apparatus for intelligent terminal, and terminal | |
CN114120950A (en) | Human voice shielding method and electronic equipment | |
JP2015220684A (en) | Portable terminal equipment and lip reading processing program | |
JP2006025402A (en) | Voice communication apparatus | |
CN111694539A (en) | Method, apparatus and medium for switching between earpiece and speaker | |
CN114915682B (en) | Voice processing method, device, storage medium and chip | |
JP2006140596A (en) | Communication terminal | |
CN115410547B (en) | Audio processing method, device, electronic equipment and storage medium | |
EP4415381A1 (en) | Change of a mode for capturing immersive audio |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21939272 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2023516954 Country of ref document: JP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21939272 Country of ref document: EP Kind code of ref document: A1 |