WO2023134328A1 - Electronic device control method and apparatus, and electronic device - Google Patents

Electronic device control method and apparatus, and electronic device Download PDF

Info

Publication number
WO2023134328A1
WO2023134328A1 PCT/CN2022/136611 CN2022136611W WO2023134328A1 WO 2023134328 A1 WO2023134328 A1 WO 2023134328A1 CN 2022136611 W CN2022136611 W CN 2022136611W WO 2023134328 A1 WO2023134328 A1 WO 2023134328A1
Authority
WO
WIPO (PCT)
Prior art keywords
space
sound field
coefficient
size
electronic device
Prior art date
Application number
PCT/CN2022/136611
Other languages
French (fr)
Chinese (zh)
Inventor
孙晨
吕帅林
周小鹏
李伟
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023134328A1 publication Critical patent/WO2023134328A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/04Circuits for transducers, loudspeakers or microphones for correcting frequency response
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10KSOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
    • G10K15/00Acoustics not otherwise provided for
    • G10K15/08Arrangements for producing a reverberation or echo sound
    • G10K15/12Arrangements for producing a reverberation or echo sound using electronic time-delay networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Definitions

  • the present application relates to the technical field of artificial intelligence, and in particular to an electronic equipment control method, device and electronic equipment.
  • KWS wake-up word detection
  • ASR automatic speech recognition
  • EQ equalize
  • DRC dynamic range compression
  • the speech recognition model and audio playback parameters in the current electronic equipment are mainly obtained through laboratory scene simulation and acoustic environment simulation. In this way, in order to adapt to the general family scene, a model with better generalization will be selected. Or parameters are uniformly deployed to the device side to meet the experience of most users. However, each user's actual home environment space size, overall layout, and decoration materials are not the same, resulting in differences in the sound field environment. Due to the existence of this difference, the performance of the unified model may degrade in various environments, affecting user experience.
  • the present application provides an electronic equipment control method, device, electronic equipment, computer storage medium and computer program product, which can make the voice recognition model, audio playback parameters and other unified models in the electronic equipment adaptive to different sound field environments, avoiding the These unified models degrade performance in a variety of environments, improving user experience.
  • the present application provides a method for controlling an electronic device, the method comprising: acquiring a first image of a first space where the electronic device is located through a camera, and acquiring a first sound in the first space through a microphone; according to the first image, Determining the spatial parameters of the first space, and according to the first sound, determining the corresponding sound parameters of the first space, the spatial parameters include the first size of the first space and the material type of objects in the first space, and the sound parameters include The first reverberation coefficient of the reverberation size in the first space; according to the space parameter and the sound parameter, determine the sound field environment parameter, the sound field environment parameter includes at least one of the target reverberation coefficient, the target absorption coefficient and the target size of the first space , the target absorption coefficient is used to characterize the absorption coefficient corresponding to the material of the object in the first space; and the electronic device is controlled according to the environmental parameters of the sound field.
  • the results of visual and acoustic parameter estimation are mutually verified, so that the reliability of the obtained sound field environmental parameters is higher, which provides a basis for the subsequent evaluation of electronic equipment.
  • Taking control provides a solid foundation for maximizing the user experience. For example, it can effectively improve the voice recognition service, reduce the audio playback effect from being affected by the sound field environment, improve the wake-up rate of electronic equipment and the recognition rate of ASR, and significantly improve the listening effect.
  • the sound field environment parameter is the target reverberation coefficient
  • the sound field environment parameter is determined according to the space parameter and the sound parameter, which specifically includes: when the confidence degree of the first reverberation coefficient is greater than the first reverberation value, Determine the target reverberation coefficient as the first reverberation coefficient; when the confidence of the first reverberation coefficient is less than or equal to the first reverberation value and greater than the second reverberation value, according to the first size of the first space and the first
  • the material type of the object in the space obtains the second reverberation coefficient, and obtains the target reverberation coefficient according to the first reverberation coefficient and the second reverberation coefficient; when the confidence degree of the first reverberation coefficient is less than or equal to the second reverberation coefficient
  • the target reverberation coefficient is obtained according to the first reverberation coefficient, the second reverberation
  • the sound field environment parameter is a target absorption coefficient
  • determining the sound field environment parameter according to the space parameter and the sound parameter specifically includes: when the confidence degree of the first absorption coefficient is greater than the first absorption value, determining the target absorption The coefficient is the first absorption coefficient, wherein the first absorption coefficient is obtained according to the material type of the object in the first space; when the confidence of the first absorption coefficient is less than or equal to the first absorption value and greater than the second absorption value, according to The first size of the first space and the first reverberation coefficient are used to obtain the second absorption coefficient, and the target absorption coefficient is obtained according to the first absorption coefficient and the second absorption coefficient; when the confidence degree of the first absorption coefficient is less than or equal to the second When the absorption value is obtained, the target absorption coefficient is obtained according to the first absorption coefficient, the second absorption coefficient, and the confidence level of the first absorption coefficient.
  • the sound field environment parameter is the target size of the first space
  • determining the sound field environment parameter according to the space parameter and the sound parameter specifically includes: when the confidence degree of the first size of the first space is greater than the first size value, determine the target size as the first size, wherein the first size is obtained according to the material type of the object in the first space; when the confidence of the first size is less than or equal to the first size value and greater than the second size value , get the second size according to the first reverberation coefficient and the material type of the object in the first space, and get the target size according to the first size and the second size; when the confidence of the first size is less than or equal to the second size value, the target size is obtained based on the first size, the second size, and the confidence of the first size.
  • controlling the electronic device according to the sound field environment parameters specifically includes: determining a target speech recognition model that matches the sound field environment parameters according to the sound field environment parameters; updating the speech recognition model in the electronic device is the target speech recognition model.
  • the electronic device can adaptively optimize the speech recognition model according to the sound field environment parameters in the current environment, and use the speech recognition model that matches the current sound field environment for speech recognition, realizing the speech recognition function
  • the adaptation to the user's actual use environment avoids the degradation of the model recognition performance due to the difference in the sound field environment, provides a guarantee for a good speech recognition service experience, and improves the user experience.
  • controlling the electronic device according to the sound field environment parameters specifically includes: modeling the sound field environment where the electronic device is located according to the sound field environment parameters to obtain a space model of the first space; The model performs sound field simulation to obtain the first frequency response curve corresponding to the target position in the first space; based on the sound field environmental parameters, the second frequency response matching the sound field environmental parameters is determined from the preset ideal acoustic frequency response library Curve; fit the first frequency response curve to the second frequency response curve.
  • the target position may be a position where the loudness, sense of space, strength, and clarity of the sound are optimal in the current sound field environment.
  • controlling the electronic device according to the sound field environment parameter specifically includes: using the sound field environment parameter as an input of an enhancement algorithm for processing voice data in the electronic device.
  • the voice signal during the user's call is adaptively enhanced through the enhancement algorithm according to the input sound field environment parameters, so as to improve the call quality and user experience.
  • the present application provides an electronic equipment control device, including: at least one memory for storing programs; at least one processor for executing the programs stored in the memory, and when the programs stored in the memory are executed, the processor is used to Perform the method as provided in the first aspect.
  • the present application provides an electronic device, which includes at least one memory for storing programs and at least one processor for executing the programs stored in the memory. Wherein, when the program stored in the memory is executed, the processor is configured to execute the method as provided in the first aspect.
  • the present application provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program runs on an electronic device, the electronic device executes the method as provided in the first aspect.
  • the present application provides a computer program product, which, when the computer program product is run on an electronic device, causes the electronic device to execute the method as provided in the first aspect.
  • FIG. 1 is a schematic diagram of an application scenario provided by an embodiment of the present application
  • FIG. 2 is a schematic diagram of a hardware structure of an electronic device provided in an embodiment of the present application.
  • FIG. 3 is a schematic flowchart of a method for controlling an electronic device provided in an embodiment of the present application
  • FIG. 4 is a schematic diagram of steps for controlling electronic equipment according to sound field environmental parameters provided by an embodiment of the present application
  • FIG. 5 is another schematic diagram of steps for controlling electronic equipment according to sound field environmental parameters provided by the embodiment of the present application.
  • FIG. 6 is a schematic diagram of a hardware structure of an electronic equipment control device provided by an embodiment of the present application.
  • first and second and the like in the specification and claims herein are used to distinguish different objects, rather than to describe a specific order of objects.
  • first response message and the second response message are used to distinguish different response messages, rather than describing a specific order of the response messages.
  • words such as “exemplary” or “for example” are used as examples, illustrations or illustrations. Any embodiment or design scheme described as “exemplary” or “for example” in the embodiments of the present application shall not be interpreted as being more preferred or more advantageous than other embodiments or design schemes. Rather, the use of words such as “exemplary” or “such as” is intended to present related concepts in a concrete manner.
  • multiple means two or more, for example, multiple processing units refer to two or more processing units, etc.; multiple A component refers to two or more components or the like.
  • Fig. 1 shows a schematic diagram of an application scenario.
  • an electronic device 100 is provided in a room 200 , and a camera 110 , a microphone 120 and a speaker 130 may be provided on the electronic device 100 , but not limited thereto.
  • the electronic device 100 can recognize and respond to sounds in the room 200, and can also play sounds, and so on.
  • this application scene can be understood as an indoor scene.
  • the electronic device 100 may be, but not limited to, a smart TV.
  • the smart TV referred to in the embodiment of the present application may be a TV capable of interacting with mobile devices such as smart phones, tablet computers, etc., or other electronic devices with large screens, such as
  • the user interface in the smart phone can be transmitted wirelessly and presented on the smart TV, and the user's operations on the smart TV can also affect the smart phone.
  • the electronic device 100 shown in FIG. 1 can also be replaced with other electronic devices, and the replaced solution is still within the protection scope of the present application.
  • the electronic device 100 can be a mobile phone, a tablet computer, a desktop computer, a laptop computer, a handheld computer, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, and a cell phone, a personal Personal digital assistant (PDA), augmented reality (AR) device, virtual reality (VR) device, artificial intelligence (AI) device, wearable device and/or smart home device , the embodiment of the present application does not specifically limit the specific type of the electronic device 100 .
  • PDA personal Personal digital assistant
  • AR augmented reality
  • VR virtual reality
  • AI artificial intelligence
  • FIG. 2 shows a schematic structural diagram of the electronic device 100 .
  • the electronic device 100 may include: a camera 110 , a microphone 120 , a speaker 130 , a processor 140 , a memory 150 , a transceiver unit 160 and a display screen 170 .
  • the camera 110 is used to capture still images or videos.
  • the object generates an optical image through the lens and projects it to the photosensitive element.
  • the photosensitive element can be a charge coupled device (charge coupled device, CCD) or a complementary metal-oxide-semiconductor (complementary metal-oxide-semiconductor, CMOS) phototransistor.
  • CCD charge coupled device
  • CMOS complementary metal-oxide-semiconductor
  • the photosensitive element converts the optical signal into an electrical signal, and then transmits the electrical signal to the processor 140 for processing, so as to obtain image signals in standard RGB, YUV and other formats.
  • the electronic device 100 may include 1 or N cameras 110 , where N is a positive integer greater than 1. Exemplarily, the camera 110 may be used to collect images of the environment where the electronic device 100 is located. In some embodiments, the camera 110 and the electronic device 100 can be set separately or integrated together.
  • the microphone 120 also called “microphone” or “microphone”, is used to convert sound signals into electrical signals.
  • the electronic device 100 may be provided with at least one microphone 120 .
  • the electronic device 100 may be provided with two microphones 120, which may also implement a noise reduction function in addition to collecting sound signals.
  • the electronic device 100 can also be provided with three, four or more microphones 120 to collect sound signals, reduce noise, identify sound sources, and realize directional recording functions, etc.
  • the microphone 120 may be used to collect sound signals in the environment, such as the sound emitted by the user.
  • the microphone 120 and the electronic device 100 can be set separately or integrated together.
  • the speaker 130 also called “horn" is used to convert audio electrical signals into sound signals.
  • the electronic device 100 may play sound or the like through the speaker 130 .
  • the speaker 130 and the electronic device 100 can be set separately or integrated together.
  • Processor 140 may be a general purpose processor or a special purpose processor.
  • the processor 140 may include a central processing unit (central processing unit, CPU) and/or a baseband processor.
  • the baseband processor can be used to process communication data
  • the CPU can be used to implement corresponding control and processing functions, execute software programs, and process data of the software programs.
  • a program (or an instruction or code) may be stored in the memory 150, and the program may be executed by the processor 140, so that the processor 140 executes the method described in this solution.
  • data may also be stored in the memory 150 .
  • the processor 140 may also read data stored in the memory 150 (for example, a wake-up word detection model, a speech recognition model, equalization parameters, dynamic range control parameters, transmission channel delay parameters corresponding to each microphone, etc.), the Data can be stored at the same memory address as the program, or it can be stored at a different memory address than the program.
  • the processor 140 and the memory 150 can be set separately, or can be integrated together, for example, integrated on a single board or a system on chip (system on chip, SOC).
  • the electronic device 100 may further include a transceiver unit 160 .
  • the transceiving unit 160 may implement input (reception) and output (transmission) of signals.
  • the transceiving unit 160 may include a transceiver or a radio frequency chip.
  • the transceiver unit 160 may also include a communication interface.
  • the electronic device 100 can communicate with a server (not shown in the figure) through the transceiver unit 160, so as to obtain required data from the server, such as a speech recognition model and the like.
  • the electronic device 100 may further include a display screen 170 .
  • the display screen 170 can be used to display images, videos and the like.
  • the display screen 170 may include a display panel.
  • the display panel can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active matrix organic light emitting diode or an active matrix organic light emitting diode (active-matrix organic light emitting diode, AMOLED), flexible light-emitting diode (flex light-emitting diode, FLED), Miniled, MicroLed, Micro-oLed, quantum dot light emitting diodes (quantum dot light emitting diodes, QLED), etc.
  • the electronic device 100 may include 1 or N display screens, where N is a positive integer greater than 1.
  • the structure illustrated in the embodiment of the present application does not constitute a specific limitation on the electronic device 100 .
  • the electronic device 100 may include more or fewer components than shown in the figure, or combine certain components, or separate certain components, or arrange different components.
  • the illustrated components can be realized in hardware, software or a combination of software and hardware.
  • FIG. 3 shows a schematic flowchart of a method for controlling an electronic device.
  • the electronic device involved in FIG. 3 may be the electronic device 100 described above. It can be understood that the method can be executed by any device, device, platform, or device cluster that has computing and processing capabilities. For example, it may be executed by the electronic device 100 shown in FIG. 2 , or may be executed by devices such as a server. For ease of description, the execution of electronic equipment is taken as an example below. As shown in FIG. 3, the electronic equipment control method may include the following steps:
  • the electronic device may obtain a first image of the first space where the electronic device is located through a camera matched with the electronic device.
  • the first image when the method is executed by a device such as a server, after the electronic device obtains the first image, the first image may be sent to the device such as the server.
  • the electronic device may acquire the first sound in the first space through a microphone matched with it.
  • the first sound may be a sound made by a user.
  • the user before S301 and/or S302, the user can send an instruction to the electronic device to optimize the sound field environment parameters, and after the electronic device obtains the instruction, it can start the camera and/or microphone that matches it to obtain to the first image and/or first sound.
  • the electronic device activates its matching microphone, it can prompt the user to make a sound, such as voice prompt, image prompt, text prompt, etc., so that the microphone can collect the sound from the user.
  • the first sound when the method is executed by a device such as a server, after the electronic device acquires the first sound, the first sound may be sent to the device such as the server.
  • the first image may be input to a pre-trained neural network model related to image processing to obtain spatial parameters of the first space.
  • the space parameters may include: a first size of the first space and a material type of objects in the first space.
  • the first size may be the size of the first space (such as volume, etc.).
  • there may be one or more first images.
  • the first sound may be input into a pre-trained neural network model related to sound processing, so as to obtain sound parameters corresponding to the first space.
  • the sound parameter may include a first reverberation coefficient used to characterize the magnitude of reverberation in the first space.
  • the first reverberation coefficient may be T60, that is, the time required for the sound to decay by 60db in the sound field.
  • the sound field environment parameter may be determined according to the space parameter and the sound parameter.
  • the sound field environment parameters may include a target reverberation coefficient, a target absorption coefficient, and a target size of the first space.
  • the sound field environment parameters may also include equalization EQ parameters.
  • the determination of the target reverberation coefficient, the target absorption coefficient, and the target size of the first space according to the space parameter and the sound parameter will be described respectively below.
  • the first reverberation coefficient may be used as the target reverberation coefficient.
  • the confidence degree of the first reverberation coefficient may be output together by the neural network model that outputs the first reverberation coefficient.
  • the first reverberation value may be 0.9.
  • the second can be calculated from the first size of the first space and the material type of the object in the first space.
  • the reverberation coefficient is obtained from the first reverberation coefficient and the second reverberation coefficient.
  • the second reverberation value may be 0.6.
  • the formula for calculating the second reverberation coefficient may be:
  • RT is the reverberation coefficient
  • V is the size of the first space
  • S is the average value of the absorption coefficient of each material in the first space.
  • the absorption coefficient of the material in the first space can be obtained by querying the relationship table between the material type and the absorption coefficient of the material after obtaining the type of the material.
  • the target reverberation coefficient is obtained from the first reverberation coefficient and the second reverberation coefficient, specifically, an average value of the first reverberation coefficient and the second reverberation coefficient may be used as the target reverberation coefficient.
  • the target reverberation coefficient may be obtained from the first reverberation coefficient and the second reverberation coefficient.
  • the formula for obtaining the target reverberation coefficient may be:
  • RT mesh (m/2) ⁇ RT 1 +(1-m/2) ⁇ RT 2 (Formula 2)
  • RTme is the target reverberation coefficient
  • RT 1 is the first reverberation coefficient
  • RT 2 is the second reverberation coefficient
  • m is the confidence degree of the first reverberation coefficient (ie RT 1 ).
  • the first absorption coefficient may be used as the target absorption coefficient.
  • the confidence degree of the first absorption coefficient may be output together by the neural network model that outputs the first absorption coefficient.
  • the first absorption value may be 0.8.
  • the first absorption coefficient may be an average value of absorption coefficients corresponding to materials of objects in the first space.
  • the target absorption coefficient may be used to characterize the absorption coefficient corresponding to the material of the objects (all objects or objects collected by the camera) in the first space.
  • the second absorption coefficient can be calculated from the first reverberation coefficient and the size of the first space, and then the first The absorption coefficient and the second absorption coefficient to obtain the target absorption coefficient.
  • the second absorption value may be 0.5.
  • the first reverberation coefficient and the size of the first space may be calculated by the above-mentioned "Formula 1", so as to obtain the second absorption coefficient.
  • the target absorption coefficient is obtained from the first absorption coefficient and the second absorption coefficient, specifically, the average value of the first absorption coefficient and the second absorption coefficient may be used as the target absorption coefficient.
  • the target absorption coefficient can be obtained from the first absorption coefficient and the second absorption coefficient.
  • the formula for obtaining the target absorption coefficient may be:
  • Ab mesh is the target absorption coefficient
  • Ab 1 is the first absorption coefficient
  • Ab 2 is the second absorption coefficient
  • n is the confidence degree of the first absorption coefficient (ie Ab 1 ).
  • the first size may be used as the target size.
  • the confidence degree of the first size may be output together by the neural network model outputting the first size.
  • the first size value may be 0.8.
  • the second size can be calculated from the first reverberation coefficient and the first absorption coefficient, and then the first size and the second size Two size to get the target size.
  • the second size value may be 0.5.
  • the first reverberation coefficient and the first absorption coefficient may be calculated by the above "Formula 1", so as to obtain the second magnitude.
  • the target size is obtained from the first size and the second size, specifically, the average value of the first size and the second size may be used as the target size.
  • the target size can be obtained from the first size and the second size.
  • the formula for obtaining the target size may be:
  • Vmesh (p/2) ⁇ AV 1 +(1-p/2) ⁇ V 2 (Formula 3)
  • Vmesh is the target size
  • V 1 is the first size
  • V 2 is the second size
  • p is the confidence of the first size (ie V 1 ).
  • S306 may be executed.
  • the electronic device can be controlled according to the sound field environment parameter, so that the electronic device can better adapt to the current sound field environment (ie, the current space).
  • the voice recognition model matching the sound field environment parameter can be acquired from the sound field environment parameter. Specifically, as shown in Figure 4, the following steps are included:
  • the electronic device sends a first message to a server, where the first message includes a sound field environment parameter, and the first message is used to request acquisition of a speech recognition model matching the sound field environment parameter.
  • the server determines a target speech recognition model matching the sound field environment parameters according to the sound field environment parameters.
  • speech recognition models corresponding to different sound field environment parameters may be preset in the server. After the server acquires the sound field environment parameters sent by the electronic device, the server can determine the target speech recognition model from its preset speech recognition models based on the sound field environment parameters.
  • the weight value of each sub-parameter in the sound field environment parameters can be preset, and then, the matching degree between the sound field environment parameters acquired by the calculation server and the pre-stored sound field environment parameters is finally selected.
  • the speech recognition model corresponding to one of the sound field environment parameters is used as the target speech recognition model.
  • the matching degree can be calculated by the following "Formula 4", which is:
  • f is the matching degree
  • the RT terminal is the reverberation coefficient in the sound field environment parameters sent by the electronic equipment obtained by the server
  • the RT cloud is the reverberation coefficient in the sound field environment parameters preset in the server
  • the V terminal is the electronic equipment obtained by the server.
  • the size of the space in the sound field environment parameters sent by the device is the size of the space in the sound field environment parameters preset in the server
  • the Ab end is the absorption coefficient in the sound field environment parameters sent by the electronic device obtained by the server
  • the Ab cloud is The absorption coefficient in the sound field environment parameters preset in the server
  • the EQ terminal is the value of the EQ parameter in the sound field environment parameters sent by the electronic device obtained by the server
  • the EQ cloud is the value of the EQ parameter in the sound field environment parameters preset in the server
  • ⁇ , ⁇ , ⁇ , ⁇ , ⁇ are preset weight values respectively.
  • Each parameter in the formula can be selected according to actual conditions, and is not limited here.
  • the server sends a second message to the electronic device, where the second message includes the target speech recognition model.
  • the electronic device uses the target speech recognition model to perform speech recognition.
  • S401 to S404 may also be referred to as: determining a target speech recognition model matching the sound field environment parameters according to the sound field environment parameters; updating the speech recognition model in the electronic device to the target speech recognition model.
  • the electronic device when performing speech recognition, can adaptively optimize the speech recognition model according to the sound field environment parameters in the current environment, and use the speech recognition model that matches the current sound field environment for speech recognition, realizing the speech recognition function
  • the adaptation to the user's actual use environment avoids the degradation of the model recognition performance due to the difference in the sound field environment, provides a guarantee for a good speech recognition service experience, and improves the user experience.
  • the sound field distribution map of the environment where the electronic device is located can be calculated from the sound field environment parameters, and the audio playback effect can be adaptively adjusted according to the sound field distribution map and combined with artificial intelligence search algorithms. Parameter, so that the user's listening effect can reach the best. Specifically, as shown in Figure 5, the following steps are included:
  • space modeling it is possible, but not limited to, to carry out space modeling through preset sound field modeling methods (such as open source pyroom library, etc.), as well as the size of the space included in the sound field environment parameters and the absorption coefficient of each object in the space , so as to complete the modeling of the current sound field environment, so as to obtain the space model of the first space where the electronic device is located.
  • sound field modeling methods such as open source pyroom library, etc.
  • sound field simulation may be performed in the space model by using sound field simulation technology, so as to obtain the first frequency response curve corresponding to the target position.
  • the target position may be a position where the loudness, sense of space, strength, and clarity of the sound are optimal in the current sound field environment.
  • a second frequency response curve matching the sound field environment parameters may be determined from a preset ideal acoustic frequency response library.
  • the matching degree between the obtained sound field environment parameters and the sound field environment parameters corresponding to each frequency response curve in the ideal acoustic frequency response library may be determined through the aforementioned "Formula 4".
  • the difference between the first frequency response curve and the second frequency response curve it is possible to compare the difference between the first frequency response curve and the second frequency response curve, and then use the difference between the two to adjust EQ, DRC, delay parameters of transmission channels corresponding to each microphone, etc., so that The first frequency response curve is fitted to the second frequency response curve, so that the loudness, sense of space, strength, and clarity of the sound heard by the user at the target position are optimized, and the listening effect is the best.
  • the audio playing effect can be adjusted adaptively, so that the user's listening effect can be optimized and the user experience can be improved.
  • the sound field environment parameter when an electronic device is used to make a call, after the sound field environment parameter is obtained, the sound field environment parameter can be used as the input of the enhancement algorithm for processing voice data in the electronic device, and the enhancement algorithm can be used according to the input
  • the sound field environment parameters adaptively enhance the voice signal of the user during the call, so as to improve the call quality and enhance the user experience.
  • the results of visual and acoustic parameter estimation are mutually verified, so that the reliability of the acquired sound field environmental parameters is higher, which is a good foundation for the subsequent electronic research.
  • Device control provides a solid foundation, which can greatly improve user experience. For example, it can effectively improve the voice recognition service, reduce the audio playback effect from being affected by the sound field environment, improve the wake-up rate of electronic equipment and the recognition rate of ASR, and significantly improve the listening effect.
  • sequence numbers of the steps in the above embodiments do not mean the order of execution, and the execution order of each process should be determined by its functions and internal logic, and should not constitute any obligation for the implementation process of the embodiment of the present application. limited.
  • the steps in the foregoing embodiments may be selectively executed according to actual conditions, may be partially executed, or may be completely executed, which is not limited here.
  • FIG. 6 is a schematic structural diagram of an electronic equipment control device provided by an embodiment of the present application.
  • an electronic equipment control device 600 includes one or more processors 601 and an interface circuit 602 .
  • the electronic device control apparatus 600 may also include a bus 603 . in:
  • the processor 601 may be an integrated circuit chip and has signal processing capability. In the implementation process, each step of the above method may be completed by an integrated logic circuit of hardware in the processor 601 or instructions in the form of software.
  • the above-mentioned processor 601 can be a general-purpose processor, a neural network processor (Neural Network Processing Unit, NPU), a digital communicator (DSP), an application-specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable Logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • NPU neural network processor
  • DSP digital communicator
  • ASIC application-specific integrated circuit
  • FPGA field programmable gate array
  • a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
  • the interface circuit 602 can be used for sending or receiving data, instructions or information.
  • the processor 601 can process the data, instructions or other information received by the interface circuit 602 , and can send the processing completion information through the interface circuit 602 .
  • the electronic device control apparatus 600 further includes a memory, which may include a read-only memory and a random access memory, and provides operation instructions and data to the processor.
  • a portion of the memory may also include non-volatile random access memory (NVRAM).
  • NVRAM non-volatile random access memory
  • the memory may be coupled with the processor 601 .
  • the memory stores executable software modules or data structures
  • the processor 601 can execute corresponding operations by calling operation instructions stored in the memory (the operation instructions can be stored in the operating system).
  • the interface circuit 602 may be used to output an execution result of the processor 601 .
  • the corresponding functions of the processor 601 and the interface circuit 602 can be realized by hardware design, software design, or a combination of software and hardware, which is not limited here.
  • the electronic device control apparatus 600 may be applied in the electronic device 100 shown in FIG. 2 , but is not limited to.
  • processor in the embodiment of the present application may be a central processing unit (central processing unit, CPU), and may also be other general processors, digital signal processors (digital signal processor, DSP), application specific integrated circuits (application specific integrated circuit, ASIC), field programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, transistor logic devices, hardware components or any combination thereof.
  • CPU central processing unit
  • DSP digital signal processor
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • a general-purpose processor can be a microprocessor, or any conventional processor.
  • the method steps in the embodiments of the present application may be implemented by means of hardware, or may be implemented by means of a processor executing software instructions.
  • the software instructions can be composed of corresponding software modules, and the software modules can be stored in random access memory (random access memory, RAM), flash memory, read-only memory (read-only memory, ROM), programmable read-only memory (programmable rom) , PROM), erasable programmable read-only memory (erasable PROM, EPROM), electrically erasable programmable read-only memory (electrically EPROM, EEPROM), register, hard disk, mobile hard disk, CD-ROM or known in the art any other form of storage medium.
  • An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium.
  • the storage medium may also be a component of the processor.
  • the processor and storage medium can be located in the ASIC.
  • all or part of them may be implemented by software, hardware, firmware or any combination thereof.
  • software When implemented using software, it may be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present application will be generated in whole or in part.
  • the computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in or transmitted via a computer-readable storage medium.
  • the computer instructions may be transmitted from one website site, computer, server, or data center to another website site by wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) , computer, server or data center for transmission.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server or a data center integrated with one or more available media.
  • the available medium may be a magnetic medium (such as a floppy disk, a hard disk, or a magnetic tape), an optical medium (such as a DVD), or a semiconductor medium (such as a solid state disk (solid state disk, SSD)), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Signal Processing (AREA)
  • Quality & Reliability (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

An electronic device control method, relating to the technical field of AI. The method comprises: by means of a camera, obtaining a first image of a first space where an electronic device is located, and by means of a microphone, obtaining a first sound in the first space; determining spatial parameters of the first space according to the first image, and according to the first sound, determining sound parameters corresponding to the first space; determining sound field environment parameters according to the spatial parameters and the sound parameters, wherein the sound field environment parameters comprise at least one of a target reverberation coefficient, a target absorption coefficient and the target size of the first space, and the target absorption coefficient is used for representing an absorption coefficient corresponding to the material of an object in the first space; and controlling the electronic device according to the sound field environment parameters. Therefore, unified models such as a voice recognition model, an audio playback parameters and the like in the electronic device may adapt to different sound field environments, and performance degradation of the unified models in various different environments may be prevented.

Description

一种电子设备控制方法、装置及电子设备Electronic device control method, device and electronic device
本申请要求于2022年1月14日提交中国国家知识产权局、申请号为202210042081.6、申请名称为“一种电子设备控制方法、装置及电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application filed with the State Intellectual Property Office of China on January 14, 2022, with the application number 202210042081.6 and the application name "A method, device, and electronic equipment for controlling electronic equipment", the entire content of which is passed References are incorporated in this application.
技术领域technical field
本申请涉及人工智能技术领域,尤其涉及一种电子设备控制方法、装置及电子设备。The present application relates to the technical field of artificial intelligence, and in particular to an electronic equipment control method, device and electronic equipment.
背景技术Background technique
智慧屏、智能音箱等电子设备正快速进入千家万户,人们可以通过这些设备来观看电视节目或者收听音乐等。为了使用户更加便捷地使用这些设备,通常会预置一些常用的语音识别模型、音频播放声学参数等到设备内,如唤醒词检测(keyword spotting,KWS)模型、语音识别(automatic speech recognition,ASR)模型、均衡(equalize,EQ)参数、动态范围控制(dynamic range compression,DRC)参数、各个拾音器(比如麦克风等)对应的传输通道时延参数等。Electronic devices such as smart screens and smart speakers are rapidly entering thousands of households. People can use these devices to watch TV programs or listen to music. In order to make it easier for users to use these devices, some commonly used speech recognition models, audio playback acoustic parameters, etc. are usually preset into the device, such as wake-up word detection (keyword spotting, KWS) model, speech recognition (automatic speech recognition, ASR) Model, equalize (EQ) parameters, dynamic range compression (DRC) parameters, transmission channel delay parameters corresponding to each pickup (such as microphone, etc.).
当前电子设备内的语音识别模型、音频播放参数等主要是通过实验室场景仿真、声学环境模拟等方式来调试获取的,这种方式为了适应一般的家庭场景,会选择泛化性较好的模型或参数统一部署到端侧,以满足大多数用户的使用体验。但每个用户实际的家庭环境空间大小、整体布局,装修材料不尽相同,从而导致了声场环境的差异性。由于这种差异性的存在,统一的模型在各种不同的环境可能会发生性能退化,影响用户体验。The speech recognition model and audio playback parameters in the current electronic equipment are mainly obtained through laboratory scene simulation and acoustic environment simulation. In this way, in order to adapt to the general family scene, a model with better generalization will be selected. Or parameters are uniformly deployed to the device side to meet the experience of most users. However, each user's actual home environment space size, overall layout, and decoration materials are not the same, resulting in differences in the sound field environment. Due to the existence of this difference, the performance of the unified model may degrade in various environments, affecting user experience.
发明内容Contents of the invention
本申请提供了一种电子设备控制方法、装置、电子设备、计算机存储介质和计算机程序产品,能够使得电子设备内的语音识别模型、音频播放参数等统一模型可以自适应不同的声场环境,避免了这些统一的模型在各种不同的环境下发生性能退化,提升了用户体验。The present application provides an electronic equipment control method, device, electronic equipment, computer storage medium and computer program product, which can make the voice recognition model, audio playback parameters and other unified models in the electronic equipment adaptive to different sound field environments, avoiding the These unified models degrade performance in a variety of environments, improving user experience.
第一方面,本申请提供一种电子设备控制方法,方法包括:通过摄像头获取电子设备所处第一空间的第一图像,以及通过麦克风获取第一空间中的第一声音;根据第一图像,确定第一空间的空间参数,以及根据第一声音,确定第一空间对应的声音参数,空间参数包括第一空间的第一大小和第一空间内的物体的材料类型,声音参数包括用于表征第一空间中混响大小的第一混响系数;根据空间参数和声音参数,确定声场环境参数,声场环境参数包括目标混响系数、目标吸收系数和第一空间的目标大小中的至少一种,目标吸收系数用于表征第一空间内的物体的材料对应的吸收系数;根据声场环境参数,对电子设备进行控制。In a first aspect, the present application provides a method for controlling an electronic device, the method comprising: acquiring a first image of a first space where the electronic device is located through a camera, and acquiring a first sound in the first space through a microphone; according to the first image, Determining the spatial parameters of the first space, and according to the first sound, determining the corresponding sound parameters of the first space, the spatial parameters include the first size of the first space and the material type of objects in the first space, and the sound parameters include The first reverberation coefficient of the reverberation size in the first space; according to the space parameter and the sound parameter, determine the sound field environment parameter, the sound field environment parameter includes at least one of the target reverberation coefficient, the target absorption coefficient and the target size of the first space , the target absorption coefficient is used to characterize the absorption coefficient corresponding to the material of the object in the first space; and the electronic device is controlled according to the environmental parameters of the sound field.
这样,通过视觉和声学多模态结合的方式,相互校验视觉和声学参数估计的结果(即空间参数和声音参数),使得获取到的声场环境参数的可靠性更高,为后续对电子设备进行控制提供了坚实的基础,从而可以较大程度提升用户体验。比如:可以有效提升语音识别服务、减小音频播放效果受声场环境的影响,提升电子设备的唤醒率和ASR的识别率,以及明显改善听音效果。In this way, through the combination of visual and acoustic multi-modality, the results of visual and acoustic parameter estimation (that is, spatial parameters and sound parameters) are mutually verified, so that the reliability of the obtained sound field environmental parameters is higher, which provides a basis for the subsequent evaluation of electronic equipment. Taking control provides a solid foundation for maximizing the user experience. For example, it can effectively improve the voice recognition service, reduce the audio playback effect from being affected by the sound field environment, improve the wake-up rate of electronic equipment and the recognition rate of ASR, and significantly improve the listening effect.
在一种可能的实现方式中,声场环境参数为目标混响系数,根据空间参数和声音参数, 确定声场环境参数,具体包括:当第一混响系数的置信度大于第一混响值时,确定目标混响系数为第一混响系数;当第一混响系数的置信度小于或等于第一混响值,且大于第二混响值时,根据第一空间的第一大小和第一空间内的物体的材料类型,得到第二混响系数,以及根据第一混响系数和第二混响系数,得到目标混响系数;当第一混响系数的置信度小于或等于第二混响值时,根据第一混响系数、第二混响系数和第一混响系数的置信度,得到目标混响系数。In a possible implementation manner, the sound field environment parameter is the target reverberation coefficient, and the sound field environment parameter is determined according to the space parameter and the sound parameter, which specifically includes: when the confidence degree of the first reverberation coefficient is greater than the first reverberation value, Determine the target reverberation coefficient as the first reverberation coefficient; when the confidence of the first reverberation coefficient is less than or equal to the first reverberation value and greater than the second reverberation value, according to the first size of the first space and the first The material type of the object in the space obtains the second reverberation coefficient, and obtains the target reverberation coefficient according to the first reverberation coefficient and the second reverberation coefficient; when the confidence degree of the first reverberation coefficient is less than or equal to the second reverberation coefficient When the reverberation value is used, the target reverberation coefficient is obtained according to the first reverberation coefficient, the second reverberation coefficient, and the confidence degree of the first reverberation coefficient.
在一种可能的实现方式中,声场环境参数为目标吸收系数,根据空间参数和声音参数,确定声场环境参数,具体包括:当第一吸收系数的置信度大于第一吸收值时,确定目标吸收系数为第一吸收系数,其中,第一吸收系数根据第一空间内的物体的材料类型得到;当第一吸收系数的置信度小于或等于第一吸收值,且大于第二吸收值时,根据第一空间的第一大小和第一混响系数,得到第二吸收系数,以及根据第一吸收系数和第二吸收系数,得到目标吸收系数;当第一吸收系数的置信度小于或等于第二吸收值时,根据第一吸收系数、第二吸收系数和第一吸收系数的置信度,得到目标吸收系数。In a possible implementation manner, the sound field environment parameter is a target absorption coefficient, and determining the sound field environment parameter according to the space parameter and the sound parameter specifically includes: when the confidence degree of the first absorption coefficient is greater than the first absorption value, determining the target absorption The coefficient is the first absorption coefficient, wherein the first absorption coefficient is obtained according to the material type of the object in the first space; when the confidence of the first absorption coefficient is less than or equal to the first absorption value and greater than the second absorption value, according to The first size of the first space and the first reverberation coefficient are used to obtain the second absorption coefficient, and the target absorption coefficient is obtained according to the first absorption coefficient and the second absorption coefficient; when the confidence degree of the first absorption coefficient is less than or equal to the second When the absorption value is obtained, the target absorption coefficient is obtained according to the first absorption coefficient, the second absorption coefficient, and the confidence level of the first absorption coefficient.
在一种可能的实现方式中,声场环境参数为第一空间的目标大小,根据空间参数和声音参数,确定声场环境参数,具体包括:当第一空间的第一大小的置信度大于第一尺寸值时,确定目标大小为第一大小,其中,第一大小根据第一空间内的物体的材料类型得到;当第一大小的置信度小于或等于第一尺寸值,且大于第二尺寸值时,根据第一混响系数和第一空间内的物体的材料类型,得到第二大小,以及根据第一大小和第二大小,得到目标大小;当第一大小的置信度小于或等于第二尺寸值时,根据第一大小、第二大小和第一大小的置信度,得到目标大小。In a possible implementation manner, the sound field environment parameter is the target size of the first space, and determining the sound field environment parameter according to the space parameter and the sound parameter specifically includes: when the confidence degree of the first size of the first space is greater than the first size value, determine the target size as the first size, wherein the first size is obtained according to the material type of the object in the first space; when the confidence of the first size is less than or equal to the first size value and greater than the second size value , get the second size according to the first reverberation coefficient and the material type of the object in the first space, and get the target size according to the first size and the second size; when the confidence of the first size is less than or equal to the second size value, the target size is obtained based on the first size, the second size, and the confidence of the first size.
在一种可能的实现方式中,根据声场环境参数,对电子设备进行控制,具体包括:根据声场环境参数,确定与声场环境参数相匹配的目标语音识别模型;将电子设备中的语音识别模型更新为目标语音识别模型。这样,电子设备即可以在进行语音识别时,根据当前的环境中的声场环境参数自适应优化语音识别模型,以及使用与当前的声场环境相匹配的语音识别模型进行语音识别,实现了语音识别功能对用户实际使用环境的自适应,避免了由于声场环境差异导致模型识别性能退化的情况,为良好的语音识别服务体验提供了保障,改善用户的使用体验。In a possible implementation manner, controlling the electronic device according to the sound field environment parameters specifically includes: determining a target speech recognition model that matches the sound field environment parameters according to the sound field environment parameters; updating the speech recognition model in the electronic device is the target speech recognition model. In this way, when performing speech recognition, the electronic device can adaptively optimize the speech recognition model according to the sound field environment parameters in the current environment, and use the speech recognition model that matches the current sound field environment for speech recognition, realizing the speech recognition function The adaptation to the user's actual use environment avoids the degradation of the model recognition performance due to the difference in the sound field environment, provides a guarantee for a good speech recognition service experience, and improves the user experience.
在一种可能的实现方式中,根据声场环境参数,对电子设备进行控制,具体包括:根据声场环境参数,对电子设备所处的声场环境进行建模,得到第一空间的空间模型;基于空间模型进行声场模拟,得到位于第一空间中目标位置处对应的第一频响曲线;基于声场环境参数,从预置的理想声学频响库中确定出与声场环境参数相匹配的第二频响曲线;将第一频响曲线拟合为第二频响曲线。这样,当电子设备在播放声音时,即可以对音频播放效果进行自适应调参,从而使得用户的听音效果达到最佳,提升用户体验。示例性的,目标位置可以为在当前的声场环境下声音的响度、空间感、力度、清晰度均最优的位置。In a possible implementation manner, controlling the electronic device according to the sound field environment parameters specifically includes: modeling the sound field environment where the electronic device is located according to the sound field environment parameters to obtain a space model of the first space; The model performs sound field simulation to obtain the first frequency response curve corresponding to the target position in the first space; based on the sound field environmental parameters, the second frequency response matching the sound field environmental parameters is determined from the preset ideal acoustic frequency response library Curve; fit the first frequency response curve to the second frequency response curve. In this way, when the electronic device is playing sound, the audio playing effect can be adjusted adaptively, so that the user's listening effect can be optimized and the user experience can be improved. Exemplarily, the target position may be a position where the loudness, sense of space, strength, and clarity of the sound are optimal in the current sound field environment.
在一种可能的实现方式中,根据声场环境参数,对电子设备进行控制,具体包括:将声场环境参数作为电子设备中对语音数据进行处理的增强算法的输入。这样,在用户通过电子设备进行语音通话时,通过增强算法根据输入的声场环境参数对用户通话时语音信号进行自适应增强,以改善通话质量,提升了户体验。In a possible implementation manner, controlling the electronic device according to the sound field environment parameter specifically includes: using the sound field environment parameter as an input of an enhancement algorithm for processing voice data in the electronic device. In this way, when the user makes a voice call through the electronic device, the voice signal during the user's call is adaptively enhanced through the enhancement algorithm according to the input sound field environment parameters, so as to improve the call quality and user experience.
第二方面,本申请提供一种电子设备控制装置,包括:至少一个存储器,用于存储程序;至少一个处理器,用于执行存储器存储的程序,当存储器存储的程序被执行时,处理器用于 执行如第一方面中所提供的方法。In a second aspect, the present application provides an electronic equipment control device, including: at least one memory for storing programs; at least one processor for executing the programs stored in the memory, and when the programs stored in the memory are executed, the processor is used to Perform the method as provided in the first aspect.
第三方面,本申请提供一种电子设备,该电子设备包括至少一个用于存储程序的存储器和至少一个用于执行存储器存储的程序的处理器。其中,当存储器存储的程序被执行时,处理器用于执行如第一方面中所提供的方法。In a third aspect, the present application provides an electronic device, which includes at least one memory for storing programs and at least one processor for executing the programs stored in the memory. Wherein, when the program stored in the memory is executed, the processor is configured to execute the method as provided in the first aspect.
第四方面,本申请提供一种计算机可读存储介质,计算机可读存储介质存储有计算机程序,当计算机程序在电子设备上运行时,使得电子设备执行如第一方面中所提供的方法。In a fourth aspect, the present application provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program runs on an electronic device, the electronic device executes the method as provided in the first aspect.
第五方面,本申请提供一种计算机程序产品,当计算机程序产品在电子设备上运行时,使得电子设备执行如第一方面中所提供的方法。In a fifth aspect, the present application provides a computer program product, which, when the computer program product is run on an electronic device, causes the electronic device to execute the method as provided in the first aspect.
可以理解的是,上述第二方面至第五方面的有益效果可以参见上述第一方面中的相关描述,在此不再赘述。It can be understood that, for the beneficial effects of the above-mentioned second aspect to the fifth aspect, reference can be made to the relevant description in the above-mentioned first aspect, which will not be repeated here.
附图说明Description of drawings
下面对实施例或现有技术描述中所需使用的附图作简单地介绍。The following briefly introduces the drawings used in the embodiments or the description of the prior art.
图1是本申请实施例提供的一种应用场景的示意图;FIG. 1 is a schematic diagram of an application scenario provided by an embodiment of the present application;
图2是本申请实施例提供的一种电子设备的硬件结构示意图;FIG. 2 is a schematic diagram of a hardware structure of an electronic device provided in an embodiment of the present application;
图3是本申请实施例提供的一种电子设备控制方法的流程示意图;FIG. 3 is a schematic flowchart of a method for controlling an electronic device provided in an embodiment of the present application;
图4是本申请实施例提供的一种根据声场环境参数对电子设备进行控制的步骤示意图;FIG. 4 is a schematic diagram of steps for controlling electronic equipment according to sound field environmental parameters provided by an embodiment of the present application;
图5是本申请实施例提供的另一种根据声场环境参数对电子设备进行控制的步骤示意图;FIG. 5 is another schematic diagram of steps for controlling electronic equipment according to sound field environmental parameters provided by the embodiment of the present application;
图6是本申请实施例提供的一种电子设备控制装置的硬件结构示意图。FIG. 6 is a schematic diagram of a hardware structure of an electronic equipment control device provided by an embodiment of the present application.
具体实施方式Detailed ways
本文中术语“和/或”,是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。本文中符号“/”表示关联对象是或者的关系,例如A/B表示A或者B。The term "and/or" in this article is an association relationship describing associated objects, which means that there can be three relationships, for example, A and/or B can mean: A exists alone, A and B exist simultaneously, and B exists alone These three situations. The symbol "/" in this document indicates that the associated object is an or relationship, for example, A/B indicates A or B.
本文中的说明书和权利要求书中的术语“第一”和“第二”等是用于区别不同的对象,而不是用于描述对象的特定顺序。例如,第一响应消息和第二响应消息等是用于区别不同的响应消息,而不是用于描述响应消息的特定顺序。The terms "first" and "second" and the like in the specification and claims herein are used to distinguish different objects, rather than to describe a specific order of objects. For example, the first response message and the second response message are used to distinguish different response messages, rather than describing a specific order of the response messages.
在本申请实施例中,“示例性的”或者“例如”等词用于表示作例子、例证或说明。本申请实施例中被描述为“示例性的”或者“例如”的任何实施例或设计方案不应被解释为比其它实施例或设计方案更优选或更具优势。确切而言,使用“示例性的”或者“例如”等词旨在以具体方式呈现相关概念。In the embodiments of the present application, words such as "exemplary" or "for example" are used as examples, illustrations or illustrations. Any embodiment or design scheme described as "exemplary" or "for example" in the embodiments of the present application shall not be interpreted as being more preferred or more advantageous than other embodiments or design schemes. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete manner.
在本申请实施例的描述中,除非另有说明,“多个”的含义是指两个或者两个以上,例如,多个处理单元是指两个或者两个以上的处理单元等;多个元件是指两个或者两个以上的元件等。In the description of the embodiments of the present application, unless otherwise specified, "multiple" means two or more, for example, multiple processing units refer to two or more processing units, etc.; multiple A component refers to two or more components or the like.
示例性的,图1示出了一种应用场景的示意图。如图1所示,在房间200中设置有电子设备100,在电子设备100上可以但不限于设置有摄像头110、麦克风120和扬声器130。电子设备100可以在房间200中进行声音识别并响应,也可以播放声音,等等。示例性的,该应用场景可以理解为是室内场景。其中,电子设备100可以但不限于为智能电视,本申请实施例中所指的智能电视可以是能与移动设备例如智能手机、平板电脑等进行交互的电视或其 他具有大屏的电子设备,例如智能手机中的用户界面可以通过无线方式传输并在智能电视中呈现,用户在智能电视中的操作也可以影响智能手机。Exemplarily, Fig. 1 shows a schematic diagram of an application scenario. As shown in FIG. 1 , an electronic device 100 is provided in a room 200 , and a camera 110 , a microphone 120 and a speaker 130 may be provided on the electronic device 100 , but not limited thereto. The electronic device 100 can recognize and respond to sounds in the room 200, and can also play sounds, and so on. Exemplarily, this application scene can be understood as an indoor scene. Wherein, the electronic device 100 may be, but not limited to, a smart TV. The smart TV referred to in the embodiment of the present application may be a TV capable of interacting with mobile devices such as smart phones, tablet computers, etc., or other electronic devices with large screens, such as The user interface in the smart phone can be transmitted wirelessly and presented on the smart TV, and the user's operations on the smart TV can also affect the smart phone.
在一些实施例中,图1中所示的电子设备100也可以替换为其他的电子设备,替换后的方案仍在本申请的保护范围内。示例性的,电子设备100可以为手机、平板电脑、桌面型计算机、膝上型计算机、手持计算机、笔记本电脑、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本,以及蜂窝电话、个人数字助理(personal digital assistant,PDA)、增强现实(augmented reality,AR)设备、虚拟现实(virtual reality,VR)设备、人工智能(artificial intelligence,AI)设备、可穿戴式设备和/或智能家居设备,本申请实施例对该电子设备100的具体类型不作特殊限制。In some embodiments, the electronic device 100 shown in FIG. 1 can also be replaced with other electronic devices, and the replaced solution is still within the protection scope of the present application. Exemplarily, the electronic device 100 can be a mobile phone, a tablet computer, a desktop computer, a laptop computer, a handheld computer, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, and a cell phone, a personal Personal digital assistant (PDA), augmented reality (AR) device, virtual reality (VR) device, artificial intelligence (AI) device, wearable device and/or smart home device , the embodiment of the present application does not specifically limit the specific type of the electronic device 100 .
示例性的,图2示出了电子设备100的结构示意图。如图2所示,该电子设备100可以包括:摄像头110、麦克风120、扬声器130、处理器140、存储器150、收发单元160和显示屏170。Exemplarily, FIG. 2 shows a schematic structural diagram of the electronic device 100 . As shown in FIG. 2 , the electronic device 100 may include: a camera 110 , a microphone 120 , a speaker 130 , a processor 140 , a memory 150 , a transceiver unit 160 and a display screen 170 .
其中,摄像头110用于捕获静态图像或视频。物体通过镜头生成光学图像投射到感光元件。感光元件可以是电荷耦合器件(charge coupled device,CCD)或互补金属氧化物半导体(complementarymetal-oxide-semiconductor,CMOS)光电晶体管。感光元件把光信号转换成电信号,之后将电信号传递给处理器140加工处理,以得到标准的RGB,YUV等格式的图像信号。在一些实施例中,电子设备100可以包括1个或N个摄像头110,N为大于1的正整数。示例性的,摄像头110可以用于采集电子设备100所处的环境中的图像。在一些实施例中,摄像头110和电子设备100可以单独设置,也可以集成在一起。Wherein, the camera 110 is used to capture still images or videos. The object generates an optical image through the lens and projects it to the photosensitive element. The photosensitive element can be a charge coupled device (charge coupled device, CCD) or a complementary metal-oxide-semiconductor (complementary metal-oxide-semiconductor, CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, and then transmits the electrical signal to the processor 140 for processing, so as to obtain image signals in standard RGB, YUV and other formats. In some embodiments, the electronic device 100 may include 1 or N cameras 110 , where N is a positive integer greater than 1. Exemplarily, the camera 110 may be used to collect images of the environment where the electronic device 100 is located. In some embodiments, the camera 110 and the electronic device 100 can be set separately or integrated together.
麦克风120,也称“话筒”,“传声器”,用于将声音信号转换为电信号。电子设备100可以设置至少一个麦克风120。在另一些实施例中,电子设备100可以设置两个麦克风120,除了采集声音信号,还可以实现降噪功能。在另一些实施例中,电子设备100还可以设置三个,四个或更多麦克风120,实现采集声音信号,降噪,还可以识别声音来源,实现定向录音功能等。示例性的,麦克风120可以用于采集环境中的声音信号,比如用户发出的声音等。在一些实施例中,麦克风120和电子设备100可以单独设置,也可以集成在一起。The microphone 120, also called "microphone" or "microphone", is used to convert sound signals into electrical signals. The electronic device 100 may be provided with at least one microphone 120 . In other embodiments, the electronic device 100 may be provided with two microphones 120, which may also implement a noise reduction function in addition to collecting sound signals. In some other embodiments, the electronic device 100 can also be provided with three, four or more microphones 120 to collect sound signals, reduce noise, identify sound sources, and realize directional recording functions, etc. Exemplarily, the microphone 120 may be used to collect sound signals in the environment, such as the sound emitted by the user. In some embodiments, the microphone 120 and the electronic device 100 can be set separately or integrated together.
扬声器130,也称“喇叭”,用于将音频电信号转换为声音信号。电子设备100可以通过扬声器130播放声音等。在一些实施例中,扬声器130和电子设备100可以单独设置,也可以集成在一起。The speaker 130, also called "horn", is used to convert audio electrical signals into sound signals. The electronic device 100 may play sound or the like through the speaker 130 . In some embodiments, the speaker 130 and the electronic device 100 can be set separately or integrated together.
处理器140可以是通用处理器或者专用处理器。例如,处理器140可以包括中央处理器(central processing unit,CPU)和/或基带处理器。其中,基带处理器可以用于处理通信数据,CPU可以用于实现相应的控制和处理功能,执行软件程序,处理软件程序的数据。Processor 140 may be a general purpose processor or a special purpose processor. For example, the processor 140 may include a central processing unit (central processing unit, CPU) and/or a baseband processor. Wherein, the baseband processor can be used to process communication data, and the CPU can be used to implement corresponding control and processing functions, execute software programs, and process data of the software programs.
存储器150上可以存有程序(也可以是指令或者代码),程序可被处理器140运行,使得处理器140执行本方案中描述的方法。可选地,存储器150中还可以存储有数据。可选地,处理器140还可以读取存储器150中存储的数据(例如,唤醒词检测模型、语音识别模型、均衡参数、动态范围控制参数、各个麦克风对应的传输通道时延参数等),该数据可以与程序存储在相同的存储地址,该数据也可以与程序存储在不同的存储地址。本方案中,处理器140和存储器150可以单独设置,也可以集成在一起,例如,集成在单板或者系统级芯片(system on chip,SOC)上。A program (or an instruction or code) may be stored in the memory 150, and the program may be executed by the processor 140, so that the processor 140 executes the method described in this solution. Optionally, data may also be stored in the memory 150 . Optionally, the processor 140 may also read data stored in the memory 150 (for example, a wake-up word detection model, a speech recognition model, equalization parameters, dynamic range control parameters, transmission channel delay parameters corresponding to each microphone, etc.), the Data can be stored at the same memory address as the program, or it can be stored at a different memory address than the program. In this solution, the processor 140 and the memory 150 can be set separately, or can be integrated together, for example, integrated on a single board or a system on chip (system on chip, SOC).
在一些实施例中,电子设备100上还可以包括收发单元160。收发单元160可以实现信号的输入(接收)和输出(发送)。例如,收发单元160可以包括收发器或射频芯片。收发单 元160还可以包括通信接口。示例性的,电子设备100可以通过收发单元160与服务器(图中未示出)通信,以从服务器处获取到所需的数据,比如语音识别模型等。In some embodiments, the electronic device 100 may further include a transceiver unit 160 . The transceiving unit 160 may implement input (reception) and output (transmission) of signals. For example, the transceiving unit 160 may include a transceiver or a radio frequency chip. The transceiver unit 160 may also include a communication interface. Exemplarily, the electronic device 100 can communicate with a server (not shown in the figure) through the transceiver unit 160, so as to obtain required data from the server, such as a speech recognition model and the like.
在一些实施例中,电子设备100上还可以包括显示屏170。该显示屏170可以用于显示图像,视频等。该显示屏170可以包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emitting diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode的,AMOLED),柔性发光二极管(flex light-emitting diode,FLED),Miniled,MicroLed,Micro-oLed,量子点发光二极管(quantum dot light emitting diodes,QLED)等。在一些实施例中,电子设备100可以包括1个或N个显示屏,N为大于1的正整数。In some embodiments, the electronic device 100 may further include a display screen 170 . The display screen 170 can be used to display images, videos and the like. The display screen 170 may include a display panel. The display panel can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active matrix organic light emitting diode or an active matrix organic light emitting diode (active-matrix organic light emitting diode, AMOLED), flexible light-emitting diode (flex light-emitting diode, FLED), Miniled, MicroLed, Micro-oLed, quantum dot light emitting diodes (quantum dot light emitting diodes, QLED), etc. In some embodiments, the electronic device 100 may include 1 or N display screens, where N is a positive integer greater than 1.
可以理解的是,本申请实施例示意的结构并不构成对电子设备100的具体限定。在本申请另一些实施例中,电子设备100可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。It can be understood that, the structure illustrated in the embodiment of the present application does not constitute a specific limitation on the electronic device 100 . In other embodiments of the present application, the electronic device 100 may include more or fewer components than shown in the figure, or combine certain components, or separate certain components, or arrange different components. The illustrated components can be realized in hardware, software or a combination of software and hardware.
接下来基于上文所描述的内容,对本申请提供的一种电子设备控制方法进行介绍。Next, based on the content described above, an electronic device control method provided by the present application will be introduced.
示例性的,图3示出了一种电子设备控制方法的流程示意图。图3中所涉及的电子设备可以为上文所描述的电子设备100。可以理解,该方法可以通过任何具有计算、处理能力的装置、设备、平台、设备集群来执行。例如可以由图2中所示的电子设备100执行,也可以由服务器等设备执行。为便于描述,下面以电子设备执行为例进行说明,如图3所示,该电子设备控制方法可以包括以下步骤:Exemplarily, FIG. 3 shows a schematic flowchart of a method for controlling an electronic device. The electronic device involved in FIG. 3 may be the electronic device 100 described above. It can be understood that the method can be executed by any device, device, platform, or device cluster that has computing and processing capabilities. For example, it may be executed by the electronic device 100 shown in FIG. 2 , or may be executed by devices such as a server. For ease of description, the execution of electronic equipment is taken as an example below. As shown in FIG. 3, the electronic equipment control method may include the following steps:
S301、通过摄像头获取电子设备所处第一空间的第一图像。S301. Acquire a first image of a first space where the electronic device is located by using a camera.
具体地,电子设备可以通过与其配套的摄像头获取到电子设备所处的第一空间的第一图像。Specifically, the electronic device may obtain a first image of the first space where the electronic device is located through a camera matched with the electronic device.
在一些实施例中,当该方法由服务器等设备执行时,电子设备获取到第一图像后,可以将该第一图像发送至服务器等设备。In some embodiments, when the method is executed by a device such as a server, after the electronic device obtains the first image, the first image may be sent to the device such as the server.
S302、通过麦克风获取第一空间中的第一声音。S302. Acquire a first sound in the first space through a microphone.
具体地,电子设备可以通过与其配套的麦克风获取到第一空间中的第一声音。示例性的,第一声音可以为用户发出的声音。Specifically, the electronic device may acquire the first sound in the first space through a microphone matched with it. Exemplarily, the first sound may be a sound made by a user.
在一些实施例中,在S301和/或S302之前,用户可以向电子设备下发进行声场环境参数优化的指令,电子设备获取到该指令后,可以启动与其配套的摄像头和/或麦克风,以获取到第一图像和/或第一声音。示例性的,电子设备在启动与其配套的麦克风后,可以提示用户发出声音,比如语音提示、图像提示、文字提示等等,以使得麦克风可以采集到用户发出的声音。In some embodiments, before S301 and/or S302, the user can send an instruction to the electronic device to optimize the sound field environment parameters, and after the electronic device obtains the instruction, it can start the camera and/or microphone that matches it to obtain to the first image and/or first sound. Exemplarily, after the electronic device activates its matching microphone, it can prompt the user to make a sound, such as voice prompt, image prompt, text prompt, etc., so that the microphone can collect the sound from the user.
在一些实施例中,当该方法由服务器等设备执行时,电子设备获取到第一声音后,可以将该第一声音发送至服务器等设备。In some embodiments, when the method is executed by a device such as a server, after the electronic device acquires the first sound, the first sound may be sent to the device such as the server.
S303、根据第一图像,确定第一空间的空间参数。S303. Determine spatial parameters of the first space according to the first image.
具体地,获取到第一图像后,可以将该第一图像输入至预先训练的与图像处理相关的神经网络模型,以得到第一空间的空间参数。示例性的,空间参数可以包括:第一空间的第一大小和第一空间内的物体的材料类型。示例性的,第一大小可以为第一空间的尺寸(比如:体积等)的大小。示例性的,第一图像可以为一个,也可以为多个。Specifically, after the first image is acquired, the first image may be input to a pre-trained neural network model related to image processing to obtain spatial parameters of the first space. Exemplarily, the space parameters may include: a first size of the first space and a material type of objects in the first space. Exemplarily, the first size may be the size of the first space (such as volume, etc.). Exemplarily, there may be one or more first images.
S304、根据第一声音,确定第一空间对应的声音参数。S304. Determine sound parameters corresponding to the first space according to the first sound.
具体地,获取到第一声音后,可以将该第一声音输入至预先训练的与声音处理相关的神经网络模型,以得到第一空间对应的声音参数。示例性的,声音参数可以包括用于表征第一空间中混响大小的第一混响系数。示例性的,第一混响系数可以为T60,即声音在声场中衰减60db所需的时间。Specifically, after the first sound is acquired, the first sound may be input into a pre-trained neural network model related to sound processing, so as to obtain sound parameters corresponding to the first space. Exemplarily, the sound parameter may include a first reverberation coefficient used to characterize the magnitude of reverberation in the first space. Exemplarily, the first reverberation coefficient may be T60, that is, the time required for the sound to decay by 60db in the sound field.
S305、根据空间参数和声音参数,确定声场环境参数。S305. Determine a sound field environment parameter according to the space parameter and the sound parameter.
具体地,获取到空间参数和声音参数后,可以根据空间参数和声音参数,确定出声场环境参数。示例性的,声场环境参数可以包括目标混响系数、目标吸收系数和第一空间的目标大小。在一些实施例中,声场环境参数中还可以包括均衡EQ参数。Specifically, after the space parameter and the sound parameter are acquired, the sound field environment parameter may be determined according to the space parameter and the sound parameter. Exemplarily, the sound field environment parameters may include a target reverberation coefficient, a target absorption coefficient, and a target size of the first space. In some embodiments, the sound field environment parameters may also include equalization EQ parameters.
在一些实施例中,下面分别对根据空间参数和声音参数,确定目标混响系数、目标吸收系数和第一空间的目标大小进行说明。In some embodiments, the determination of the target reverberation coefficient, the target absorption coefficient, and the target size of the first space according to the space parameter and the sound parameter will be described respectively below.
a)目标混响系数a) Target reverberation coefficient
若第一混响系数的置信度大于第一混响值,可以将该第一混响系数作为目标混响系数。其中,第一混响系数的置信度可以由输出第一混响系数的神经网络模型一并输出。示例性的,第一混响值可以为0.9。If the confidence degree of the first reverberation coefficient is greater than the first reverberation value, the first reverberation coefficient may be used as the target reverberation coefficient. Wherein, the confidence degree of the first reverberation coefficient may be output together by the neural network model that outputs the first reverberation coefficient. Exemplarily, the first reverberation value may be 0.9.
若第一混响系数的置信度小于或等于第一混响值,且大于第二混响值,可以先由第一空间的第一大小和第一空间内的物体的材料类型计算得到第二混响系数,然后,再由第一混响系数和第二混响系数,得到目标混响系数。示例性的,第二混响值可以为0.6。示例性的,计算第二混响系数的公式可以为:If the confidence of the first reverberation coefficient is less than or equal to the first reverberation value and greater than the second reverberation value, the second can be calculated from the first size of the first space and the material type of the object in the first space. The reverberation coefficient. Then, the target reverberation coefficient is obtained from the first reverberation coefficient and the second reverberation coefficient. Exemplarily, the second reverberation value may be 0.6. Exemplarily, the formula for calculating the second reverberation coefficient may be:
RT≌0.161×V÷S       (公式1)RT≌0.161×V÷S (Formula 1)
其中,RT为混响系数,V为第一空间的大小,S为第一空间内各个材料的吸收系数的平均值。对于第一空间内材料的吸收系数,可以在得到材料的类型后,查询材料类型与材料的吸收系数之间的关系表得到。Wherein, RT is the reverberation coefficient, V is the size of the first space, and S is the average value of the absorption coefficient of each material in the first space. The absorption coefficient of the material in the first space can be obtained by querying the relationship table between the material type and the absorption coefficient of the material after obtaining the type of the material.
示例性的,由第一混响系数和第二混响系数得到目标混响系数,具体可以为将第一混响系数和第二混响系数的平均值作为目标混响系数。Exemplarily, the target reverberation coefficient is obtained from the first reverberation coefficient and the second reverberation coefficient, specifically, an average value of the first reverberation coefficient and the second reverberation coefficient may be used as the target reverberation coefficient.
若第一混响系数的置信度小于或等于第二混响值,可以由第一混响系数和第二混响系数,得到目标混响系数。示例性的,得到目标混响系数的公式可以为:If the confidence degree of the first reverberation coefficient is less than or equal to the second reverberation value, the target reverberation coefficient may be obtained from the first reverberation coefficient and the second reverberation coefficient. Exemplarily, the formula for obtaining the target reverberation coefficient may be:
RT =(m/2)×RT 1+(1-m/2)×RT 2     (公式2) RT mesh =(m/2)×RT 1 +(1-m/2)×RT 2 (Formula 2)
其中,RT 为目标混响系数,RT 1为第一混响系数,RT 2为第二混响系数,m为第一混响系数(即RT 1)的置信度。 Wherein, RTme is the target reverberation coefficient, RT 1 is the first reverberation coefficient, RT 2 is the second reverberation coefficient, and m is the confidence degree of the first reverberation coefficient (ie RT 1 ).
b)目标吸收系数b) Target Absorption Coefficient
若第一吸收系数的置信度大于第一吸收值,可以将该第一吸收系数作为目标吸收系数。其中,第一吸收系数的置信度可以由输出第一吸收系数的神经网络模型一并输出。示例性的,第一吸收值可以为0.8。示例性的,第一吸收系数可以为第一空间内各个物体的材料对应的吸收系数的平均值。示例性的,目标吸收系数可以用于表征第一空间内的物体(所有物体或者摄像头采集到的物体)的材料对应的吸收系数。If the confidence of the first absorption coefficient is greater than the first absorption value, the first absorption coefficient may be used as the target absorption coefficient. Wherein, the confidence degree of the first absorption coefficient may be output together by the neural network model that outputs the first absorption coefficient. Exemplarily, the first absorption value may be 0.8. Exemplarily, the first absorption coefficient may be an average value of absorption coefficients corresponding to materials of objects in the first space. Exemplarily, the target absorption coefficient may be used to characterize the absorption coefficient corresponding to the material of the objects (all objects or objects collected by the camera) in the first space.
若第一吸收系数的置信度小于或等于第一吸收值,且大于第二吸收值,可以先由第一混响系数和第一空间的大小计算得到第二吸收系数,然后,再由第一吸收系数和第二吸收系数,得到目标吸收系数。示例性的,第二吸收值可以为0.5。示例性的,可以通过上述“公式1”对第一混响系数和第一空间的大小进行计算,以得到第二吸收系数。If the confidence of the first absorption coefficient is less than or equal to the first absorption value and greater than the second absorption value, the second absorption coefficient can be calculated from the first reverberation coefficient and the size of the first space, and then the first The absorption coefficient and the second absorption coefficient to obtain the target absorption coefficient. Exemplarily, the second absorption value may be 0.5. Exemplarily, the first reverberation coefficient and the size of the first space may be calculated by the above-mentioned "Formula 1", so as to obtain the second absorption coefficient.
示例性的,由第一吸收系数和第二吸收系数得到目标吸收系数,具体可以为将第一吸收系数和第二吸收系数的平均值作为目标吸收系数。Exemplarily, the target absorption coefficient is obtained from the first absorption coefficient and the second absorption coefficient, specifically, the average value of the first absorption coefficient and the second absorption coefficient may be used as the target absorption coefficient.
若第一吸收系数的置信度小于或等于第二吸收值,可以由第一吸收系数和第二吸收系数,得到目标吸收系数。示例性的,得到目标吸收系数的公式可以为:If the confidence level of the first absorption coefficient is less than or equal to the second absorption value, the target absorption coefficient can be obtained from the first absorption coefficient and the second absorption coefficient. Exemplarily, the formula for obtaining the target absorption coefficient may be:
Ab =(n/2)×Ab 1+(1-n/2)Ab 2     (公式3) Ab mesh =(n/2)×Ab 1 +(1-n/2)Ab 2 (Formula 3)
其中,Ab 为目标吸收系数,Ab 1为第一吸收系数,Ab 2为第二吸收系数,n为第一吸收系数(即Ab 1)的置信度。 Wherein, Ab mesh is the target absorption coefficient, Ab 1 is the first absorption coefficient, Ab 2 is the second absorption coefficient, and n is the confidence degree of the first absorption coefficient (ie Ab 1 ).
c)第一空间的目标大小c) The target size of the first space
若第一空间的第一大小的置信度大于第一尺寸值,可以将该第一大小作为目标大小。其中,第一大小的置信度可以由输出第一大小的神经网络模型一并输出。示例性的,第一尺寸值可以为0.8。If the confidence of the first size of the first space is greater than the first size value, the first size may be used as the target size. Wherein, the confidence degree of the first size may be output together by the neural network model outputting the first size. Exemplarily, the first size value may be 0.8.
若第一大小的置信度小于或等于第一尺寸值,且大于第二尺寸值,可以先由第一混响系数和第一吸收系数计算得到第二大小,然后,再由第一大小和第二大小,得到目标大小。示例性的,第二尺寸值可以为0.5。示例性的,可以通过上述“公式1”对第一混响系数和第一吸收系数进行计算,以得到第二大小。If the confidence of the first size is less than or equal to the first size value and greater than the second size value, the second size can be calculated from the first reverberation coefficient and the first absorption coefficient, and then the first size and the second size Two size to get the target size. Exemplarily, the second size value may be 0.5. Exemplarily, the first reverberation coefficient and the first absorption coefficient may be calculated by the above "Formula 1", so as to obtain the second magnitude.
示例性的,由第一大小和第二大小得到目标大小,具体可以为将第一大小和第二大小的平均值作为目标大小。Exemplarily, the target size is obtained from the first size and the second size, specifically, the average value of the first size and the second size may be used as the target size.
若第一大小的置信度小于或等于第二尺寸值,可以由第一大小和第二大小,得到目标大小。示例性的,得到目标大小的公式可以为:If the confidence of the first size is less than or equal to the value of the second size, the target size can be obtained from the first size and the second size. Exemplarily, the formula for obtaining the target size may be:
V =(p/2)×AV 1+(1-p/2)×V 2     (公式3) Vmesh = (p/2)×AV 1 +(1-p/2)×V 2 (Formula 3)
其中,V 为目标大小,V 1为第一大小,V 2为第二大小,p为第一大小(即V 1)的置信度。 Wherein, Vmesh is the target size, V 1 is the first size, V 2 is the second size, and p is the confidence of the first size (ie V 1 ).
这样通过上述对由视觉获取到的空间参数和由声学获取到的声音参数进行一致性校验,以提升获取到的声场环境参数的准确度。In this way, the accuracy of the acquired sound field environment parameters is improved through the above-mentioned consistency check of the spatial parameters acquired by vision and the sound parameters acquired by acoustics.
在确定出声场环境参数后,可以执行S306。After the environmental parameters of the sound field are determined, S306 may be executed.
S306、根据声场环境参数,对电子设备进行控制。S306. Control the electronic device according to the sound field environment parameters.
具体地,在确定出声场环境参数,即可以根据该声场环境参数对电子设备进行控制,从而使得电子设备能够更好的适应当前的声场环境(即当前的空间)。Specifically, when the sound field environment parameter is determined, the electronic device can be controlled according to the sound field environment parameter, so that the electronic device can better adapt to the current sound field environment (ie, the current space).
作为一种可能的实现方式,当电子设备中设置有语音识别模型时,可以由声场环境参数获取到与该声场环境参数相匹配的语音识别模型。具体地,如图4所示,包括以下步骤:As a possible implementation manner, when the voice recognition model is set in the electronic device, the voice recognition model matching the sound field environment parameter can be acquired from the sound field environment parameter. Specifically, as shown in Figure 4, the following steps are included:
S401、电子设备向服务器发送第一消息,第一消息中包括声场环境参数,第一消息用于请求获取与该声场环境参数相匹配的语音识别模型。S401. The electronic device sends a first message to a server, where the first message includes a sound field environment parameter, and the first message is used to request acquisition of a speech recognition model matching the sound field environment parameter.
S402、服务器根据声场环境参数,确定出与该声场环境参数相匹配的目标语音识别模型。S402. The server determines a target speech recognition model matching the sound field environment parameters according to the sound field environment parameters.
具体地,在服务器中可以预置有不同声场环境参数对应的语音识别模型。当服务器获取到电子设备发送的声场环境参数后,服务器可以由该声场环境参数从其预置的语音识别模型中确定出目标语音识别模型。Specifically, speech recognition models corresponding to different sound field environment parameters may be preset in the server. After the server acquires the sound field environment parameters sent by the electronic device, the server can determine the target speech recognition model from its preset speech recognition models based on the sound field environment parameters.
示例性的,可以预先设定声场环境参数中每个子参数的权重值,然后,再由计算服务器获取到的声场环境参数与其预先存储的各个声场环境参数之间的匹配度,最后选取匹配度最高的一个声场环境参数对应的语音识别模型作为目标语音识别模型。其中,可以通过以下“公式4”计算匹配度,该公式为:Exemplarily, the weight value of each sub-parameter in the sound field environment parameters can be preset, and then, the matching degree between the sound field environment parameters acquired by the calculation server and the pre-stored sound field environment parameters is finally selected. The speech recognition model corresponding to one of the sound field environment parameters is used as the target speech recognition model. Among them, the matching degree can be calculated by the following "Formula 4", which is:
f=|RT -RT |×α+|V -V |×β+|Ab -Ab |×γ+|EQ -EQ |×δ+ε   (公 式4) f=|RT terminal -RT cloud |×α+|V terminal -V cloud |×β+|Ab terminal -Ab cloud |×γ+|EQ terminal -EQ cloud |×δ+ε (Formula 4)
其中,f为匹配度,RT 为服务器获取的电子设备发送的声场环境参数中的混响系数,RT 为服务器中预置的声场环境参数中的混响系数,V 为服务器获取的电子设备发送的声场环境参数中的空间的大小,V 为服务器中预置的声场环境参数中的空间的大小,Ab 为服务器获取的电子设备发送的声场环境参数中的吸收系数,Ab 为服务器中预置的声场环境参数中的吸收系数,EQ 为服务器获取的电子设备发送的声场环境参数中的EQ参数的值,EQ 为服务器中预置的声场环境参数中的EQ参数的值,α、β、γ、δ、ε分别为预先设置的权重值。该公式中的各个参数可以根据实际情况选取,此处不做限定。 Among them, f is the matching degree, the RT terminal is the reverberation coefficient in the sound field environment parameters sent by the electronic equipment obtained by the server, the RT cloud is the reverberation coefficient in the sound field environment parameters preset in the server, and the V terminal is the electronic equipment obtained by the server. The size of the space in the sound field environment parameters sent by the device, the V cloud is the size of the space in the sound field environment parameters preset in the server, the Ab end is the absorption coefficient in the sound field environment parameters sent by the electronic device obtained by the server, and the Ab cloud is The absorption coefficient in the sound field environment parameters preset in the server, the EQ terminal is the value of the EQ parameter in the sound field environment parameters sent by the electronic device obtained by the server, and the EQ cloud is the value of the EQ parameter in the sound field environment parameters preset in the server , α, β, γ, δ, ε are preset weight values respectively. Each parameter in the formula can be selected according to actual conditions, and is not limited here.
S403、服务器向电子设备发送第二消息,第二消息中包括目标语音识别模型。S403. The server sends a second message to the electronic device, where the second message includes the target speech recognition model.
S404、电子设备利用目标语音识别模型进行语音识别。S404. The electronic device uses the target speech recognition model to perform speech recognition.
在一些实施例中,S401至S404也可以称之为:据声场环境参数,确定与声场环境参数相匹配的目标语音识别模型;将电子设备中的语音识别模型更新为目标语音识别模型。In some embodiments, S401 to S404 may also be referred to as: determining a target speech recognition model matching the sound field environment parameters according to the sound field environment parameters; updating the speech recognition model in the electronic device to the target speech recognition model.
这样,电子设备即可以在进行语音识别时,根据当前的环境中的声场环境参数自适应优化语音识别模型,以及使用与当前的声场环境相匹配的语音识别模型进行语音识别,实现了语音识别功能对用户实际使用环境的自适应,避免了由于声场环境差异导致模型识别性能退化的情况,为良好的语音识别服务体验提供了保障,改善用户的使用体验。In this way, when performing speech recognition, the electronic device can adaptively optimize the speech recognition model according to the sound field environment parameters in the current environment, and use the speech recognition model that matches the current sound field environment for speech recognition, realizing the speech recognition function The adaptation to the user's actual use environment avoids the degradation of the model recognition performance due to the difference in the sound field environment, provides a guarantee for a good speech recognition service experience, and improves the user experience.
作为另一种可能的实现方式,当电子设备在播放声音时,可以由声场环境参数计算电子设备所在环境的声场分布图,根据声场分布图并结合人工智能搜索算法对音频播放效果进行自适应调参,使得用户的听音效果达到最佳。具体地,如图5所示,包括以下步骤:As another possible implementation, when the electronic device is playing sound, the sound field distribution map of the environment where the electronic device is located can be calculated from the sound field environment parameters, and the audio playback effect can be adaptively adjusted according to the sound field distribution map and combined with artificial intelligence search algorithms. Parameter, so that the user's listening effect can reach the best. Specifically, as shown in Figure 5, the following steps are included:
S501、根据声场环境参数,对当前的声场环境进行建模,以得到电子设备所处的第一空间的空间模型。S501. Model the current sound field environment according to the sound field environment parameters, so as to obtain a space model of the first space where the electronic device is located.
具体地,在建模时,可以但不限于通过预置声场建模方式(比如开源pyroom库等),以及声场环境参数中所包括的空间的大小和空间内各个物体的吸收系数进行空间建模,从而完成对当前声场环境的建模,从而得到电子设备所处的第一空间的空间模型。Specifically, when modeling, it is possible, but not limited to, to carry out space modeling through preset sound field modeling methods (such as open source pyroom library, etc.), as well as the size of the space included in the sound field environment parameters and the absorption coefficient of each object in the space , so as to complete the modeling of the current sound field environment, so as to obtain the space model of the first space where the electronic device is located.
S502、基于得到的空间模型进行声场模拟,得到目标位置处对应的第一频响曲线。S502. Perform sound field simulation based on the obtained spatial model, to obtain a first frequency response curve corresponding to the target position.
具体地,得到第一空间的空间模型后,可以利用声场模拟技术在空间模型中进行声场模拟,以得到目标位置处对应的第一频响曲线。示例性的,目标位置可以为在当前的声场环境下声音的响度、空间感、力度、清晰度均最优的位置。Specifically, after the space model of the first space is obtained, sound field simulation may be performed in the space model by using sound field simulation technology, so as to obtain the first frequency response curve corresponding to the target position. Exemplarily, the target position may be a position where the loudness, sense of space, strength, and clarity of the sound are optimal in the current sound field environment.
S503、基于得到的声场环境参数,从预置的理想声学频响库中确定出与该声场环境参数相匹配的第二频响曲线。S503. Based on the obtained sound field environment parameters, determine a second frequency response curve matching the sound field environment parameters from a preset ideal acoustic frequency response library.
具体地,可以基于得到的声场环境参数,从预置的理想声学频响库确定出与该声场环境参数相匹配的第二频响曲线。示例性的,可以但不限于通过前述的“公式4”确定得到的声场环境参数与理想声学频响库中各个频响曲线对应的声场环境参数之间的匹配度。Specifically, based on the obtained sound field environment parameters, a second frequency response curve matching the sound field environment parameters may be determined from a preset ideal acoustic frequency response library. Exemplarily, but not limited to, the matching degree between the obtained sound field environment parameters and the sound field environment parameters corresponding to each frequency response curve in the ideal acoustic frequency response library may be determined through the aforementioned "Formula 4".
S504、将第一频响曲线拟合为第二频响曲线。S504. Fit the first frequency response curve to the second frequency response curve.
具体地,可以比较第一频响曲线和第二频响曲线之间的差异,然后在利用两者之间的差异,通过调整EQ,DRC、各个麦克风对应的传输通道的时延参数等,从而将第一频响曲线拟合为第二频响曲线,进而使得用户在目标位置处的听到的声音的响度、空间感、力度、清晰度最优,听音效果最佳。Specifically, it is possible to compare the difference between the first frequency response curve and the second frequency response curve, and then use the difference between the two to adjust EQ, DRC, delay parameters of transmission channels corresponding to each microphone, etc., so that The first frequency response curve is fitted to the second frequency response curve, so that the loudness, sense of space, strength, and clarity of the sound heard by the user at the target position are optimized, and the listening effect is the best.
这样,当电子设备在播放声音时,即可以对音频播放效果进行自适应调参,从而使得用户的听音效果达到最佳,提升用户体验。In this way, when the electronic device is playing sound, the audio playing effect can be adjusted adaptively, so that the user's listening effect can be optimized and the user experience can be improved.
作为又一种可能的实现方式,当使用电子设备进行通话时,在获取声场环境参数后,可以将该声场环境参数作为电子设备中对语音数据进行处理的增强算法的输入,通过增强算法根据输入的声场环境参数对用户通话时语音信号进行自适应增强,以改善通话质量,提升用户体验。As yet another possible implementation, when an electronic device is used to make a call, after the sound field environment parameter is obtained, the sound field environment parameter can be used as the input of the enhancement algorithm for processing voice data in the electronic device, and the enhancement algorithm can be used according to the input The sound field environment parameters adaptively enhance the voice signal of the user during the call, so as to improve the call quality and enhance the user experience.
由此,通过视觉和声学多模态结合的方式,相互校验视觉和声学参数估计的结果(即空间参数和声音参数),使得获取到的声场环境参数的可靠性更高,为后续对电子设备进行控制提供了坚实的基础,从而可以较大程度提升用户体验。比如:可以有效提升语音识别服务、减小音频播放效果受声场环境的影响,提升电子设备的唤醒率和ASR的识别率,以及明显改善听音效果。Therefore, through the combination of vision and acoustic multi-modality, the results of visual and acoustic parameter estimation (that is, spatial parameters and sound parameters) are mutually verified, so that the reliability of the acquired sound field environmental parameters is higher, which is a good foundation for the subsequent electronic research. Device control provides a solid foundation, which can greatly improve user experience. For example, it can effectively improve the voice recognition service, reduce the audio playback effect from being affected by the sound field environment, improve the wake-up rate of electronic equipment and the recognition rate of ASR, and significantly improve the listening effect.
可以理解的是,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。此外,在一些可能的实现方式中,上述实施例中的各步骤可以根据实际情况选择性执行,可以部分执行,也可以全部执行,此处不做限定。It can be understood that the sequence numbers of the steps in the above embodiments do not mean the order of execution, and the execution order of each process should be determined by its functions and internal logic, and should not constitute any obligation for the implementation process of the embodiment of the present application. limited. In addition, in some possible implementation manners, the steps in the foregoing embodiments may be selectively executed according to actual conditions, may be partially executed, or may be completely executed, which is not limited here.
基于上述实施例中的描述的方法,本申请实施例还提供了一种电子设备控制装置。请参阅图6,图6为本申请实施例提供的一种电子设备控制装置的结构示意图。如图6所示,电子设备控制装置600包括一个或多个处理器601以及接口电路602。可选的,电子设备控制装置600还可以包含总线603。其中:Based on the methods described in the foregoing embodiments, the embodiments of the present application further provide an electronic device control device. Please refer to FIG. 6 . FIG. 6 is a schematic structural diagram of an electronic equipment control device provided by an embodiment of the present application. As shown in FIG. 6 , an electronic equipment control device 600 includes one or more processors 601 and an interface circuit 602 . Optionally, the electronic device control apparatus 600 may also include a bus 603 . in:
处理器601可能是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器601中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器601可以是通用处理器、神经网络处理器(Neural Network Processing Unit,NPU)、数字通信器(DSP)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)或者其它可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。The processor 601 may be an integrated circuit chip and has signal processing capability. In the implementation process, each step of the above method may be completed by an integrated logic circuit of hardware in the processor 601 or instructions in the form of software. The above-mentioned processor 601 can be a general-purpose processor, a neural network processor (Neural Network Processing Unit, NPU), a digital communicator (DSP), an application-specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable Logic devices, discrete gate or transistor logic devices, discrete hardware components. Various methods and steps disclosed in the embodiments of the present application may be implemented or executed. A general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
接口电路602可以用于数据、指令或者信息的发送或者接收,处理器601可以利用接口电路602接收的数据、指令或者其它信息,进行加工,可以将加工完成信息通过接口电路602发送出去。The interface circuit 602 can be used for sending or receiving data, instructions or information. The processor 601 can process the data, instructions or other information received by the interface circuit 602 , and can send the processing completion information through the interface circuit 602 .
可选的,电子设备控制装置600还包括存储器,存储器可以包括只读存储器和随机存取存储器,并向处理器提供操作指令和数据。存储器的一部分还可以包括非易失性随机存取存储器(NVRAM)。其中,该存储器可以与处理器601耦合。Optionally, the electronic device control apparatus 600 further includes a memory, which may include a read-only memory and a random access memory, and provides operation instructions and data to the processor. A portion of the memory may also include non-volatile random access memory (NVRAM). Wherein, the memory may be coupled with the processor 601 .
可选的,存储器存储了可执行软件模块或者数据结构,处理器601可以通过调用存储器存储的操作指令(该操作指令可存储在操作系统中),执行相应的操作。Optionally, the memory stores executable software modules or data structures, and the processor 601 can execute corresponding operations by calling operation instructions stored in the memory (the operation instructions can be stored in the operating system).
可选的,接口电路602可用于输出处理器601的执行结果。Optionally, the interface circuit 602 may be used to output an execution result of the processor 601 .
需要说明的,处理器601、接口电路602各自对应的功能既可以通过硬件设计实现,也可以通过软件设计来实现,还可以通过软硬件结合的方式来实现,这里不作限制。示例性的,电子设备控制装置600可以但不限于应用在图2中所示的电子设备100中。It should be noted that the corresponding functions of the processor 601 and the interface circuit 602 can be realized by hardware design, software design, or a combination of software and hardware, which is not limited here. Exemplarily, the electronic device control apparatus 600 may be applied in the electronic device 100 shown in FIG. 2 , but is not limited to.
应理解,上述方法实施例的各步骤可以通过处理器中的硬件形式的逻辑电路或者软件形式的指令完成。It should be understood that each step in the foregoing method embodiments may be implemented by logic circuits in the form of hardware or instructions in the form of software in the processor.
可以理解的是,本申请的实施例中的处理器可以是中央处理单元(central processing unit,CPU),还可以是其他通用处理器、数字信号处理器(digital signal processor,DSP)、 专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、晶体管逻辑器件,硬件部件或者其任意组合。通用处理器可以是微处理器,也可以是任何常规的处理器。It can be understood that the processor in the embodiment of the present application may be a central processing unit (central processing unit, CPU), and may also be other general processors, digital signal processors (digital signal processor, DSP), application specific integrated circuits (application specific integrated circuit, ASIC), field programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, transistor logic devices, hardware components or any combination thereof. A general-purpose processor can be a microprocessor, or any conventional processor.
本申请的实施例中的方法步骤可以通过硬件的方式来实现,也可以由处理器执行软件指令的方式来实现。软件指令可以由相应的软件模块组成,软件模块可以被存放于随机存取存储器(random access memory,RAM)、闪存、只读存储器(read-only memory,ROM)、可编程只读存储器(programmable rom,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)、寄存器、硬盘、移动硬盘、CD-ROM或者本领域熟知的任何其它形式的存储介质中。一种示例性的存储介质耦合至处理器,从而使处理器能够从该存储介质读取信息,且可向该存储介质写入信息。当然,存储介质也可以是处理器的组成部分。处理器和存储介质可以位于ASIC中。The method steps in the embodiments of the present application may be implemented by means of hardware, or may be implemented by means of a processor executing software instructions. The software instructions can be composed of corresponding software modules, and the software modules can be stored in random access memory (random access memory, RAM), flash memory, read-only memory (read-only memory, ROM), programmable read-only memory (programmable rom) , PROM), erasable programmable read-only memory (erasable PROM, EPROM), electrically erasable programmable read-only memory (electrically EPROM, EEPROM), register, hard disk, mobile hard disk, CD-ROM or known in the art any other form of storage medium. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be a component of the processor. The processor and storage medium can be located in the ASIC.
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者通过所述计算机可读存储介质进行传输。所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘(solid state disk,SSD))等。In the above embodiments, all or part of them may be implemented by software, hardware, firmware or any combination thereof. When implemented using software, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present application will be generated in whole or in part. The computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable devices. The computer instructions may be stored in or transmitted via a computer-readable storage medium. The computer instructions may be transmitted from one website site, computer, server, or data center to another website site by wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) , computer, server or data center for transmission. The computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server or a data center integrated with one or more available media. The available medium may be a magnetic medium (such as a floppy disk, a hard disk, or a magnetic tape), an optical medium (such as a DVD), or a semiconductor medium (such as a solid state disk (solid state disk, SSD)), etc.
可以理解的是,在本申请的实施例中涉及的各种数字编号仅为描述方便进行的区分,并不用来限制本申请的实施例的范围。It can be understood that the various numbers involved in the embodiments of the present application are only for convenience of description, and are not used to limit the scope of the embodiments of the present application.

Claims (11)

  1. 一种电子设备控制方法,其特征在于,所述方法包括:A control method for electronic equipment, characterized in that the method comprises:
    通过摄像头获取电子设备所处第一空间的第一图像,以及通过麦克风获取所述第一空间中的第一声音;acquiring a first image of the first space where the electronic device is located through a camera, and acquiring a first sound in the first space through a microphone;
    根据所述第一图像,确定所述第一空间的空间参数,以及根据所述第一声音,确定所述第一空间对应的声音参数,所述空间参数包括所述第一空间的第一大小和所述第一空间内的物体的材料类型,所述声音参数包括用于表征所述第一空间中混响大小的第一混响系数;Determine a spatial parameter of the first space according to the first image, and determine a sound parameter corresponding to the first space according to the first sound, where the spatial parameter includes a first size of the first space and the material type of the object in the first space, the sound parameters include a first reverberation coefficient for characterizing the magnitude of reverberation in the first space;
    根据所述空间参数和所述声音参数,确定所述声场环境参数,所述声场环境参数包括目标混响系数、目标吸收系数和所述第一空间的目标大小中的至少一种,所述目标吸收系数用于表征所述第一空间内的物体的材料对应的吸收系数;According to the space parameter and the sound parameter, determine the sound field environment parameter, the sound field environment parameter includes at least one of a target reverberation coefficient, a target absorption coefficient, and a target size of the first space, the target The absorption coefficient is used to characterize the absorption coefficient corresponding to the material of the object in the first space;
    根据所述声场环境参数,对所述电子设备进行控制。The electronic equipment is controlled according to the sound field environment parameters.
  2. 根据权利要求1所述的方法,其特征在于,所述声场环境参数为目标混响系数,所述根据所述空间参数和所述声音参数,确定所述声场环境参数,具体包括:The method according to claim 1, wherein the sound field environment parameter is a target reverberation coefficient, and determining the sound field environment parameter according to the space parameter and the sound parameter specifically includes:
    当所述第一混响系数的置信度大于第一混响值时,确定所述目标混响系数为所述第一混响系数;When the confidence degree of the first reverberation coefficient is greater than a first reverberation value, determining that the target reverberation coefficient is the first reverberation coefficient;
    当所述第一混响系数的置信度小于或等于所述第一混响值,且大于第二混响值时,根据所述第一空间的第一大小和所述第一空间内的物体的材料类型,得到第二混响系数,以及根据所述第一混响系数和所述第二混响系数,得到所述目标混响系数;When the confidence degree of the first reverberation coefficient is less than or equal to the first reverberation value and greater than the second reverberation value, according to the first size of the first space and the objects in the first space The material type is used to obtain a second reverberation coefficient, and according to the first reverberation coefficient and the second reverberation coefficient, the target reverberation coefficient is obtained;
    当所述第一混响系数的置信度小于或等于所述第二混响值时,根据所述第一混响系数、所述第二混响系数和所述第一混响系数的置信度,得到所述目标混响系数。When the confidence degree of the first reverberation coefficient is less than or equal to the second reverberation value, according to the first reverberation coefficient, the second reverberation coefficient and the confidence degree of the first reverberation coefficient , to obtain the target reverberation coefficient.
  3. 根据权利要求1或2所述的方法,其特征在于,所述声场环境参数为目标吸收系数,所述根据所述空间参数和所述声音参数,确定所述声场环境参数,具体包括:The method according to claim 1 or 2, wherein the sound field environment parameter is a target absorption coefficient, and the determination of the sound field environment parameter according to the space parameter and the sound parameter specifically includes:
    当第一吸收系数的置信度大于第一吸收值时,确定所述目标吸收系数为所述第一吸收系数,其中,所述第一吸收系数根据所述第一空间内的物体的材料类型得到;When the confidence level of the first absorption coefficient is greater than the first absorption value, determine the target absorption coefficient as the first absorption coefficient, wherein the first absorption coefficient is obtained according to the material type of the object in the first space ;
    当所述第一吸收系数的置信度小于或等于所述第一吸收值,且大于第二吸收值时,根据所述第一空间的第一大小和所述第一混响系数,得到第二吸收系数,以及根据所述第一吸收系数和所述第二吸收系数,得到所述目标吸收系数;When the confidence of the first absorption coefficient is less than or equal to the first absorption value and greater than the second absorption value, according to the first size of the first space and the first reverberation coefficient, the second an absorption coefficient, and obtaining the target absorption coefficient according to the first absorption coefficient and the second absorption coefficient;
    当所述第一吸收系数的置信度小于或等于所述第二吸收值时,根据所述第一吸收系数、所述第二吸收系数和所述第一吸收系数的置信度,得到所述目标吸收系数。When the confidence degree of the first absorption coefficient is less than or equal to the second absorption value, the target is obtained according to the first absorption coefficient, the second absorption coefficient and the confidence degree of the first absorption coefficient absorption coefficient.
  4. 根据权利要求1-3任一所述的方法,其特征在于,所述声场环境参数为所述第一空间的目标大小,所述根据所述空间参数和所述声音参数,确定所述声场环境参数,具体包括:The method according to any one of claims 1-3, wherein the sound field environment parameter is the target size of the first space, and the sound field environment is determined according to the space parameter and the sound parameter parameters, including:
    当所述第一空间的第一大小的置信度大于第一尺寸值时,确定所述目标大小为所述第一大小,其中,所述第一大小根据所述第一空间内的物体的材料类型得到;When the confidence level of the first size of the first space is greater than the first size value, determine that the target size is the first size, wherein the first size is based on the material of the object in the first space type get;
    当所述第一大小的置信度小于或等于所述第一尺寸值,且大于第二尺寸值时,根据所述第一混响系数和所述第一空间内的物体的材料类型,得到第二大小,以及根据所述第一大小和所述第二大小,得到所述目标大小;When the confidence degree of the first size is less than or equal to the first size value and greater than the second size value, according to the first reverberation coefficient and the material type of the object in the first space, a second two sizes, and obtaining the target size based on the first size and the second size;
    当所述第一大小的置信度小于或等于所述第二尺寸值时,根据所述第一大小、所述第二大小和所述第一大小的置信度,得到所述目标大小。When the confidence degree of the first size is less than or equal to the second size value, the target size is obtained according to the first size, the second size, and the confidence degree of the first size.
  5. 根据权利要求1-4任一所述的方法,其特征在于,所述根据所述声场环境参数,对所 述电子设备进行控制,具体包括:The method according to any one of claims 1-4, wherein the controlling the electronic device according to the sound field environment parameters specifically includes:
    根据所述声场环境参数,确定与所述声场环境参数相匹配的目标语音识别模型;Determining a target speech recognition model matching the sound field environment parameters according to the sound field environment parameters;
    将所述电子设备中的语音识别模型更新为所述目标语音识别模型。updating the speech recognition model in the electronic device to the target speech recognition model.
  6. 根据权利要求1-5任一所述的方法,其特征在于,所述根据所述声场环境参数,对所述电子设备进行控制,具体包括:The method according to any one of claims 1-5, wherein the controlling the electronic device according to the sound field environment parameters specifically includes:
    根据所述声场环境参数,对所述电子设备所处的声场环境进行建模,得到所述第一空间的空间模型;Modeling the sound field environment where the electronic device is located according to the sound field environment parameters to obtain a space model of the first space;
    基于所述空间模型进行声场模拟,得到位于所述第一空间中目标位置处对应的第一频响曲线;performing sound field simulation based on the space model to obtain a first frequency response curve corresponding to a target position in the first space;
    基于所述声场环境参数,从预置的理想声学频响库中确定出与所述声场环境参数相匹配的第二频响曲线;Based on the sound field environment parameters, determining a second frequency response curve matching the sound field environment parameters from a preset ideal acoustic frequency response library;
    将所述第一频响曲线拟合为所述第二频响曲线。Fitting the first frequency response curve to the second frequency response curve.
  7. 根据权利要求1-6任一所述的方法,其特征在于,所述根据所述声场环境参数,对所述电子设备进行控制,具体包括:The method according to any one of claims 1-6, wherein the controlling the electronic device according to the sound field environment parameters specifically includes:
    将所述声场环境参数作为所述电子设备中对语音数据进行处理的增强算法的输入。The sound field environment parameter is used as an input of an enhancement algorithm for processing voice data in the electronic device.
  8. 一种电子设备控制装置,其特征在于,包括:A control device for electronic equipment, characterized in that it includes:
    至少一个存储器,用于存储程序;at least one memory for storing programs;
    至少一个处理器,用于执行存储器存储的程序,当存储器存储的程序被执行时,处理器用于执行如权利要求1-7中任一所述的方法。At least one processor is used to execute the program stored in the memory, and when the program stored in the memory is executed, the processor is used to execute the method according to any one of claims 1-7.
  9. 一种电子设备,其特征在于,包括:An electronic device, characterized in that it comprises:
    至少一个存储器,用于存储程序;at least one memory for storing programs;
    至少一个处理器,用于执行存储器存储的程序,当存储器存储的程序被执行时,处理器用于执行如权利要求1-7中任一所述的方法。At least one processor is used to execute the program stored in the memory, and when the program stored in the memory is executed, the processor is used to execute the method according to any one of claims 1-7.
  10. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,当所述计算机程序在电子设备上运行时,使得所述电子设备执行如权利要求1-7任一所述的方法。A computer-readable storage medium, the computer-readable storage medium stores a computer program, and when the computer program runs on an electronic device, the electronic device executes the method according to any one of claims 1-7 .
  11. 一种计算机程序产品,其特征在于,当所述计算机程序产品在电子设备上运行时,使得所述电子设备执行如权利要求1-7任一所述的方法。A computer program product, characterized in that, when the computer program product is run on an electronic device, the electronic device is made to execute the method according to any one of claims 1-7.
PCT/CN2022/136611 2022-01-14 2022-12-05 Electronic device control method and apparatus, and electronic device WO2023134328A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210042081.6A CN116489572A (en) 2022-01-14 2022-01-14 Electronic equipment control method and device and electronic equipment
CN202210042081.6 2022-01-14

Publications (1)

Publication Number Publication Date
WO2023134328A1 true WO2023134328A1 (en) 2023-07-20

Family

ID=87221880

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/136611 WO2023134328A1 (en) 2022-01-14 2022-12-05 Electronic device control method and apparatus, and electronic device

Country Status (2)

Country Link
CN (1) CN116489572A (en)
WO (1) WO2023134328A1 (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN205754811U (en) * 2016-05-12 2016-11-30 惠州Tcl移动通信有限公司 Mobile terminal and audio frequency processing system thereof
CN109686380A (en) * 2019-02-18 2019-04-26 广州视源电子科技股份有限公司 Processing method, device and the electronic equipment of voice signal
US20190394567A1 (en) * 2018-06-22 2019-12-26 EVA Automation, Inc. Dynamically Adapting Sound Based on Background Sound
CN111766303A (en) * 2020-09-03 2020-10-13 深圳市声扬科技有限公司 Voice acquisition method, device, equipment and medium based on acoustic environment evaluation
US10897570B1 (en) * 2019-01-28 2021-01-19 Facebook Technologies, Llc Room acoustic matching using sensors on headset
US20210058731A1 (en) * 2018-05-11 2021-02-25 Clepseadra, Inc. Acoustic program, acoustic device, and acoustic system
CN113597777A (en) * 2019-05-15 2021-11-02 苹果公司 Audio processing

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013058728A1 (en) * 2011-10-17 2013-04-25 Nuance Communications, Inc. Speech signal enhancement using visual information
CN111863005A (en) * 2019-04-28 2020-10-30 北京地平线机器人技术研发有限公司 Sound signal acquisition method and device, storage medium and electronic equipment

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN205754811U (en) * 2016-05-12 2016-11-30 惠州Tcl移动通信有限公司 Mobile terminal and audio frequency processing system thereof
US20210058731A1 (en) * 2018-05-11 2021-02-25 Clepseadra, Inc. Acoustic program, acoustic device, and acoustic system
US20190394567A1 (en) * 2018-06-22 2019-12-26 EVA Automation, Inc. Dynamically Adapting Sound Based on Background Sound
US10897570B1 (en) * 2019-01-28 2021-01-19 Facebook Technologies, Llc Room acoustic matching using sensors on headset
CN109686380A (en) * 2019-02-18 2019-04-26 广州视源电子科技股份有限公司 Processing method, device and the electronic equipment of voice signal
CN113597777A (en) * 2019-05-15 2021-11-02 苹果公司 Audio processing
CN111766303A (en) * 2020-09-03 2020-10-13 深圳市声扬科技有限公司 Voice acquisition method, device, equipment and medium based on acoustic environment evaluation

Also Published As

Publication number Publication date
CN116489572A (en) 2023-07-25

Similar Documents

Publication Publication Date Title
WO2021098405A1 (en) Data transmission method and apparatus, terminal, and storage medium
CN105814909B (en) System and method for feeding back detection
WO2021027476A1 (en) Method for voice controlling apparatus, and electronic apparatus
WO2016184119A1 (en) Volume adjustment method, system and equipment, and computer storage medium
WO2021204098A1 (en) Voice interaction method and electronic device
US20110117851A1 (en) Method and apparatus for remote controlling bluetooth device
WO2018133307A1 (en) Method and terminal for implementing voice control
US20170149961A1 (en) Electronic device and call service providing method thereof
WO2021180083A1 (en) Bluetooth communication system and wireless communication system
WO2023016018A1 (en) Voice processing method and electronic device
JP7347728B2 (en) Voiceprint recognition methods and devices
WO2021190545A1 (en) Call processing method and electronic device
CN112331202B (en) Voice screen projection method and device, electronic equipment and computer readable storage medium
JP2023514751A (en) Power adjustment method and electronic device
CN111696562A (en) Voice wake-up method, device and storage medium
WO2020177687A1 (en) Mode setting method and device, electronic apparatus, and storage medium
WO2021203906A1 (en) Automatic volume adjustment method and apparatus, and medium and device
CN111613213A (en) Method, device, equipment and storage medium for audio classification
US11741984B2 (en) Method and apparatus and telephonic system for acoustic scene conversion
WO2022147692A1 (en) Voice command recognition method, electronic device and non-transitory computer-readable storage medium
KR102226817B1 (en) Method for reproducing contents and an electronic device thereof
WO2023134328A1 (en) Electronic device control method and apparatus, and electronic device
WO2021000778A1 (en) Uplink transmission discarding method, uplink transmission discarding configuration method and related device
WO2020192673A1 (en) Resource configuration method, resource determining method, network side device, and terminal
US10923123B2 (en) Two-person automatic speech recognition training to interpret unknown voice inputs

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22919961

Country of ref document: EP

Kind code of ref document: A1