WO2023134328A1

WO2023134328A1 - Electronic device control method and apparatus, and electronic device

Info

Publication number: WO2023134328A1
Application number: PCT/CN2022/136611
Authority: WO
Inventors: 孙晨; 吕帅林; 周小鹏; 李伟
Original assignee: 华为技术有限公司
Priority date: 2022-01-14
Filing date: 2022-12-05
Publication date: 2023-07-20
Also published as: CN116489572A

Abstract

An electronic device control method, relating to the technical field of AI. The method comprises: by means of a camera, obtaining a first image of a first space where an electronic device is located, and by means of a microphone, obtaining a first sound in the first space; determining spatial parameters of the first space according to the first image, and according to the first sound, determining sound parameters corresponding to the first space; determining sound field environment parameters according to the spatial parameters and the sound parameters, wherein the sound field environment parameters comprise at least one of a target reverberation coefficient, a target absorption coefficient and the target size of the first space, and the target absorption coefficient is used for representing an absorption coefficient corresponding to the material of an object in the first space; and controlling the electronic device according to the sound field environment parameters. Therefore, unified models such as a voice recognition model, an audio playback parameters and the like in the electronic device may adapt to different sound field environments, and performance degradation of the unified models in various different environments may be prevented.

Description

Electronic device control method, device and electronic device

This application claims the priority of the Chinese patent application filed with the State Intellectual Property Office of China on January 14, 2022, with the application number 202210042081.6 and the application name "A method, device, and electronic equipment for controlling electronic equipment", the entire content of which is passed References are incorporated in this application.

technical field

The present application relates to the technical field of artificial intelligence, and in particular to an electronic equipment control method, device and electronic equipment.

Background technique

Electronic devices such as smart screens and smart speakers are rapidly entering thousands of households. People can use these devices to watch TV programs or listen to music. In order to make it easier for users to use these devices, some commonly used speech recognition models, audio playback acoustic parameters, etc. are usually preset into the device, such as wake-up word detection (keyword spotting, KWS) model, speech recognition (automatic speech recognition, ASR) Model, equalize (EQ) parameters, dynamic range compression (DRC) parameters, transmission channel delay parameters corresponding to each pickup (such as microphone, etc.).

The speech recognition model and audio playback parameters in the current electronic equipment are mainly obtained through laboratory scene simulation and acoustic environment simulation. In this way, in order to adapt to the general family scene, a model with better generalization will be selected. Or parameters are uniformly deployed to the device side to meet the experience of most users. However, each user's actual home environment space size, overall layout, and decoration materials are not the same, resulting in differences in the sound field environment. Due to the existence of this difference, the performance of the unified model may degrade in various environments, affecting user experience.

Contents of the invention

The present application provides an electronic equipment control method, device, electronic equipment, computer storage medium and computer program product, which can make the voice recognition model, audio playback parameters and other unified models in the electronic equipment adaptive to different sound field environments, avoiding the These unified models degrade performance in a variety of environments, improving user experience.

In a first aspect, the present application provides a method for controlling an electronic device, the method comprising: acquiring a first image of a first space where the electronic device is located through a camera, and acquiring a first sound in the first space through a microphone; according to the first image, Determining the spatial parameters of the first space, and according to the first sound, determining the corresponding sound parameters of the first space, the spatial parameters include the first size of the first space and the material type of objects in the first space, and the sound parameters include The first reverberation coefficient of the reverberation size in the first space; according to the space parameter and the sound parameter, determine the sound field environment parameter, the sound field environment parameter includes at least one of the target reverberation coefficient, the target absorption coefficient and the target size of the first space , the target absorption coefficient is used to characterize the absorption coefficient corresponding to the material of the object in the first space; and the electronic device is controlled according to the environmental parameters of the sound field.

In this way, through the combination of visual and acoustic multi-modality, the results of visual and acoustic parameter estimation (that is, spatial parameters and sound parameters) are mutually verified, so that the reliability of the obtained sound field environmental parameters is higher, which provides a basis for the subsequent evaluation of electronic equipment. Taking control provides a solid foundation for maximizing the user experience. For example, it can effectively improve the voice recognition service, reduce the audio playback effect from being affected by the sound field environment, improve the wake-up rate of electronic equipment and the recognition rate of ASR, and significantly improve the listening effect.

In a possible implementation manner, the sound field environment parameter is the target reverberation coefficient, and the sound field environment parameter is determined according to the space parameter and the sound parameter, which specifically includes: when the confidence degree of the first reverberation coefficient is greater than the first reverberation value, Determine the target reverberation coefficient as the first reverberation coefficient; when the confidence of the first reverberation coefficient is less than or equal to the first reverberation value and greater than the second reverberation value, according to the first size of the first space and the first The material type of the object in the space obtains the second reverberation coefficient, and obtains the target reverberation coefficient according to the first reverberation coefficient and the second reverberation coefficient; when the confidence degree of the first reverberation coefficient is less than or equal to the second reverberation coefficient When the reverberation value is used, the target reverberation coefficient is obtained according to the first reverberation coefficient, the second reverberation coefficient, and the confidence degree of the first reverberation coefficient.

In a possible implementation manner, the sound field environment parameter is a target absorption coefficient, and determining the sound field environment parameter according to the space parameter and the sound parameter specifically includes: when the confidence degree of the first absorption coefficient is greater than the first absorption value, determining the target absorption The coefficient is the first absorption coefficient, wherein the first absorption coefficient is obtained according to the material type of the object in the first space; when the confidence of the first absorption coefficient is less than or equal to the first absorption value and greater than the second absorption value, according to The first size of the first space and the first reverberation coefficient are used to obtain the second absorption coefficient, and the target absorption coefficient is obtained according to the first absorption coefficient and the second absorption coefficient; when the confidence degree of the first absorption coefficient is less than or equal to the second When the absorption value is obtained, the target absorption coefficient is obtained according to the first absorption coefficient, the second absorption coefficient, and the confidence level of the first absorption coefficient.

In a possible implementation manner, the sound field environment parameter is the target size of the first space, and determining the sound field environment parameter according to the space parameter and the sound parameter specifically includes: when the confidence degree of the first size of the first space is greater than the first size value, determine the target size as the first size, wherein the first size is obtained according to the material type of the object in the first space; when the confidence of the first size is less than or equal to the first size value and greater than the second size value , get the second size according to the first reverberation coefficient and the material type of the object in the first space, and get the target size according to the first size and the second size; when the confidence of the first size is less than or equal to the second size value, the target size is obtained based on the first size, the second size, and the confidence of the first size.

In a possible implementation manner, controlling the electronic device according to the sound field environment parameters specifically includes: determining a target speech recognition model that matches the sound field environment parameters according to the sound field environment parameters; updating the speech recognition model in the electronic device is the target speech recognition model. In this way, when performing speech recognition, the electronic device can adaptively optimize the speech recognition model according to the sound field environment parameters in the current environment, and use the speech recognition model that matches the current sound field environment for speech recognition, realizing the speech recognition function The adaptation to the user's actual use environment avoids the degradation of the model recognition performance due to the difference in the sound field environment, provides a guarantee for a good speech recognition service experience, and improves the user experience.

In a possible implementation manner, controlling the electronic device according to the sound field environment parameters specifically includes: modeling the sound field environment where the electronic device is located according to the sound field environment parameters to obtain a space model of the first space; The model performs sound field simulation to obtain the first frequency response curve corresponding to the target position in the first space; based on the sound field environmental parameters, the second frequency response matching the sound field environmental parameters is determined from the preset ideal acoustic frequency response library Curve; fit the first frequency response curve to the second frequency response curve. In this way, when the electronic device is playing sound, the audio playing effect can be adjusted adaptively, so that the user's listening effect can be optimized and the user experience can be improved. Exemplarily, the target position may be a position where the loudness, sense of space, strength, and clarity of the sound are optimal in the current sound field environment.

In a possible implementation manner, controlling the electronic device according to the sound field environment parameter specifically includes: using the sound field environment parameter as an input of an enhancement algorithm for processing voice data in the electronic device. In this way, when the user makes a voice call through the electronic device, the voice signal during the user's call is adaptively enhanced through the enhancement algorithm according to the input sound field environment parameters, so as to improve the call quality and user experience.

In a second aspect, the present application provides an electronic equipment control device, including: at least one memory for storing programs; at least one processor for executing the programs stored in the memory, and when the programs stored in the memory are executed, the processor is used to Perform the method as provided in the first aspect.

In a third aspect, the present application provides an electronic device, which includes at least one memory for storing programs and at least one processor for executing the programs stored in the memory. Wherein, when the program stored in the memory is executed, the processor is configured to execute the method as provided in the first aspect.

In a fourth aspect, the present application provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program runs on an electronic device, the electronic device executes the method as provided in the first aspect.

In a fifth aspect, the present application provides a computer program product, which, when the computer program product is run on an electronic device, causes the electronic device to execute the method as provided in the first aspect.

It can be understood that, for the beneficial effects of the above-mentioned second aspect to the fifth aspect, reference can be made to the relevant description in the above-mentioned first aspect, which will not be repeated here.

Description of drawings

The following briefly introduces the drawings used in the embodiments or the description of the prior art.

FIG. 1 is a schematic diagram of an application scenario provided by an embodiment of the present application;

FIG. 2 is a schematic diagram of a hardware structure of an electronic device provided in an embodiment of the present application;

FIG. 3 is a schematic flowchart of a method for controlling an electronic device provided in an embodiment of the present application;

FIG. 4 is a schematic diagram of steps for controlling electronic equipment according to sound field environmental parameters provided by an embodiment of the present application;

FIG. 5 is another schematic diagram of steps for controlling electronic equipment according to sound field environmental parameters provided by the embodiment of the present application;

FIG. 6 is a schematic diagram of a hardware structure of an electronic equipment control device provided by an embodiment of the present application.

Detailed ways

The term "and/or" in this article is an association relationship describing associated objects, which means that there can be three relationships, for example, A and/or B can mean: A exists alone, A and B exist simultaneously, and B exists alone These three situations. The symbol "/" in this document indicates that the associated object is an or relationship, for example, A/B indicates A or B.

The terms "first" and "second" and the like in the specification and claims herein are used to distinguish different objects, rather than to describe a specific order of objects. For example, the first response message and the second response message are used to distinguish different response messages, rather than describing a specific order of the response messages.

In the embodiments of the present application, words such as "exemplary" or "for example" are used as examples, illustrations or illustrations. Any embodiment or design scheme described as "exemplary" or "for example" in the embodiments of the present application shall not be interpreted as being more preferred or more advantageous than other embodiments or design schemes. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete manner.

In the description of the embodiments of the present application, unless otherwise specified, "multiple" means two or more, for example, multiple processing units refer to two or more processing units, etc.; multiple A component refers to two or more components or the like.

Exemplarily, Fig. 1 shows a schematic diagram of an application scenario. As shown in FIG. 1 , an electronic device 100 is provided in a room 200 , and a camera 110 , a microphone 120 and a speaker 130 may be provided on the electronic device 100 , but not limited thereto. The electronic device 100 can recognize and respond to sounds in the room 200, and can also play sounds, and so on. Exemplarily, this application scene can be understood as an indoor scene. Wherein, the electronic device 100 may be, but not limited to, a smart TV. The smart TV referred to in the embodiment of the present application may be a TV capable of interacting with mobile devices such as smart phones, tablet computers, etc., or other electronic devices with large screens, such as The user interface in the smart phone can be transmitted wirelessly and presented on the smart TV, and the user's operations on the smart TV can also affect the smart phone.

In some embodiments, the electronic device 100 shown in FIG. 1 can also be replaced with other electronic devices, and the replaced solution is still within the protection scope of the present application. Exemplarily, the electronic device 100 can be a mobile phone, a tablet computer, a desktop computer, a laptop computer, a handheld computer, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, and a cell phone, a personal Personal digital assistant (PDA), augmented reality (AR) device, virtual reality (VR) device, artificial intelligence (AI) device, wearable device and/or smart home device , the embodiment of the present application does not specifically limit the specific type of the electronic device 100 .

Exemplarily, FIG. 2 shows a schematic structural diagram of the electronic device 100 . As shown in FIG. 2 , the electronic device 100 may include: a camera 110 , a microphone 120 , a speaker 130 , a processor 140 , a memory 150 , a transceiver unit 160 and a display screen 170 .

Wherein, the camera 110 is used to capture still images or videos. The object generates an optical image through the lens and projects it to the photosensitive element. The photosensitive element can be a charge coupled device (charge coupled device, CCD) or a complementary metal-oxide-semiconductor (complementary metal-oxide-semiconductor, CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, and then transmits the electrical signal to the processor 140 for processing, so as to obtain image signals in standard RGB, YUV and other formats. In some embodiments, the electronic device 100 may include 1 or N cameras 110 , where N is a positive integer greater than 1. Exemplarily, the camera 110 may be used to collect images of the environment where the electronic device 100 is located. In some embodiments, the camera 110 and the electronic device 100 can be set separately or integrated together.

The microphone 120, also called "microphone" or "microphone", is used to convert sound signals into electrical signals. The electronic device 100 may be provided with at least one microphone 120 . In other embodiments, the electronic device 100 may be provided with two microphones 120, which may also implement a noise reduction function in addition to collecting sound signals. In some other embodiments, the electronic device 100 can also be provided with three, four or more microphones 120 to collect sound signals, reduce noise, identify sound sources, and realize directional recording functions, etc. Exemplarily, the microphone 120 may be used to collect sound signals in the environment, such as the sound emitted by the user. In some embodiments, the microphone 120 and the electronic device 100 can be set separately or integrated together.

The speaker 130, also called "horn", is used to convert audio electrical signals into sound signals. The electronic device 100 may play sound or the like through the speaker 130 . In some embodiments, the speaker 130 and the electronic device 100 can be set separately or integrated together.

Processor 140 may be a general purpose processor or a special purpose processor. For example, the processor 140 may include a central processing unit (central processing unit, CPU) and/or a baseband processor. Wherein, the baseband processor can be used to process communication data, and the CPU can be used to implement corresponding control and processing functions, execute software programs, and process data of the software programs.

A program (or an instruction or code) may be stored in the memory 150, and the program may be executed by the processor 140, so that the processor 140 executes the method described in this solution. Optionally, data may also be stored in the memory 150 . Optionally, the processor 140 may also read data stored in the memory 150 (for example, a wake-up word detection model, a speech recognition model, equalization parameters, dynamic range control parameters, transmission channel delay parameters corresponding to each microphone, etc.), the Data can be stored at the same memory address as the program, or it can be stored at a different memory address than the program. In this solution, the processor 140 and the memory 150 can be set separately, or can be integrated together, for example, integrated on a single board or a system on chip (system on chip, SOC).

In some embodiments, the electronic device 100 may further include a transceiver unit 160 . The transceiving unit 160 may implement input (reception) and output (transmission) of signals. For example, the transceiving unit 160 may include a transceiver or a radio frequency chip. The transceiver unit 160 may also include a communication interface. Exemplarily, the electronic device 100 can communicate with a server (not shown in the figure) through the transceiver unit 160, so as to obtain required data from the server, such as a speech recognition model and the like.

In some embodiments, the electronic device 100 may further include a display screen 170 . The display screen 170 can be used to display images, videos and the like. The display screen 170 may include a display panel. The display panel can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active matrix organic light emitting diode or an active matrix organic light emitting diode (active-matrix organic light emitting diode, AMOLED), flexible light-emitting diode (flex light-emitting diode, FLED), Miniled, MicroLed, Micro-oLed, quantum dot light emitting diodes (quantum dot light emitting diodes, QLED), etc. In some embodiments, the electronic device 100 may include 1 or N display screens, where N is a positive integer greater than 1.

It can be understood that, the structure illustrated in the embodiment of the present application does not constitute a specific limitation on the electronic device 100 . In other embodiments of the present application, the electronic device 100 may include more or fewer components than shown in the figure, or combine certain components, or separate certain components, or arrange different components. The illustrated components can be realized in hardware, software or a combination of software and hardware.

Next, based on the content described above, an electronic device control method provided by the present application will be introduced.

Exemplarily, FIG. 3 shows a schematic flowchart of a method for controlling an electronic device. The electronic device involved in FIG. 3 may be the electronic device 100 described above. It can be understood that the method can be executed by any device, device, platform, or device cluster that has computing and processing capabilities. For example, it may be executed by the electronic device 100 shown in FIG. 2 , or may be executed by devices such as a server. For ease of description, the execution of electronic equipment is taken as an example below. As shown in FIG. 3, the electronic equipment control method may include the following steps:

S301. Acquire a first image of a first space where the electronic device is located by using a camera.

Specifically, the electronic device may obtain a first image of the first space where the electronic device is located through a camera matched with the electronic device.

In some embodiments, when the method is executed by a device such as a server, after the electronic device obtains the first image, the first image may be sent to the device such as the server.

S302. Acquire a first sound in the first space through a microphone.

Specifically, the electronic device may acquire the first sound in the first space through a microphone matched with it. Exemplarily, the first sound may be a sound made by a user.

In some embodiments, before S301 and/or S302, the user can send an instruction to the electronic device to optimize the sound field environment parameters, and after the electronic device obtains the instruction, it can start the camera and/or microphone that matches it to obtain to the first image and/or first sound. Exemplarily, after the electronic device activates its matching microphone, it can prompt the user to make a sound, such as voice prompt, image prompt, text prompt, etc., so that the microphone can collect the sound from the user.

In some embodiments, when the method is executed by a device such as a server, after the electronic device acquires the first sound, the first sound may be sent to the device such as the server.

S303. Determine spatial parameters of the first space according to the first image.

Specifically, after the first image is acquired, the first image may be input to a pre-trained neural network model related to image processing to obtain spatial parameters of the first space. Exemplarily, the space parameters may include: a first size of the first space and a material type of objects in the first space. Exemplarily, the first size may be the size of the first space (such as volume, etc.). Exemplarily, there may be one or more first images.

S304. Determine sound parameters corresponding to the first space according to the first sound.

Specifically, after the first sound is acquired, the first sound may be input into a pre-trained neural network model related to sound processing, so as to obtain sound parameters corresponding to the first space. Exemplarily, the sound parameter may include a first reverberation coefficient used to characterize the magnitude of reverberation in the first space. Exemplarily, the first reverberation coefficient may be T60, that is, the time required for the sound to decay by 60db in the sound field.

S305. Determine a sound field environment parameter according to the space parameter and the sound parameter.

Specifically, after the space parameter and the sound parameter are acquired, the sound field environment parameter may be determined according to the space parameter and the sound parameter. Exemplarily, the sound field environment parameters may include a target reverberation coefficient, a target absorption coefficient, and a target size of the first space. In some embodiments, the sound field environment parameters may also include equalization EQ parameters.

In some embodiments, the determination of the target reverberation coefficient, the target absorption coefficient, and the target size of the first space according to the space parameter and the sound parameter will be described respectively below.

a) Target reverberation coefficient

If the confidence degree of the first reverberation coefficient is greater than the first reverberation value, the first reverberation coefficient may be used as the target reverberation coefficient. Wherein, the confidence degree of the first reverberation coefficient may be output together by the neural network model that outputs the first reverberation coefficient. Exemplarily, the first reverberation value may be 0.9.

If the confidence of the first reverberation coefficient is less than or equal to the first reverberation value and greater than the second reverberation value, the second can be calculated from the first size of the first space and the material type of the object in the first space. The reverberation coefficient. Then, the target reverberation coefficient is obtained from the first reverberation coefficient and the second reverberation coefficient. Exemplarily, the second reverberation value may be 0.6. Exemplarily, the formula for calculating the second reverberation coefficient may be:

RT≌0.161×V÷S (Formula 1)

Wherein, RT is the reverberation coefficient, V is the size of the first space, and S is the average value of the absorption coefficient of each material in the first space. The absorption coefficient of the material in the first space can be obtained by querying the relationship table between the material type and the absorption coefficient of the material after obtaining the type of the material.

Exemplarily, the target reverberation coefficient is obtained from the first reverberation coefficient and the second reverberation coefficient, specifically, an average value of the first reverberation coefficient and the second reverberation coefficient may be used as the target reverberation coefficient.

If the confidence degree of the first reverberation coefficient is less than or equal to the second reverberation value, the target reverberation coefficient may be obtained from the first reverberation coefficient and the second reverberation coefficient. Exemplarily, the formula for obtaining the target reverberation coefficient may be:

RT _mesh =(m/2)×RT ₁ +(1-m/2)×RT ₂ (Formula 2)

Wherein, _RTme is the target reverberation coefficient, RT ₁ is the first reverberation coefficient, RT ₂ is the second reverberation coefficient, and m is the confidence degree of the first reverberation coefficient (ie RT ₁ ).

b) Target Absorption Coefficient

If the confidence of the first absorption coefficient is greater than the first absorption value, the first absorption coefficient may be used as the target absorption coefficient. Wherein, the confidence degree of the first absorption coefficient may be output together by the neural network model that outputs the first absorption coefficient. Exemplarily, the first absorption value may be 0.8. Exemplarily, the first absorption coefficient may be an average value of absorption coefficients corresponding to materials of objects in the first space. Exemplarily, the target absorption coefficient may be used to characterize the absorption coefficient corresponding to the material of the objects (all objects or objects collected by the camera) in the first space.

If the confidence of the first absorption coefficient is less than or equal to the first absorption value and greater than the second absorption value, the second absorption coefficient can be calculated from the first reverberation coefficient and the size of the first space, and then the first The absorption coefficient and the second absorption coefficient to obtain the target absorption coefficient. Exemplarily, the second absorption value may be 0.5. Exemplarily, the first reverberation coefficient and the size of the first space may be calculated by the above-mentioned "Formula 1", so as to obtain the second absorption coefficient.

Exemplarily, the target absorption coefficient is obtained from the first absorption coefficient and the second absorption coefficient, specifically, the average value of the first absorption coefficient and the second absorption coefficient may be used as the target absorption coefficient.

If the confidence level of the first absorption coefficient is less than or equal to the second absorption value, the target absorption coefficient can be obtained from the first absorption coefficient and the second absorption coefficient. Exemplarily, the formula for obtaining the target absorption coefficient may be:

Ab _mesh =(n/2)×Ab ₁ +(1-n/2)Ab ₂ (Formula 3)

Wherein, Ab _mesh is the target absorption coefficient, Ab ₁ is the first absorption coefficient, Ab ₂ is the second absorption coefficient, and n is the confidence degree of the first absorption coefficient (ie Ab ₁ ).

c) The target size of the first space

If the confidence of the first size of the first space is greater than the first size value, the first size may be used as the target size. Wherein, the confidence degree of the first size may be output together by the neural network model outputting the first size. Exemplarily, the first size value may be 0.8.

If the confidence of the first size is less than or equal to the first size value and greater than the second size value, the second size can be calculated from the first reverberation coefficient and the first absorption coefficient, and then the first size and the second size Two size to get the target size. Exemplarily, the second size value may be 0.5. Exemplarily, the first reverberation coefficient and the first absorption coefficient may be calculated by the above "Formula 1", so as to obtain the second magnitude.

Exemplarily, the target size is obtained from the first size and the second size, specifically, the average value of the first size and the second size may be used as the target size.

If the confidence of the first size is less than or equal to the value of the second size, the target size can be obtained from the first size and the second size. Exemplarily, the formula for obtaining the target size may be:

_Vmesh = (p/2)×AV ₁ +(1-p/2)×V ₂ (Formula 3)

Wherein, _Vmesh is the target size, V ₁ is the first size, V ₂ is the second size, and p is the confidence of the first size (ie V ₁ ).

In this way, the accuracy of the acquired sound field environment parameters is improved through the above-mentioned consistency check of the spatial parameters acquired by vision and the sound parameters acquired by acoustics.

After the environmental parameters of the sound field are determined, S306 may be executed.

S306. Control the electronic device according to the sound field environment parameters.

Specifically, when the sound field environment parameter is determined, the electronic device can be controlled according to the sound field environment parameter, so that the electronic device can better adapt to the current sound field environment (ie, the current space).

As a possible implementation manner, when the voice recognition model is set in the electronic device, the voice recognition model matching the sound field environment parameter can be acquired from the sound field environment parameter. Specifically, as shown in Figure 4, the following steps are included:

S401. The electronic device sends a first message to a server, where the first message includes a sound field environment parameter, and the first message is used to request acquisition of a speech recognition model matching the sound field environment parameter.

S402. The server determines a target speech recognition model matching the sound field environment parameters according to the sound field environment parameters.

Specifically, speech recognition models corresponding to different sound field environment parameters may be preset in the server. After the server acquires the sound field environment parameters sent by the electronic device, the server can determine the target speech recognition model from its preset speech recognition models based on the sound field environment parameters.

Exemplarily, the weight value of each sub-parameter in the sound field environment parameters can be preset, and then, the matching degree between the sound field environment parameters acquired by the calculation server and the pre-stored sound field environment parameters is finally selected. The speech recognition model corresponding to one of the sound field environment parameters is used as the target speech recognition model. Among them, the matching degree can be calculated by the following "Formula 4", which is:

f=|RT _terminal -RT _cloud |×α+|V _terminal -V _cloud |×β+|Ab _terminal -Ab _cloud |×γ+|EQ _terminal -EQ _cloud |×δ+ε (Formula 4)

Among them, f is the matching degree, the RT _terminal is the reverberation coefficient in the sound field environment parameters sent by the electronic equipment obtained by the server, the RT _cloud is the reverberation coefficient in the sound field environment parameters preset in the server, and the V _terminal is the electronic equipment obtained by the server. The size of the space in the sound field environment parameters sent by the device, the V _cloud is the size of the space in the sound field environment parameters preset in the server, the Ab _end is the absorption coefficient in the sound field environment parameters sent by the electronic device obtained by the server, and the Ab _cloud is The absorption coefficient in the sound field environment parameters preset in the server, the EQ _terminal is the value of the EQ parameter in the sound field environment parameters sent by the electronic device obtained by the server, and the EQ _cloud is the value of the EQ parameter in the sound field environment parameters preset in the server , α, β, γ, δ, ε are preset weight values respectively. Each parameter in the formula can be selected according to actual conditions, and is not limited here.

S403. The server sends a second message to the electronic device, where the second message includes the target speech recognition model.

S404. The electronic device uses the target speech recognition model to perform speech recognition.

In some embodiments, S401 to S404 may also be referred to as: determining a target speech recognition model matching the sound field environment parameters according to the sound field environment parameters; updating the speech recognition model in the electronic device to the target speech recognition model.

In this way, when performing speech recognition, the electronic device can adaptively optimize the speech recognition model according to the sound field environment parameters in the current environment, and use the speech recognition model that matches the current sound field environment for speech recognition, realizing the speech recognition function The adaptation to the user's actual use environment avoids the degradation of the model recognition performance due to the difference in the sound field environment, provides a guarantee for a good speech recognition service experience, and improves the user experience.

As another possible implementation, when the electronic device is playing sound, the sound field distribution map of the environment where the electronic device is located can be calculated from the sound field environment parameters, and the audio playback effect can be adaptively adjusted according to the sound field distribution map and combined with artificial intelligence search algorithms. Parameter, so that the user's listening effect can reach the best. Specifically, as shown in Figure 5, the following steps are included:

S501. Model the current sound field environment according to the sound field environment parameters, so as to obtain a space model of the first space where the electronic device is located.

Specifically, when modeling, it is possible, but not limited to, to carry out space modeling through preset sound field modeling methods (such as open source pyroom library, etc.), as well as the size of the space included in the sound field environment parameters and the absorption coefficient of each object in the space , so as to complete the modeling of the current sound field environment, so as to obtain the space model of the first space where the electronic device is located.

S502. Perform sound field simulation based on the obtained spatial model, to obtain a first frequency response curve corresponding to the target position.

Specifically, after the space model of the first space is obtained, sound field simulation may be performed in the space model by using sound field simulation technology, so as to obtain the first frequency response curve corresponding to the target position. Exemplarily, the target position may be a position where the loudness, sense of space, strength, and clarity of the sound are optimal in the current sound field environment.

S503. Based on the obtained sound field environment parameters, determine a second frequency response curve matching the sound field environment parameters from a preset ideal acoustic frequency response library.

Specifically, based on the obtained sound field environment parameters, a second frequency response curve matching the sound field environment parameters may be determined from a preset ideal acoustic frequency response library. Exemplarily, but not limited to, the matching degree between the obtained sound field environment parameters and the sound field environment parameters corresponding to each frequency response curve in the ideal acoustic frequency response library may be determined through the aforementioned "Formula 4".

S504. Fit the first frequency response curve to the second frequency response curve.

Specifically, it is possible to compare the difference between the first frequency response curve and the second frequency response curve, and then use the difference between the two to adjust EQ, DRC, delay parameters of transmission channels corresponding to each microphone, etc., so that The first frequency response curve is fitted to the second frequency response curve, so that the loudness, sense of space, strength, and clarity of the sound heard by the user at the target position are optimized, and the listening effect is the best.

In this way, when the electronic device is playing sound, the audio playing effect can be adjusted adaptively, so that the user's listening effect can be optimized and the user experience can be improved.

As yet another possible implementation, when an electronic device is used to make a call, after the sound field environment parameter is obtained, the sound field environment parameter can be used as the input of the enhancement algorithm for processing voice data in the electronic device, and the enhancement algorithm can be used according to the input The sound field environment parameters adaptively enhance the voice signal of the user during the call, so as to improve the call quality and enhance the user experience.

Therefore, through the combination of vision and acoustic multi-modality, the results of visual and acoustic parameter estimation (that is, spatial parameters and sound parameters) are mutually verified, so that the reliability of the acquired sound field environmental parameters is higher, which is a good foundation for the subsequent electronic research. Device control provides a solid foundation, which can greatly improve user experience. For example, it can effectively improve the voice recognition service, reduce the audio playback effect from being affected by the sound field environment, improve the wake-up rate of electronic equipment and the recognition rate of ASR, and significantly improve the listening effect.

It can be understood that the sequence numbers of the steps in the above embodiments do not mean the order of execution, and the execution order of each process should be determined by its functions and internal logic, and should not constitute any obligation for the implementation process of the embodiment of the present application. limited. In addition, in some possible implementation manners, the steps in the foregoing embodiments may be selectively executed according to actual conditions, may be partially executed, or may be completely executed, which is not limited here.

Based on the methods described in the foregoing embodiments, the embodiments of the present application further provide an electronic device control device. Please refer to FIG. 6 . FIG. 6 is a schematic structural diagram of an electronic equipment control device provided by an embodiment of the present application. As shown in FIG. 6 , an electronic equipment control device 600 includes one or more processors 601 and an interface circuit 602 . Optionally, the electronic device control apparatus 600 may also include a bus 603 . in:

The processor 601 may be an integrated circuit chip and has signal processing capability. In the implementation process, each step of the above method may be completed by an integrated logic circuit of hardware in the processor 601 or instructions in the form of software. The above-mentioned processor 601 can be a general-purpose processor, a neural network processor (Neural Network Processing Unit, NPU), a digital communicator (DSP), an application-specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable Logic devices, discrete gate or transistor logic devices, discrete hardware components. Various methods and steps disclosed in the embodiments of the present application may be implemented or executed. A general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.

The interface circuit 602 can be used for sending or receiving data, instructions or information. The processor 601 can process the data, instructions or other information received by the interface circuit 602 , and can send the processing completion information through the interface circuit 602 .

Optionally, the electronic device control apparatus 600 further includes a memory, which may include a read-only memory and a random access memory, and provides operation instructions and data to the processor. A portion of the memory may also include non-volatile random access memory (NVRAM). Wherein, the memory may be coupled with the processor 601 .

Optionally, the memory stores executable software modules or data structures, and the processor 601 can execute corresponding operations by calling operation instructions stored in the memory (the operation instructions can be stored in the operating system).

Optionally, the interface circuit 602 may be used to output an execution result of the processor 601 .

It should be noted that the corresponding functions of the processor 601 and the interface circuit 602 can be realized by hardware design, software design, or a combination of software and hardware, which is not limited here. Exemplarily, the electronic device control apparatus 600 may be applied in the electronic device 100 shown in FIG. 2 , but is not limited to.

It should be understood that each step in the foregoing method embodiments may be implemented by logic circuits in the form of hardware or instructions in the form of software in the processor.

It can be understood that the processor in the embodiment of the present application may be a central processing unit (central processing unit, CPU), and may also be other general processors, digital signal processors (digital signal processor, DSP), application specific integrated circuits (application specific integrated circuit, ASIC), field programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, transistor logic devices, hardware components or any combination thereof. A general-purpose processor can be a microprocessor, or any conventional processor.

The method steps in the embodiments of the present application may be implemented by means of hardware, or may be implemented by means of a processor executing software instructions. The software instructions can be composed of corresponding software modules, and the software modules can be stored in random access memory (random access memory, RAM), flash memory, read-only memory (read-only memory, ROM), programmable read-only memory (programmable rom) , PROM), erasable programmable read-only memory (erasable PROM, EPROM), electrically erasable programmable read-only memory (electrically EPROM, EEPROM), register, hard disk, mobile hard disk, CD-ROM or known in the art any other form of storage medium. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be a component of the processor. The processor and storage medium can be located in the ASIC.

In the above embodiments, all or part of them may be implemented by software, hardware, firmware or any combination thereof. When implemented using software, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present application will be generated in whole or in part. The computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable devices. The computer instructions may be stored in or transmitted via a computer-readable storage medium. The computer instructions may be transmitted from one website site, computer, server, or data center to another website site by wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) , computer, server or data center for transmission. The computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server or a data center integrated with one or more available media. The available medium may be a magnetic medium (such as a floppy disk, a hard disk, or a magnetic tape), an optical medium (such as a DVD), or a semiconductor medium (such as a solid state disk (solid state disk, SSD)), etc.

It can be understood that the various numbers involved in the embodiments of the present application are only for convenience of description, and are not used to limit the scope of the embodiments of the present application.

Claims

A control method for electronic equipment, characterized in that the method comprises:

acquiring a first image of the first space where the electronic device is located through a camera, and acquiring a first sound in the first space through a microphone;

Determine a spatial parameter of the first space according to the first image, and determine a sound parameter corresponding to the first space according to the first sound, where the spatial parameter includes a first size of the first space and the material type of the object in the first space, the sound parameters include a first reverberation coefficient for characterizing the magnitude of reverberation in the first space;

According to the space parameter and the sound parameter, determine the sound field environment parameter, the sound field environment parameter includes at least one of a target reverberation coefficient, a target absorption coefficient, and a target size of the first space, the target The absorption coefficient is used to characterize the absorption coefficient corresponding to the material of the object in the first space;

The electronic equipment is controlled according to the sound field environment parameters.
The method according to claim 1, wherein the sound field environment parameter is a target reverberation coefficient, and determining the sound field environment parameter according to the space parameter and the sound parameter specifically includes:

When the confidence degree of the first reverberation coefficient is greater than a first reverberation value, determining that the target reverberation coefficient is the first reverberation coefficient;

When the confidence degree of the first reverberation coefficient is less than or equal to the first reverberation value and greater than the second reverberation value, according to the first size of the first space and the objects in the first space The material type is used to obtain a second reverberation coefficient, and according to the first reverberation coefficient and the second reverberation coefficient, the target reverberation coefficient is obtained;

When the confidence degree of the first reverberation coefficient is less than or equal to the second reverberation value, according to the first reverberation coefficient, the second reverberation coefficient and the confidence degree of the first reverberation coefficient , to obtain the target reverberation coefficient.
The method according to claim 1 or 2, wherein the sound field environment parameter is a target absorption coefficient, and the determination of the sound field environment parameter according to the space parameter and the sound parameter specifically includes:

When the confidence level of the first absorption coefficient is greater than the first absorption value, determine the target absorption coefficient as the first absorption coefficient, wherein the first absorption coefficient is obtained according to the material type of the object in the first space ;

When the confidence of the first absorption coefficient is less than or equal to the first absorption value and greater than the second absorption value, according to the first size of the first space and the first reverberation coefficient, the second an absorption coefficient, and obtaining the target absorption coefficient according to the first absorption coefficient and the second absorption coefficient;

When the confidence degree of the first absorption coefficient is less than or equal to the second absorption value, the target is obtained according to the first absorption coefficient, the second absorption coefficient and the confidence degree of the first absorption coefficient absorption coefficient.
The method according to any one of claims 1-3, wherein the sound field environment parameter is the target size of the first space, and the sound field environment is determined according to the space parameter and the sound parameter parameters, including:

When the confidence level of the first size of the first space is greater than the first size value, determine that the target size is the first size, wherein the first size is based on the material of the object in the first space type get;

When the confidence degree of the first size is less than or equal to the first size value and greater than the second size value, according to the first reverberation coefficient and the material type of the object in the first space, a second two sizes, and obtaining the target size based on the first size and the second size;

When the confidence degree of the first size is less than or equal to the second size value, the target size is obtained according to the first size, the second size, and the confidence degree of the first size.
The method according to any one of claims 1-4, wherein the controlling the electronic device according to the sound field environment parameters specifically includes:

Determining a target speech recognition model matching the sound field environment parameters according to the sound field environment parameters;

updating the speech recognition model in the electronic device to the target speech recognition model.
The method according to any one of claims 1-5, wherein the controlling the electronic device according to the sound field environment parameters specifically includes:

Modeling the sound field environment where the electronic device is located according to the sound field environment parameters to obtain a space model of the first space;

performing sound field simulation based on the space model to obtain a first frequency response curve corresponding to a target position in the first space;

Based on the sound field environment parameters, determining a second frequency response curve matching the sound field environment parameters from a preset ideal acoustic frequency response library;

Fitting the first frequency response curve to the second frequency response curve.
The method according to any one of claims 1-6, wherein the controlling the electronic device according to the sound field environment parameters specifically includes:

The sound field environment parameter is used as an input of an enhancement algorithm for processing voice data in the electronic device.
A control device for electronic equipment, characterized in that it includes:

at least one memory for storing programs;

At least one processor is used to execute the program stored in the memory, and when the program stored in the memory is executed, the processor is used to execute the method according to any one of claims 1-7.
An electronic device, characterized in that it comprises:

at least one memory for storing programs;

At least one processor is used to execute the program stored in the memory, and when the program stored in the memory is executed, the processor is used to execute the method according to any one of claims 1-7.
A computer-readable storage medium, the computer-readable storage medium stores a computer program, and when the computer program runs on an electronic device, the electronic device executes the method according to any one of claims 1-7 .
A computer program product, characterized in that, when the computer program product is run on an electronic device, the electronic device is made to execute the method according to any one of claims 1-7.