CN113593558A

CN113593558A - Far-field voice adaptation method, device, equipment and storage medium

Info

Publication number: CN113593558A
Application number: CN202110860879.7A
Authority: CN
Inventors: 方伟
Original assignee: Shenzhen Skyworth RGB Electronics Co Ltd
Current assignee: Shenzhen Skyworth RGB Electronics Co Ltd
Priority date: 2021-07-28
Filing date: 2021-07-28
Publication date: 2021-11-02

Abstract

The invention relates to the technical field of voice recognition, and discloses a far-field voice adaptation method, a far-field voice adaptation device, far-field voice adaptation equipment and a far-field voice adaptation storage medium. According to the method, when a far-field voice adaptation scheme switching instruction is detected, a far-field voice configuration item in an initial configuration file is modified to obtain a target configuration file, the target configuration file is analyzed to generate an attribute configuration item, and then the current far-field voice adaptation scheme is switched according to the attribute configuration item. The method is simpler, can adapt to various far-field voice adaptation schemes on the same hardware and software platform, and can automatically switch the far-field voice adaptation schemes.

Description

Far-field voice adaptation method, device, equipment and storage medium

Technical Field

The present invention relates to the field of speech recognition technologies, and in particular, to a far-field speech adaptation method, apparatus, device, and storage medium.

Background

With the development of science and technology, the application scenes of far-field speech recognition are more and more, and at present, the landing scheme is provided on equipment such as household appliances, electronic products, commercial advertisement machines and the like. Far field pronunciation have been fused the AI pronunciation search, by the AI voice interaction technique in wide application in intelligent TV, intelligent audio amplifier, far field pronunciation can realize the remote controller speech control of exempting from of 5 meters distances, and the user can directly say out the instruction through awakening up the word and carry out controlling of intelligent TV and intelligent audio amplifier far away, removes traditional bluetooth voice remote controller and need press the pronunciation button always and just can assign the loaded down with trivial details operation of voice command from.

On the other hand, the adaptation modes of far-field speech are different due to the fact that product platforms are diversified and the performance and function difference is large, and the system adaptation and verification of the far-field speech are very complicated. The adaptation scheme in the prior art mainly has two types, namely bottom layer awakening and front-end processing of recorded data, and upper layer acquiring original data for awakening and front-end processing, and the two types have respective advantages and disadvantages, but are required to be customized for a system, and have poor migratability. And because the hardware platform and the system version are different, one platform can only be on-line with a far-field voice adaptation scheme, the adaptation period is long, and long time is needed for solving the problems when meeting the performance and function problems.

The above is only for the purpose of assisting understanding of the technical aspects of the present invention, and does not represent an admission that the above is prior art.

Disclosure of Invention

The invention mainly aims to provide a far-field voice adaptation method, a far-field voice adaptation device, far-field voice adaptation equipment and a far-field voice adaptation storage medium, and aims to solve the technical problem that a far-field voice adaptation scheme cannot be automatically switched in the prior art.

In order to achieve the above object, the present invention provides a far-field speech adaptation method, which includes:

when a far-field voice adaptation scheme switching instruction is detected, modifying a far-field voice configuration item in an initial configuration file to obtain a target configuration file;

analyzing the target configuration file to generate an attribute configuration item;

and switching the current far-field voice adaptation scheme according to the attribute configuration item.

Optionally, before the step of modifying the far-field speech configuration item in the initial configuration file to obtain the target configuration file when the far-field speech adaptation scheme switching instruction is detected, the method further includes:

acquiring key identification information, and searching target configuration data corresponding to the key identification information in a preset mapping relation;

and configuring the default configuration file according to the target configuration data to obtain an initial configuration file.

Optionally, the step of modifying the far-field speech configuration item in the initial configuration file to obtain the target configuration file when the far-field speech adaptation scheme switching instruction is detected specifically includes:

when a far-field voice adaptation scheme switching instruction is detected, selecting a target automation script from pre-configured automation scripts according to the far-field voice adaptation scheme switching instruction;

and modifying each far-field voice configuration item in the initial configuration file through the target automation script to obtain a target configuration file.

Optionally, the step of analyzing the target configuration file to generate an attribute configuration item specifically includes:

restarting the system service, and analyzing the target configuration file through a preset program when restarting is completed to generate an attribute configuration item.

Optionally, the step of switching the current far-field speech adaptation scheme according to the attribute configuration item specifically includes:

determining the configuration type of the attribute configuration item according to the identification information in the attribute configuration item;

and switching the current far-field voice adaptation scheme according to the configuration type.

Optionally, the step of switching the current far-field speech adaptation scheme according to the configuration type specifically includes:

when the configuration type is an upper application type, switching the current far-field voice adaptation scheme to an upper application adaptation scheme;

and when the configuration type is the bottom-layer service type, switching the current far-field voice adaptation scheme to the bottom-layer service adaptation scheme.

Optionally, after the step of switching the current far-field speech adaptation scheme according to the attribute configuration item, the method further includes:

determining a target far-field voice adaptation scheme according to a switching result;

acquiring adaptation effect data of the target far-field voice adaptation scheme;

and when the adaptation effect data does not meet the preset condition, switching the target far-field voice adaptation scheme.

In addition, in order to achieve the above object, the present invention further provides a far-field speech adaptation apparatus, including:

the configuration modification module is used for modifying a far-field voice configuration item in the initial configuration file when a far-field voice adaptation scheme switching instruction is detected, so as to obtain a target configuration file;

the configuration analysis module is used for analyzing the target configuration file to generate an attribute configuration item;

and the scheme switching module is used for switching the current far-field voice adaptation scheme according to the attribute configuration item.

In addition, to achieve the above object, the present invention further provides a far-field speech adaptation apparatus, including: a memory, a processor, and a far-field speech adaptation program stored on the memory and executable on the processor, the far-field speech adaptation program configured to implement a far-field speech adaptation method as described above.

Furthermore, to achieve the above object, the present invention further provides a storage medium having a far-field speech adaptation program stored thereon, wherein the far-field speech adaptation program, when executed by a processor, implements the far-field speech adaptation method as described above.

According to the method, when a far-field voice adaptation scheme switching instruction is detected, a far-field voice configuration item in an initial configuration file is modified to obtain a target configuration file, the target configuration file is analyzed to generate an attribute configuration item, and then the current far-field voice adaptation scheme is switched according to the attribute configuration item. According to the method and the device, the required target configuration file can be obtained by modifying the far-field voice configuration item in the initial configuration file, then the target configuration file is re-analyzed, and the attribute configuration item generated after analysis is switched to the required far-field voice adaptation scheme.

Drawings

Fig. 1 is a schematic structural diagram of a far-field speech adaptation device of a hardware operating environment according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating a far-field speech adaptation method according to a first embodiment of the present invention;

FIG. 3 is a flowchart illustrating a far-field speech adaptation method according to a second embodiment of the present invention;

FIG. 4 is a flowchart illustrating a far-field speech adaptation method according to a third embodiment of the present invention;

fig. 5 is a block diagram of a far-field speech adapter according to a first embodiment of the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Referring to fig. 1, fig. 1 is a schematic structural diagram of a far-field speech adaptation device in a hardware operating environment according to an embodiment of the present invention.

As shown in fig. 1, the far-field speech adaptation apparatus may include: a processor 1001, such as a Central Processing Unit (CPU), a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a Wireless interface (e.g., a Wireless-Fidelity (Wi-Fi) interface). The Memory 1005 may be a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as a disk Memory. The memory 1005 may alternatively be a storage device separate from the processor 1001.

Those skilled in the art will appreciate that the configuration shown in fig. 1 does not constitute a limitation of the far-field speech adaptation device and may include more or fewer components than those shown, or some components in combination, or a different arrangement of components.

As shown in fig. 1, a memory 1005, which is a storage medium, may include therein an operating system, a network communication module, a user interface module, and a far-field speech adaptation program.

In the far-field speech adaptation device shown in fig. 1, the network interface 1004 is mainly used for data communication with a network server; the user interface 1003 is mainly used for data interaction with a user; the processor 1001 and the memory 1005 of the far-field speech adaptation device of the present invention may be provided in the far-field speech adaptation device, and the far-field speech adaptation device calls the far-field speech adaptation program stored in the memory 1005 through the processor 1001 and executes the far-field speech adaptation method provided by the embodiment of the present invention.

An embodiment of the present invention provides a far-field speech adaptation method, and referring to fig. 2, fig. 2 is a flowchart illustrating a first embodiment of the far-field speech adaptation method according to the present invention.

In this embodiment, the far-field speech adaptation method includes the following steps:

step S10: and when a far-field voice adaptation scheme switching instruction is detected, modifying the far-field voice configuration item in the initial configuration file to obtain a target configuration file.

It should be noted that the execution subject of the present embodiment may be the far-field speech adaptation device with the data processing, network communication, and program running functions, for example: the television may also be other devices capable of implementing the same or similar functions, and this embodiment is not particularly limited to this. In this embodiment and the following embodiments, the far-field speech adaptation method of the present invention is described by taking a television as an example.

It is understood that the far-field speech adaptation scheme switching instruction refers to an instruction to switch the current far-field speech adaptation scheme.

It will be appreciated that a configuration file is a computer file that can configure parameters and initial settings for some computer programs, the configuration file including annotation content and configuration item content, and thus, the configuration file can be modified by modifying the configuration item content.

In a specific implementation, the remote speech configuration items may include a built-in microphone, an algorithm type, a company name, a microphone number, a performance function attribute, a hotword function, a voiceprint function, and the like, and the remote speech configuration items are already configured with corresponding values, and this embodiment is not particularly limited to specific values, and this embodiment is also not particularly limited to the content of the far-field speech configuration item.

Further, to acquire the initial configuration file, before the step S10, the method further includes: acquiring key identification information, and searching target configuration data corresponding to the key identification information in a preset mapping relation; and configuring the default configuration file according to the target configuration data to obtain an initial configuration file.

It should be noted that the key identification information refers to information capable of identifying the attribute of the current system, for example: the model and the chip have corresponding configuration data for each type and chip of television, and the type of the key identification information is not specifically limited in this embodiment.

It can be understood that, a developer in the preset mapping relationship may pre-store the corresponding relationship between the key identification information and the configuration data, so that the target configuration data corresponding to the current key identification information may be found through the preset mapping relationship.

In a specific implementation, the default configuration file may be a blank configuration file, and the initial configuration file may be obtained by adding the target configuration data to the default configuration file.

According to the embodiment, the initial configuration file is obtained according to the key identification information, and different configuration files are configured for each type of equipment, so that the initial configuration file can be more accurate, and a far-field voice adaptation scheme can be automatically switched.

Step S20: and analyzing the target configuration file to generate an attribute configuration item.

Further, the step S20 includes: restarting the system service, and analyzing the target configuration file through a preset program when restarting is completed to generate an attribute configuration item.

It should be noted that the preset program refers to a code which is written in advance by a developer and added into the system, and can be analyzed again after restarting the system service, for a user, the television can be restarted when the developer wants to restart, and for the developer, the upper-layer application and the bottom-layer service can be restarted when the developer wants to restart.

It can be understood that the attribute configuration items may include whether there are multiple audio functions, whether there is a built-in microphone, whether the algorithm is a software algorithm or a DSP algorithm, a company name, a serial number of the microphone, how many channels the microphone has, whether voiceprints are supported, whether hotwords are supported, and the like, and may further include other attribute configuration items, which is not particularly limited in this embodiment.

In the embodiment, the system service is restarted, and the configuration file can be reanalyzed, so that the content of the attribute configuration item can be more accurate, and a far-field voice adaptation scheme can be automatically switched.

Step S30: and switching the current far-field voice adaptation scheme according to the attribute configuration item.

It can be understood that each attribute configuration item is assigned with a value, and the current remote voice adaptation scheme can be switched according to the value of each attribute configuration item.

Further, after step S30, the method further includes: determining a target far-field voice adaptation scheme according to a switching result; acquiring adaptation effect data of the target far-field voice adaptation scheme; and when the adaptation effect data does not meet the preset condition, switching the target far-field voice adaptation scheme.

It can be understood that after the switching of the current far-field speech adaptation scheme, the target far-field speech adaptation scheme after the switching, i.e. an adaptation scheme between the upper layer application and the bottom layer service, can be obtained.

It should be understood that the fitting effect data refers to the fitting effect data of the target far-field speech fitting scheme and the current system. The preset condition refers to a condition preset by a developer, and may be set such that the occupancy of the CPU does not exceed a certain value, for example: 10%, 15%, etc., which are not specifically limited by the present embodiment, and the preset condition may be set according to the actual situation, which is also not specifically limited by the present embodiment.

In a specific implementation, when a target far-field speech adaptation scheme is an upper-layer application adaptation scheme, if the adaptation effect is not good, switching to a bottom-layer service adaptation scheme is performed, and if the adaptation effect is good, switching is not required; when the target far-field voice adaptation scheme is the bottom-layer service adaptation scheme, if the adaptation effect is not good, switching to the upper-layer application adaptation scheme is performed, and if the adaptation effect is good, switching is not needed.

The embodiment also judges the adaptation effect of the switched far-field speech adaptation scheme after the switching of the current far-field speech adaptation scheme, and can automatically switch to another far-field speech adaptation scheme under the condition of poor adaptation effect, thereby improving the user experience.

In this embodiment, when a far-field speech adaptation scheme switching instruction is detected, a far-field speech configuration item in an initial configuration file is modified to obtain a target configuration file, and then the target configuration file is analyzed to generate an attribute configuration item, and then a current far-field speech adaptation scheme is switched according to the attribute configuration item. In this embodiment, a required target configuration file can be obtained by modifying a far-field voice configuration item in an initial configuration file, then the target configuration file is re-analyzed, and the attribute configuration item generated after analysis is switched to a required far-field voice adaptation scheme.

Referring to fig. 3, fig. 3 is a flowchart illustrating a far-field speech adaptation method according to a second embodiment of the present invention.

Based on the first embodiment described above, in the present embodiment, the step S10 includes:

step S101: and when a far-field voice adaptation scheme switching instruction is detected, selecting a target automation script from the pre-configured automation scripts according to the far-field voice adaptation scheme switching instruction.

It should be noted that the preconfigured automation script is a script that is written into the system by the developer in advance, and the script can modify the configuration items in the configuration file.

It can be understood that there is one corresponding automation script for each far-field speech adaptation scheme, so that a target automation script corresponding to an instruction needs to be selected from preconfigured automation scripts according to a far-field speech adaptation scheme switching instruction. For example: when the far-field voice adaptation scheme switching instruction is to switch to an upper-layer application scheme, selecting a script related to an upper-layer application from the pre-configured automatic scripts; when the far-field voice adaptation scheme switching instruction is to switch to the lower-layer service scheme, a script related to the lower-layer service needs to be selected from the pre-configured automation scripts.

Step S102: and modifying each far-field voice configuration item in the initial configuration file through the target automation script to obtain a target configuration file.

It can be understood that, since the present embodiment is modified for the far-field speech adaptation scheme, only the configuration items related to the far-field speech need to be modified to implement the scheme of the present embodiment.

In a specific implementation, the target automation script may modify a value corresponding to the configuration item, for example: the target configuration file may be obtained by modifying values corresponding to the built-in microphone, the algorithm type, the company name, the microphone number, the performance function attribute, the hotword function, and the voiceprint function, and may include other types.

In the embodiment, when a far-field voice adaptation scheme switching instruction is detected, a target automation script is selected from pre-configured automation scripts according to the far-field voice adaptation scheme switching instruction, and then each far-field voice configuration item in an initial configuration file is modified through the target automation script to obtain a target configuration file. In the embodiment, each far-field voice configuration item in the initial configuration file is modified through the target automation script, so that the far-field voice configuration item can be modified more quickly and conveniently, and a far-field voice adaptation scheme can be automatically switched.

Referring to fig. 4, fig. 4 is a flowchart illustrating a far-field speech adaptation method according to a third embodiment of the present invention.

Based on the above embodiments, in the present embodiment, the step S30 includes:

step S301: and determining the configuration type of the attribute configuration item according to the identification information in the attribute configuration item.

It should be noted that the identification information refers to information that can be identified for each configuration type carried in the attribute configuration item.

In a specific implementation, the configuration types are mainly divided into two types, one is an upper layer application type, and the other is a bottom layer service type, the upper layer application type refers to that the upper layer acquires original data to perform awakening and front-end processing, and the bottom layer service type refers to that the bottom layer awakens and the front-end processing of the recorded data. Both types have advantages and disadvantages, and for the upper application type, the workload of the underlying migration is small, and the workload of the application is large, and for the upper application type, the opposite is true.

Step S302: and switching the current far-field voice adaptation scheme according to the configuration type.

Further, the step S302 includes: when the configuration type is an upper application type, switching the current far-field voice adaptation scheme to an upper application adaptation scheme; and when the configuration type is the bottom-layer service type, switching the current far-field voice adaptation scheme to the bottom-layer service adaptation scheme.

In concrete implementation, after the far-field voice adaptation scheme is switched, developers can test whether the scheme is perfect or not, and compare, verify and mutually reference, so that the far-field voice adaptation efficiency of equipment is improved, and the product can be rapidly landed.

In the embodiment, the far-field speech adaptation schemes are switched through the configuration types, and each configuration type has a corresponding switching scheme, so that the schemes can be switched more conveniently and automatically.

It is understood that the far-field speech adaptation scheme may not have only the two schemes, but for other far-field speech adaptation schemes, two or more far-field speech schemes may be adapted on the same hardware and software platform by the above-described embodiment of the present invention, and the switching of the far-field speech adaptation scheme is performed.

In this embodiment, the configuration type of the attribute configuration item is determined according to the identification information in the attribute configuration item, and then the current far-field speech adaptation scheme is switched according to the configuration type. The embodiment determines the configuration type through the identification information, and not only determines the configuration type through the attribute configuration item, so that the embodiment can more accurately determine the configuration type, and can automatically switch the far-field voice adaptation scheme by switching the current far-field voice adaptation scheme according to the configuration type.

Furthermore, an embodiment of the present invention further provides a storage medium, where the storage medium stores a far-field speech adaptation program, and the far-field speech adaptation program, when executed by a processor, implements the far-field speech adaptation method as described above.

Referring to fig. 5, fig. 5 is a block diagram illustrating a first embodiment of a far-field speech adapter according to the present invention.

As shown in fig. 5, the far-field speech adaptation apparatus according to the embodiment of the present invention includes:

a configuration modification module 501, configured to modify a far-field speech configuration item in an initial configuration file when a far-field speech adaptation scheme switching instruction is detected, so as to obtain a target configuration file;

a configuration analysis module 502, configured to analyze the target configuration file to generate an attribute configuration item;

and a scheme switching module 503, configured to switch the current far-field speech adaptation scheme according to the attribute configuration item.

Based on the above first embodiment of the far-field speech adaptation apparatus of the present invention, a second embodiment of the far-field speech adaptation apparatus of the present invention is provided.

In this embodiment, the far-field speech adaptation apparatus further includes a file obtaining module 500, where the file obtaining module 500 is configured to obtain key identification information, and search for target configuration data corresponding to the key identification information in a preset mapping relationship; and configuring the default configuration file according to the target configuration data to obtain an initial configuration file.

Further, the configuration modification module 501 is further configured to, when a far-field speech adaptation scheme switching instruction is detected, select a target automation script from preconfigured automation scripts according to the far-field speech adaptation scheme switching instruction; and modifying each far-field voice configuration item in the initial configuration file through the target automation script to obtain a target configuration file.

Further, the configuration analysis module 502 is further configured to restart a system service, and analyze the target configuration file through a preset program when the restart is completed, so as to generate an attribute configuration item.

Further, the scheme switching module 503 is further configured to determine a configuration type of the attribute configuration item according to the identification information in the attribute configuration item; and switching the current far-field voice adaptation scheme according to the configuration type.

Further, the scheme switching module 503 is further configured to switch the current far-field speech adaptation scheme to the upper-layer application adaptation scheme when the configuration type is the upper-layer application type; and when the configuration type is the bottom-layer service type, switching the current far-field voice adaptation scheme to the bottom-layer service adaptation scheme.

Further, the far-field speech adaptation apparatus further includes an effect switching module 504, where the effect switching module 504 is configured to determine a target far-field speech adaptation scheme according to a switching result; acquiring adaptation effect data of the target far-field voice adaptation scheme; and when the adaptation effect data does not meet the preset condition, switching the target far-field voice adaptation scheme.

Other embodiments or specific implementations of the far-field speech adaptation device of the present invention may refer to the above method embodiments, and are not described herein again.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., a rom/ram, a magnetic disk, an optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A far-field speech adaptation method, comprising:

2. The far-field speech adaptation method according to claim 1, wherein before the step of modifying the far-field speech configuration item in the initial configuration file to obtain the target configuration file when the far-field speech adaptation scheme switching instruction is detected, the method further comprises:

3. The far-field speech adaptation method according to claim 1, wherein the step of modifying the far-field speech configuration item in the initial configuration file to obtain the target configuration file when the far-field speech adaptation scheme switching instruction is detected includes:

4. The far-field speech adaptation method according to claim 1, wherein the step of parsing the target configuration file to generate attribute configuration items specifically comprises:

5. The far-field speech adaptation method according to claim 1, wherein the step of switching the current far-field speech adaptation scheme according to the attribute configuration item specifically comprises:

6. The far-field speech adaptation method according to claim 5, wherein the step of switching the current far-field speech adaptation scheme according to the configuration type specifically comprises:

7. The far-field speech adaptation method according to any one of claims 1 to 6, wherein the step of switching the current far-field speech adaptation scheme according to the attribute configuration item further comprises:

8. A far-field speech adaptation apparatus, comprising:

9. A far-field speech adaptation device, characterized in that the far-field speech adaptation device comprises: a memory, a processor, and a far-field speech adaptation program stored on the memory and executable on the processor, the far-field speech adaptation program configured to implement the far-field speech adaptation method of any of claims 1-7.

10. A storage medium having stored thereon a far-field speech adaptation program which, when executed by a processor, implements a far-field speech adaptation method according to any one of claims 1 to 7.