CN112350688A

CN112350688A - Voice signal clock domain crossing processing method and device and intelligent voice chip

Info

Publication number: CN112350688A
Application number: CN202110015635.9A
Authority: CN
Inventors: 金傲寒; 梁敏学; 余新康
Original assignee: Symboltek Co ltd
Current assignee: Symboltek Co ltd
Priority date: 2021-01-07
Filing date: 2021-01-07
Publication date: 2021-02-09
Anticipated expiration: 2041-01-07
Also published as: CN112350688B

Abstract

The invention discloses a voice signal clock domain crossing processing method and device and an intelligent voice chip, wherein the processing method comprises the following steps: processing frequency data of an audio interface clock domain to obtain a first sampling rate; performing multi-stage frequency reduction processing on the first sampling rate to obtain a third sampling rate equal to a second sampling rate of the audio interface clock domain; the invention can quickly synchronize the interface sampling data to the internal clock domain of the audio subsystem, thereby being convenient for multiplexing various signal processing modules, saving the chip area and the power consumption, and reducing the difficulty of layout and wiring and the design complexity.

Description

Voice signal clock domain crossing processing method and device and intelligent voice chip

Technical Field

The present invention relates to a method for processing a voice signal, and more particularly, to a method and an apparatus for processing a voice signal across clock domains, and an intelligent voice chip.

Background

On an artificial intelligence voice chip, an i2s tdm interface is generally adopted for audio sampling, a sampling clock of the interfaces is generally provided by an external crystal oscillator of the chip, sampled data needs to be synchronized with an audio subsystem clock domain, even if a reference frequency for sampling is consistent with the audio subsystem clock frequency, a tiny frequency offset based on the same clock frequency or an uncertain phase difference is generated, and the sampled data cannot be filtered by a filter based on the audio subsystem clock domain.

The prior art generally places the filter in the interface clock domain and then synchronizes the filtered data to the audio subsystem clock domain through a standard synchronization processing module. The disadvantage is that the general artificial intelligence voice chip comprises a plurality of voice input interfaces, and the audio filter can only be designed separately in each interface clock domain by the method, which can lead to the increase of chip area and power consumption.

In the prior art, high-level pll is usually adopted for frequency phase locking, and sampled audio data are synchronized into an audio subsystem, but the high-level pll ip is high in purchase cost and complex in layout design. This results in an increase in the design workload and an increase in the cost.

Disclosure of Invention

Objects of the invention

The invention aims to provide a method and a device for processing a voice signal across clock domains and an intelligent voice chip, which can quickly synchronize interface sampling data to an internal clock domain of an audio subsystem, conveniently multiplex various signal processing modules, save chip area and power consumption, and reduce difficulty in layout and wiring and design complexity.

(II) technical scheme

In order to solve the above problem, an aspect of the present invention provides a method for processing a voice signal across clock domains, where the method includes: processing frequency data of an audio interface clock domain to obtain a first sampling rate; and carrying out multi-stage frequency reduction processing on the first sampling rate to obtain a third sampling rate which is equal to the second sampling rate of the audio interface clock domain.

Optionally, processing the frequency data of the audio interface clock domain to obtain a first sampling rate, including: converting the frequency data based on the audio interface clock domain into frequency data based on the audio subsystem clock domain; performing oversampling processing on the frequency data of the audio subsystem clock domain to obtain a first sampling rate based on the audio subsystem clock frequency; the second sampling rate is an interface sampling rate of an audio interface clock domain.

Optionally, performing multi-stage frequency reduction processing on the first sampling rate to obtain a third sampling rate equal to the second sampling rate of the audio interface clock domain, including: performing frequency reduction processing on the first sampling rate to obtain a first-stage sampling rate; carrying out frequency reduction processing on the first-stage sampling rate to obtain a second-stage sampling rate; and carrying out frequency reduction processing on the second-stage sampling rate to obtain a third sampling rate.

On the other hand, the invention also provides a voice signal cross-clock domain processing device, which comprises: the asynchronous sampling rate conversion module is used for processing frequency data of an audio interface clock domain to obtain a first sampling rate based on the clock frequency of an audio subsystem; and the filtering module is used for carrying out multi-stage frequency reduction processing on the first sampling rate to obtain a third sampling rate which is equal to the second sampling rate of the audio interface clock domain.

Optionally, the asynchronous sample rate conversion module includes: the conversion unit is used for converting the frequency data based on the clock domain of the audio interface into frequency data based on the clock domain of the audio subsystem; and the oversampling unit is used for performing oversampling processing on the frequency data of the clock domain of the audio subsystem to obtain the first sampling rate. Optionally, the preset frequency is a fixed frequency of the asynchronous sampling rate conversion module.

Optionally, the filtering module includes: the first filtering unit is used for carrying out frequency reduction processing on the first sampling rate to obtain a first-stage sampling rate; the second filtering unit is used for carrying out frequency reduction processing on the first-stage sampling rate to obtain a second-stage sampling rate; and the third filtering unit is used for carrying out frequency reduction processing on the second-stage sampling rate to obtain a third sampling rate.

Optionally, the first filtering unit, the second filtering unit, and the third filtering unit are all low-pass filtering units.

On the other hand, the invention also provides an intelligent voice chip, which comprises: an interface of an audio interface clock domain, an interface of an audio subsystem clock domain, and a processing device for implementing the method; the processing means comprises an asynchronous sample rate converter and a multi-stage filter.

Optionally, an interface of the audio interface clock domain is connected with the asynchronous sample rate converter; the multistage filter comprises a first stage filter, a second stage filter and a third stage filter; the first-stage filter is connected with the asynchronous sampling rate converter, the first-stage filter is connected with the third-stage filter through the second-stage filter, and the third-stage filter is connected with an interface of the clock domain of the audio subsystem.

(III) advantageous effects

The technical scheme of the invention has the following beneficial technical effects:

according to the invention, the frequency of the clock domain of the audio interface is processed to obtain a first sampling rate; and meanwhile, the first sampling rate is subjected to multi-stage frequency reduction processing, so that the interface sampling data can be simply and directly synchronized to an internal clock domain of the audio subsystem, the audio data converted across the clock domain can be directly used for the intelligent analysis module, the chip area and the power consumption are saved, and the difficulty in layout and wiring and the design complexity are reduced.

Drawings

Fig. 1 is a working principle diagram of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings in conjunction with the following detailed description. It should be understood that the description is intended to be exemplary only, and is not intended to limit the scope of the present invention. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present invention.

In the description of the present invention, it should be noted that the terms "first", "second", and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

In the embodiment of the invention, the invention provides a voice signal clock domain crossing processing method, which comprises the steps of processing frequency data of an audio interface clock domain to obtain a first sampling rate based on the clock frequency of an audio subsystem; and carrying out multi-stage frequency reduction processing on the first sampling rate to obtain a third sampling rate which is equal to the second sampling rate of the audio interface clock domain.

According to the invention, the frequency of the clock domain of the audio interface is processed to obtain a first sampling rate; and meanwhile, the first sampling rate is subjected to multi-stage frequency reduction processing, so that the invention can quickly synchronize interface sampling data to an internal clock domain of the audio subsystem, is convenient for multiplexing various signal processing modules, saves chip area and power consumption, and reduces difficulty in layout and wiring and design complexity.

In an embodiment of the present invention, the sampling rate (also referred to as sampling speed or sampling frequency) refers to the number of samples per second that are extracted from a continuous signal and constitute a discrete signal, which is expressed in hertz (Hz).

In the embodiment of the invention, the frequency data of the clock domain of the audio interface is processed to obtain a first sampling rate based on the clock frequency of the audio subsystem, and the first sampling rate comprises the step of converting the frequency data based on the clock domain of the audio interface into the frequency data based on the clock domain of the audio subsystem; and performing oversampling processing on the frequency data of the clock domain of the audio subsystem to obtain a first sampling rate. In an embodiment of the invention, the second sampling rate is an interface sampling rate of an audio interface clock domain.

The invention converts the frequency data based on the audio interface clock domain into the frequency data based on the audio subsystem clock domain, so that the frequency data of the audio interface clock domain is synchronized to the audio subsystem clock domain, and further, the back-end voice recognition module can normally recognize the voice data.

According to the invention, the frequency data under the clock frequency of the audio interface is directly subjected to oversampling processing based on the clock frequency of the audio subsystem to obtain data of a first sampling frequency, and then three-level low-pass filtering is carried out. The invention can change the distribution of noise, inhibit the noise caused by data conversion under different clock domains, reduce the noise of the signal source in the bandwidth of useful signals, improve the time domain resolution to obtain better time domain waveform, improve the processing gain of the filter and improve the signal-to-noise ratio. The oversampling process also makes the sampling rate of the invention more than twice the sampling rate of the audio interface clock domain, so that the invention does not make the precision of the intelligent voice module for recognizing the voice worse.

In an embodiment of the invention, oversampling refers to sampling the input signal using a frequency much greater than the nyquist sampling frequency. Let the original sampling frequency of the digital audio system be fs, typically 44.1kHz or 48 kHz. If the sampling frequency is increased to R × fs, R is called the oversampling ratio, and R > 1. In such sampled digital signals, the total quantization noise power is not changed because the number of quantization bits is not changed, but the spectral distribution of the quantization noise is changed at this time, that is, the quantization noise which is originally uniformly distributed in the frequency band of 0 to fs/2 is dispersed in the frequency band of 0 to Rfs/2.

In the embodiment of the invention, the multi-stage frequency reduction processing is carried out on the first sampling rate to obtain a third sampling rate which is equal to the second sampling rate of the audio interface clock domain, and the frequency reduction processing is carried out on the first sampling rate to obtain a first-stage sampling rate; carrying out frequency reduction processing on the first-stage sampling rate to obtain a second-stage sampling rate; and carrying out frequency reduction processing on the second-stage sampling rate to obtain the third sampling rate.

In the embodiment of the invention, the first-stage sampling rate is obtained by carrying out frequency reduction processing on the first sampling rate; carrying out frequency reduction processing on the first-stage sampling rate to obtain a second-stage sampling rate; carrying out frequency reduction processing on the second-level sampling rate to obtain a third sampling rate; the invention carries out three-stage frequency reduction processing on the first sampling rate, further accelerates the speed of synchronizing the interface sampling data to the internal clock domain of the audio subsystem, and simultaneously keeps the accuracy of the intelligent voice module in recognizing the voice.

In an embodiment of the present invention, on the other hand, the present invention further provides a voice signal cross-clock domain processing apparatus, including an asynchronous sampling rate conversion module, configured to process frequency data of an audio interface clock domain to obtain a first sampling rate based on a clock frequency of an audio subsystem; and the filtering module is used for carrying out multi-stage frequency reduction processing on the first sampling rate to obtain a third sampling rate which is equal to the second sampling rate of the audio interface clock domain.

The invention realizes the frequency processing of the clock domain of the audio interface through the asynchronous sampling rate conversion module, and simultaneously realizes the multi-stage frequency reduction processing of the first sampling rate through the filtering module, thereby saving the area and the power consumption of a chip, and reducing the difficulty of layout and wiring and the design complexity.

In the embodiment of the invention, the asynchronous sampling rate conversion module comprises a conversion unit, a conversion unit and a conversion unit, wherein the conversion unit is used for converting frequency data based on an audio interface clock domain into frequency data based on an audio subsystem clock domain; and the oversampling unit is used for performing oversampling processing on the frequency data of the clock domain of the audio subsystem to obtain the first sampling rate. In this embodiment, the preset frequency is a fixed frequency of the asynchronous sampling rate conversion module; in other embodiments, the preset frequency can also be set according to the user's requirement.

The asynchronous sampling rate conversion module is internally provided with the conversion unit and the oversampling unit, so that the asynchronous sampling rate conversion module realizes centralization and simplification, further saves chip area and power consumption, and reduces difficulty in layout and wiring and design complexity.

In the embodiment of the invention, the filtering module comprises a first filtering unit, which is used for carrying out frequency reduction processing on a first sampling rate to obtain a first-stage sampling rate; the second filtering unit is used for carrying out frequency reduction processing on the first-stage sampling rate to obtain a second-stage sampling rate; and the third filtering unit is used for carrying out frequency reduction processing on the second-stage sampling rate to obtain a third sampling rate. In this embodiment, the first filtering unit, the second filtering unit and the third filtering unit are all low-pass filtering units.

In an embodiment of the present invention, in another aspect, the present invention further provides an intelligent voice chip, including: an interface of an audio interface clock domain, an interface of an audio subsystem clock domain and a processing device for realizing the method; the processing means comprises an asynchronous sample rate converter and a multi-stage filtering.

In the embodiment of the invention, the interface of the audio interface clock domain is connected with the asynchronous sampling rate converter; the multistage filter comprises a first stage filter, a second stage filter and a third stage filter; the first-stage filter is connected with the asynchronous sampling rate converter, the first-stage filter is connected with the third-stage filter through the second-stage filter, and the third-stage filter is connected with an interface of the clock domain of the audio subsystem.

In order to better illustrate the invention, the following examples are given.

In an interface using an external interface clock (the interface of the external interface clock is an interface of an audio interface clock domain), such as i2s tdm and the like, an asrc module (asrc is asynchronous sampling rate conversion) is added at the lower stage, and the asrc has two clocks, one clock is synchronous with the external interface clock, the data sampling rate based on the clock frequency is generally 16khz or 48khz sampling rate, and the other clock is synchronous with an audio subsystem. The sampling frequency can be increased to 192khz by the asrc module and the purpose of crossing the data from the external clock domain to the audio subsystem clock domain is achieved. 192Khz (at which time the sampling rate is already based on the audio subsystem clock frequency) is then reduced to 16Khz by a three-stage filter. The three-stage filter comprises a first stage filter, a second stage filter and a third stage filter, wherein the first stage filter reduces 192khz to 96khz, the second stage filter reduces 96khz to 48khz, and the third stage filter reduces 48khz to 16 khz. According to the structure, data input by the interface and based on the 16khz sampling frequency or the 48khz sampling frequency of the external clock domain is converted to the clock domain based on the audio subsystem.

The working principle of the invention is as follows: because the speech signal required by the artificial intelligence algorithm does not require a complete restoration of the external audio input signal to achieve high fidelity, the signal may be considered acceptable if the speech data input to the artificial intelligence audio algorithm does not degrade the accuracy of the artificial intelligence audio algorithm. The early-stage experiment shows that the precision of the artificial intelligence audio algorithm is not influenced by the structure of the scheme, so that the method has good matching performance on the voice artificial intelligence algorithm.

The invention can process the problem of clock domain crossing of interface data relative to the clock domain of the audio subsystem, can synchronize the interface sampling data to the clock domain in the audio subsystem as early as possible, is convenient for multiplexing various signal processing modules, saves the chip area and power consumption, and reduces the difficulty of layout and wiring and the design complexity.

It is to be understood that the above-described embodiments of the present invention are merely illustrative of or explaining the principles of the invention and are not to be construed as limiting the invention. Therefore, any modification, equivalent replacement, improvement and the like made without departing from the spirit and scope of the present invention should be included in the protection scope of the present invention. Further, it is intended that the appended claims cover all such variations and modifications as fall within the scope and boundaries of the appended claims or the equivalents of such scope and boundaries.

Claims

1. A method for processing a voice signal across clock domains is characterized by comprising the following steps:

processing frequency data of an audio interface clock domain to obtain a first sampling rate based on the clock frequency of an audio subsystem;

performing multi-stage frequency reduction processing on the first sampling rate to obtain a third sampling rate equal to a second sampling rate of the audio interface clock domain;

wherein, carry on the multistage frequency reduction to the said first sampling rate, get the third sampling rate equal to second sampling rate of the clock domain of the audio interface, including:

performing frequency reduction processing on the first sampling rate to obtain a first-stage sampling rate;

carrying out frequency reduction processing on the first-stage sampling rate to obtain a second-stage sampling rate;

and carrying out frequency reduction processing on the second-stage sampling rate to obtain a third sampling rate.

2. The method of claim 1,

processing frequency data of an audio interface clock domain to obtain a first sampling rate based on an audio subsystem clock frequency, comprising:

converting the frequency data based on the audio interface clock domain into frequency data based on the audio subsystem clock domain;

performing oversampling processing on the frequency data based on the clock domain of the audio subsystem to obtain the first sampling rate;

the second sampling rate is an interface sampling rate of an audio interface clock domain.

3. A speech signal cross-clock domain processing apparatus, comprising:

the asynchronous sampling rate conversion module is used for processing frequency data of an audio interface clock domain to obtain a first sampling rate based on the clock frequency of an audio subsystem;

the filtering module is used for carrying out multi-stage frequency reduction processing on the first sampling rate to obtain a third sampling rate equal to the second sampling rate of the audio interface clock domain;

wherein the filtering module comprises:

the first filtering unit is used for carrying out frequency reduction processing on the first sampling rate to obtain a first-stage sampling rate;

the second filtering unit is used for carrying out frequency reduction processing on the first-stage sampling rate to obtain a second-stage sampling rate;

and the third filtering unit is used for carrying out frequency reduction processing on the second-stage sampling rate to obtain a third sampling rate.

4. The apparatus of claim 3,

the asynchronous sample rate conversion module comprising:

the conversion unit is used for converting the frequency data based on the clock domain of the audio interface into the frequency data based on the clock domain of the audio subsystem;

and the oversampling unit is used for performing oversampling processing on the frequency data based on the clock domain of the audio subsystem to obtain the first sampling rate.

5. The apparatus of claim 3,

the first filtering unit, the second filtering unit and the third filtering unit are all low-pass filtering units.

6. An intelligent voice chip, comprising:

an interface to an audio interface clock domain, an interface to an audio subsystem clock domain and a processing device implementing the method according to any one of claims 1 to 2;

the processing device includes an asynchronous sample rate converter and a multi-stage filter.

7. The intelligent voice chip of claim 6,

the interface of the audio interface clock domain is connected with the asynchronous sampling rate converter;

the multistage filter comprises a first stage filter, a second stage filter and a third stage filter;

the first-stage filter is connected with the asynchronous sampling rate converter, the first-stage filter is connected with the third-stage filter through the second-stage filter, and the third-stage filter is connected with an interface of the clock domain of the audio subsystem.