CN112102825B

CN112102825B - Audio processing method and device based on vehicle-mounted machine voice recognition and computer equipment

Info

Publication number: CN112102825B
Application number: CN202010800257.0A
Authority: CN
Inventors: 楼赵辉
Original assignee: Hubei Ecarx Technology Co Ltd
Current assignee: Ecarx Hubei Tech Co Ltd
Priority date: 2020-08-11
Filing date: 2020-08-11
Publication date: 2021-11-26
Anticipated expiration: 2040-08-11
Also published as: CN112102825A

Abstract

The application relates to an audio processing method, an audio processing device, computer equipment and a storage medium based on vehicle-mounted machine voice recognition, wherein the method comprises the following steps: when a voice recognition service process of the vehicle machine operating system is started, the voice recognition service process calls a recording interface; opening a microphone recording channel corresponding to the recording interface to acquire a microphone signal, and opening a reference recording channel corresponding to the recording interface to acquire a reference signal, wherein the microphone signal comprises a first microphone signal and a second microphone signal, and the reference signal comprises a first reference signal and a second reference signal; mixing a microphone signal and a reference signal into a path of audio data and sending the audio data to a voice recognition service process; the voice recognition service process separates and analyzes the mixed audio data to be used for voice recognition processing, and the problem of unstable audio signal delay in the vehicle voice recognition process is solved.

Description

Audio processing method and device based on vehicle-mounted machine voice recognition and computer equipment

Technical Field

The present application relates to the field of vehicle multimedia technologies, and in particular, to an audio processing method and apparatus, a computer device, and a storage medium based on vehicle speech recognition.

Background

At present, the functions of the car machine are more and more abundant, the voice recognition function is used at the car machine, and the car body related control function can be realized through voice recognition. For a speech recognition system, there is a hardware-based noise reduction process and also a software-based noise reduction process. For a pure software noise reduction framework scheme at a vehicle end, in the related art, voice signal (MIC PCM) and reference signal (REF PCM) data are generally transmitted to an application service layer through a multithreading asynchronous manner for software voice noise reduction algorithm processing, however, the transmission manner often causes problems of signal delay and instability.

In the related art, no effective solution is provided at present for the problems of delay and instability of audio signal transmission in the voice recognition process.

Disclosure of Invention

In view of the foregoing, it is desirable to provide an audio processing method and apparatus, a computer device, and a storage medium based on car machine voice recognition.

In a first aspect, an embodiment of the present application provides an audio processing method based on car-mounted device speech recognition, where the method includes:

when a voice recognition service process of a vehicle machine operating system is started, the voice recognition service process calls a recording interface;

opening a microphone recording channel corresponding to the recording interface to acquire a microphone signal, and opening a reference recording channel corresponding to the recording interface to acquire a reference signal, wherein the microphone signal comprises a first microphone signal and a second microphone signal, and the reference signal comprises a first reference signal and a second reference signal;

mixing the microphone signal and the reference signal into a path of audio data and sending the audio data to the voice recognition service process;

the voice recognition service process separates the mixed audio data and analyzes the first microphone signal, the second microphone signal, the first reference signal and the second reference signal, so that voice recognition processing is performed on the basis of the analyzed first microphone signal, the analyzed second microphone signal, the analyzed first reference signal and the analyzed second reference signal by voice recognition application.

In one embodiment, the voice recognition service process calls a recording interface, including:

and the voice recognition service process calls a recording interface corresponding to the voice recognition service process through a function interface in the vehicle machine operating system.

In one embodiment, the mixing the microphone signal and the reference signal into one audio data includes:

and mixing the first microphone signal, the second microphone signal, the first reference signal and the second reference signal into one path of audio data according to the sequence.

mixing the first microphone signal and the second microphone signal into a first path of audio data;

mixing the first reference signal and the second reference signal into a second channel of audio data;

and mixing the first path of audio data and the second path of audio data into one path of audio data.

In one embodiment, the opening a microphone recording channel corresponding to the recording interface to acquire a microphone signal, and opening a reference recording channel corresponding to the recording interface to acquire a reference signal includes:

enabling a microphone recording channel corresponding to the recording interface to acquire a microphone signal;

and simultaneously, enabling a reference recording channel corresponding to the recording interface to acquire a reference signal.

In one embodiment, the first microphone signal is a left microphone signal, the second microphone signal is a right microphone signal, the first reference signal is a left reference signal, and the second reference signal is a right reference signal.

In a second aspect, an embodiment of the present application further provides an audio processing apparatus based on car machine voice recognition, the apparatus includes a recording module, a mixing module, and a voice recognition module:

the voice recognition module is used for calling a recording interface when a voice recognition service process of the vehicle-mounted operating system is started,

the recording module is used for opening a microphone recording channel corresponding to the recording interface to acquire microphone signals and opening a reference recording channel corresponding to the recording interface to acquire reference signals, wherein the microphone signals comprise first microphone signals and second microphone signals, and the reference signals comprise first reference signals and second reference signals;

the mixing module is used for mixing the microphone signal and the reference signal into a path of audio data and then sending the audio data to the voice recognition module;

the voice recognition module is further configured to separate the mixed audio data and analyze the first microphone signal, the second microphone signal, the first reference signal, and the second reference signal, so that a voice recognition application performs a voice recognition process based on the analyzed first microphone signal, the analyzed second microphone signal, the analyzed first reference signal, and the analyzed second reference signal.

In one embodiment, the mixing module is further configured to mix the first microphone signal and the second microphone signal into a first audio data; mixing the first reference signal and the second reference signal into a second channel of audio data; and mixing the first path of audio data and the second path of audio data into one path of audio data.

In a third aspect, an embodiment of the present application provides an audio processing computer device based on car-in-vehicle speech recognition, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where when the processor executes the computer program, the audio processing method based on car-in-vehicle speech recognition is implemented.

In a fourth aspect, an embodiment of the present application further provides an audio processing computer-readable storage medium based on car machine voice recognition, where a computer program is stored on the storage medium, and when the computer program is executed by a processor, the audio processing method based on car machine voice recognition is implemented.

According to the audio processing method and device based on the vehicle-mounted machine voice recognition, the computer equipment and the storage medium, when a voice recognition service process of a vehicle-mounted machine operating system is started, the voice recognition service process calls the recording interface; opening a microphone recording channel corresponding to the recording interface to acquire a microphone signal, and opening a reference recording channel corresponding to the recording interface to acquire a reference signal, wherein the microphone signal comprises a first microphone signal and a second microphone signal, and the reference signal comprises a first reference signal and a second reference signal; mixing a microphone signal and a reference signal into a path of audio data and sending the audio data to a voice recognition service process; the voice recognition service process separates the mixed audio data and analyzes the first microphone signal, the second microphone signal, the first reference signal and the second reference signal so as to be used for voice recognition application to perform voice recognition processing, and the problem that the audio signal delay is unstable in the vehicle-mounted voice recognition process is solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

fig. 1 is a block diagram of a car-mounted device hardware structure of an audio processing method based on car-mounted device speech recognition according to an embodiment of the present invention;

FIG. 2 is a flowchart of an audio processing method based on car-mounted speech recognition according to an embodiment of the present invention;

FIG. 3 is a flow chart illustrating mixing of a microphone signal and a reference signal in an audio processing method based on car-mounted speech recognition according to an embodiment of the present invention;

FIG. 4 is a flow chart illustrating mixing of a microphone signal and a reference signal in an audio processing method based on car-mounted speech recognition according to another embodiment of the present invention;

FIG. 5 is a flow chart of an audio processing method based on car machine voice recognition according to a preferred embodiment of the present invention;

FIG. 6 is a system architecture diagram of an audio processing method based on in-vehicle speech recognition according to a preferred embodiment of the present invention;

FIG. 7 is a schematic diagram of an audio processing apparatus based on car-in-vehicle speech recognition according to an embodiment of the present invention;

FIG. 8 is a schematic diagram of an audio processing computer device based on car machine voice recognition according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described and illustrated below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments provided in the present application without any inventive step are within the scope of protection of the present application.

It is obvious that the drawings in the following description are only examples or embodiments of the present application, and that it is also possible for a person skilled in the art to apply the present application to other similar contexts on the basis of these drawings without inventive effort. Moreover, it should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another.

Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of ordinary skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments without conflict.

Unless defined otherwise, technical or scientific terms referred to herein shall have the ordinary meaning as understood by those of ordinary skill in the art to which this application belongs. Reference to "a," "an," "the," and similar words throughout this application are not to be construed as limiting in number, and may refer to the singular or the plural. The present application is directed to the use of the terms "including," "comprising," "having," and any variations thereof, which are intended to cover non-exclusive inclusions; for example, a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to the listed steps or elements, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. Reference to "connected," "coupled," and the like in this application is not intended to be limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. The term "plurality" as referred to herein means two or more. "and/or" describes an association relationship of associated objects, meaning that three relationships may exist, for example, "A and/or B" may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. Reference herein to the terms "first," "second," "third," and the like, are merely to distinguish similar objects and do not denote a particular ordering for the objects.

The method provided by the embodiment can be used in a vehicle machine of a vehicle, wherein the vehicle machine refers to a vehicle-mounted infotainment product installed in the vehicle for short, and the vehicle machine can realize information communication between a person and the vehicle and between the vehicle and the outside in terms of functions. Fig. 1 is a block diagram of a hardware structure of a car machine for audio processing based on car machine speech recognition according to an embodiment of the present invention, and as shown in fig. 1, the car machine may include one or more processors 102 (only one of which is shown in fig. 1) (the processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA), and a memory 104 for storing data, and the car machine further includes a microphone 106 and a playing device 108. When a voice recognition service process of the in-vehicle operating system deployed on the processor 102 is started, the voice recognition service process calls a recording interface, opens a microphone recording channel corresponding to the recording interface, and collects microphone signals through a microphone 106, wherein the microphone signals comprise a first microphone signal and a second microphone signal; meanwhile, a reference recording channel corresponding to the recording interface is opened to acquire reference signals corresponding to the playing device 108, wherein the reference signals comprise a first reference signal and a second reference signal. The processor 102 reads the microphone signal and the reference signal, mixes the microphone signal and the reference signal into a channel of audio data, and sends the channel of audio data to the voice recognition service process, and the voice recognition service process separates the mixed channel of audio data and analyzes a first microphone signal, a second microphone signal, a first reference signal and a second reference signal, so that the voice recognition application performs voice recognition processing based on the analyzed first microphone signal, second microphone signal, first reference signal and second reference signal.

It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration and is not intended to limit the structure of the terminal. The memory 104 may be used to store a computer program, for example, a software program and a module of an application software, such as a computer program corresponding to the in-vehicle speech recognition based audio processing method in the embodiment of the present invention, and the processor 102 executes various functional applications and data processing by running the computer program stored in the memory 104, so as to implement the method described above. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory remotely located from the processor 102. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

In an embodiment, fig. 2 is a flowchart of an audio processing method based on car-in-vehicle speech recognition according to an embodiment of the present invention, and as shown in fig. 2, an audio processing method based on car-in-vehicle speech recognition is provided, where the method includes the following steps:

step S210, when the voice recognition service process of the vehicle machine operating system is started, the voice recognition service process calls a recording interface. Optionally, in the configuration process of the speech recognition service process, a recording interface called by the speech service process in the running process is preset, and the recording interface may be a hardware interface or a software interface.

Step S220, a microphone recording channel corresponding to the recording interface is opened to collect a microphone signal, and a reference recording channel corresponding to the recording interface is opened to collect a reference signal, where the microphone signal includes a first microphone signal and a second microphone signal, and the reference signal includes a first reference signal and a second reference signal. The microphone channel and the reference recording channel are simultaneously started through a mode of simultaneously starting the switches under hardware control or software control. The microphone signal is a signal generated by speech acquired by a microphone on the vehicle, the speech is used for speech recognition, and the reference signal is background sound, such as background music, recorded during acquisition of the speech signal. The process of filtering the reference signal from the microphone signal may be considered as a noise reduction process, and performing speech recognition on the noise reduced microphone signal will significantly improve the accuracy of the recognition.

Step S230, mixing the microphone signal and the reference signal into a channel of audio data, and sending the channel of audio data to the speech recognition service process. Optionally, the mixing of the audio data may be linear addition of the audio signals, or may be a splicing manner, that is, a microphone signal and a reference signal are spliced front and back to form a path of audio data, and the path of audio data is sent to a third-party speech recognition service program for soft noise reduction and speech recognition.

Step S240, the voice recognition service process separates the mixed audio data and then analyzes the first microphone signal, the second microphone signal, the first reference signal, and the second reference signal, so that the voice recognition application performs voice recognition processing based on the analyzed first microphone signal, the analyzed second microphone signal, the analyzed first reference signal, and the analyzed second reference signal. The voice recognition service process will separate the microphone signal from the reference signal for the acquired mixed audio data. In some embodiments, the separation may be performed according to the size of the audio file according to the storage manner of the audio, such as 8-bit mono, 8-bit binaural, 16-bit mono, or 16-bit binaural. Optionally, a splicing mark may also be added at the splicing point during the splicing process. The splicing point is indicated, for example, by a flag of a preset bit, from which the splitting is performed upon splitting. Further, speech recognition processing is performed based on the microphone signal and the reference signal obtained by the separation. Specifically, the noise reduction processing may be performed on the microphone signal according to the reference signal, and then the control command in the microphone signal is recognized from the audio data through the speech recognition algorithm, so as to execute the operation indicated by the control command through corresponding software and hardware of the vehicle body, for example, playing music, turning on an air conditioner, and the like.

In steps S210 to S240, when the speech recognition service process of the vehicle-mounted operating system is started, the speech recognition service process calls a recording interface; opening a microphone recording channel corresponding to the recording interface to acquire a microphone signal, and opening a reference recording channel corresponding to the recording interface to acquire a reference signal, wherein the microphone signal comprises a first microphone signal and a second microphone signal, and the reference signal comprises a first reference signal and a second reference signal; mixing a microphone signal and a reference signal into a path of audio data and sending the audio data to a voice recognition service process; the voice recognition service process separates the mixed audio data and analyzes a first microphone signal, a second microphone signal, a first reference signal and a second reference signal so as to allow voice recognition application to perform voice recognition processing based on the analyzed first microphone signal, second microphone signal, first reference signal and second reference signal. The microphone signal and the corresponding reference signal are mixed into one path of audio data for transmission, so that the time delay of the asynchronous transmission of the microphone signal and the reference signal is avoided. The microphone signals correspond to the reference signals for noise reduction one to one, so that the noise reduction accuracy of the microphone signals is improved, meanwhile, the microphone signals and the reference signals are mixed into one path for transmission, the audio data for noise reduction is input in a single-thread mode, and the signal stability in the voice recognition process can be improved.

In an embodiment, fig. 3 is a flow chart illustrating a mixing process of a microphone signal and a reference signal in an audio processing method based on car-mounted speech recognition according to an embodiment of the present invention, and as shown in fig. 3, the mixing of the microphone signal and the reference signal into a channel of audio data includes the following steps:

step S310, mixing the first microphone signal and the second microphone signal into a first channel of audio data.

Step S320, mixing the first reference signal and the second reference signal into the second channel of audio data.

Step S330, mixing the first channel of audio data and the second channel of audio data into a channel of audio data.

In steps S310 to S330, firstly, four channels of audio data are superimposed on two channels of data, which may be applied to a car operating system, such as an android system, in which a recording interface only supports two channels of audio data. In this case, since the microphone signals collected by the microphone include the first microphone signal and the second microphone signal, the reference signal includes the first reference signal and the second reference signal, and the four audio data are used for performing the voice soft noise reduction and the recognition processing, the four audio data can be transmitted through the two audio data channels. In this embodiment, the middle layer of the operating system of the vehicle-mounted device can receive four complete paths of audio data, that is, the microphone signals and the reference signals are mixed, the first microphone signal and the second microphone signal are mixed into one path, the second reference signal and the second reference signal are mixed into the other path, and then the two paths of signals are mixed into one path of audio data.

Under the condition that the recording interface supports two paths of audio data channels, complete four paths of audio data can still be acquired, so that the method for carrying out voice recognition based on the audio data can be used for more types of car machines without additional software framework improvement or hardware improvement, and the applicability and compatibility of the car machines are improved.

In an embodiment, fig. 4 is a flowchart illustrating mixing of a microphone signal and a reference signal in an audio processing method based on car-mounted speech recognition according to another embodiment of the present invention, and as shown in fig. 4, mixing the microphone signal and the reference signal into a channel of audio data includes the following steps:

step S410, mixing the first microphone signal, the second microphone signal, the first reference signal and the second reference signal into a channel of audio data in sequence.

In step S410, the two signals of the microphone and the two signals of the reference signal are mixed in a preset order. In one embodiment, the first microphone signal is a left microphone signal (left MIC), the second microphone signal is a right microphone signal (right MIC), the first reference signal is a left reference signal (left REF), and the second reference signal is a right reference signal (right REF). For example, the microphone signal and the reference signal are mixed into one audio data path in the order of the left microphone signal, the right microphone signal, the left reference signal, and the right reference signal. After the voice noise reduction identification process acquires the audio data, the four paths of audio signals can be separated from the audio data according to the preset data formats of the left MIC, the right MIC, the left REF and the right REF. Optionally, in the case that each audio signal is 16 bits, the received audio signal is sequentially divided into four 16-bit segments, where the first segment corresponds to the left MIC signal, the second segment corresponds to the right MIC signal, and so on. In the practical application process, the sequence can be adjusted according to a preset rule, and the voice recognition service process can also perform audio data separation according to the preset rule, so as to finally obtain the microphone signal and the reference signal. The embodiment provided by the embodiment is to perform audio mixing processing on the microphone signal and the reference signal through splicing, and on the premise of improving the stability of the audio data, the efficiency of mixing and separating the audio data is higher, and the audio processing efficiency of the car machine voice recognition is further improved.

In some embodiments, the voice recognition service process calls a recording interface corresponding to the voice recognition service process through a function interface in the vehicle-mounted operating system, where the function interface may be an OpenSL interface, and the voice noise reduction recognition algorithm service program calls the recording interface in the vehicle-mounted operating system through the OpenSL interface, where an identity ID of the recording interface is preconfigured in the OpenSL interface, so that when the voice recognition service process is started, a corresponding recording interface in the vehicle-mounted operating system can be accurately started according to the ID of the recording interface, and thus subsequent audio data acquisition is performed more timely and accurately.

In one embodiment, opening a microphone recording channel corresponding to the recording interface to acquire a microphone signal, and opening a reference recording channel corresponding to the recording interface to acquire a reference signal comprises: enabling a microphone recording channel corresponding to the recording interface to acquire a microphone signal; and meanwhile, enabling a reference recording channel corresponding to the recording interface to acquire a reference signal. In this embodiment, the recording enable of the microphone channel and the recording enable of the reference recording channel are started in the same function interface, and the microphone channel and the reference recording channel are simultaneously notified to start recording under the condition that the function interface, that is, the software recording interface, is called, so that the problem of signal delay is avoided, and the audio processing accuracy of the car-mounted device speech recognition can be further improved.

The embodiments of the present application are described and illustrated below by means of preferred embodiments. Fig. 5 is a flowchart of an audio processing method based on car-mounted device voice recognition in a preferred embodiment of the present invention, and fig. 6 is a system architecture diagram of the audio processing method based on car-mounted device voice recognition in a preferred embodiment of the present invention, as shown in fig. 5 and fig. 6, in a preferred embodiment, the audio processing based on android system in a car-mounted device, voice recognition includes the following steps:

step S510, a voice recognition service is started. When an Android system (Android system) is started, the Android system starts a voice recognition service, namely a service corresponding to third-party voice noise reduction recognition software in fig. 6, at a local service (native) layer, and the service starts a voice recognition service process in a vehicle machine operating system.

Step S520, open MIC recording channel and REF recording channel. The third-party speech noise reduction recognition algorithm service program calls a recording interface in the Android through an OpenSL interface in the Android system Android, an ID of the recording interface is configured in the OpenSL interface in advance, and the recording interface in the Android intermediate layer is opened according to the ID of the recording interface.

Step S530, MIC audio data and REF audio data are acquired. Optionally, the android system middle layer identifies whether the recording interface is connected with a voice-recognized audio data acquisition device according to the ID information of the recording interface opened in step S520, and if so, opens an MIC recording channel and an REF recording channel of the driver layer; if not, the MIC recording channel and the REF recording channel are not opened.

Because the recording interface of the Android system only supports 2-channel Pulse Code Modulation (PCM) audio data, and the voice soft noise reduction scheme requires four channels of audio data, namely left MIC, right MIC, left REF and right REF, and superimposes the four channels of audio data onto two channels of audio data. And respectively obtaining the data through an MIC recording channel and a Ref recording channel. In order to avoid the signal delay problem, in the same function interface, the recording enabling of the microphone channel and the recording enabling of the reference recording channel are started, namely the microphone channel and the reference recording channel are simultaneously informed to start recording.

In the preferred embodiment, the reference signal is obtained from background music played by a music application app in the car operating system, where the music app is any music playing app in an application software layer in the car operating system, and when receiving a play command from a software layer, the music app transmits music PCM (Pulse code modulation) audio data to a player in a hardware layer through driver software for playing. In addition, the reference signal recording channel directly acquires the music PCM audio data played by the playing device as a reference signal. And step S540, reading MIC audio data. In the middle layer of the android system, audio data of a microphone is read through an MIC recording channel, optionally, a sound card chip of the vehicle operating system acquires microphone voice data through an external microphone, and the microphone voice data is input into the microphone recording channel as MIC audio data through driving software. The audio data acquired by the microphone recording channel comprises two paths of data, namely left MIC data and right MIC data.

In step S550, the REF audio data is read. After the microphone signal in step S540 is read, the reference signal is read through the Ref recording channel, and the obtained reference channel data also includes two paths of audio data, namely, left Ref data and right Ref data.

Step S560 mixes MIC audio data and REF audio data. And mixing the total 4 paths of audio data read in the step S640 and the step S650 into modes of MIC left, MIC right, REF left and REF right, merging pcm data, and returning the pcm data to the third-party voice recognition service program. During sound mixing processing, four audio data of MIC left, MIC right, REF left and REF right are integrated into one audio data, but the MIC left, the MIC right, the REF left and the REF right are independent audio data. And then, transmitting the mixed path of data to third-party voice noise reduction recognition software through a recording interface and an openSL interface.

Step S570, speech recognition. And after receiving the mixed sound data, analyzing and separating the mixed sound data by using a voice recognition algorithm to obtain four paths of audio data. And the third-party voice recognition service program is used for separating the microphone signal from the reference signal of the acquired audio data and then carrying out voice recognition processing to recognize the text content of the audio data.

Step S580, determining whether the speech content can be correctly recognized, if not, returning to step S530, and re-acquiring MIC audio data and REF audio data. If yes, the process proceeds to step S590.

In step S590, the recognition result is processed. The third-party voice recognition service program sends the recognized file content information to the application program APP located in the application software layer and used for processing the voice recognition result, and the automobile body and other related contents are controlled through the voice recognition result processing APP, such as music playing, air conditioner opening and the like.

The preferred embodiment is based on the Android system in the prior art, and is implemented by hardware, for example, a microphone signal acquired by a microphone recording device in fig. 6 and a played background music signal are transmitted to an intermediate layer of the Android system through driving software, Microphone (MIC) audio data corresponding to the microphone signal and Reference (REF) audio data corresponding to a beijing music signal are mixed and combined into an ordered audio data packet in the intermediate layer of the Android system, and then are transmitted to an application program through a native frame of the Android system to be processed by a voice noise reduction algorithm, and the microphone signal obtained after the noise reduction processing is processed by a voice recognition algorithm, and finally, a control function related to a vehicle body is performed according to the obtained text information. MIC audio data and REF audio data form a mixed signal and are transmitted to a required application program, single-thread processing is adopted in a software scheme, and the problem of unstable signal delay caused by multithreading asynchrony is solved.

It should be understood that, although the respective steps in the flowcharts in fig. 2 to 5 are sequentially shown as indicated by arrows, the steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2-5 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performing the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least some of the sub-steps or stages of other steps.

In one embodiment, fig. 7 is a schematic diagram of an audio processing apparatus based on car-in-vehicle speech recognition according to an embodiment of the present invention, and as shown in fig. 7, an audio processing apparatus based on car-in-vehicle speech recognition is provided, and the apparatus includes a recording module 72, a mixing module 74, and a speech recognition module 76.

The voice recognition module 76 is configured to call the recording interface when the voice recognition service process of the car machine operating system is started. The recording module 72 is configured to open a microphone recording channel corresponding to the recording interface to acquire a microphone signal, and open a reference recording channel corresponding to the recording interface to acquire a reference signal, where the microphone signal includes a first microphone signal and a second microphone signal, and the reference signal includes a first reference signal and a second reference signal. The mixing module 74 is configured to mix the microphone signal and the reference signal into a channel of audio data and send the channel of audio data to the speech recognition module. The voice recognition module 76 is further configured to separate one channel of mixed audio data and analyze a first microphone signal, a second microphone signal, a first reference signal, and a second reference signal, so that the voice recognition application performs voice recognition processing based on the analyzed first microphone signal, second microphone signal, first reference signal, and second reference signal.

In one embodiment, the mixing module 74 is further configured to mix the first microphone signal and the second microphone signal into the first audio data; mixing the first reference signal and the second reference signal into a second channel of audio data; and mixing the first path of audio data and the second path of audio data into one path of audio data.

For specific limitations of the audio processing apparatus based on car-mounted speech recognition, reference may be made to the above limitations of the audio processing method based on car-mounted speech recognition, which are not described herein again. All or part of each module in the audio processing device based on the car machine voice recognition can be realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In an embodiment, fig. 8 is a schematic diagram of an audio processing computer device based on car-in-vehicle speech recognition according to an embodiment of the present invention, as shown in fig. 8, a computer device is provided, where the computer device may be a terminal, and its internal structure diagram may be as shown in fig. 8. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to realize an audio processing method based on vehicle-mounted speech recognition. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.

Those skilled in the art will appreciate that the architecture shown in fig. 8 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the processor implements the audio processing method based on car-mounted speech recognition.

When the voice recognition service process of the vehicle-mounted operating system is started, the voice recognition service process calls a recording interface; opening a microphone recording channel corresponding to the recording interface to acquire a microphone signal, and opening a reference recording channel corresponding to the recording interface to acquire a reference signal, wherein the microphone signal comprises a first microphone signal and a second microphone signal, and the reference signal comprises a first reference signal and a second reference signal; mixing a microphone signal and a reference signal into a path of audio data and sending the audio data to a voice recognition service process; the voice recognition service process separates the mixed audio data and analyzes the first microphone signal, the second microphone signal, the first reference signal and the second reference signal so as to be used for voice recognition application to perform voice recognition processing, and the problem that the audio signal delay is unstable in the vehicle-mounted voice recognition process is solved.

In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored, which, when executed by a processor, implements the above-described audio processing method based on car machine voice recognition.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. An audio processing method based on vehicle-mounted voice recognition is characterized by comprising the following steps:

enabling a microphone recording channel corresponding to the recording interface to acquire a microphone signal, and simultaneously enabling a reference recording channel corresponding to the recording interface to acquire a reference signal, wherein the microphone signal comprises a first microphone signal and a second microphone signal, and the reference signal comprises a first reference signal and a second reference signal;

the voice recognition service process separates the mixed audio data and analyzes the first microphone signal, the second microphone signal, the first reference signal and the second reference signal so as to perform voice recognition processing on the voice recognition application based on the analyzed first microphone signal, the analyzed second microphone signal, the analyzed first reference signal and the analyzed second reference signal;

wherein mixing the microphone signal and the reference signal into one path of audio data comprises:

mixing the first microphone signal and the second microphone signal into a first path of audio data, mixing the first reference signal and the second reference signal into a second path of audio data, and mixing the first path of audio data and the second path of audio data into a path of audio data.

2. The method of claim 1, wherein the speech recognition service process invokes a recording interface, comprising:

3. The method of any of claims 1-2, wherein the first microphone signal is a left microphone signal, the second microphone signal is a right microphone signal, the first reference signal is a left reference signal, and the second reference signal is a right reference signal.

4. The utility model provides an audio processing device based on car machine speech recognition which characterized in that, the device is including recording module, mixed module and speech recognition module:

the recording module is used for enabling a microphone recording channel corresponding to the recording interface to acquire microphone signals, and simultaneously enabling a reference recording channel corresponding to the recording interface to acquire reference signals, wherein the microphone signals comprise first microphone signals and second microphone signals, and the reference signals comprise first reference signals and second reference signals;

the mixing module is used for mixing the first microphone signal and the second microphone signal into a first channel of audio data, mixing the first reference signal and the second reference signal into a second channel of audio data, mixing the first channel of audio data and the second channel of audio data into a channel of audio data, and sending the channel of audio data to the voice recognition module;

5. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method of any of claims 1 to 3 are implemented when the computer program is executed by the processor.

6. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 3.