CN113593577A

CN113593577A - Vehicle-mounted artificial intelligence voice interaction system based on big data

Info

Publication number: CN113593577A
Application number: CN202111038179.6A
Authority: CN
Inventors: 王永锋; 朱方其; 徐波
Original assignee: Sichuan Yihaitian Technology Co ltd
Current assignee: Sichuan Yihaitian Technology Co ltd
Priority date: 2021-09-06
Filing date: 2021-09-06
Publication date: 2021-11-02

Abstract

The invention discloses a vehicle-mounted artificial intelligence voice interaction system based on big data, which is characterized in that a noise elimination module, a main control module, a real-time network sharing module, a main conversion module and a deep learning module are arranged, when in use, a voice awakening module keeps a microphone array in a pickup state all the time, the audio is subjected to mute recognition and noise reduction, the voice awakening module can judge whether awakening words appear or not, if yes, a subsequent voice recognition server is started, and then the voice awakening module is compared with a database in a big data mode through the deep learning module, so that the final translated words are obtained and then processed through a raincoat understanding server, the method can keep stable voice noise reduction in the voice receiving and recording process, ensure the accuracy of the interactive information receiving and recording process, and simultaneously adopt the real-time sharing module to interconnect a plurality of vehicle machines in a big data mode, the method and the device can carry out common learning, and the accuracy of identification is greatly improved.

Description

Vehicle-mounted artificial intelligence voice interaction system based on big data

Technical Field

The invention relates to the field of intelligent voice interaction, in particular to a vehicle-mounted artificial intelligence voice interaction system based on big data.

Background

Along with the development of the vehicle machine system of the current vehicle, the intelligent voice system has a good auxiliary effect on the driving process, the functions of direct communication and function selection can be avoided, and the intelligent voice interaction system of the vehicle machine gradually tends to be perfect in the development of the intelligent voice interaction system.

However, the existing voice interaction system for the vehicle is very single in function when in use, and is accessed into a background database for question and answer processing in a real-time networking mode, so that the system has relatively comprehensive requirements on knowledge storage in the database, and as an operating vehicle, the system aims at the five-flower eight-door problems of different customers and self accent, has extremely high challenges on intelligent voice interaction, and is the most basic layer for stable sound acquisition and accurate identification information, so that the system is indispensable for acquiring and ensuring the audio.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides a vehicle-mounted artificial intelligence voice interaction system based on big data, which is used as an operation vehicle, aims at the five-flower and eight-door problems of different customers, has extremely high challenge on intelligent voice interaction, and is the most basic layer for stable sound acquisition and accurate identification information, so that the audio acquisition guarantee is indispensable.

In order to solve the technical problems, the invention provides the following technical scheme: a vehicle-mounted artificial intelligence voice interaction system based on big data comprises a main control module, wherein the input end of the main control module is electrically connected with the output end of a noise elimination module, the input end of the noise elimination module is electrically connected with the output end of a voice recognition server, the input end of the voice recognition server is electrically connected with the output end of a voice awakening module, the input end of the voice awakening module is electrically connected with the output end of the noise elimination module, the main control module is bidirectionally and electrically connected with a real-time network sharing module, the output end of the real-time network sharing module is electrically connected with the input end of a cloud server, the output end of the cloud server is electrically connected with the input end of a main conversion module, the output end of the main conversion module is electrically connected with the input end of a classification module, and the output end of the classification module is respectively and electrically connected with the input ends of a database and a deep learning module, the deep learning module and the classification module are both electrically connected with the input end of the database, the output end of the main conversion module is electrically connected with the input end of the semantic understanding server, and the output end of the semantic understanding server is electrically connected with the input end of the network sharing module.

As a preferred technical solution of the present invention, an output end of the main conversion module is electrically connected to an input end of the model library, and an output end of the model library is electrically connected to an input end of the deep learning module.

As a preferred technical solution of the present invention, the noise elimination module adopts a hardware coordination manner of a dsp chip dedicated for noise reduction and eight microphone arrays, the noise reduction module adopts a spectral subtraction method for noise reduction, and the spectral subtraction calculation formula is as follows:

wherein: p is a radical of_s(omega) is the power spectrum of the clean speech signal, p_x(omega) is the power spectrum of the acoustic signal, p_nAnd (omega) is the noise signal power spectrum.

As a preferred technical solution of the present invention, the voice wake-up module continuously performs a pickup state on the microphone array to sample and quantize the collected audio.

As a preferred technical solution of the present invention, the voice recognition server performs synchronous high-quality recording of a plurality of audio data using a plurality of microphones, directly compares a plurality of audio signal frequency spectrums, and transmits a recording signal with the highest repetition degree to the noise cancellation module.

As a preferred technical scheme of the invention, the real-time network sharing module adopts 4G network mode data transmission, and the main control module is built by installing a 9.0 system platform framework.

As a preferred technical solution of the present invention, the main conversion module specifically has a function of converting speech into text, and directly sends the text to the semantic understanding server if the text can be smoothly recognized, and sends the text to the model library if speech cannot be recognized.

As a preferred technical solution of the present invention, the model library records and compares speech, searches for an approximate term through the deep learning module, and matches the retrieved approximate term with language data through big data.

Compared with the prior art, the invention can achieve the following beneficial effects:

by arranging the noise elimination module, the voice recognition server, the voice awakening module, the main control module, the real-time network sharing module, the main conversion module and the deep learning module, when the voice awakening module is used, the voice awakening module keeps the microphone array in a pickup state all the time, and performs mute recognition and noise reduction processing on audio frequency through built-in basic signal processing, the voice awakening module can judge whether awakening words appear or not, if so, a subsequent voice recognition server is started, the voice awakening module processes the voice information after interactive voice information is recognized, noise of the vehicle and the wind is characterized by additive characteristics, stable local parts and independent of voice signal statistics, a power spectrum of noise is obtained through noise energy estimation and gain calculation according to the additive characteristics of the noise after a spectrum method is processed, and then the estimated noise is subtracted from the power spectrum with the voice noise to obtain pure voice, then, in some interactive voices of accent tape dialects, the main conversion module is difficult to detect complete characters, independent model file comparison is established through the model base, the main conversion module is compared with the database in a big data mode immediately, the obtained final translated characters are processed through the raincoat understanding server, the method can be used for stably reducing noise in the voice recording process, accuracy of the interactive information recording process is guaranteed, meanwhile, a real-time sharing module is used for enabling a plurality of vehicle machines to learn together in a big data interconnection mode, and accuracy of recognition is greatly improved.

Drawings

FIG. 1 is a schematic diagram of the system of the present invention;

fig. 2 is a block diagram of a process of the noise cancellation module of the present invention.

Wherein: the system comprises a main control module 1, a noise elimination module 2, a voice recognition server 3, a voice awakening module 4, a real-time network sharing module 5, a cloud server 6, a main conversion module 7, a classification module 8, a database 9, a deep learning module 10, a model library 11 and a semantic understanding server 12.

Detailed Description

Technical means for implementing the present invention; authoring features; the purpose served by the disclosure is to provide a thorough understanding of the invention, and is to be construed as being a limitation on the scope of the invention as defined by the appended claims. Based on the embodiments in the implementation, other embodiments obtained by those skilled in the art without any creative efforts belong to the protection scope of the present invention. The experimental methods in the following examples, unless otherwise specified, are conventional methods, materials used in the following examples; reagents and the like are commercially available unless otherwise specified.

Examples

As shown in fig. 1-2, the present invention provides a vehicle-mounted artificial intelligence voice interaction system based on big data, which comprises a main control module 1, wherein an input end of the main control module 1 is electrically connected with an output end of a noise elimination module 2, an input end of the noise elimination module 2 is electrically connected with an output end of a voice recognition server 3, an input end of the voice recognition server 3 is electrically connected with an output end of a voice wake-up module 4, an input end of the voice wake-up module 4 is electrically connected with an output end of the noise elimination module 2, an output end of the main control module 1 is bidirectionally electrically connected with a real-time network sharing module 5, an output end of the real-time network sharing module 5 is electrically connected with an input end of a cloud server 6, an output end of the cloud server 6 is electrically connected with an input end of a main conversion module 7, an output end of the main conversion module 7 is electrically connected with an input end of a classification module 8, and an output end of the classification module 8 is electrically connected with input ends of a database 9 and a deep learning module 10 respectively, the deep learning module 10 and the classification module 8 are both electrically connected with the input end of the database 9, the output end of the main conversion module 7 is electrically connected with the input end of the semantic understanding server 12, and the output end of the semantic understanding server 12 is electrically connected with the input end of the network sharing module.

The output end of the main conversion module 7 is electrically connected with the input end of the model base 11, the output end of the model base 11 is electrically connected with the input end of the deep learning module 10, the voice awakening module 4 continuously picks up the microphone array to sample and quantize the collected audio, the voice recognition server 3 adopts a plurality of microphones to synchronously record and acquire high quality of a plurality of audio data and directly compares a plurality of audio signal frequency spectrums, the recorded signal with the highest repeatability is sent to the noise elimination module 2, the real-time network sharing module 5 adopts a 4G network mode for data transmission, the main control module 1 is built by installing a 9.0 system platform framework, the main conversion module 7 has the function of converting voice into text, can directly send the text to the semantic understanding server 12 if the text can be smoothly recognized, and sends the text to the model base 11 if the speech cannot be recognized, and the model base 11 records and compares the speech, the deep learning module 10 searches for the approximate term and matches the retrieved approximate term with language data through big data.

The noise elimination module 2 adopts a hardware coordination mode of a special dsp chip for noise reduction and eight microphone arrays, the noise reduction module adopts a spectral subtraction method for noise reduction, and the spectral subtraction calculation formula is as follows:

When the voice awakening module 4 is used, the microphone array is kept in a pickup state all the time, the voice frequency is subjected to mute recognition and noise reduction processing through built-in basic signal processing, the voice awakening module 4 can judge whether awakening words appear or not, if yes, the subsequent voice recognition server 3 is started up, interactive voice information is recognized and then processed through the noise reduction module 2, noise of a vehicle and wind is characterized by additive and local stability and is independent of voice signal statistics, a power spectrum of the noise is obtained through noise energy estimation and gain calculation according to the additive characteristic of the noise after the processing of a frequency spectrum method, then the estimated noise is subtracted from the power spectrum of the voice with noise to obtain pure voice, and then in the interactive voice of some spoken dialects, the main conversion module 7 is difficult to detect complete words, so that the voice with the noise is subjected to independent model file comparison through the model base 11, the deep learning module 10 is used for comparing with the database 9 in a big data mode, the final translated words are obtained and then processed through the raincoat understanding server, stable voice noise reduction can be kept in the voice recording process in the mode, the accuracy of the interactive information recording process is guaranteed, meanwhile, a real-time sharing module is used for enabling a plurality of vehicle machines to learn together in a big data interconnection mode, and the accuracy of recognition is greatly improved.

In the present invention, unless otherwise expressly stated or limited, "above" or "below" a first feature means that the first and second features are in direct contact, or that the first and second features are not in direct contact but are in contact with each other via another feature therebetween. Also, the first feature being "on," "above" and "over" the second feature includes the first feature being directly on and obliquely above the second feature, or merely indicating that the first feature is at a higher level than the second feature. A first feature being "under," "below," and "beneath" a second feature includes the first feature being directly under and obliquely below the second feature, or simply meaning that the first feature is at a lesser elevation than the second feature.

The foregoing shows and describes the general principles, essential features, and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and the preferred embodiments of the present invention are described in the above embodiments and the description, and are not intended to limit the present invention. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims

1. The utility model provides a vehicle-mounted artificial intelligence voice interaction system based on big data, includes host system (1), its characterized in that: the input end of the main control module (1) is electrically connected with the output end of the noise elimination module (2), the input end of the noise elimination module (2) is electrically connected with the output end of the voice recognition server (3), the input end of the voice recognition server (3) is electrically connected with the output end of the voice awakening module (4), the input end of the voice awakening module (4) is electrically connected with the output end of the noise elimination module (2), the main control module (1) is bidirectionally and electrically connected with the real-time network sharing module (5), the output end of the real-time network sharing module (5) is electrically connected with the input end of the cloud server (6), the output end of the cloud server (6) is electrically connected with the input end of the main conversion module (7), the output end of the main conversion module (7) is electrically connected with the input end of the classification module (8), and the output end of the classification module (8) is respectively and electrically connected with the input ends of the database (9) and the deep learning module (10), the deep learning module (10) and the classification module (8) are both electrically connected with the input end of the database (9), the output end of the main conversion module (7) is electrically connected with the input end of the semantic understanding server (12), and the output end of the semantic understanding server (12) is electrically connected with the input end of the network sharing module.

2. The vehicle-mounted artificial intelligence voice interaction system based on big data as claimed in claim 1, wherein: the output end of the main conversion module (7) is electrically connected with the input end of the model library (11), and the output end of the model library (11) is electrically connected with the input end of the deep learning module (10).

3. The vehicle-mounted artificial intelligence voice interaction system based on big data as claimed in claim 1, wherein: the noise elimination module (2) adopts a hardware coordination mode of a special noise reduction dsp chip and eight microphone arrays, the noise reduction module adopts a spectral subtraction method to reduce noise, and the spectral subtraction calculation formula is as follows:

4. The vehicle-mounted artificial intelligence voice interaction system based on big data according to claim 3, characterized in that: the voice awakening module (4) continuously carries out pickup state on the microphone array and carries out sampling and quantization on collected audio.

5. The vehicle-mounted artificial intelligence voice interaction system based on big data as claimed in claim 1, wherein: the voice recognition server (3) adopts a plurality of microphones to carry out synchronous high-quality recording on a plurality of audio data, directly compares a plurality of audio signal frequency spectrums, and sends a recording signal with the highest repetition degree to the noise elimination module (2).

6. The vehicle-mounted artificial intelligence voice interaction system based on big data as claimed in claim 1, wherein: the real-time network sharing module (5) adopts a 4G network mode for data transmission, and the main control module (1) is built by installing a 9.0 system platform framework.

7. The vehicle-mounted artificial intelligence voice interaction system based on big data as claimed in claim 1, wherein: the main conversion module (7) has a function of converting voice into text, and directly sends the text to the semantic understanding server (12) if the text can be smoothly recognized, and sends the text to the model library (11) if the text can not be recognized.

8. The vehicle-mounted artificial intelligence voice interaction system based on big data according to claim 7, characterized in that: the model library (11) collects and compares voice, searches approximate items through the deep learning module (10), and matches the retrieved approximate items with language data through big data.