CN113593577A - Vehicle-mounted artificial intelligence voice interaction system based on big data - Google Patents

Vehicle-mounted artificial intelligence voice interaction system based on big data Download PDF

Info

Publication number
CN113593577A
CN113593577A CN202111038179.6A CN202111038179A CN113593577A CN 113593577 A CN113593577 A CN 113593577A CN 202111038179 A CN202111038179 A CN 202111038179A CN 113593577 A CN113593577 A CN 113593577A
Authority
CN
China
Prior art keywords
module
electrically connected
voice
big data
output end
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111038179.6A
Other languages
Chinese (zh)
Inventor
王永锋
朱方其
徐波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan Yihaitian Technology Co ltd
Original Assignee
Sichuan Yihaitian Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan Yihaitian Technology Co ltd filed Critical Sichuan Yihaitian Technology Co ltd
Priority to CN202111038179.6A priority Critical patent/CN113593577A/en
Publication of CN113593577A publication Critical patent/CN113593577A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/005Language recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Traffic Control Systems (AREA)

Abstract

The invention discloses a vehicle-mounted artificial intelligence voice interaction system based on big data, which is characterized in that a noise elimination module, a main control module, a real-time network sharing module, a main conversion module and a deep learning module are arranged, when in use, a voice awakening module keeps a microphone array in a pickup state all the time, the audio is subjected to mute recognition and noise reduction, the voice awakening module can judge whether awakening words appear or not, if yes, a subsequent voice recognition server is started, and then the voice awakening module is compared with a database in a big data mode through the deep learning module, so that the final translated words are obtained and then processed through a raincoat understanding server, the method can keep stable voice noise reduction in the voice receiving and recording process, ensure the accuracy of the interactive information receiving and recording process, and simultaneously adopt the real-time sharing module to interconnect a plurality of vehicle machines in a big data mode, the method and the device can carry out common learning, and the accuracy of identification is greatly improved.

Description

Vehicle-mounted artificial intelligence voice interaction system based on big data
Technical Field
The invention relates to the field of intelligent voice interaction, in particular to a vehicle-mounted artificial intelligence voice interaction system based on big data.
Background
Along with the development of the vehicle machine system of the current vehicle, the intelligent voice system has a good auxiliary effect on the driving process, the functions of direct communication and function selection can be avoided, and the intelligent voice interaction system of the vehicle machine gradually tends to be perfect in the development of the intelligent voice interaction system.
However, the existing voice interaction system for the vehicle is very single in function when in use, and is accessed into a background database for question and answer processing in a real-time networking mode, so that the system has relatively comprehensive requirements on knowledge storage in the database, and as an operating vehicle, the system aims at the five-flower eight-door problems of different customers and self accent, has extremely high challenges on intelligent voice interaction, and is the most basic layer for stable sound acquisition and accurate identification information, so that the system is indispensable for acquiring and ensuring the audio.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a vehicle-mounted artificial intelligence voice interaction system based on big data, which is used as an operation vehicle, aims at the five-flower and eight-door problems of different customers, has extremely high challenge on intelligent voice interaction, and is the most basic layer for stable sound acquisition and accurate identification information, so that the audio acquisition guarantee is indispensable.
In order to solve the technical problems, the invention provides the following technical scheme: a vehicle-mounted artificial intelligence voice interaction system based on big data comprises a main control module, wherein the input end of the main control module is electrically connected with the output end of a noise elimination module, the input end of the noise elimination module is electrically connected with the output end of a voice recognition server, the input end of the voice recognition server is electrically connected with the output end of a voice awakening module, the input end of the voice awakening module is electrically connected with the output end of the noise elimination module, the main control module is bidirectionally and electrically connected with a real-time network sharing module, the output end of the real-time network sharing module is electrically connected with the input end of a cloud server, the output end of the cloud server is electrically connected with the input end of a main conversion module, the output end of the main conversion module is electrically connected with the input end of a classification module, and the output end of the classification module is respectively and electrically connected with the input ends of a database and a deep learning module, the deep learning module and the classification module are both electrically connected with the input end of the database, the output end of the main conversion module is electrically connected with the input end of the semantic understanding server, and the output end of the semantic understanding server is electrically connected with the input end of the network sharing module.
As a preferred technical solution of the present invention, an output end of the main conversion module is electrically connected to an input end of the model library, and an output end of the model library is electrically connected to an input end of the deep learning module.
As a preferred technical solution of the present invention, the noise elimination module adopts a hardware coordination manner of a dsp chip dedicated for noise reduction and eight microphone arrays, the noise reduction module adopts a spectral subtraction method for noise reduction, and the spectral subtraction calculation formula is as follows:
Figure BDA0003248148890000021
wherein: p is a radical ofs(omega) is the power spectrum of the clean speech signal, px(omega) is the power spectrum of the acoustic signal, pnAnd (omega) is the noise signal power spectrum.
As a preferred technical solution of the present invention, the voice wake-up module continuously performs a pickup state on the microphone array to sample and quantize the collected audio.
As a preferred technical solution of the present invention, the voice recognition server performs synchronous high-quality recording of a plurality of audio data using a plurality of microphones, directly compares a plurality of audio signal frequency spectrums, and transmits a recording signal with the highest repetition degree to the noise cancellation module.
As a preferred technical scheme of the invention, the real-time network sharing module adopts 4G network mode data transmission, and the main control module is built by installing a 9.0 system platform framework.
As a preferred technical solution of the present invention, the main conversion module specifically has a function of converting speech into text, and directly sends the text to the semantic understanding server if the text can be smoothly recognized, and sends the text to the model library if speech cannot be recognized.
As a preferred technical solution of the present invention, the model library records and compares speech, searches for an approximate term through the deep learning module, and matches the retrieved approximate term with language data through big data.
Compared with the prior art, the invention can achieve the following beneficial effects:
by arranging the noise elimination module, the voice recognition server, the voice awakening module, the main control module, the real-time network sharing module, the main conversion module and the deep learning module, when the voice awakening module is used, the voice awakening module keeps the microphone array in a pickup state all the time, and performs mute recognition and noise reduction processing on audio frequency through built-in basic signal processing, the voice awakening module can judge whether awakening words appear or not, if so, a subsequent voice recognition server is started, the voice awakening module processes the voice information after interactive voice information is recognized, noise of the vehicle and the wind is characterized by additive characteristics, stable local parts and independent of voice signal statistics, a power spectrum of noise is obtained through noise energy estimation and gain calculation according to the additive characteristics of the noise after a spectrum method is processed, and then the estimated noise is subtracted from the power spectrum with the voice noise to obtain pure voice, then, in some interactive voices of accent tape dialects, the main conversion module is difficult to detect complete characters, independent model file comparison is established through the model base, the main conversion module is compared with the database in a big data mode immediately, the obtained final translated characters are processed through the raincoat understanding server, the method can be used for stably reducing noise in the voice recording process, accuracy of the interactive information recording process is guaranteed, meanwhile, a real-time sharing module is used for enabling a plurality of vehicle machines to learn together in a big data interconnection mode, and accuracy of recognition is greatly improved.
Drawings
FIG. 1 is a schematic diagram of the system of the present invention;
fig. 2 is a block diagram of a process of the noise cancellation module of the present invention.
Wherein: the system comprises a main control module 1, a noise elimination module 2, a voice recognition server 3, a voice awakening module 4, a real-time network sharing module 5, a cloud server 6, a main conversion module 7, a classification module 8, a database 9, a deep learning module 10, a model library 11 and a semantic understanding server 12.
Detailed Description
Technical means for implementing the present invention; authoring features; the purpose served by the disclosure is to provide a thorough understanding of the invention, and is to be construed as being a limitation on the scope of the invention as defined by the appended claims. Based on the embodiments in the implementation, other embodiments obtained by those skilled in the art without any creative efforts belong to the protection scope of the present invention. The experimental methods in the following examples, unless otherwise specified, are conventional methods, materials used in the following examples; reagents and the like are commercially available unless otherwise specified.
Examples
As shown in fig. 1-2, the present invention provides a vehicle-mounted artificial intelligence voice interaction system based on big data, which comprises a main control module 1, wherein an input end of the main control module 1 is electrically connected with an output end of a noise elimination module 2, an input end of the noise elimination module 2 is electrically connected with an output end of a voice recognition server 3, an input end of the voice recognition server 3 is electrically connected with an output end of a voice wake-up module 4, an input end of the voice wake-up module 4 is electrically connected with an output end of the noise elimination module 2, an output end of the main control module 1 is bidirectionally electrically connected with a real-time network sharing module 5, an output end of the real-time network sharing module 5 is electrically connected with an input end of a cloud server 6, an output end of the cloud server 6 is electrically connected with an input end of a main conversion module 7, an output end of the main conversion module 7 is electrically connected with an input end of a classification module 8, and an output end of the classification module 8 is electrically connected with input ends of a database 9 and a deep learning module 10 respectively, the deep learning module 10 and the classification module 8 are both electrically connected with the input end of the database 9, the output end of the main conversion module 7 is electrically connected with the input end of the semantic understanding server 12, and the output end of the semantic understanding server 12 is electrically connected with the input end of the network sharing module.
The output end of the main conversion module 7 is electrically connected with the input end of the model base 11, the output end of the model base 11 is electrically connected with the input end of the deep learning module 10, the voice awakening module 4 continuously picks up the microphone array to sample and quantize the collected audio, the voice recognition server 3 adopts a plurality of microphones to synchronously record and acquire high quality of a plurality of audio data and directly compares a plurality of audio signal frequency spectrums, the recorded signal with the highest repeatability is sent to the noise elimination module 2, the real-time network sharing module 5 adopts a 4G network mode for data transmission, the main control module 1 is built by installing a 9.0 system platform framework, the main conversion module 7 has the function of converting voice into text, can directly send the text to the semantic understanding server 12 if the text can be smoothly recognized, and sends the text to the model base 11 if the speech cannot be recognized, and the model base 11 records and compares the speech, the deep learning module 10 searches for the approximate term and matches the retrieved approximate term with language data through big data.
The noise elimination module 2 adopts a hardware coordination mode of a special dsp chip for noise reduction and eight microphone arrays, the noise reduction module adopts a spectral subtraction method for noise reduction, and the spectral subtraction calculation formula is as follows:
Figure BDA0003248148890000061
wherein: p is a radical ofs(omega) is the power spectrum of the clean speech signal, px(omega) is the power spectrum of the acoustic signal, pnAnd (omega) is the noise signal power spectrum.
When the voice awakening module 4 is used, the microphone array is kept in a pickup state all the time, the voice frequency is subjected to mute recognition and noise reduction processing through built-in basic signal processing, the voice awakening module 4 can judge whether awakening words appear or not, if yes, the subsequent voice recognition server 3 is started up, interactive voice information is recognized and then processed through the noise reduction module 2, noise of a vehicle and wind is characterized by additive and local stability and is independent of voice signal statistics, a power spectrum of the noise is obtained through noise energy estimation and gain calculation according to the additive characteristic of the noise after the processing of a frequency spectrum method, then the estimated noise is subtracted from the power spectrum of the voice with noise to obtain pure voice, and then in the interactive voice of some spoken dialects, the main conversion module 7 is difficult to detect complete words, so that the voice with the noise is subjected to independent model file comparison through the model base 11, the deep learning module 10 is used for comparing with the database 9 in a big data mode, the final translated words are obtained and then processed through the raincoat understanding server, stable voice noise reduction can be kept in the voice recording process in the mode, the accuracy of the interactive information recording process is guaranteed, meanwhile, a real-time sharing module is used for enabling a plurality of vehicle machines to learn together in a big data interconnection mode, and the accuracy of recognition is greatly improved.
In the present invention, unless otherwise expressly stated or limited, "above" or "below" a first feature means that the first and second features are in direct contact, or that the first and second features are not in direct contact but are in contact with each other via another feature therebetween. Also, the first feature being "on," "above" and "over" the second feature includes the first feature being directly on and obliquely above the second feature, or merely indicating that the first feature is at a higher level than the second feature. A first feature being "under," "below," and "beneath" a second feature includes the first feature being directly under and obliquely below the second feature, or simply meaning that the first feature is at a lesser elevation than the second feature.
The foregoing shows and describes the general principles, essential features, and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and the preferred embodiments of the present invention are described in the above embodiments and the description, and are not intended to limit the present invention. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (8)

1. The utility model provides a vehicle-mounted artificial intelligence voice interaction system based on big data, includes host system (1), its characterized in that: the input end of the main control module (1) is electrically connected with the output end of the noise elimination module (2), the input end of the noise elimination module (2) is electrically connected with the output end of the voice recognition server (3), the input end of the voice recognition server (3) is electrically connected with the output end of the voice awakening module (4), the input end of the voice awakening module (4) is electrically connected with the output end of the noise elimination module (2), the main control module (1) is bidirectionally and electrically connected with the real-time network sharing module (5), the output end of the real-time network sharing module (5) is electrically connected with the input end of the cloud server (6), the output end of the cloud server (6) is electrically connected with the input end of the main conversion module (7), the output end of the main conversion module (7) is electrically connected with the input end of the classification module (8), and the output end of the classification module (8) is respectively and electrically connected with the input ends of the database (9) and the deep learning module (10), the deep learning module (10) and the classification module (8) are both electrically connected with the input end of the database (9), the output end of the main conversion module (7) is electrically connected with the input end of the semantic understanding server (12), and the output end of the semantic understanding server (12) is electrically connected with the input end of the network sharing module.
2. The vehicle-mounted artificial intelligence voice interaction system based on big data as claimed in claim 1, wherein: the output end of the main conversion module (7) is electrically connected with the input end of the model library (11), and the output end of the model library (11) is electrically connected with the input end of the deep learning module (10).
3. The vehicle-mounted artificial intelligence voice interaction system based on big data as claimed in claim 1, wherein: the noise elimination module (2) adopts a hardware coordination mode of a special noise reduction dsp chip and eight microphone arrays, the noise reduction module adopts a spectral subtraction method to reduce noise, and the spectral subtraction calculation formula is as follows:
Figure FDA0003248148880000021
wherein: p is a radical ofs(omega) is the power spectrum of the clean speech signal, px(omega) is the power spectrum of the acoustic signal, pnAnd (omega) is the noise signal power spectrum.
4. The vehicle-mounted artificial intelligence voice interaction system based on big data according to claim 3, characterized in that: the voice awakening module (4) continuously carries out pickup state on the microphone array and carries out sampling and quantization on collected audio.
5. The vehicle-mounted artificial intelligence voice interaction system based on big data as claimed in claim 1, wherein: the voice recognition server (3) adopts a plurality of microphones to carry out synchronous high-quality recording on a plurality of audio data, directly compares a plurality of audio signal frequency spectrums, and sends a recording signal with the highest repetition degree to the noise elimination module (2).
6. The vehicle-mounted artificial intelligence voice interaction system based on big data as claimed in claim 1, wherein: the real-time network sharing module (5) adopts a 4G network mode for data transmission, and the main control module (1) is built by installing a 9.0 system platform framework.
7. The vehicle-mounted artificial intelligence voice interaction system based on big data as claimed in claim 1, wherein: the main conversion module (7) has a function of converting voice into text, and directly sends the text to the semantic understanding server (12) if the text can be smoothly recognized, and sends the text to the model library (11) if the text can not be recognized.
8. The vehicle-mounted artificial intelligence voice interaction system based on big data according to claim 7, characterized in that: the model library (11) collects and compares voice, searches approximate items through the deep learning module (10), and matches the retrieved approximate items with language data through big data.
CN202111038179.6A 2021-09-06 2021-09-06 Vehicle-mounted artificial intelligence voice interaction system based on big data Pending CN113593577A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111038179.6A CN113593577A (en) 2021-09-06 2021-09-06 Vehicle-mounted artificial intelligence voice interaction system based on big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111038179.6A CN113593577A (en) 2021-09-06 2021-09-06 Vehicle-mounted artificial intelligence voice interaction system based on big data

Publications (1)

Publication Number Publication Date
CN113593577A true CN113593577A (en) 2021-11-02

Family

ID=78241346

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111038179.6A Pending CN113593577A (en) 2021-09-06 2021-09-06 Vehicle-mounted artificial intelligence voice interaction system based on big data

Country Status (1)

Country Link
CN (1) CN113593577A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103700370A (en) * 2013-12-04 2014-04-02 北京中科模识科技有限公司 Broadcast television voice recognition method and system
CN107135247A (en) * 2017-02-16 2017-09-05 江苏南大电子信息技术股份有限公司 A kind of service system and method for the intelligent coordinated work of person to person's work
CN112015874A (en) * 2020-07-30 2020-12-01 上海松鼠课堂人工智能科技有限公司 Student mental health accompany conversation system
CN112331207A (en) * 2020-09-30 2021-02-05 音数汇元(上海)智能科技有限公司 Service content monitoring method and device, electronic equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103700370A (en) * 2013-12-04 2014-04-02 北京中科模识科技有限公司 Broadcast television voice recognition method and system
CN107135247A (en) * 2017-02-16 2017-09-05 江苏南大电子信息技术股份有限公司 A kind of service system and method for the intelligent coordinated work of person to person's work
CN112015874A (en) * 2020-07-30 2020-12-01 上海松鼠课堂人工智能科技有限公司 Student mental health accompany conversation system
CN112331207A (en) * 2020-09-30 2021-02-05 音数汇元(上海)智能科技有限公司 Service content monitoring method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN110364143B (en) Voice awakening method and device and intelligent electronic equipment
CN111508498B (en) Conversational speech recognition method, conversational speech recognition system, electronic device, and storage medium
US9336780B2 (en) Identification of a local speaker
CN102543073B (en) Shanghai dialect phonetic recognition information processing method
CN109256150A (en) Speech emotion recognition system and method based on machine learning
CN108564952A (en) The method and apparatus of speech roles separation
CN111489765A (en) Telephone traffic service quality inspection method based on intelligent voice technology
CN110890096A (en) Intelligent voice system and method based on voice analysis
WO2021196802A1 (en) Method, apparatus, and device for training multimode voice recognition model, and storage medium
CN111145763A (en) GRU-based voice recognition method and system in audio
CN110858476A (en) Sound collection method and device based on microphone array
CN111508527B (en) Telephone answering state detection method, device and server
CN111883135A (en) Voice transcription method and device and electronic equipment
CN110148418B (en) Scene record analysis system, method and device
CN111107284B (en) Real-time generation system and generation method for video subtitles
CN111276150A (en) Intelligent voice-to-character and simultaneous interpretation system based on microphone array
CN101950564A (en) Remote digital voice acquisition, analysis and identification system
CN116825123A (en) Tone quality optimization method and system based on audio push
CN113593577A (en) Vehicle-mounted artificial intelligence voice interaction system based on big data
CN111833878A (en) Chinese voice interaction non-inductive control system and method based on raspberry Pi edge calculation
CN112151055A (en) Audio processing method and device
CN111179972A (en) Human voice detection algorithm based on deep learning
CN116129942A (en) Voice interaction device and voice interaction method
CN117059068A (en) Speech processing method, device, storage medium and computer equipment
CN114664303A (en) Continuous voice instruction rapid recognition control system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination