CN112908316A - AI Intelligent Voice stream Collection - Google Patents

AI Intelligent Voice stream Collection Download PDF

Info

Publication number
CN112908316A
CN112908316A CN201911209849.9A CN201911209849A CN112908316A CN 112908316 A CN112908316 A CN 112908316A CN 201911209849 A CN201911209849 A CN 201911209849A CN 112908316 A CN112908316 A CN 112908316A
Authority
CN
China
Prior art keywords
voice
voice stream
user
stream
collected
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911209849.9A
Other languages
Chinese (zh)
Inventor
谢伟平
柯玉蒙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Sikaozhe Technology Co ltd
Original Assignee
Zhejiang Sikaozhe Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Sikaozhe Technology Co ltd filed Critical Zhejiang Sikaozhe Technology Co ltd
Priority to CN201911209849.9A priority Critical patent/CN112908316A/en
Publication of CN112908316A publication Critical patent/CN112908316A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/34Adaptation of a single recogniser for parallel processing, e.g. by use of multiple processors or cloud computing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/225Feedback of the input speech

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention discloses AI intelligent voice stream collection, starting a program, noticing whether the collected voice stream is sent or not, saying a sentence for a microphone, then seeing whether the voice stream is sent or not, converting the collected voice stream into an array, then judging whether the volume of the sentence is higher than a preset value or not, if so, transmitting the voice stream to a cloud server, and carrying out subsequent logic processing of a voice recognition machine. According to the invention, the voice stream of the microphone is monitored in real time, a user speaks in the microphone, the system collects the voice stream through the sampling frequency, the collected voice stream is converted into an array, then whether the volume of the voice stream is higher than a preset value or not is judged, if the volume of the voice stream is higher than the preset value, the voice stream is transmitted to the cloud server, and subsequent logic processing of the voice recognition machine is carried out.

Description

AI Intelligent Voice stream Collection
Technical Field
The invention relates to the technical field of voice stream acquisition, in particular to AI intelligent voice stream acquisition.
Background
In the process of man-machine conversation, a user inevitably has a situation of expression errors, which causes deviation of the machine to the language understanding of the user, and an error correction mechanism is very important for the machine at the moment.
The prior art has no triggering condition and consumes a large amount of network resources particularly when voice streams are collected and sent in real time, and the technology can realize the real-time collection of the voice streams and more humanized realization of sending the voice streams by processing the voice streams.
Disclosure of Invention
The invention aims to provide AI intelligent voice stream acquisition to solve the problems in the background art.
In order to achieve the purpose, the invention provides the following technical scheme:
the AI intelligent voice stream acquisition method comprises the following specific use methods:
the method comprises the following steps: deployment configuration:
firstly, downloading the latest python version in the win7 system environment, then installing a download package python-3.6.4-amd64.exe (installation version) matched with JDK1.8 and eclipse4.6 versions, then adding an installation catalog for the device and building the installation catalog to an environment variable, and only adding the installation catalog of the python into a pth system variable;
step two: deployment testing:
testing whether the python installation is successful, inputting a python command by a cmd open command line, inputting a print ('Hello World'), and completing the installation of the python environment through the test;
step three: connecting with a cloud server:
the technical modules required by the AI voice cloud server are four parts, namely automatic voice recognition, natural language understanding, natural language generation and text-to-voice conversion;
step four: program start test:
starting a program, paying attention to whether the collected voice stream is sent or not at the moment, saying a sentence for the microphone, and then, judging whether the voice stream is sent or not;
step five: formally starting a program:
after the test is finished, the collected voice stream can be successfully sent, the microphone speaks, the voice stream can be sent, and then the program is formally started to send the effective voice stream to the cloud server for automatic voice recognition, natural language understanding, natural language generation and text-to-voice conversion;
step six: collecting voice flow:
after the program is started, the voice stream of the microphone is monitored in real time, a user speaks in the microphone, and the system collects the voice stream through sampling frequency;
step seven: the human voice analysis and uploading cloud server:
and converting the collected voice stream into an array, judging whether the volume of the voice stream is higher than a preset value, and if so, transmitting the voice stream to a cloud server for subsequent logic processing of the voice recognition machine.
Preferably, the core of the interaction flow of the four module groups in the third step is the intention.
Preferably, the intentions are mainly expressed as voices spoken by a user after the user uses the AI voice stream for collection, the intentions represent the core function of the AI intelligence, if the AI intelligence successfully identifies the intentions of the user, the AI voice stream is required to collect the voice of the user and then feed the result back to the user, and if the AI intelligence cannot identify the intentions of the user, the user is required to be prompted to guide the user to use.
Preferably, how the AI intelligence recognizes the intention is to perform semantic analysis on the user voice collected by the AI voice stream, that is, to analyze and understand the voice recognition result, in short, the AI intelligence analyzes the corresponding machine instruction of the user voice pair, then analyzes the machine instruction, makes a feedback response after requesting processing to the cloud server, solves the answer of the user problem, generates a natural language sentence under the action of a dialogue management mechanism according to the internal representation obtained by the parsing module, and simultaneously converts the sentence generated by the natural language generation module into voice output.
Preferably, in the fourth step, if the collected voice stream is successfully transmitted, a sentence is said for the microphone, and the voice stream cannot be transmitted, the microphone is replaced to perform a test, and if the collected voice stream cannot be transmitted, the program operating environment and the automatic voice recognition module are debugged until the collected voice stream can be successfully transmitted, the microphone is spoken, and the voice stream can be transmitted.
Preferably, the human voice analysis in step seven is to analyze the collected voice, collect when the maximum volume threshold of the voice stream is greater than a preset amount, send the voice stream after the judgment, and eliminate the noise if the maximum volume threshold of the voice stream is less than the preset amount.
The invention has the technical effects and advantages that:
the invention monitors the voice flow of the microphone in real time after the program is started, a user speaks in the microphone, the system collects the voice flow through the sampling frequency, converts the collected voice flow into an array, then judges whether the volume of the voice flow is higher than a preset value, if the volume of the voice flow is higher than the preset value, the voice flow is transmitted to the cloud server, and the subsequent logic processing of the voice recognition machine is carried out, thereby avoiding the problem that the prior art has no triggering condition and consumes a large amount of network resources when the voice flow is collected and sent in real time, the technology can realize the real-time collection of the voice flow, and more humanization can be realized by processing the voice flow to send the voice flow.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The first embodiment is as follows:
the AI intelligent voice stream acquisition method comprises the following specific use methods:
the method comprises the following steps: deployment configuration:
firstly, downloading the latest python version in the win7 system environment, then installing a download package python-3.6.4-amd64.exe (installation version) matched with JDK1.8 and eclipse4.6 versions, then adding an installation catalog for the device and building the installation catalog to an environment variable, and only adding the installation catalog of the python into a pth system variable;
step two: deployment testing:
testing whether the python installation is successful, inputting a python command by a cmd open command line, inputting a print ('Hello World'), and completing the installation of the python environment through the test;
step three: connecting with a cloud server:
the technical modules required by the AI voice cloud server are four parts, namely automatic voice recognition, natural language understanding, natural language generation and text-to-voice conversion;
step four: program start test:
starting a program, paying attention to whether the collected voice stream is sent or not at the moment, saying a sentence for the microphone, and then, judging whether the voice stream is sent or not;
step five: : formally starting a program:
after the test is finished, the collected voice stream can be successfully sent, the microphone speaks, the voice stream can be sent, and then the program is formally started to send the effective voice stream to the cloud server for automatic voice recognition, natural language understanding, natural language generation and text-to-voice conversion;
step six: collecting voice flow:
after the program is started, the voice stream of the microphone is monitored in real time, a user speaks in the microphone, and the system collects the voice stream through sampling frequency;
step seven: the human voice analysis and uploading cloud server:
and converting the collected voice stream into an array, judging whether the volume of the voice stream is higher than a preset value, and if so, transmitting the voice stream to a cloud server for subsequent logic processing of the voice recognition machine.
Example two:
further, in the above scheme, the core of the interaction flow of the four module groups in the third step is the intention.
Further, in the above scheme, the intentions are mainly expressed as voices spoken by the user after the user uses the AI voice stream, and represent core functions of the AI intelligence, if the AI intelligence successfully identifies the intentions of the user, the AI voice stream is required to acquire the user voices and then feed results back to the user, and if the AI intelligence cannot identify the user intentions, the user needs to be prompted to guide the user to use.
Furthermore, in the above scheme, how the AI intelligence recognizes the intention is to mainly perform semantic parsing on the user voice collected by the AI voice stream, that is, to analyze and understand the voice recognition result, in short, the user voice is paired with each machine instruction, then the machine instructions are analyzed, a feedback response is made after the request processing is performed on the cloud server, the answer to the user problem is solved, a sentence in the natural language is generated under the action of the dialogue management mechanism according to the internal representation obtained by the parsing module, and the sentence generated by the natural language generation module is converted into voice output.
Further, in the above scheme, in the fourth step, if the collected voice stream is successfully transmitted, a sentence is said to be sent to the microphone, and the voice stream cannot be transmitted, the microphone is replaced to perform a test, and if the collected voice stream cannot be transmitted, the program operating environment and the automatic voice recognition module are debugged until the collected voice stream can be successfully transmitted, and the microphone is spoken, and the voice stream can be transmitted.
Further, in the above scheme, the human voice analysis in step seven is to analyze the collected voice, and when the maximum volume threshold of the voice stream is greater than the preset amount, the voice stream is collected and sent after being determined, and if the maximum volume threshold of the voice stream is less than the preset amount, the noise cancellation is performed.
Combine example 1 and example 2: through the steps, in the using process, the voice stream of the microphone is monitored in real time after the program is started, a user speaks in the microphone, the system collects the voice stream through the sampling frequency, the collected voice stream is converted into an array, whether the volume of the voice stream is higher than a preset value is judged, if the volume of the voice stream is higher than the preset value, the subsequent voice recognition machine logic processing is carried out, the problem that in the prior art, especially when the voice stream is collected and sent in real time, triggering conditions do not exist, a large amount of network resources can be consumed is solved, the voice stream can be collected in real time, the voice stream is processed, and the voice stream is sent in a more humanized mode.
Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the embodiments or portions thereof without departing from the spirit and scope of the invention.

Claims (6)

  1. AI intelligent voice stream collection, characterized by comprising the following specific methods of use:
    the method comprises the following steps: deployment configuration:
    firstly, downloading the latest python version in the win7 system environment, then installing a download package python-3.6.4-amd64.exe (installation version) matched with JDK1.8 and eclipse4.6 versions, then adding an installation catalog for the device and building the installation catalog to an environment variable, and only adding the installation catalog of the python into a pth system variable;
    step two: deployment testing:
    testing whether the python installation is successful, inputting a python command by a cmd open command line, inputting a print ('Hello World'), and completing the installation of the python environment through the test;
    step three: connecting with a cloud server:
    the technical modules required by the AI voice cloud server are four parts, namely automatic voice recognition, natural language understanding, natural language generation and text-to-voice conversion;
    step four: program start test:
    starting a program, paying attention to whether the collected voice stream is sent or not at the moment, saying a sentence for the microphone, and then, judging whether the voice stream is sent or not;
    step five: : formally starting a program:
    after the test is finished, the collected voice stream can be successfully sent, the microphone speaks, the voice stream can be sent, and then the program is formally started to send the effective voice stream to the cloud server for automatic voice recognition, natural language understanding, natural language generation and text-to-voice conversion;
    step six: collecting voice flow:
    after the program is started, the voice stream of the microphone is monitored in real time, a user speaks in the microphone, and the system collects the voice stream through sampling frequency;
    step seven: the human voice analysis and uploading cloud server:
    and converting the collected voice stream into an array, judging whether the volume of the voice stream is higher than a preset value, and if so, transmitting the voice stream to a cloud server for subsequent logic processing of the voice recognition machine.
  2. 2. The AI smart voice stream capture as claimed in claim 1, wherein: the core of the interaction flow of the four module groups in the third step is the intention.
  3. 3. The AI smart voice stream capture as claimed in claim 2, wherein: the intentions are mainly expressed as voices spoken by a user after the user uses the AI voice stream to collect, the intentions represent the core functions of AI intelligence, if the AI intelligence successfully identifies the intentions of the user, the AI voice stream is required to collect the voices of the user, the results are fed back to the user, and if the AI intelligence cannot identify the intentions of the user, the user is required to be prompted to guide the user to use.
  4. 4. The AI smart voice stream capture of claim 3, wherein: the AI intelligence identifies the intention, mainly carries out semantic analysis on the user voice collected by the AI voice stream, namely analyzes and understands the voice identification result, namely simply corresponding the user voice to each machine instruction, then analyzes the machine instruction, makes a feedback response after requesting and processing the cloud server, solves the answer of the user problem, generates a sentence of natural language under the action of a dialogue management mechanism according to the internal representation obtained by an analysis module, and simultaneously converts the sentence generated by a natural language generation module into voice for output.
  5. 5. The AI smart voice stream capture as claimed in claim 1, wherein: and in the fourth step, if the collected voice stream is successfully transmitted, a sentence is spoken into the microphone, and the voice stream cannot be transmitted, the microphone is replaced for testing, and if the collected voice stream cannot be transmitted, the program running environment and the automatic voice recognition module are debugged until the collected voice stream can be successfully transmitted, and the microphone is spoken, and the voice stream can be transmitted.
  6. 6. The AI smart voice stream capture as claimed in claim 1, wherein: and step seven, the human voice analysis is to analyze the collected voice, when the maximum volume threshold value of the voice stream is greater than the preset amount, the voice stream is collected and sent after being judged, and if the maximum volume threshold value of the voice stream is less than the preset amount, the noise elimination is carried out.
CN201911209849.9A 2019-12-02 2019-12-02 AI Intelligent Voice stream Collection Pending CN112908316A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911209849.9A CN112908316A (en) 2019-12-02 2019-12-02 AI Intelligent Voice stream Collection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911209849.9A CN112908316A (en) 2019-12-02 2019-12-02 AI Intelligent Voice stream Collection

Publications (1)

Publication Number Publication Date
CN112908316A true CN112908316A (en) 2021-06-04

Family

ID=76104216

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911209849.9A Pending CN112908316A (en) 2019-12-02 2019-12-02 AI Intelligent Voice stream Collection

Country Status (1)

Country Link
CN (1) CN112908316A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107146602A (en) * 2017-04-10 2017-09-08 北京猎户星空科技有限公司 A kind of audio recognition method, device and electronic equipment
CN109509470A (en) * 2018-12-11 2019-03-22 平安科技(深圳)有限公司 Voice interactive method, device, computer readable storage medium and terminal device
CN109961777A (en) * 2019-02-16 2019-07-02 天津大学 A kind of voice interactive method based on intelligent robot
CN110223711A (en) * 2019-06-03 2019-09-10 清华大学 Interactive voice based on microphone signal wakes up electronic equipment, method and medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107146602A (en) * 2017-04-10 2017-09-08 北京猎户星空科技有限公司 A kind of audio recognition method, device and electronic equipment
CN109509470A (en) * 2018-12-11 2019-03-22 平安科技(深圳)有限公司 Voice interactive method, device, computer readable storage medium and terminal device
CN109961777A (en) * 2019-02-16 2019-07-02 天津大学 A kind of voice interactive method based on intelligent robot
CN110223711A (en) * 2019-06-03 2019-09-10 清华大学 Interactive voice based on microphone signal wakes up electronic equipment, method and medium

Similar Documents

Publication Publication Date Title
CN108257616A (en) Interactive detection method and device
CN107277272A (en) A kind of bluetooth equipment voice interactive method and system based on software APP
CN106560892B (en) Intelligent robot, cloud interaction method thereof and cloud interaction system
TW201626363A (en) A voice recognition system of a robot system and method thereof
WO2021169615A1 (en) Voice response processing method and apparatus based on artificial intelligence, device, and medium
CN105488032A (en) Speech recognition input control method and system
WO2014023257A1 (en) Instruction processing method, apparatus, and system
CN109347708B (en) Voice recognition method and device, household appliance, cloud server and medium
CN110992955A (en) Voice operation method, device, equipment and storage medium of intelligent equipment
CN107016070A (en) A kind of interactive method and device for intelligent robot
CN111933149A (en) Voice interaction method, wearable device, terminal and voice interaction system
CN102023644A (en) Method for controlling cradle head based on voice recognition technology
CN112002315A (en) Voice control method and device, electrical equipment, storage medium and processor
CN111629110A (en) Voice interaction method and voice interaction system
CN112908316A (en) AI Intelligent Voice stream Collection
CN109935230A (en) A kind of survey hair password monitoring system and method based on voice driven
CN113241063B (en) Algorithm parameter updating method, device, terminal and medium in voice recognition system
CN110728980A (en) Intelligent service bus system based on voice and conversation robot
CN115567336A (en) Smart home based wake-up-free voice control system and method
CN113595811B (en) Equipment performance testing method and device, storage medium and electronic device
Meng et al. Application Research and Implementation of Voice Control System Based on Android Speech Recognition
CN114420103A (en) Voice processing method and device, electronic equipment and storage medium
CN111292746A (en) Voice input conversion system based on human-computer interaction
CN112656309A (en) Function execution method and device of sweeper, readable storage medium and electronic equipment
CN110266894A (en) A kind of call method and system of automatic busy tone detecting

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination