CN112908316A - AI Intelligent Voice stream Collection - Google Patents
AI Intelligent Voice stream Collection Download PDFInfo
- Publication number
- CN112908316A CN112908316A CN201911209849.9A CN201911209849A CN112908316A CN 112908316 A CN112908316 A CN 112908316A CN 201911209849 A CN201911209849 A CN 201911209849A CN 112908316 A CN112908316 A CN 112908316A
- Authority
- CN
- China
- Prior art keywords
- voice
- voice stream
- user
- stream
- collected
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012545 processing Methods 0.000 claims abstract description 12
- 238000005070 sampling Methods 0.000 claims abstract description 6
- 238000009434 installation Methods 0.000 claims description 18
- 238000012360 testing method Methods 0.000 claims description 18
- 238000000034 method Methods 0.000 claims description 10
- 238000006243 chemical reaction Methods 0.000 claims description 6
- 230000007246 mechanism Effects 0.000 claims description 4
- 230000009471 action Effects 0.000 claims description 3
- 230000006870 function Effects 0.000 claims description 3
- 230000003993 interaction Effects 0.000 claims description 3
- 230000004044 response Effects 0.000 claims description 3
- 230000008030 elimination Effects 0.000 claims 1
- 238000003379 elimination reaction Methods 0.000 claims 1
- 230000008569 process Effects 0.000 description 2
- 238000012937 correction Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1822—Parsing for meaning understanding
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/30—Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/34—Adaptation of a single recogniser for parallel processing, e.g. by use of multiple processors or cloud computing
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/225—Feedback of the input speech
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Theoretical Computer Science (AREA)
- Telephonic Communication Services (AREA)
Abstract
The invention discloses AI intelligent voice stream collection, starting a program, noticing whether the collected voice stream is sent or not, saying a sentence for a microphone, then seeing whether the voice stream is sent or not, converting the collected voice stream into an array, then judging whether the volume of the sentence is higher than a preset value or not, if so, transmitting the voice stream to a cloud server, and carrying out subsequent logic processing of a voice recognition machine. According to the invention, the voice stream of the microphone is monitored in real time, a user speaks in the microphone, the system collects the voice stream through the sampling frequency, the collected voice stream is converted into an array, then whether the volume of the voice stream is higher than a preset value or not is judged, if the volume of the voice stream is higher than the preset value, the voice stream is transmitted to the cloud server, and subsequent logic processing of the voice recognition machine is carried out.
Description
Technical Field
The invention relates to the technical field of voice stream acquisition, in particular to AI intelligent voice stream acquisition.
Background
In the process of man-machine conversation, a user inevitably has a situation of expression errors, which causes deviation of the machine to the language understanding of the user, and an error correction mechanism is very important for the machine at the moment.
The prior art has no triggering condition and consumes a large amount of network resources particularly when voice streams are collected and sent in real time, and the technology can realize the real-time collection of the voice streams and more humanized realization of sending the voice streams by processing the voice streams.
Disclosure of Invention
The invention aims to provide AI intelligent voice stream acquisition to solve the problems in the background art.
In order to achieve the purpose, the invention provides the following technical scheme:
the AI intelligent voice stream acquisition method comprises the following specific use methods:
the method comprises the following steps: deployment configuration:
firstly, downloading the latest python version in the win7 system environment, then installing a download package python-3.6.4-amd64.exe (installation version) matched with JDK1.8 and eclipse4.6 versions, then adding an installation catalog for the device and building the installation catalog to an environment variable, and only adding the installation catalog of the python into a pth system variable;
step two: deployment testing:
testing whether the python installation is successful, inputting a python command by a cmd open command line, inputting a print ('Hello World'), and completing the installation of the python environment through the test;
step three: connecting with a cloud server:
the technical modules required by the AI voice cloud server are four parts, namely automatic voice recognition, natural language understanding, natural language generation and text-to-voice conversion;
step four: program start test:
starting a program, paying attention to whether the collected voice stream is sent or not at the moment, saying a sentence for the microphone, and then, judging whether the voice stream is sent or not;
step five: formally starting a program:
after the test is finished, the collected voice stream can be successfully sent, the microphone speaks, the voice stream can be sent, and then the program is formally started to send the effective voice stream to the cloud server for automatic voice recognition, natural language understanding, natural language generation and text-to-voice conversion;
step six: collecting voice flow:
after the program is started, the voice stream of the microphone is monitored in real time, a user speaks in the microphone, and the system collects the voice stream through sampling frequency;
step seven: the human voice analysis and uploading cloud server:
and converting the collected voice stream into an array, judging whether the volume of the voice stream is higher than a preset value, and if so, transmitting the voice stream to a cloud server for subsequent logic processing of the voice recognition machine.
Preferably, the core of the interaction flow of the four module groups in the third step is the intention.
Preferably, the intentions are mainly expressed as voices spoken by a user after the user uses the AI voice stream for collection, the intentions represent the core function of the AI intelligence, if the AI intelligence successfully identifies the intentions of the user, the AI voice stream is required to collect the voice of the user and then feed the result back to the user, and if the AI intelligence cannot identify the intentions of the user, the user is required to be prompted to guide the user to use.
Preferably, how the AI intelligence recognizes the intention is to perform semantic analysis on the user voice collected by the AI voice stream, that is, to analyze and understand the voice recognition result, in short, the AI intelligence analyzes the corresponding machine instruction of the user voice pair, then analyzes the machine instruction, makes a feedback response after requesting processing to the cloud server, solves the answer of the user problem, generates a natural language sentence under the action of a dialogue management mechanism according to the internal representation obtained by the parsing module, and simultaneously converts the sentence generated by the natural language generation module into voice output.
Preferably, in the fourth step, if the collected voice stream is successfully transmitted, a sentence is said for the microphone, and the voice stream cannot be transmitted, the microphone is replaced to perform a test, and if the collected voice stream cannot be transmitted, the program operating environment and the automatic voice recognition module are debugged until the collected voice stream can be successfully transmitted, the microphone is spoken, and the voice stream can be transmitted.
Preferably, the human voice analysis in step seven is to analyze the collected voice, collect when the maximum volume threshold of the voice stream is greater than a preset amount, send the voice stream after the judgment, and eliminate the noise if the maximum volume threshold of the voice stream is less than the preset amount.
The invention has the technical effects and advantages that:
the invention monitors the voice flow of the microphone in real time after the program is started, a user speaks in the microphone, the system collects the voice flow through the sampling frequency, converts the collected voice flow into an array, then judges whether the volume of the voice flow is higher than a preset value, if the volume of the voice flow is higher than the preset value, the voice flow is transmitted to the cloud server, and the subsequent logic processing of the voice recognition machine is carried out, thereby avoiding the problem that the prior art has no triggering condition and consumes a large amount of network resources when the voice flow is collected and sent in real time, the technology can realize the real-time collection of the voice flow, and more humanization can be realized by processing the voice flow to send the voice flow.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The first embodiment is as follows:
the AI intelligent voice stream acquisition method comprises the following specific use methods:
the method comprises the following steps: deployment configuration:
firstly, downloading the latest python version in the win7 system environment, then installing a download package python-3.6.4-amd64.exe (installation version) matched with JDK1.8 and eclipse4.6 versions, then adding an installation catalog for the device and building the installation catalog to an environment variable, and only adding the installation catalog of the python into a pth system variable;
step two: deployment testing:
testing whether the python installation is successful, inputting a python command by a cmd open command line, inputting a print ('Hello World'), and completing the installation of the python environment through the test;
step three: connecting with a cloud server:
the technical modules required by the AI voice cloud server are four parts, namely automatic voice recognition, natural language understanding, natural language generation and text-to-voice conversion;
step four: program start test:
starting a program, paying attention to whether the collected voice stream is sent or not at the moment, saying a sentence for the microphone, and then, judging whether the voice stream is sent or not;
step five: : formally starting a program:
after the test is finished, the collected voice stream can be successfully sent, the microphone speaks, the voice stream can be sent, and then the program is formally started to send the effective voice stream to the cloud server for automatic voice recognition, natural language understanding, natural language generation and text-to-voice conversion;
step six: collecting voice flow:
after the program is started, the voice stream of the microphone is monitored in real time, a user speaks in the microphone, and the system collects the voice stream through sampling frequency;
step seven: the human voice analysis and uploading cloud server:
and converting the collected voice stream into an array, judging whether the volume of the voice stream is higher than a preset value, and if so, transmitting the voice stream to a cloud server for subsequent logic processing of the voice recognition machine.
Example two:
further, in the above scheme, the core of the interaction flow of the four module groups in the third step is the intention.
Further, in the above scheme, the intentions are mainly expressed as voices spoken by the user after the user uses the AI voice stream, and represent core functions of the AI intelligence, if the AI intelligence successfully identifies the intentions of the user, the AI voice stream is required to acquire the user voices and then feed results back to the user, and if the AI intelligence cannot identify the user intentions, the user needs to be prompted to guide the user to use.
Furthermore, in the above scheme, how the AI intelligence recognizes the intention is to mainly perform semantic parsing on the user voice collected by the AI voice stream, that is, to analyze and understand the voice recognition result, in short, the user voice is paired with each machine instruction, then the machine instructions are analyzed, a feedback response is made after the request processing is performed on the cloud server, the answer to the user problem is solved, a sentence in the natural language is generated under the action of the dialogue management mechanism according to the internal representation obtained by the parsing module, and the sentence generated by the natural language generation module is converted into voice output.
Further, in the above scheme, in the fourth step, if the collected voice stream is successfully transmitted, a sentence is said to be sent to the microphone, and the voice stream cannot be transmitted, the microphone is replaced to perform a test, and if the collected voice stream cannot be transmitted, the program operating environment and the automatic voice recognition module are debugged until the collected voice stream can be successfully transmitted, and the microphone is spoken, and the voice stream can be transmitted.
Further, in the above scheme, the human voice analysis in step seven is to analyze the collected voice, and when the maximum volume threshold of the voice stream is greater than the preset amount, the voice stream is collected and sent after being determined, and if the maximum volume threshold of the voice stream is less than the preset amount, the noise cancellation is performed.
Combine example 1 and example 2: through the steps, in the using process, the voice stream of the microphone is monitored in real time after the program is started, a user speaks in the microphone, the system collects the voice stream through the sampling frequency, the collected voice stream is converted into an array, whether the volume of the voice stream is higher than a preset value is judged, if the volume of the voice stream is higher than the preset value, the subsequent voice recognition machine logic processing is carried out, the problem that in the prior art, especially when the voice stream is collected and sent in real time, triggering conditions do not exist, a large amount of network resources can be consumed is solved, the voice stream can be collected in real time, the voice stream is processed, and the voice stream is sent in a more humanized mode.
Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the embodiments or portions thereof without departing from the spirit and scope of the invention.
Claims (6)
- AI intelligent voice stream collection, characterized by comprising the following specific methods of use:the method comprises the following steps: deployment configuration:firstly, downloading the latest python version in the win7 system environment, then installing a download package python-3.6.4-amd64.exe (installation version) matched with JDK1.8 and eclipse4.6 versions, then adding an installation catalog for the device and building the installation catalog to an environment variable, and only adding the installation catalog of the python into a pth system variable;step two: deployment testing:testing whether the python installation is successful, inputting a python command by a cmd open command line, inputting a print ('Hello World'), and completing the installation of the python environment through the test;step three: connecting with a cloud server:the technical modules required by the AI voice cloud server are four parts, namely automatic voice recognition, natural language understanding, natural language generation and text-to-voice conversion;step four: program start test:starting a program, paying attention to whether the collected voice stream is sent or not at the moment, saying a sentence for the microphone, and then, judging whether the voice stream is sent or not;step five: : formally starting a program:after the test is finished, the collected voice stream can be successfully sent, the microphone speaks, the voice stream can be sent, and then the program is formally started to send the effective voice stream to the cloud server for automatic voice recognition, natural language understanding, natural language generation and text-to-voice conversion;step six: collecting voice flow:after the program is started, the voice stream of the microphone is monitored in real time, a user speaks in the microphone, and the system collects the voice stream through sampling frequency;step seven: the human voice analysis and uploading cloud server:and converting the collected voice stream into an array, judging whether the volume of the voice stream is higher than a preset value, and if so, transmitting the voice stream to a cloud server for subsequent logic processing of the voice recognition machine.
- 2. The AI smart voice stream capture as claimed in claim 1, wherein: the core of the interaction flow of the four module groups in the third step is the intention.
- 3. The AI smart voice stream capture as claimed in claim 2, wherein: the intentions are mainly expressed as voices spoken by a user after the user uses the AI voice stream to collect, the intentions represent the core functions of AI intelligence, if the AI intelligence successfully identifies the intentions of the user, the AI voice stream is required to collect the voices of the user, the results are fed back to the user, and if the AI intelligence cannot identify the intentions of the user, the user is required to be prompted to guide the user to use.
- 4. The AI smart voice stream capture of claim 3, wherein: the AI intelligence identifies the intention, mainly carries out semantic analysis on the user voice collected by the AI voice stream, namely analyzes and understands the voice identification result, namely simply corresponding the user voice to each machine instruction, then analyzes the machine instruction, makes a feedback response after requesting and processing the cloud server, solves the answer of the user problem, generates a sentence of natural language under the action of a dialogue management mechanism according to the internal representation obtained by an analysis module, and simultaneously converts the sentence generated by a natural language generation module into voice for output.
- 5. The AI smart voice stream capture as claimed in claim 1, wherein: and in the fourth step, if the collected voice stream is successfully transmitted, a sentence is spoken into the microphone, and the voice stream cannot be transmitted, the microphone is replaced for testing, and if the collected voice stream cannot be transmitted, the program running environment and the automatic voice recognition module are debugged until the collected voice stream can be successfully transmitted, and the microphone is spoken, and the voice stream can be transmitted.
- 6. The AI smart voice stream capture as claimed in claim 1, wherein: and step seven, the human voice analysis is to analyze the collected voice, when the maximum volume threshold value of the voice stream is greater than the preset amount, the voice stream is collected and sent after being judged, and if the maximum volume threshold value of the voice stream is less than the preset amount, the noise elimination is carried out.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911209849.9A CN112908316A (en) | 2019-12-02 | 2019-12-02 | AI Intelligent Voice stream Collection |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911209849.9A CN112908316A (en) | 2019-12-02 | 2019-12-02 | AI Intelligent Voice stream Collection |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112908316A true CN112908316A (en) | 2021-06-04 |
Family
ID=76104216
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911209849.9A Pending CN112908316A (en) | 2019-12-02 | 2019-12-02 | AI Intelligent Voice stream Collection |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112908316A (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107146602A (en) * | 2017-04-10 | 2017-09-08 | 北京猎户星空科技有限公司 | A kind of audio recognition method, device and electronic equipment |
CN109509470A (en) * | 2018-12-11 | 2019-03-22 | 平安科技(深圳)有限公司 | Voice interactive method, device, computer readable storage medium and terminal device |
CN109961777A (en) * | 2019-02-16 | 2019-07-02 | 天津大学 | A kind of voice interactive method based on intelligent robot |
CN110223711A (en) * | 2019-06-03 | 2019-09-10 | 清华大学 | Interactive voice based on microphone signal wakes up electronic equipment, method and medium |
-
2019
- 2019-12-02 CN CN201911209849.9A patent/CN112908316A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107146602A (en) * | 2017-04-10 | 2017-09-08 | 北京猎户星空科技有限公司 | A kind of audio recognition method, device and electronic equipment |
CN109509470A (en) * | 2018-12-11 | 2019-03-22 | 平安科技(深圳)有限公司 | Voice interactive method, device, computer readable storage medium and terminal device |
CN109961777A (en) * | 2019-02-16 | 2019-07-02 | 天津大学 | A kind of voice interactive method based on intelligent robot |
CN110223711A (en) * | 2019-06-03 | 2019-09-10 | 清华大学 | Interactive voice based on microphone signal wakes up electronic equipment, method and medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108257616A (en) | Interactive detection method and device | |
CN107277272A (en) | A kind of bluetooth equipment voice interactive method and system based on software APP | |
CN106560892B (en) | Intelligent robot, cloud interaction method thereof and cloud interaction system | |
TW201626363A (en) | A voice recognition system of a robot system and method thereof | |
WO2021169615A1 (en) | Voice response processing method and apparatus based on artificial intelligence, device, and medium | |
CN105488032A (en) | Speech recognition input control method and system | |
WO2014023257A1 (en) | Instruction processing method, apparatus, and system | |
CN109347708B (en) | Voice recognition method and device, household appliance, cloud server and medium | |
CN110992955A (en) | Voice operation method, device, equipment and storage medium of intelligent equipment | |
CN107016070A (en) | A kind of interactive method and device for intelligent robot | |
CN111933149A (en) | Voice interaction method, wearable device, terminal and voice interaction system | |
CN102023644A (en) | Method for controlling cradle head based on voice recognition technology | |
CN112002315A (en) | Voice control method and device, electrical equipment, storage medium and processor | |
CN111629110A (en) | Voice interaction method and voice interaction system | |
CN112908316A (en) | AI Intelligent Voice stream Collection | |
CN109935230A (en) | A kind of survey hair password monitoring system and method based on voice driven | |
CN113241063B (en) | Algorithm parameter updating method, device, terminal and medium in voice recognition system | |
CN110728980A (en) | Intelligent service bus system based on voice and conversation robot | |
CN115567336A (en) | Smart home based wake-up-free voice control system and method | |
CN113595811B (en) | Equipment performance testing method and device, storage medium and electronic device | |
Meng et al. | Application Research and Implementation of Voice Control System Based on Android Speech Recognition | |
CN114420103A (en) | Voice processing method and device, electronic equipment and storage medium | |
CN111292746A (en) | Voice input conversion system based on human-computer interaction | |
CN112656309A (en) | Function execution method and device of sweeper, readable storage medium and electronic equipment | |
CN110266894A (en) | A kind of call method and system of automatic busy tone detecting |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |