CN112908316A

CN112908316A - AI Intelligent Voice stream Collection

Info

Publication number: CN112908316A
Application number: CN201911209849.9A
Authority: CN
Inventors: 谢伟平; 柯玉蒙
Original assignee: Zhejiang Sikaozhe Technology Co ltd
Current assignee: Zhejiang Sikaozhe Technology Co ltd
Priority date: 2019-12-02
Filing date: 2019-12-02
Publication date: 2021-06-04

Abstract

The invention discloses AI intelligent voice stream collection, starting a program, noticing whether the collected voice stream is sent or not, saying a sentence for a microphone, then seeing whether the voice stream is sent or not, converting the collected voice stream into an array, then judging whether the volume of the sentence is higher than a preset value or not, if so, transmitting the voice stream to a cloud server, and carrying out subsequent logic processing of a voice recognition machine. According to the invention, the voice stream of the microphone is monitored in real time, a user speaks in the microphone, the system collects the voice stream through the sampling frequency, the collected voice stream is converted into an array, then whether the volume of the voice stream is higher than a preset value or not is judged, if the volume of the voice stream is higher than the preset value, the voice stream is transmitted to the cloud server, and subsequent logic processing of the voice recognition machine is carried out.

Description

AI Intelligent Voice stream Collection

Technical Field

The invention relates to the technical field of voice stream acquisition, in particular to AI intelligent voice stream acquisition.

Background

In the process of man-machine conversation, a user inevitably has a situation of expression errors, which causes deviation of the machine to the language understanding of the user, and an error correction mechanism is very important for the machine at the moment.

The prior art has no triggering condition and consumes a large amount of network resources particularly when voice streams are collected and sent in real time, and the technology can realize the real-time collection of the voice streams and more humanized realization of sending the voice streams by processing the voice streams.

Disclosure of Invention

The invention aims to provide AI intelligent voice stream acquisition to solve the problems in the background art.

In order to achieve the purpose, the invention provides the following technical scheme:

the AI intelligent voice stream acquisition method comprises the following specific use methods:

the method comprises the following steps: deployment configuration:

firstly, downloading the latest python version in the win7 system environment, then installing a download package python-3.6.4-amd64.exe (installation version) matched with JDK1.8 and eclipse4.6 versions, then adding an installation catalog for the device and building the installation catalog to an environment variable, and only adding the installation catalog of the python into a pth system variable;

step two: deployment testing:

testing whether the python installation is successful, inputting a python command by a cmd open command line, inputting a print ('Hello World'), and completing the installation of the python environment through the test;

step three: connecting with a cloud server:

the technical modules required by the AI voice cloud server are four parts, namely automatic voice recognition, natural language understanding, natural language generation and text-to-voice conversion;

step four: program start test:

starting a program, paying attention to whether the collected voice stream is sent or not at the moment, saying a sentence for the microphone, and then, judging whether the voice stream is sent or not;

step five: formally starting a program:

after the test is finished, the collected voice stream can be successfully sent, the microphone speaks, the voice stream can be sent, and then the program is formally started to send the effective voice stream to the cloud server for automatic voice recognition, natural language understanding, natural language generation and text-to-voice conversion;

step six: collecting voice flow:

after the program is started, the voice stream of the microphone is monitored in real time, a user speaks in the microphone, and the system collects the voice stream through sampling frequency;

step seven: the human voice analysis and uploading cloud server:

and converting the collected voice stream into an array, judging whether the volume of the voice stream is higher than a preset value, and if so, transmitting the voice stream to a cloud server for subsequent logic processing of the voice recognition machine.

Preferably, the core of the interaction flow of the four module groups in the third step is the intention.

Preferably, the intentions are mainly expressed as voices spoken by a user after the user uses the AI voice stream for collection, the intentions represent the core function of the AI intelligence, if the AI intelligence successfully identifies the intentions of the user, the AI voice stream is required to collect the voice of the user and then feed the result back to the user, and if the AI intelligence cannot identify the intentions of the user, the user is required to be prompted to guide the user to use.

Preferably, how the AI intelligence recognizes the intention is to perform semantic analysis on the user voice collected by the AI voice stream, that is, to analyze and understand the voice recognition result, in short, the AI intelligence analyzes the corresponding machine instruction of the user voice pair, then analyzes the machine instruction, makes a feedback response after requesting processing to the cloud server, solves the answer of the user problem, generates a natural language sentence under the action of a dialogue management mechanism according to the internal representation obtained by the parsing module, and simultaneously converts the sentence generated by the natural language generation module into voice output.

Preferably, in the fourth step, if the collected voice stream is successfully transmitted, a sentence is said for the microphone, and the voice stream cannot be transmitted, the microphone is replaced to perform a test, and if the collected voice stream cannot be transmitted, the program operating environment and the automatic voice recognition module are debugged until the collected voice stream can be successfully transmitted, the microphone is spoken, and the voice stream can be transmitted.

Preferably, the human voice analysis in step seven is to analyze the collected voice, collect when the maximum volume threshold of the voice stream is greater than a preset amount, send the voice stream after the judgment, and eliminate the noise if the maximum volume threshold of the voice stream is less than the preset amount.

The invention has the technical effects and advantages that:

the invention monitors the voice flow of the microphone in real time after the program is started, a user speaks in the microphone, the system collects the voice flow through the sampling frequency, converts the collected voice flow into an array, then judges whether the volume of the voice flow is higher than a preset value, if the volume of the voice flow is higher than the preset value, the voice flow is transmitted to the cloud server, and the subsequent logic processing of the voice recognition machine is carried out, thereby avoiding the problem that the prior art has no triggering condition and consumes a large amount of network resources when the voice flow is collected and sent in real time, the technology can realize the real-time collection of the voice flow, and more humanization can be realized by processing the voice flow to send the voice flow.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The first embodiment is as follows:

the method comprises the following steps: deployment configuration:

step two: deployment testing:

step three: connecting with a cloud server:

step four: program start test:

step five: : formally starting a program:

step six: collecting voice flow:

step seven: the human voice analysis and uploading cloud server:

Example two:

further, in the above scheme, the core of the interaction flow of the four module groups in the third step is the intention.

Further, in the above scheme, the intentions are mainly expressed as voices spoken by the user after the user uses the AI voice stream, and represent core functions of the AI intelligence, if the AI intelligence successfully identifies the intentions of the user, the AI voice stream is required to acquire the user voices and then feed results back to the user, and if the AI intelligence cannot identify the user intentions, the user needs to be prompted to guide the user to use.

Furthermore, in the above scheme, how the AI intelligence recognizes the intention is to mainly perform semantic parsing on the user voice collected by the AI voice stream, that is, to analyze and understand the voice recognition result, in short, the user voice is paired with each machine instruction, then the machine instructions are analyzed, a feedback response is made after the request processing is performed on the cloud server, the answer to the user problem is solved, a sentence in the natural language is generated under the action of the dialogue management mechanism according to the internal representation obtained by the parsing module, and the sentence generated by the natural language generation module is converted into voice output.

Further, in the above scheme, in the fourth step, if the collected voice stream is successfully transmitted, a sentence is said to be sent to the microphone, and the voice stream cannot be transmitted, the microphone is replaced to perform a test, and if the collected voice stream cannot be transmitted, the program operating environment and the automatic voice recognition module are debugged until the collected voice stream can be successfully transmitted, and the microphone is spoken, and the voice stream can be transmitted.

Further, in the above scheme, the human voice analysis in step seven is to analyze the collected voice, and when the maximum volume threshold of the voice stream is greater than the preset amount, the voice stream is collected and sent after being determined, and if the maximum volume threshold of the voice stream is less than the preset amount, the noise cancellation is performed.

Combine example 1 and example 2: through the steps, in the using process, the voice stream of the microphone is monitored in real time after the program is started, a user speaks in the microphone, the system collects the voice stream through the sampling frequency, the collected voice stream is converted into an array, whether the volume of the voice stream is higher than a preset value is judged, if the volume of the voice stream is higher than the preset value, the subsequent voice recognition machine logic processing is carried out, the problem that in the prior art, especially when the voice stream is collected and sent in real time, triggering conditions do not exist, a large amount of network resources can be consumed is solved, the voice stream can be collected in real time, the voice stream is processed, and the voice stream is sent in a more humanized mode.

Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the embodiments or portions thereof without departing from the spirit and scope of the invention.

Claims

AI intelligent voice stream collection, characterized by comprising the following specific methods of use:

the method comprises the following steps: deployment configuration:

firstly, downloading the latest python version in the win7 system environment, then installing a download package python-3.6.4-amd64.exe (installation version) matched with JDK1.8 and eclipse4.6 versions, then adding an installation catalog for the device and building the installation catalog to an environment variable, and only adding the installation catalog of the python into a pth system variable;

step two: deployment testing:

testing whether the python installation is successful, inputting a python command by a cmd open command line, inputting a print ('Hello World'), and completing the installation of the python environment through the test;

step three: connecting with a cloud server:

the technical modules required by the AI voice cloud server are four parts, namely automatic voice recognition, natural language understanding, natural language generation and text-to-voice conversion;

step four: program start test:

starting a program, paying attention to whether the collected voice stream is sent or not at the moment, saying a sentence for the microphone, and then, judging whether the voice stream is sent or not;

step five: : formally starting a program:

after the test is finished, the collected voice stream can be successfully sent, the microphone speaks, the voice stream can be sent, and then the program is formally started to send the effective voice stream to the cloud server for automatic voice recognition, natural language understanding, natural language generation and text-to-voice conversion;

step six: collecting voice flow:

after the program is started, the voice stream of the microphone is monitored in real time, a user speaks in the microphone, and the system collects the voice stream through sampling frequency;

step seven: the human voice analysis and uploading cloud server:

and converting the collected voice stream into an array, judging whether the volume of the voice stream is higher than a preset value, and if so, transmitting the voice stream to a cloud server for subsequent logic processing of the voice recognition machine.
2. The AI smart voice stream capture as claimed in claim 1, wherein: the core of the interaction flow of the four module groups in the third step is the intention.
3. The AI smart voice stream capture as claimed in claim 2, wherein: the intentions are mainly expressed as voices spoken by a user after the user uses the AI voice stream to collect, the intentions represent the core functions of AI intelligence, if the AI intelligence successfully identifies the intentions of the user, the AI voice stream is required to collect the voices of the user, the results are fed back to the user, and if the AI intelligence cannot identify the intentions of the user, the user is required to be prompted to guide the user to use.
4. The AI smart voice stream capture of claim 3, wherein: the AI intelligence identifies the intention, mainly carries out semantic analysis on the user voice collected by the AI voice stream, namely analyzes and understands the voice identification result, namely simply corresponding the user voice to each machine instruction, then analyzes the machine instruction, makes a feedback response after requesting and processing the cloud server, solves the answer of the user problem, generates a sentence of natural language under the action of a dialogue management mechanism according to the internal representation obtained by an analysis module, and simultaneously converts the sentence generated by a natural language generation module into voice for output.
5. The AI smart voice stream capture as claimed in claim 1, wherein: and in the fourth step, if the collected voice stream is successfully transmitted, a sentence is spoken into the microphone, and the voice stream cannot be transmitted, the microphone is replaced for testing, and if the collected voice stream cannot be transmitted, the program running environment and the automatic voice recognition module are debugged until the collected voice stream can be successfully transmitted, and the microphone is spoken, and the voice stream can be transmitted.
6. The AI smart voice stream capture as claimed in claim 1, wherein: and step seven, the human voice analysis is to analyze the collected voice, when the maximum volume threshold value of the voice stream is greater than the preset amount, the voice stream is collected and sent after being judged, and if the maximum volume threshold value of the voice stream is less than the preset amount, the noise elimination is carried out.