CN113140211A

CN113140211A - Intelligent voice recognition technology of real-time audio and video stream based on trusted call

Info

Publication number: CN113140211A
Application number: CN202110422256.1A
Authority: CN
Inventors: 刘波涛
Original assignee: Wuhan Weiwu Yunlian Technology Co ltd
Current assignee: Wuhan Weiwu Yunlian Technology Co ltd
Priority date: 2021-04-20
Filing date: 2021-04-20
Publication date: 2021-07-20

Abstract

The invention discloses an intelligent voice recognition technology of real-time audio and video streams based on trusted calls, which comprises a pre-preparation module, a mode matching module, a calling end and an answering end, wherein the pre-preparation module comprises a trusted call source database, a voiceprint database and a voiceprint binding module, the calling end and the answering end are connected through an information transmission module, a voice input module, a voice extraction module, a mode matching module, a voice detection module, a voice comparison module and a voice recognition module in sequence, and the whole flow of the calling end and the whole flow of the answering end are arranged in the surrounding of an encryption module. The intelligent voice recognition technology of the real-time audio and video stream based on the credible call effectively protects the privacy of answering and calling personnel through the voiceprint binding module of the preparation module, before the intelligent voice recognition technology is used, a voiceprint binding user needs to be input firstly, after the user registers and logs in, the voiceprint is input twice, the same detection in two times is that the intelligent voice recognition technology is successfully bound, and the intelligent voice recognition technology can be used only by successfully adding the voiceprint of the user.

Description

Intelligent voice recognition technology of real-time audio and video stream based on trusted call

Technical Field

The invention relates to the technical field of voice recognition, in particular to an intelligent voice recognition technology of real-time audio and video streams based on a credible call.

Background

With the development of science and technology, people can communicate in real time through electronic equipment such as mobile phones and computers, but this is not true for people with serious hearing impairment, and although voice conversion services are available in many countries around the world and people with hearing impairment can communicate through media, the voice conversion services are still insufficient in terms of protecting user privacy, and in addition, the cost is expensive from equipment and training to labor input. And certain specific service numbers only serve some important characters. Since the content of the call is very important in such a scenario, it is not difficult to forge the phone number to make a call, and thus it cannot be determined whether the call is made by these important persons depending on the source of the number. Therefore, the privacy and the call security of the answering and calling personnel are short of guarantee.

Disclosure of Invention

In view of the above problems, the present invention provides an intelligent speech recognition technology based on real-time audio and video streams of a trusted call to solve the problems set forth in the background art. The method comprises the following specific steps: in order to achieve the purpose, the invention adopts the following technical scheme: an intelligent voice recognition technology of real-time audio and video stream based on credible calling comprises a pre-preparation module, an information transmission module, a voice input module, a voice extraction module, a mode matching module, a calling end and an answering end, the pre-preparation module comprises a trusted call source database, a voiceprint database and a voiceprint binding module, the calling end and the answering end are connected in turn through an information transmission module, a voice input module, a voice extraction module, a mode matching module, a voice detection module, a voice comparison module and a voice recognition module, the whole process of the calling end and the whole process of the answering end are all arranged in the surrounding of the encryption module, and the information transmission module, the voice input module, the voice extraction module, the mode matching module, the voice detection module, the voice comparison module, the voice storage module and the voice recognition module are electrically connected with each other. Preferably, the voiceprint binding module comprises user registration, user login and user detection, and the voiceprint binding module is provided with two recording bindings. Preferably, the pre-preparation module is respectively connected with the encryption module and the voice storage module, and the pre-preparation module is provided with a secret key. Preferably, the voice extraction module is directly connected with the voice storage module electrically, and the voice storage module is directly connected with the voice comparison module through an electrical property. Preferably, the voice detection module is electrically connected with the voice reminding module and the voice feedback module respectively, the voice reminding module is set to be in two modes of audio reminding and pop window reminding, and the voice feedback module is connected with the trusted call source database. Preferably, the voice recognition module is connected to the voice conversion module, and the voice conversion module includes text conversion, signal conversion, and language conversion. Preferably, the method comprises the following steps: s1, before the system is started, a voiceprint binding user needs to be input, after the user registers and logs in, the voiceprint is input twice, the same detection in the two times is that the binding is successful, and the user can use the voiceprint after the voiceprint is successfully added; s2, on the premise of signal encryption, the calling end transmits the signal to the voice input module through the information transmission module, then transmits the signal to the voice extraction module through the voice input module, sequentially transmits the extracted voiceprint information to the mode matching module, the voice detection module and the voice comparison module for matching, and then transmits the information to the answering end through the voice recognition module and the information transmission module; s3, when the voice detection module detects that the voiceprint is different from the voiceprint of the trusted call source database, the voice reminding module carries out audio reminding and popup reminding; and S4, when the voice feedback module feeds back the information to the source database of the credible calling source, and finds that the voiceprint information is inconsistent, the information is fed back to the actual real calling terminal while reminding the answering terminal. The invention has the following beneficial effects: the intelligent voice recognition technology of the real-time audio and video stream based on the credible call effectively protects the privacy of answering and calling personnel through the voiceprint binding module of the preparation module, before the intelligent voice recognition technology is used, a voiceprint binding user needs to be input firstly, after the user registers and logs in, the voiceprint is input twice, the same detection in two times is that the binding is successful, and the intelligent voice recognition technology can be used only by successfully adding the voiceprint of the user; the front preparation module is electrically connected with the encryption module, and meanwhile, the front preparation module is provided with a secret key, and the whole process of the calling end and the answering end is arranged in the enclosure of the encryption module, so that the privacy and the conversation safety of answering and calling personnel are further improved; in addition, the voice recognition module is connected with the voice conversion module, the voice conversion module comprises character conversion, signal conversion and language conversion, and for a user with hearing impairment, the voice can be converted into characters for recognition and can be converted according to the language.

Drawings

FIG. 1 is a schematic diagram of a pre-system preparation process of the present invention; FIG. 2 is a diagram of a call receiving layout of the system of the present invention.

Detailed Description

In order to make the technical means, the creation characteristics, the achievement purposes and the effects of the invention easy to understand, the invention is further explained in the following combination. In the description of the present invention, it should be noted that the terms "upper", "lower", "inner", "outer", "front", "rear", "both ends", "one end", "the other end", and the like indicate orientations or positional relationships based on those shown in the drawings, and are only for convenience of description and simplicity of description, but do not indicate or imply that the referred device or element must have a specific orientation, be constructed in a specific orientation, and be operated, and thus, should not be construed as limiting the present invention. Furthermore, the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. In the description of the present invention, it is to be noted that, unless otherwise explicitly specified or limited, the terms "mounted," "disposed," "connected," and the like are to be construed broadly, such as "connected," which may be fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art. The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention. Referring to fig. 1-2, the intelligent voice recognition technology for real-time audio and video streams based on trusted calls, provided by the invention, comprises a front preparation module, a calling end and an answering end, wherein the front preparation module comprises a trusted call source database, a voiceprint database and a voiceprint binding module, the calling end and the answering end are connected sequentially through an information transmission module, a voice input module, a voice extraction module, a mode matching module, a voice detection module, a voice comparison module and a voice recognition module, the whole process of the calling end and the answering end is arranged in the surrounding of an encryption module, and the information transmission module, the voice input module, the voice extraction module, the mode matching module, the voice detection module, the voice comparison module, the voice storage module and the voice recognition module are electrically connected with each other; the voiceprint binding module comprises user registration, user login and user detection, and is provided with two recording bindings; the front preparation module is electrically connected with the encryption module and the voice storage module respectively, and is provided with a secret key; the voice extraction module is electrically and directly connected with the voice storage module, and the voice storage module is directly connected with the voice comparison module through the electrical property; the voice detection module is respectively electrically connected with the voice reminding module and the voice feedback module, the voice reminding module is set to be in two modes of audio reminding and popup reminding, and the voice feedback module is connected with the credible call source database; the voice recognition module is connected with the voice conversion module, and the voice conversion module comprises character conversion, signal conversion and language conversion; the specific matter flow of the intelligent voice recognition technology of the real-time audio and video stream based on the credible call is as follows: s1, before the system is started, a voiceprint binding user needs to be input, after the user registers and logs in, the voiceprint is input twice, the same detection in the two times is that the binding is successful, and the user can use the voiceprint after the voiceprint is successfully added; s2, on the premise of signal encryption, the calling end transmits the signal to the voice input module through the information transmission module, then transmits the signal to the voice extraction module through the voice input module, sequentially transmits the extracted voiceprint information to the mode matching module, the voice detection module and the voice comparison module for matching, and then transmits the information to the answering end through the voice recognition module and the information transmission module; s3, when the voice detection module detects that the voiceprint is different from the voiceprint of the trusted call source database, the voice reminding module carries out audio reminding and popup reminding; and S4, when the voice feedback module feeds back the information to the source database of the credible calling source, and finds that the voiceprint information is inconsistent, the information is fed back to the actual real calling terminal while reminding the answering terminal. The above are merely examples of the present invention, and common general knowledge of known specific structures and characteristics in the schemes is not described herein. It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.

Claims

1. The intelligent voice recognition technology of real-time audio and video stream based on credible calling comprises a front preparation module, an information transmission module, a voice input module, a voice extraction module, a mode matching module, a calling end and a receiving end, and is characterized in that: the voice pre-preparation module comprises a trusted call source database, a voiceprint database and a voiceprint binding module, the calling end and the answering end are connected sequentially through an information transmission module, a voice input module, a voice extraction module, a mode matching module, a voice detection module, a voice comparison module and a voice recognition module, the whole process of the calling end and the whole process of the answering end are all arranged in the surrounding of an encryption module, and the information transmission module, the voice input module, the voice extraction module, the mode matching module, the voice detection module, the voice comparison module, the voice storage module and the voice recognition module are electrically connected with each other.

2. The intelligent speech recognition technology for real-time audio and video streams based on trusted calls as claimed in claim 1, wherein: the voiceprint binding module comprises user registration, user login and user detection, and is provided with twice recording binding.

3. The intelligent voice recognition technology for the real-time audio and video stream of the credible call comprises a sending end and a receiving end, and is characterized in that: the front preparation module is electrically connected with the encryption module and the voice storage module respectively and is provided with a secret key.

4. The intelligent speech recognition technology for real-time audio and video streams based on trusted calls as claimed in claim 1, wherein: the voice extraction module is directly connected with the voice storage module in an electrical mode, and the voice storage module is directly connected with the voice comparison module in an electrical mode.

5. The intelligent speech recognition technology for real-time audio and video streams based on trusted calls as claimed in claim 1, wherein: the voice detection module is respectively electrically connected with the voice reminding module and the voice feedback module, the voice reminding module is set to be in two modes of audio reminding and popup reminding, and the voice feedback module is connected with the credible call source database.

6. The intelligent speech recognition technology for real-time audio and video streams based on trusted calls as claimed in claim 1, wherein: the voice recognition module is connected with the voice conversion module, and the voice conversion module comprises character conversion, signal conversion and language conversion.

7. The intelligent voice recognition technology for the real-time audio and video stream of the trusted call is characterized by comprising the following processes: s1, before the system is started, a voiceprint binding user needs to be input, after the user registers and logs in, the voiceprint is input twice, the same detection in the two times is that the binding is successful, and the user can use the voiceprint after the voiceprint is successfully added; s2, on the premise of signal encryption, the calling end transmits the signal to the voice input module through the information transmission module, then transmits the signal to the voice extraction module through the voice input module, sequentially transmits the extracted voiceprint information to the mode matching module, the voice detection module and the voice comparison module for matching, and then transmits the information to the answering end through the voice recognition module and the information transmission module; s3, when the voice detection module detects that the voiceprint is different from the voiceprint of the trusted call source database, the voice reminding module carries out audio reminding and popup reminding; and S4, when the voice feedback module feeds back the information to the source database of the credible calling source, and finds that the voiceprint information is inconsistent, the information is fed back to the actual real calling terminal while reminding the answering terminal.