CN115442273B

CN115442273B - Voice recognition-based audio transmission integrity monitoring method and device

Info

Publication number: CN115442273B
Application number: CN202211117749.5A
Authority: CN
Inventors: 章笑春; 彭猛; 余怀军
Original assignee: Rivotek Technology Jiangsu Co Ltd
Current assignee: Rivotek Technology Jiangsu Co Ltd
Priority date: 2022-09-14
Filing date: 2022-09-14
Publication date: 2023-04-07
Anticipated expiration: 2042-09-14
Also published as: CN115442273A

Abstract

The invention provides a voice recognition-based audio transmission integrity monitoring method and a voice recognition-based audio transmission integrity monitoring device, wherein the method comprises the following steps: the audio acquisition module acquires first audio information and transmits the first audio information to a transmission target terminal; converting the first audio information into text information with a timestamp and uploading the text information to a monitoring analysis background; the transmission target terminal plays the second audio information, converts the second audio information into text information with a timestamp and uploads the text information to the monitoring analysis background; and the monitoring analysis background automatically compares the text information with the same timestamp and analyzes the integrity of audio transmission. The invention converts the audio data into automatic comparison without manual examination, thereby protecting the privacy of users; data reporting is carried out in a timestamp and text mode, so that the traffic consumption is low, the loss rate is low, and the method is easy to popularize; an objective evaluation method of audio transmission integrity is provided for audio communication equipment manufacturers and operators, and meanwhile, operators can know the use problem of a real user scene through data analysis and provide data support for product iteration.

Description

Voice recognition-based audio transmission integrity monitoring method and device

Technical Field

The invention relates to the technical field of audio transmission, in particular to an audio transmission integrity monitoring method and device based on voice recognition.

Background

The audio communication experience is more and more important in work and study due to the requirements of a large number of cloud office work and cloud teaching. Because the audio service data volume is large, the real-time requirement is high, and the user sensitivity is strong, the monitoring of the audio communication quality is very important for audio communication equipment manufacturers and operators.

The core indexes of audio communication quality measurement are as follows: whether the audio content is transmitted to the target terminal completely and accurately, i.e. the completeness and accuracy of the audio transmission. In the process of network communication, a user cannot detect the packet loss rate under the conditions of inaudibility, interruption or blockage, or the packet loss occurs in conversation, and an information receiver is automatically completed to cause communication ambiguity. The problems of audio loss, ambiguity and the like caused by packet loss in the actual communication process generally exist, and the user satisfaction is seriously influenced. However, there is currently a lack in the market of methods for accurately monitoring the integrity of audio transmissions.

Disclosure of Invention

Aiming at the problems that the packet loss rate cannot be detected under the conditions of inaudibility, intermittence or blockage of a user in the network communication process, or the packet loss occurs in the conversation process, an information receiver automatically completes the communication to cause communication ambiguity, the audio loss and the ambiguity caused by the packet loss in the actual communication process generally exist, and the satisfaction degree of the user is seriously influenced, the audio transmission integrity monitoring method and the device based on the voice recognition are provided, the time stamp is marked on input characters, the compensation output is automatically carried out according to the absence of the time stamp during the output, and the problem of the packet loss during the conversation can be solved; the audio is digitalized and automatic comparison is introduced, manual examination is not needed, and the privacy of the user is protected; data reporting is carried out in a timestamp and text mode, so that the traffic consumption is low, the loss rate is low, and the method is easy to popularize; the link is simpler based on the third link, and the detection is more convenient; an objective evaluation method for audio transmission integrity is provided for audio communication equipment manufacturers and operators, and meanwhile, operators can know the use problem of a real user scene through data analysis and provide data support for product iteration.

In order to achieve the purpose, the invention is realized by the following technical scheme:

a voice recognition-based audio transmission integrity monitoring method comprises the following steps:

the audio acquisition module acquires first audio information and transmits the first audio information to a transmission target terminal;

converting first audio information into character information as first data, and uploading the first data to a monitoring analysis background;

the transmission target terminal takes the received audio information as second audio information and plays the second audio information, the second audio information is converted into character information and serves as second data, and the second data is uploaded to a monitoring analysis background;

and the monitoring analysis background automatically compares the first data and the second data with the same timestamp, and analyzes the integrity of audio transmission.

As a preferred embodiment of the present invention, the converting of the first audio information into text information as first data specifically includes: and cutting the first audio information, acquiring first segmented audio information, and transmitting the first segmented audio information to a first voice recognition module of an input source, wherein the first voice recognition module recognizes the first segmented audio information and converts the first segmented audio information into character information with a timestamp on each character as first data.

As a preferred scheme of the present invention, the converting, by the transmission target terminal, the second audio information into the text information as the second data specifically includes: and cutting the second audio information, acquiring second sectional audio information, transmitting the second sectional audio information to a second voice recognition module of a transmission target terminal, and recognizing the second sectional audio information by the second voice recognition module to convert the second sectional audio information into character information with time stamps of each character as second data.

As a preferred scheme of the present invention, the automatically comparing, by the monitoring analysis background, the first data and the second data with the same timestamp specifically includes: the monitoring analysis background automatically compares the text contents with the same timestamp to obtain the lost or wrong text amount, and the formula is utilized: distortion rate = amount of lost or erroneous text/total text transmission, the audio distortion rate is calculated.

As a preferred aspect of the present invention, the lower the audio distortion rate, the higher the audio transmission integrity.

As a preferred scheme of the invention, the device comprises an input source, a transmission target terminal and a monitoring analysis platform; the input source is in communication connection with the transmission target terminal;

the input source comprises a first real-time clock and a first controller, wherein the first real-time clock generates a time synchronization signal, and the first real-time clock is electrically connected with the first controller;

the transmission target terminal comprises a second real-time clock and a second controller, wherein the second real-time clock is used for generating a time synchronization signal and is electrically connected with the second controller;

the first real-time clock is synchronized with the second real-time clock.

As a preferable aspect of the present invention, the first controller includes:

the audio acquisition module is used for acquiring first audio information by an input source;

the first audio cutting module is used for cutting the audio information to obtain first segmented audio information;

the first audio transmission module is used for transmitting the first segmented audio information to the voice recognition module;

the first voice recognition module is used for recognizing the first segmented audio information and converting the first segmented audio information into character information with time stamps of each character as first data;

and the first communication module is used for uploading the first data to a monitoring analysis background.

As a preferable aspect of the present invention, the second controller includes:

the audio receiving module is used for receiving the audio information as second audio information and playing the second audio information;

the second audio cutting module is used for cutting the second audio information to obtain second segmented audio information;

the second audio transmission module is used for transmitting the second segmented audio information to the voice recognition module;

the second voice recognition module is used for recognizing the second segmented audio information and converting the second segmented audio information into character information with time stamps of each character as second data;

and the second communication module is used for uploading the second data to a monitoring analysis background.

As a preferred scheme of the present invention, the monitoring analysis background is configured to automatically compare first data and second data with the same timestamp, and specifically includes: the monitoring analysis background automatically compares the text contents with the same timestamp to obtain the lost or wrong text quantity, and the formula is utilized: the distortion rate = the lost or wrong text amount/total text transmission amount, and the audio distortion rate is calculated; the lower the audio distortion rate, the higher the audio transmission integrity.

As a preferred embodiment of the present invention, the input source and the transmission target terminal are one of a PC, a mobile phone, a PAD, a vehicle-mounted console, and an intelligent speaker.

The invention has the beneficial effects that: the audio is digitalized and automatic comparison is introduced, manual examination is not needed, and the privacy of the user is protected; data reporting is carried out in a timestamp and text mode, so that the traffic consumption is low, the loss rate is low, and the method is easy to popularize; the link is simpler based on the third link, and the detection is more convenient; an objective evaluation method for the audio transmission integrity is provided for audio communication equipment manufacturers and operators, and meanwhile, operators can know the use problem of a real user scene through data analysis and provide data support for product iteration.

Drawings

The invention is described in detail below with reference to the drawings and the detailed description;

fig. 1 is a flowchart of an audio transmission integrity monitoring method based on speech recognition according to an embodiment of the present invention;

fig. 2 is a block diagram of an audio transmission integrity monitoring apparatus based on speech recognition according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the drawings of the embodiments of the present invention. It should be apparent that the described embodiments are only some of the embodiments of the present invention, and not all of them. All other embodiments, which can be derived by a person skilled in the art from the described embodiments of the invention, are within the scope of the invention.

As shown in fig. 1, an embodiment of the present invention provides an audio transmission integrity monitoring method based on speech recognition, including the following steps:

step 1: the audio acquisition module acquires first audio information and transmits the first audio information to a transmission target terminal;

step 2: converting the first audio information into character information as first data, and uploading the first data to a monitoring analysis background;

specifically, the first audio information is cut, first segmented audio information is obtained, the first segmented audio information is transmitted to a first voice recognition module of an input source, the first voice recognition module recognizes the first segmented audio information, and the first segmented audio information is converted into character information with a timestamp of each character as first data.

And step 3: the transmission target terminal takes the received audio information as second audio information and plays the second audio information, the second audio information is converted into character information to serve as second data, and the second data is uploaded to the monitoring analysis background;

specifically, the second audio information is cut, the second audio information is obtained, the second audio information is transmitted to a second voice recognition module of the transmission target terminal, the second voice recognition module recognizes the second audio information, and the second audio information is converted into character information with a timestamp and each character information serves as second data.

And 4, step 4: the monitoring analysis background automatically compares the first data and the second data with the same timestamp, and analyzes the integrity of audio transmission; the monitoring analysis background automatically compares the text contents with the same timestamp to obtain the lost or wrong text amount, and the formula is utilized: distortion rate = amount of lost or erroneous text/total text transmission, the audio distortion rate is calculated. The lower the audio distortion rate, the higher the audio transmission integrity.

As shown in fig. 2, another embodiment of the present invention provides an audio transmission integrity monitoring apparatus based on voice recognition, which includes an input source, a transmission target terminal, and a monitoring analysis platform; the input source is in communication connection with the transmission target terminal; the input source comprises a first real-time clock and a first controller, wherein the first real-time clock generates a time synchronization signal and is electrically connected with the first controller; the transmission target terminal comprises a second real-time clock and a second controller, wherein the second real-time clock generates a time synchronization signal and is electrically connected with the second controller; the first real-time clock is synchronized with the second real-time clock. The input source and the transmission target terminal are one of a PC, a mobile phone, a PAD, a vehicle-mounted center console or an intelligent sound box.

The first controller includes:

the first voice recognition module is used for converting the first segmented audio information into character information with time stamps of each character as first data;

the first communication module is used for uploading the first data to the monitoring analysis background.

The second controller includes:

the second audio cutting module is used for cutting the second audio information to obtain second sectional audio information;

the second voice recognition module is used for recognizing the second section of audio information and converting the second section of audio information into character information with time stamps of each character as second data;

and the second communication module is used for uploading the second data to the monitoring analysis background.

The monitoring analysis background is used for automatically comparing first data and second data of the same timestamp, automatically comparing the text content of the same timestamp by the monitoring analysis background to obtain the lost or wrong text quantity, and utilizing a formula: the distortion rate = the lost or wrong text amount/total text transmission amount, and the audio distortion rate is calculated; the lower the audio distortion rate, the higher the audio transmission integrity.

Furthermore, the transmission target terminal can automatically perform compensation output according to the lack of the timestamp, and the problem of packet loss during conversation is solved.

In conclusion, the audio data is converted into automatic comparison, manual examination is not needed, and the privacy of the user is protected; data reporting is carried out in a timestamp and text mode, so that the traffic consumption is low, the loss rate is low, and the method is easy to popularize; the link is simpler based on the third link, and the detection is more convenient; an objective evaluation method for audio transmission integrity is provided for audio communication equipment manufacturers and operators, and meanwhile, operators can know the use problem of a real user scene through data analysis and provide data support for product iteration.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, and not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A voice recognition-based audio transmission integrity monitoring method is characterized by comprising the following steps:

converting the first audio information into character information with time stamps of each character as first data, and uploading the first data to a monitoring analysis background;

the transmission target terminal takes the received audio information as second audio information and plays the second audio information, the second audio information is converted into character information with time stamps and serves as second data, and the second data are uploaded to a monitoring analysis background;

the monitoring analysis background automatically compares the first data and the second data with the same timestamp, and analyzes the integrity of audio transmission;

the monitoring analysis background automatically compares the first data and the second data with the same timestamp, and specifically comprises the following steps: the monitoring analysis background automatically compares the text contents with the same timestamp to obtain the lost or wrong text amount, and the formula is utilized: the distortion rate = the lost or wrong text amount/total text transmission amount, and the audio distortion rate is calculated; the lower the audio distortion rate, the higher the audio transmission integrity.

2. The audio transmission integrity monitoring method based on speech recognition according to claim 1, wherein the converting of the first audio information into text information with a timestamp for each word as the first data specifically comprises: and cutting the first audio information, acquiring first segmented audio information, and transmitting the first segmented audio information to a first voice recognition module of an input source, wherein the first voice recognition module recognizes the first segmented audio information and converts the first segmented audio information into character information with a timestamp on each character as first data.

3. The audio transmission integrity monitoring method based on speech recognition according to claim 1, wherein the transmission target terminal converts the second audio information into text information with a time stamp for each word as the second data, specifically comprising: and cutting the second audio information, acquiring second sectional audio information, transmitting the second sectional audio information to a second voice recognition module of a transmission target terminal, and recognizing the second sectional audio information by the second voice recognition module to convert the second sectional audio information into character information with time stamps of each character as second data.

4. The device for monitoring the integrity of audio transmission based on voice recognition is characterized by comprising an input source, a transmission target terminal and a monitoring analysis background; the input source is in communication connection with the transmission target terminal;

the transmission target terminal comprises a second real-time clock and a second controller, wherein the second real-time clock generates a time synchronization signal, and the second real-time clock is electrically connected with the second controller;

the first real-time clock is synchronous with the second real-time clock;

the first controller includes:

the first communication module is used for uploading the first data to a monitoring analysis background;

the second controller includes:

the second communication module is used for uploading the second data to a monitoring analysis background;

the monitoring analysis background is used for automatically comparing first data and second data with the same timestamp, and specifically comprises the following steps: the monitoring analysis background automatically compares the text contents with the same timestamp to obtain the lost or wrong text amount, and the formula is utilized: the distortion rate = the lost or wrong text amount/total text transmission amount, and the audio distortion rate is calculated; the lower the audio distortion rate, the higher the audio transmission integrity.

5. The audio transmission integrity monitoring device based on speech recognition as claimed in claim 4, wherein the input source and the transmission target terminal are one of a PC, a mobile phone, a PAD, a car console or a smart box.