CN116631401A

CN116631401A - Voice call recording data processing method

Info

Publication number: CN116631401A
Application number: CN202211663043.9A
Authority: CN
Inventors: 赵方捷; 吴磊; 黄相辉; 金斌斌; 余适; 陈帆
Original assignee: Zte Wenzhou Rail Communication Technology Co ltd
Current assignee: Zte Wenzhou Rail Communication Technology Co ltd
Priority date: 2022-12-23
Filing date: 2022-12-23
Publication date: 2023-08-22

Abstract

The invention relates to the technical field of recording processing, and discloses a voice call recording data processing method; the voice call recording data processing method comprises the following steps: s1: receiving a recording request; s2: confirming file retention; s3: intelligent file adjustment; s4: intelligent character matching; s5: after voice call recording operation is carried out and retention is confirmed, proper time domain processing and frequency domain processing are carried out on cached voice call recording data, before retention is not confirmed, recording data is not processed, so that data processing amount is reduced, recording quality is improved through time domain processing and frequency domain processing, follow-up adoption is facilitated, intelligent voice-to-text recognition is carried out on voice call recording data, sentence group matching is carried out, voice call content is intelligently converted into characters, a recording section required by selection can be confirmed through checking characters, playing confirmation is not needed, and the voice call recording method is convenient to use.

Description

Voice call recording data processing method

Technical Field

The invention belongs to the technical field of recording processing, and particularly relates to a voice call recording data processing method.

Background

Recording means that sound is recorded by mechanical, optical or electromagnetic methods, and with the progress of technology, electronic products are popular, voice communication is changed into a common communication means for electronic terminals, and certain information can be recorded and used during voice communication.

When the existing voice call recording data is stored later, post-processing is not generally carried out, which may cause poor recording quality when the voice call environment is poor, and when a recording section is selected, the recording data needs to be repeatedly heard, which is troublesome; thus, improvements are now needed for the current situation.

Disclosure of Invention

Aiming at the situation, in order to overcome the defects of the prior art, the invention provides a voice call recording data processing method, which effectively solves the problems that when the existing voice call recording data are stored later, post processing is not generally carried out, the voice call environment is poor, the recording quality is poor, and when a recording section is selected, the recording data need to be repeatedly heard, and the problem is troublesome.

In order to achieve the above purpose, the present invention provides the following technical solutions: a voice call recording data processing method comprises the following steps:

s1: and (3) receiving a recording request: before or during a voice call, receiving and determining recording request signals provided by one or two equipment ends in the voice call, and after confirming that the recording request is received, automatically starting a recording system to record and save the current voice call content;

s2: file retention confirmation: on the basis of the step S1, after the voice call is ended, carrying out the retention confirmation of whether the record is stored or not, and if the record is confirmed to be stored, entering the subsequent step; if the voice call record data does not need to be stored, deleting the current voice call record data;

s3: intelligent file adjustment: on the basis of step S2, when the voice call recording data needs to be saved, firstly, caching the voice call recording data, and performing appropriate time domain processing and frequency domain processing on the cached voice call recording data, wherein the content of performing the time domain processing includes: extracting the highest peak value and the next highest value, and carrying out overlap processing on WAV file data, wherein the content of carrying out frequency domain processing comprises the following steps: signal filtering transformation and signal editing;

s4: intelligent text matching: on the basis of step S3, intelligent voice-to-text recognition is performed on the voice call recording data subjected to the time domain processing and the frequency domain processing, and sentence group matching is performed, wherein the intelligent voice-to-text recognition steps are as follows: firstly, analyzing and processing a voice signal in a voice call, simultaneously removing redundant information in the voice signal, secondly extracting characteristic information of the voice signal, matching words according to the characteristic information, then reordering the matched words according to grammar characteristics of different voices by combining an intelligent algorithm, analyzing the interrelation of contexts by combining semantics, properly correcting a sentence which is currently being processed, and finally completing recognition;

s5: and (3) pop-up window position confirmation: on the basis of the step S4, the matched text and voice are stored at the port of the call device at the same time, and the voice call recording data is projected to the recording time period and the window for generating the recording in a popup window mode.

Preferably, in step S1, the recording system specifically adopts one or a combination of several of mechanical recording, optical recording, magnetic recording, parallel recording, packet capturing recording or exchanging internal recording, and the recording system is equipped with an automatic noise reduction technology, an automatic volume adjustment technology and an automatic limit pressing technology.

Preferably, in the step S3, the peak frequency extraction method of combining the non-parametric method and the parametric method is specifically used for extracting the highest peak value and the next highest value, and the WAV file data superposition processing specifically uses a wavread function c++.

Preferably, in the step S3, the signal filtering transformation specifically adopts wavelet transformation, and the signal editing specifically includes: amplitude companding, time delay and reverberation, frequency filtering and frequency equalization, spatial stereo processing, and tuning processing.

Preferably, in the step S4, when the voice call recording data is used for recognizing the intelligent voice-to-text, the specifically adopted method is one or a combination of several of a linguistic and acoustic method, a template matching method and a neural network method.

Preferably, in the step S4, when extracting the feature information of the speech signal, the specifically adopted method is one or a combination of several of a spectral envelope method, a cepstrum method, an LPC interpolation method, an LPC root-finding method, a hilbert transform method and a formant tracking algorithm.

Preferably, in the step S4, the intelligent algorithm is specifically one or a combination of several of naive bayes, decision trees, random forests, neural networks, self-encoders or reinforcement learning.

Preferably, in the step S5, the method adopted for the popup window is specifically one or a combination of several of alert ("") popup windows or prompt ("") popup windows.

Preferably, the voice call carrier is one or a combination of a plurality of mobile phones, computers, IPAD or telephone watches.

Compared with the prior art, the invention has the beneficial effects that: 1. after voice call recording operation is carried out and retention is confirmed, proper time domain processing and frequency domain processing are carried out on the cached voice call recording data, the recording data is not processed before retention is not confirmed, so that the data processing amount is reduced, and through the time domain processing and the frequency domain processing, the recording data can be subjected to proper post-processing, so that the recording quality is improved, and the follow-up application is facilitated;

2. after the follow-up processing is finished, the voice call content is intelligently converted into characters by carrying out intelligent voice-to-character recognition and sentence group matching on voice call recording data, and when the follow-up processing is adopted, the characters can be checked to determine the recording section required by selection, and the voice call recording data does not need to be played and confirmed, so that the voice call content is convenient to adopt;

3. after the characters are matched, the matched characters and the voice can be projected to the recording time period and the window for generating the recording in a popup window mode, so that the voice call recording data can be conveniently and directly searched when the recording is needed.

Drawings

The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention.

In the drawings:

fig. 1 is a flowchart of a voice call recording data processing method according to the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the invention; all other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

As shown in fig. 1, the present invention provides a technical solution: a voice call recording data processing method comprises the following steps:

In step S1, the recording system specifically adopts one or a combination of several of mechanical recording, optical recording, magnetic recording, parallel recording, packet capturing recording or exchanging internal recording, and the recording system is equipped with an automatic noise reduction technology, an automatic volume adjustment technology and an automatic pressure limiting technology; in the step S3, the highest peak value and the next highest value are extracted by adopting a non-parametric method and a peak frequency extraction method combined by the parametric method, and the WAV file data superposition processing specifically adopts a wavread function C++; the signal filtering transformation specifically adopts wavelet transformation, and the signal editing specifically comprises: amplitude companding, time delay and reverberation, frequency filtering and frequency balancing, spatial stereo processing and tuning processing; in step S4, when the voice call recording data performs intelligent voice-to-text recognition, the specifically adopted method is one or a combination of several of a linguistic and acoustic method, a template matching method and a neural network method; when extracting the characteristic information of the voice signal, the method is one or a combination of more of a spectrum envelope method, a cepstrum method, an LPC interpolation method, an LPC root-finding method, a Hilbert transform method and a formant tracking algorithm; the intelligent algorithm is one or a combination of a plurality of naive Bayes, decision trees, random forests, neural networks, self-encoders or reinforcement learning; in step S5, the method adopted by the popup window is specifically one or a combination of several of alert ("") popup windows and prompt ("" ") popup windows; the voice call carrier is one or a combination of a plurality of mobile phones, computers, IPAD or telephone watches.

Through the steps, after voice call recording operation is carried out and retention is confirmed, proper time domain processing and frequency domain processing are carried out on the cached voice call recording data, and before the retention is not confirmed, the recording data is not processed, so that the data processing amount is reduced, and through the time domain processing and the frequency domain processing, the recording data can be subjected to proper post-processing, so that the recording quality is improved, and the follow-up application is facilitated; after the follow-up processing is finished, the voice call content is intelligently converted into characters by carrying out intelligent voice-to-character recognition and sentence group matching on voice call recording data, and when the follow-up processing is adopted, the characters can be checked to determine the recording section required by selection, and the voice call recording data does not need to be played and confirmed, so that the voice call content is convenient to adopt; after the characters are matched, the matched characters and the voice can be projected to the recording time period and the window for generating the recording in a popup window mode, so that the voice call recording data can be conveniently and directly searched when the recording is needed.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. A voice call recording data processing method is characterized in that: the method comprises the following steps:

2. The voice call recording data processing method according to claim 1, wherein: in step S1, the recording system specifically adopts one or a combination of several of mechanical recording, optical recording, magnetic recording, parallel recording, packet capturing recording or exchanging internal recording, and the recording system is equipped with an automatic noise reduction technology, an automatic volume adjustment technology and an automatic pressure limiting technology.

3. The voice call recording data processing method according to claim 1, wherein: in the step S3, the peak frequency extraction method of combining the non-parametric method and the parametric method is specifically adopted for the peak value extraction and the next highest value, and the WAV file data superposition processing specifically adopts the wavread function c++.

4. A voice call recording data processing method according to claim 3, wherein: in the step S3, the signal filtering transformation specifically adopts wavelet transformation, and the signal editing specifically includes: amplitude companding, time delay and reverberation, frequency filtering and frequency equalization, spatial stereo processing, and tuning processing.

5. The voice call recording data processing method according to claim 1, wherein: in the step S4, when the voice call recording data is used for recognizing the intelligent voice-to-text, the specifically adopted method is one or a combination of several of a linguistic and acoustic method, a template matching method and a neural network method.

6. The method for processing voice call recording data according to claim 5, wherein: in the step S4, when extracting the characteristic information of the speech signal, the method specifically adopted is one or a combination of several of a spectrum envelope method, a cepstrum method, an LPC interpolation method, an LPC root-finding method, a hilbert transformation method and a formant tracking algorithm.

7. The method for processing voice call recording data according to claim 5, wherein: in the step S4, the intelligent algorithm is specifically one or a combination of several of naive bayes, decision trees, random forests, neural networks, self-encoders or reinforcement learning.

8. The voice call recording data processing method according to claim 1, wherein: in the step S5, the method adopted by the popup window is specifically one or a combination of several of alert ("") popup windows or prompt ("") popup windows.

9. The voice call recording data processing method according to claim 1, wherein: the voice call carrier is specifically one or a combination of a plurality of mobile phones, computers, IPAD or telephone watches.