CN116631401A - Voice call recording data processing method - Google Patents

Voice call recording data processing method Download PDF

Info

Publication number
CN116631401A
CN116631401A CN202211663043.9A CN202211663043A CN116631401A CN 116631401 A CN116631401 A CN 116631401A CN 202211663043 A CN202211663043 A CN 202211663043A CN 116631401 A CN116631401 A CN 116631401A
Authority
CN
China
Prior art keywords
voice call
recording
recording data
processing
voice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211663043.9A
Other languages
Chinese (zh)
Inventor
赵方捷
吴磊
黄相辉
金斌斌
余适
陈帆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zte Wenzhou Rail Communication Technology Co ltd
Original Assignee
Zte Wenzhou Rail Communication Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zte Wenzhou Rail Communication Technology Co ltd filed Critical Zte Wenzhou Rail Communication Technology Co ltd
Priority to CN202211663043.9A priority Critical patent/CN116631401A/en
Publication of CN116631401A publication Critical patent/CN116631401A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/64Automatic arrangements for answering calls; Automatic arrangements for recording messages for absent subscribers; Arrangements for recording conversations
    • H04M1/65Recording arrangements for recording a message from the calling party
    • H04M1/656Recording arrangements for recording a message from the calling party for recording conversations

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Telephonic Communication Services (AREA)
  • Telephone Function (AREA)

Abstract

The invention relates to the technical field of recording processing, and discloses a voice call recording data processing method; the voice call recording data processing method comprises the following steps: s1: receiving a recording request; s2: confirming file retention; s3: intelligent file adjustment; s4: intelligent character matching; s5: after voice call recording operation is carried out and retention is confirmed, proper time domain processing and frequency domain processing are carried out on cached voice call recording data, before retention is not confirmed, recording data is not processed, so that data processing amount is reduced, recording quality is improved through time domain processing and frequency domain processing, follow-up adoption is facilitated, intelligent voice-to-text recognition is carried out on voice call recording data, sentence group matching is carried out, voice call content is intelligently converted into characters, a recording section required by selection can be confirmed through checking characters, playing confirmation is not needed, and the voice call recording method is convenient to use.

Description

Voice call recording data processing method
Technical Field
The invention belongs to the technical field of recording processing, and particularly relates to a voice call recording data processing method.
Background
Recording means that sound is recorded by mechanical, optical or electromagnetic methods, and with the progress of technology, electronic products are popular, voice communication is changed into a common communication means for electronic terminals, and certain information can be recorded and used during voice communication.
When the existing voice call recording data is stored later, post-processing is not generally carried out, which may cause poor recording quality when the voice call environment is poor, and when a recording section is selected, the recording data needs to be repeatedly heard, which is troublesome; thus, improvements are now needed for the current situation.
Disclosure of Invention
Aiming at the situation, in order to overcome the defects of the prior art, the invention provides a voice call recording data processing method, which effectively solves the problems that when the existing voice call recording data are stored later, post processing is not generally carried out, the voice call environment is poor, the recording quality is poor, and when a recording section is selected, the recording data need to be repeatedly heard, and the problem is troublesome.
In order to achieve the above purpose, the present invention provides the following technical solutions: a voice call recording data processing method comprises the following steps:
s1: and (3) receiving a recording request: before or during a voice call, receiving and determining recording request signals provided by one or two equipment ends in the voice call, and after confirming that the recording request is received, automatically starting a recording system to record and save the current voice call content;
s2: file retention confirmation: on the basis of the step S1, after the voice call is ended, carrying out the retention confirmation of whether the record is stored or not, and if the record is confirmed to be stored, entering the subsequent step; if the voice call record data does not need to be stored, deleting the current voice call record data;
s3: intelligent file adjustment: on the basis of step S2, when the voice call recording data needs to be saved, firstly, caching the voice call recording data, and performing appropriate time domain processing and frequency domain processing on the cached voice call recording data, wherein the content of performing the time domain processing includes: extracting the highest peak value and the next highest value, and carrying out overlap processing on WAV file data, wherein the content of carrying out frequency domain processing comprises the following steps: signal filtering transformation and signal editing;
s4: intelligent text matching: on the basis of step S3, intelligent voice-to-text recognition is performed on the voice call recording data subjected to the time domain processing and the frequency domain processing, and sentence group matching is performed, wherein the intelligent voice-to-text recognition steps are as follows: firstly, analyzing and processing a voice signal in a voice call, simultaneously removing redundant information in the voice signal, secondly extracting characteristic information of the voice signal, matching words according to the characteristic information, then reordering the matched words according to grammar characteristics of different voices by combining an intelligent algorithm, analyzing the interrelation of contexts by combining semantics, properly correcting a sentence which is currently being processed, and finally completing recognition;
s5: and (3) pop-up window position confirmation: on the basis of the step S4, the matched text and voice are stored at the port of the call device at the same time, and the voice call recording data is projected to the recording time period and the window for generating the recording in a popup window mode.
Preferably, in step S1, the recording system specifically adopts one or a combination of several of mechanical recording, optical recording, magnetic recording, parallel recording, packet capturing recording or exchanging internal recording, and the recording system is equipped with an automatic noise reduction technology, an automatic volume adjustment technology and an automatic limit pressing technology.
Preferably, in the step S3, the peak frequency extraction method of combining the non-parametric method and the parametric method is specifically used for extracting the highest peak value and the next highest value, and the WAV file data superposition processing specifically uses a wavread function c++.
Preferably, in the step S3, the signal filtering transformation specifically adopts wavelet transformation, and the signal editing specifically includes: amplitude companding, time delay and reverberation, frequency filtering and frequency equalization, spatial stereo processing, and tuning processing.
Preferably, in the step S4, when the voice call recording data is used for recognizing the intelligent voice-to-text, the specifically adopted method is one or a combination of several of a linguistic and acoustic method, a template matching method and a neural network method.
Preferably, in the step S4, when extracting the feature information of the speech signal, the specifically adopted method is one or a combination of several of a spectral envelope method, a cepstrum method, an LPC interpolation method, an LPC root-finding method, a hilbert transform method and a formant tracking algorithm.
Preferably, in the step S4, the intelligent algorithm is specifically one or a combination of several of naive bayes, decision trees, random forests, neural networks, self-encoders or reinforcement learning.
Preferably, in the step S5, the method adopted for the popup window is specifically one or a combination of several of alert ("") popup windows or prompt ("") popup windows.
Preferably, the voice call carrier is one or a combination of a plurality of mobile phones, computers, IPAD or telephone watches.
Compared with the prior art, the invention has the beneficial effects that: 1. after voice call recording operation is carried out and retention is confirmed, proper time domain processing and frequency domain processing are carried out on the cached voice call recording data, the recording data is not processed before retention is not confirmed, so that the data processing amount is reduced, and through the time domain processing and the frequency domain processing, the recording data can be subjected to proper post-processing, so that the recording quality is improved, and the follow-up application is facilitated;
2. after the follow-up processing is finished, the voice call content is intelligently converted into characters by carrying out intelligent voice-to-character recognition and sentence group matching on voice call recording data, and when the follow-up processing is adopted, the characters can be checked to determine the recording section required by selection, and the voice call recording data does not need to be played and confirmed, so that the voice call content is convenient to adopt;
3. after the characters are matched, the matched characters and the voice can be projected to the recording time period and the window for generating the recording in a popup window mode, so that the voice call recording data can be conveniently and directly searched when the recording is needed.
Drawings
The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention.
In the drawings:
fig. 1 is a flowchart of a voice call recording data processing method according to the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the invention; all other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
As shown in fig. 1, the present invention provides a technical solution: a voice call recording data processing method comprises the following steps:
s1: and (3) receiving a recording request: before or during a voice call, receiving and determining recording request signals provided by one or two equipment ends in the voice call, and after confirming that the recording request is received, automatically starting a recording system to record and save the current voice call content;
s2: file retention confirmation: on the basis of the step S1, after the voice call is ended, carrying out the retention confirmation of whether the record is stored or not, and if the record is confirmed to be stored, entering the subsequent step; if the voice call record data does not need to be stored, deleting the current voice call record data;
s3: intelligent file adjustment: on the basis of step S2, when the voice call recording data needs to be saved, firstly, caching the voice call recording data, and performing appropriate time domain processing and frequency domain processing on the cached voice call recording data, wherein the content of performing the time domain processing includes: extracting the highest peak value and the next highest value, and carrying out overlap processing on WAV file data, wherein the content of carrying out frequency domain processing comprises the following steps: signal filtering transformation and signal editing;
s4: intelligent text matching: on the basis of step S3, intelligent voice-to-text recognition is performed on the voice call recording data subjected to the time domain processing and the frequency domain processing, and sentence group matching is performed, wherein the intelligent voice-to-text recognition steps are as follows: firstly, analyzing and processing a voice signal in a voice call, simultaneously removing redundant information in the voice signal, secondly extracting characteristic information of the voice signal, matching words according to the characteristic information, then reordering the matched words according to grammar characteristics of different voices by combining an intelligent algorithm, analyzing the interrelation of contexts by combining semantics, properly correcting a sentence which is currently being processed, and finally completing recognition;
s5: and (3) pop-up window position confirmation: on the basis of the step S4, the matched text and voice are stored at the port of the call device at the same time, and the voice call recording data is projected to the recording time period and the window for generating the recording in a popup window mode.
In step S1, the recording system specifically adopts one or a combination of several of mechanical recording, optical recording, magnetic recording, parallel recording, packet capturing recording or exchanging internal recording, and the recording system is equipped with an automatic noise reduction technology, an automatic volume adjustment technology and an automatic pressure limiting technology; in the step S3, the highest peak value and the next highest value are extracted by adopting a non-parametric method and a peak frequency extraction method combined by the parametric method, and the WAV file data superposition processing specifically adopts a wavread function C++; the signal filtering transformation specifically adopts wavelet transformation, and the signal editing specifically comprises: amplitude companding, time delay and reverberation, frequency filtering and frequency balancing, spatial stereo processing and tuning processing; in step S4, when the voice call recording data performs intelligent voice-to-text recognition, the specifically adopted method is one or a combination of several of a linguistic and acoustic method, a template matching method and a neural network method; when extracting the characteristic information of the voice signal, the method is one or a combination of more of a spectrum envelope method, a cepstrum method, an LPC interpolation method, an LPC root-finding method, a Hilbert transform method and a formant tracking algorithm; the intelligent algorithm is one or a combination of a plurality of naive Bayes, decision trees, random forests, neural networks, self-encoders or reinforcement learning; in step S5, the method adopted by the popup window is specifically one or a combination of several of alert ("") popup windows and prompt ("" ") popup windows; the voice call carrier is one or a combination of a plurality of mobile phones, computers, IPAD or telephone watches.
Through the steps, after voice call recording operation is carried out and retention is confirmed, proper time domain processing and frequency domain processing are carried out on the cached voice call recording data, and before the retention is not confirmed, the recording data is not processed, so that the data processing amount is reduced, and through the time domain processing and the frequency domain processing, the recording data can be subjected to proper post-processing, so that the recording quality is improved, and the follow-up application is facilitated; after the follow-up processing is finished, the voice call content is intelligently converted into characters by carrying out intelligent voice-to-character recognition and sentence group matching on voice call recording data, and when the follow-up processing is adopted, the characters can be checked to determine the recording section required by selection, and the voice call recording data does not need to be played and confirmed, so that the voice call content is convenient to adopt; after the characters are matched, the matched characters and the voice can be projected to the recording time period and the window for generating the recording in a popup window mode, so that the voice call recording data can be conveniently and directly searched when the recording is needed.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (9)

1. A voice call recording data processing method is characterized in that: the method comprises the following steps:
s1: and (3) receiving a recording request: before or during a voice call, receiving and determining recording request signals provided by one or two equipment ends in the voice call, and after confirming that the recording request is received, automatically starting a recording system to record and save the current voice call content;
s2: file retention confirmation: on the basis of the step S1, after the voice call is ended, carrying out the retention confirmation of whether the record is stored or not, and if the record is confirmed to be stored, entering the subsequent step; if the voice call record data does not need to be stored, deleting the current voice call record data;
s3: intelligent file adjustment: on the basis of step S2, when the voice call recording data needs to be saved, firstly, caching the voice call recording data, and performing appropriate time domain processing and frequency domain processing on the cached voice call recording data, wherein the content of performing the time domain processing includes: extracting the highest peak value and the next highest value, and carrying out overlap processing on WAV file data, wherein the content of carrying out frequency domain processing comprises the following steps: signal filtering transformation and signal editing;
s4: intelligent text matching: on the basis of step S3, intelligent voice-to-text recognition is performed on the voice call recording data subjected to the time domain processing and the frequency domain processing, and sentence group matching is performed, wherein the intelligent voice-to-text recognition steps are as follows: firstly, analyzing and processing a voice signal in a voice call, simultaneously removing redundant information in the voice signal, secondly extracting characteristic information of the voice signal, matching words according to the characteristic information, then reordering the matched words according to grammar characteristics of different voices by combining an intelligent algorithm, analyzing the interrelation of contexts by combining semantics, properly correcting a sentence which is currently being processed, and finally completing recognition;
s5: and (3) pop-up window position confirmation: on the basis of the step S4, the matched text and voice are stored at the port of the call device at the same time, and the voice call recording data is projected to the recording time period and the window for generating the recording in a popup window mode.
2. The voice call recording data processing method according to claim 1, wherein: in step S1, the recording system specifically adopts one or a combination of several of mechanical recording, optical recording, magnetic recording, parallel recording, packet capturing recording or exchanging internal recording, and the recording system is equipped with an automatic noise reduction technology, an automatic volume adjustment technology and an automatic pressure limiting technology.
3. The voice call recording data processing method according to claim 1, wherein: in the step S3, the peak frequency extraction method of combining the non-parametric method and the parametric method is specifically adopted for the peak value extraction and the next highest value, and the WAV file data superposition processing specifically adopts the wavread function c++.
4. A voice call recording data processing method according to claim 3, wherein: in the step S3, the signal filtering transformation specifically adopts wavelet transformation, and the signal editing specifically includes: amplitude companding, time delay and reverberation, frequency filtering and frequency equalization, spatial stereo processing, and tuning processing.
5. The voice call recording data processing method according to claim 1, wherein: in the step S4, when the voice call recording data is used for recognizing the intelligent voice-to-text, the specifically adopted method is one or a combination of several of a linguistic and acoustic method, a template matching method and a neural network method.
6. The method for processing voice call recording data according to claim 5, wherein: in the step S4, when extracting the characteristic information of the speech signal, the method specifically adopted is one or a combination of several of a spectrum envelope method, a cepstrum method, an LPC interpolation method, an LPC root-finding method, a hilbert transformation method and a formant tracking algorithm.
7. The method for processing voice call recording data according to claim 5, wherein: in the step S4, the intelligent algorithm is specifically one or a combination of several of naive bayes, decision trees, random forests, neural networks, self-encoders or reinforcement learning.
8. The voice call recording data processing method according to claim 1, wherein: in the step S5, the method adopted by the popup window is specifically one or a combination of several of alert ("") popup windows or prompt ("") popup windows.
9. The voice call recording data processing method according to claim 1, wherein: the voice call carrier is specifically one or a combination of a plurality of mobile phones, computers, IPAD or telephone watches.
CN202211663043.9A 2022-12-23 2022-12-23 Voice call recording data processing method Pending CN116631401A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211663043.9A CN116631401A (en) 2022-12-23 2022-12-23 Voice call recording data processing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211663043.9A CN116631401A (en) 2022-12-23 2022-12-23 Voice call recording data processing method

Publications (1)

Publication Number Publication Date
CN116631401A true CN116631401A (en) 2023-08-22

Family

ID=87637090

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211663043.9A Pending CN116631401A (en) 2022-12-23 2022-12-23 Voice call recording data processing method

Country Status (1)

Country Link
CN (1) CN116631401A (en)

Similar Documents

Publication Publication Date Title
US11488605B2 (en) Method and apparatus for detecting spoofing conditions
US7610199B2 (en) Method and apparatus for obtaining complete speech signals for speech recognition applications
EP1159737B9 (en) Speaker recognition
CN111951796B (en) Speech recognition method and device, electronic equipment and storage medium
EP3989217B1 (en) Method for detecting an audio adversarial attack with respect to a voice input processed by an automatic speech recognition system, corresponding device, computer program product and computer-readable carrier medium
CN113192535B (en) Voice keyword retrieval method, system and electronic device
CN111145763A (en) GRU-based voice recognition method and system in audio
Lee et al. Intra‐and Inter‐frame Features for Automatic Speech Recognition
CN113921026A (en) Speech enhancement method and device
CN110556114B (en) Speaker identification method and device based on attention mechanism
US20070198255A1 (en) Method For Noise Reduction In A Speech Input Signal
CN116631401A (en) Voice call recording data processing method
CN112927680B (en) Voiceprint effective voice recognition method and device based on telephone channel
CN115641850A (en) Method and device for recognizing ending of conversation turns, storage medium and computer equipment
CN112216270B (en) Speech phoneme recognition method and system, electronic equipment and storage medium
Tzudir et al. Low-resource dialect identification in Ao using noise robust mean Hilbert envelope coefficients
CN108962249B (en) Voice matching method based on MFCC voice characteristics and storage medium
Vicente-Peña et al. Band-pass filtering of the time sequences of spectral parameters for robust wireless speech recognition
CN112489692A (en) Voice endpoint detection method and device
CN111696524A (en) Character-overlapping voice recognition method and system
CN111429890B (en) Weak voice enhancement method, voice recognition method and computer readable storage medium
CN116229987B (en) Campus voice recognition method, device and storage medium
WO2002069324A1 (en) Detection of inconsistent training data in a voice recognition system
US20230197097A1 (en) Sound enhancement method and related communication apparatus
CN116072104A (en) Voice gender recognition method and device and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination