CN116631401A - Voice call recording data processing method - Google Patents
Voice call recording data processing method Download PDFInfo
- Publication number
- CN116631401A CN116631401A CN202211663043.9A CN202211663043A CN116631401A CN 116631401 A CN116631401 A CN 116631401A CN 202211663043 A CN202211663043 A CN 202211663043A CN 116631401 A CN116631401 A CN 116631401A
- Authority
- CN
- China
- Prior art keywords
- voice call
- recording
- recording data
- processing
- voice
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 16
- 238000012545 processing Methods 0.000 claims abstract description 56
- 238000000034 method Methods 0.000 claims abstract description 47
- 230000014759 maintenance of location Effects 0.000 claims abstract description 13
- 238000012790 confirmation Methods 0.000 claims abstract description 10
- 238000005516 engineering process Methods 0.000 claims description 10
- 238000004422 calculation algorithm Methods 0.000 claims description 9
- 238000001914 filtration Methods 0.000 claims description 9
- 230000009466 transformation Effects 0.000 claims description 9
- 238000013528 artificial neural network Methods 0.000 claims description 6
- 238000000605 extraction Methods 0.000 claims description 4
- 230000003287 optical effect Effects 0.000 claims description 4
- 238000003066 decision tree Methods 0.000 claims description 3
- 230000006870 function Effects 0.000 claims description 3
- 238000007637 random forest analysis Methods 0.000 claims description 3
- 230000009467 reduction Effects 0.000 claims description 3
- 230000002787 reinforcement Effects 0.000 claims description 3
- 238000001228 spectrum Methods 0.000 claims description 2
- 238000011426 transformation method Methods 0.000 claims 1
- 238000012805 post-processing Methods 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 238000004891 communication Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M1/00—Substation equipment, e.g. for use by subscribers
- H04M1/64—Automatic arrangements for answering calls; Automatic arrangements for recording messages for absent subscribers; Arrangements for recording conversations
- H04M1/65—Recording arrangements for recording a message from the calling party
- H04M1/656—Recording arrangements for recording a message from the calling party for recording conversations
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Telephonic Communication Services (AREA)
- Telephone Function (AREA)
Abstract
The invention relates to the technical field of recording processing, and discloses a voice call recording data processing method; the voice call recording data processing method comprises the following steps: s1: receiving a recording request; s2: confirming file retention; s3: intelligent file adjustment; s4: intelligent character matching; s5: after voice call recording operation is carried out and retention is confirmed, proper time domain processing and frequency domain processing are carried out on cached voice call recording data, before retention is not confirmed, recording data is not processed, so that data processing amount is reduced, recording quality is improved through time domain processing and frequency domain processing, follow-up adoption is facilitated, intelligent voice-to-text recognition is carried out on voice call recording data, sentence group matching is carried out, voice call content is intelligently converted into characters, a recording section required by selection can be confirmed through checking characters, playing confirmation is not needed, and the voice call recording method is convenient to use.
Description
Technical Field
The invention belongs to the technical field of recording processing, and particularly relates to a voice call recording data processing method.
Background
Recording means that sound is recorded by mechanical, optical or electromagnetic methods, and with the progress of technology, electronic products are popular, voice communication is changed into a common communication means for electronic terminals, and certain information can be recorded and used during voice communication.
When the existing voice call recording data is stored later, post-processing is not generally carried out, which may cause poor recording quality when the voice call environment is poor, and when a recording section is selected, the recording data needs to be repeatedly heard, which is troublesome; thus, improvements are now needed for the current situation.
Disclosure of Invention
Aiming at the situation, in order to overcome the defects of the prior art, the invention provides a voice call recording data processing method, which effectively solves the problems that when the existing voice call recording data are stored later, post processing is not generally carried out, the voice call environment is poor, the recording quality is poor, and when a recording section is selected, the recording data need to be repeatedly heard, and the problem is troublesome.
In order to achieve the above purpose, the present invention provides the following technical solutions: a voice call recording data processing method comprises the following steps:
s1: and (3) receiving a recording request: before or during a voice call, receiving and determining recording request signals provided by one or two equipment ends in the voice call, and after confirming that the recording request is received, automatically starting a recording system to record and save the current voice call content;
s2: file retention confirmation: on the basis of the step S1, after the voice call is ended, carrying out the retention confirmation of whether the record is stored or not, and if the record is confirmed to be stored, entering the subsequent step; if the voice call record data does not need to be stored, deleting the current voice call record data;
s3: intelligent file adjustment: on the basis of step S2, when the voice call recording data needs to be saved, firstly, caching the voice call recording data, and performing appropriate time domain processing and frequency domain processing on the cached voice call recording data, wherein the content of performing the time domain processing includes: extracting the highest peak value and the next highest value, and carrying out overlap processing on WAV file data, wherein the content of carrying out frequency domain processing comprises the following steps: signal filtering transformation and signal editing;
s4: intelligent text matching: on the basis of step S3, intelligent voice-to-text recognition is performed on the voice call recording data subjected to the time domain processing and the frequency domain processing, and sentence group matching is performed, wherein the intelligent voice-to-text recognition steps are as follows: firstly, analyzing and processing a voice signal in a voice call, simultaneously removing redundant information in the voice signal, secondly extracting characteristic information of the voice signal, matching words according to the characteristic information, then reordering the matched words according to grammar characteristics of different voices by combining an intelligent algorithm, analyzing the interrelation of contexts by combining semantics, properly correcting a sentence which is currently being processed, and finally completing recognition;
s5: and (3) pop-up window position confirmation: on the basis of the step S4, the matched text and voice are stored at the port of the call device at the same time, and the voice call recording data is projected to the recording time period and the window for generating the recording in a popup window mode.
Preferably, in step S1, the recording system specifically adopts one or a combination of several of mechanical recording, optical recording, magnetic recording, parallel recording, packet capturing recording or exchanging internal recording, and the recording system is equipped with an automatic noise reduction technology, an automatic volume adjustment technology and an automatic limit pressing technology.
Preferably, in the step S3, the peak frequency extraction method of combining the non-parametric method and the parametric method is specifically used for extracting the highest peak value and the next highest value, and the WAV file data superposition processing specifically uses a wavread function c++.
Preferably, in the step S3, the signal filtering transformation specifically adopts wavelet transformation, and the signal editing specifically includes: amplitude companding, time delay and reverberation, frequency filtering and frequency equalization, spatial stereo processing, and tuning processing.
Preferably, in the step S4, when the voice call recording data is used for recognizing the intelligent voice-to-text, the specifically adopted method is one or a combination of several of a linguistic and acoustic method, a template matching method and a neural network method.
Preferably, in the step S4, when extracting the feature information of the speech signal, the specifically adopted method is one or a combination of several of a spectral envelope method, a cepstrum method, an LPC interpolation method, an LPC root-finding method, a hilbert transform method and a formant tracking algorithm.
Preferably, in the step S4, the intelligent algorithm is specifically one or a combination of several of naive bayes, decision trees, random forests, neural networks, self-encoders or reinforcement learning.
Preferably, in the step S5, the method adopted for the popup window is specifically one or a combination of several of alert ("") popup windows or prompt ("") popup windows.
Preferably, the voice call carrier is one or a combination of a plurality of mobile phones, computers, IPAD or telephone watches.
Compared with the prior art, the invention has the beneficial effects that: 1. after voice call recording operation is carried out and retention is confirmed, proper time domain processing and frequency domain processing are carried out on the cached voice call recording data, the recording data is not processed before retention is not confirmed, so that the data processing amount is reduced, and through the time domain processing and the frequency domain processing, the recording data can be subjected to proper post-processing, so that the recording quality is improved, and the follow-up application is facilitated;
2. after the follow-up processing is finished, the voice call content is intelligently converted into characters by carrying out intelligent voice-to-character recognition and sentence group matching on voice call recording data, and when the follow-up processing is adopted, the characters can be checked to determine the recording section required by selection, and the voice call recording data does not need to be played and confirmed, so that the voice call content is convenient to adopt;
3. after the characters are matched, the matched characters and the voice can be projected to the recording time period and the window for generating the recording in a popup window mode, so that the voice call recording data can be conveniently and directly searched when the recording is needed.
Drawings
The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention.
In the drawings:
fig. 1 is a flowchart of a voice call recording data processing method according to the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the invention; all other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
As shown in fig. 1, the present invention provides a technical solution: a voice call recording data processing method comprises the following steps:
s1: and (3) receiving a recording request: before or during a voice call, receiving and determining recording request signals provided by one or two equipment ends in the voice call, and after confirming that the recording request is received, automatically starting a recording system to record and save the current voice call content;
s2: file retention confirmation: on the basis of the step S1, after the voice call is ended, carrying out the retention confirmation of whether the record is stored or not, and if the record is confirmed to be stored, entering the subsequent step; if the voice call record data does not need to be stored, deleting the current voice call record data;
s3: intelligent file adjustment: on the basis of step S2, when the voice call recording data needs to be saved, firstly, caching the voice call recording data, and performing appropriate time domain processing and frequency domain processing on the cached voice call recording data, wherein the content of performing the time domain processing includes: extracting the highest peak value and the next highest value, and carrying out overlap processing on WAV file data, wherein the content of carrying out frequency domain processing comprises the following steps: signal filtering transformation and signal editing;
s4: intelligent text matching: on the basis of step S3, intelligent voice-to-text recognition is performed on the voice call recording data subjected to the time domain processing and the frequency domain processing, and sentence group matching is performed, wherein the intelligent voice-to-text recognition steps are as follows: firstly, analyzing and processing a voice signal in a voice call, simultaneously removing redundant information in the voice signal, secondly extracting characteristic information of the voice signal, matching words according to the characteristic information, then reordering the matched words according to grammar characteristics of different voices by combining an intelligent algorithm, analyzing the interrelation of contexts by combining semantics, properly correcting a sentence which is currently being processed, and finally completing recognition;
s5: and (3) pop-up window position confirmation: on the basis of the step S4, the matched text and voice are stored at the port of the call device at the same time, and the voice call recording data is projected to the recording time period and the window for generating the recording in a popup window mode.
In step S1, the recording system specifically adopts one or a combination of several of mechanical recording, optical recording, magnetic recording, parallel recording, packet capturing recording or exchanging internal recording, and the recording system is equipped with an automatic noise reduction technology, an automatic volume adjustment technology and an automatic pressure limiting technology; in the step S3, the highest peak value and the next highest value are extracted by adopting a non-parametric method and a peak frequency extraction method combined by the parametric method, and the WAV file data superposition processing specifically adopts a wavread function C++; the signal filtering transformation specifically adopts wavelet transformation, and the signal editing specifically comprises: amplitude companding, time delay and reverberation, frequency filtering and frequency balancing, spatial stereo processing and tuning processing; in step S4, when the voice call recording data performs intelligent voice-to-text recognition, the specifically adopted method is one or a combination of several of a linguistic and acoustic method, a template matching method and a neural network method; when extracting the characteristic information of the voice signal, the method is one or a combination of more of a spectrum envelope method, a cepstrum method, an LPC interpolation method, an LPC root-finding method, a Hilbert transform method and a formant tracking algorithm; the intelligent algorithm is one or a combination of a plurality of naive Bayes, decision trees, random forests, neural networks, self-encoders or reinforcement learning; in step S5, the method adopted by the popup window is specifically one or a combination of several of alert ("") popup windows and prompt ("" ") popup windows; the voice call carrier is one or a combination of a plurality of mobile phones, computers, IPAD or telephone watches.
Through the steps, after voice call recording operation is carried out and retention is confirmed, proper time domain processing and frequency domain processing are carried out on the cached voice call recording data, and before the retention is not confirmed, the recording data is not processed, so that the data processing amount is reduced, and through the time domain processing and the frequency domain processing, the recording data can be subjected to proper post-processing, so that the recording quality is improved, and the follow-up application is facilitated; after the follow-up processing is finished, the voice call content is intelligently converted into characters by carrying out intelligent voice-to-character recognition and sentence group matching on voice call recording data, and when the follow-up processing is adopted, the characters can be checked to determine the recording section required by selection, and the voice call recording data does not need to be played and confirmed, so that the voice call content is convenient to adopt; after the characters are matched, the matched characters and the voice can be projected to the recording time period and the window for generating the recording in a popup window mode, so that the voice call recording data can be conveniently and directly searched when the recording is needed.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.
Claims (9)
1. A voice call recording data processing method is characterized in that: the method comprises the following steps:
s1: and (3) receiving a recording request: before or during a voice call, receiving and determining recording request signals provided by one or two equipment ends in the voice call, and after confirming that the recording request is received, automatically starting a recording system to record and save the current voice call content;
s2: file retention confirmation: on the basis of the step S1, after the voice call is ended, carrying out the retention confirmation of whether the record is stored or not, and if the record is confirmed to be stored, entering the subsequent step; if the voice call record data does not need to be stored, deleting the current voice call record data;
s3: intelligent file adjustment: on the basis of step S2, when the voice call recording data needs to be saved, firstly, caching the voice call recording data, and performing appropriate time domain processing and frequency domain processing on the cached voice call recording data, wherein the content of performing the time domain processing includes: extracting the highest peak value and the next highest value, and carrying out overlap processing on WAV file data, wherein the content of carrying out frequency domain processing comprises the following steps: signal filtering transformation and signal editing;
s4: intelligent text matching: on the basis of step S3, intelligent voice-to-text recognition is performed on the voice call recording data subjected to the time domain processing and the frequency domain processing, and sentence group matching is performed, wherein the intelligent voice-to-text recognition steps are as follows: firstly, analyzing and processing a voice signal in a voice call, simultaneously removing redundant information in the voice signal, secondly extracting characteristic information of the voice signal, matching words according to the characteristic information, then reordering the matched words according to grammar characteristics of different voices by combining an intelligent algorithm, analyzing the interrelation of contexts by combining semantics, properly correcting a sentence which is currently being processed, and finally completing recognition;
s5: and (3) pop-up window position confirmation: on the basis of the step S4, the matched text and voice are stored at the port of the call device at the same time, and the voice call recording data is projected to the recording time period and the window for generating the recording in a popup window mode.
2. The voice call recording data processing method according to claim 1, wherein: in step S1, the recording system specifically adopts one or a combination of several of mechanical recording, optical recording, magnetic recording, parallel recording, packet capturing recording or exchanging internal recording, and the recording system is equipped with an automatic noise reduction technology, an automatic volume adjustment technology and an automatic pressure limiting technology.
3. The voice call recording data processing method according to claim 1, wherein: in the step S3, the peak frequency extraction method of combining the non-parametric method and the parametric method is specifically adopted for the peak value extraction and the next highest value, and the WAV file data superposition processing specifically adopts the wavread function c++.
4. A voice call recording data processing method according to claim 3, wherein: in the step S3, the signal filtering transformation specifically adopts wavelet transformation, and the signal editing specifically includes: amplitude companding, time delay and reverberation, frequency filtering and frequency equalization, spatial stereo processing, and tuning processing.
5. The voice call recording data processing method according to claim 1, wherein: in the step S4, when the voice call recording data is used for recognizing the intelligent voice-to-text, the specifically adopted method is one or a combination of several of a linguistic and acoustic method, a template matching method and a neural network method.
6. The method for processing voice call recording data according to claim 5, wherein: in the step S4, when extracting the characteristic information of the speech signal, the method specifically adopted is one or a combination of several of a spectrum envelope method, a cepstrum method, an LPC interpolation method, an LPC root-finding method, a hilbert transformation method and a formant tracking algorithm.
7. The method for processing voice call recording data according to claim 5, wherein: in the step S4, the intelligent algorithm is specifically one or a combination of several of naive bayes, decision trees, random forests, neural networks, self-encoders or reinforcement learning.
8. The voice call recording data processing method according to claim 1, wherein: in the step S5, the method adopted by the popup window is specifically one or a combination of several of alert ("") popup windows or prompt ("") popup windows.
9. The voice call recording data processing method according to claim 1, wherein: the voice call carrier is specifically one or a combination of a plurality of mobile phones, computers, IPAD or telephone watches.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211663043.9A CN116631401A (en) | 2022-12-23 | 2022-12-23 | Voice call recording data processing method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211663043.9A CN116631401A (en) | 2022-12-23 | 2022-12-23 | Voice call recording data processing method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116631401A true CN116631401A (en) | 2023-08-22 |
Family
ID=87637090
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211663043.9A Pending CN116631401A (en) | 2022-12-23 | 2022-12-23 | Voice call recording data processing method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116631401A (en) |
-
2022
- 2022-12-23 CN CN202211663043.9A patent/CN116631401A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11488605B2 (en) | Method and apparatus for detecting spoofing conditions | |
US7610199B2 (en) | Method and apparatus for obtaining complete speech signals for speech recognition applications | |
CN103903612B (en) | Method for performing real-time digital speech recognition | |
EP1159737B9 (en) | Speaker recognition | |
JP2006079079A (en) | Distributed speech recognition system and its method | |
CN111145763A (en) | GRU-based voice recognition method and system in audio | |
CN111951796B (en) | Speech recognition method and device, electronic equipment and storage medium | |
EP3989217B1 (en) | Method for detecting an audio adversarial attack with respect to a voice input processed by an automatic speech recognition system, corresponding device, computer program product and computer-readable carrier medium | |
CN113192535B (en) | Voice keyword retrieval method, system and electronic device | |
CN109473102A (en) | A kind of robot secretary intelligent meeting recording method and system | |
CN113921026A (en) | Speech enhancement method and device | |
CN112489692B (en) | Voice endpoint detection method and device | |
CN112216270B (en) | Speech phoneme recognition method and system, electronic equipment and storage medium | |
CN110556114B (en) | Speaker identification method and device based on attention mechanism | |
US20070198255A1 (en) | Method For Noise Reduction In A Speech Input Signal | |
CN116229987B (en) | Campus voice recognition method, device and storage medium | |
CN116631401A (en) | Voice call recording data processing method | |
CN112927680B (en) | Voiceprint effective voice recognition method and device based on telephone channel | |
CN108962249B (en) | Voice matching method based on MFCC voice characteristics and storage medium | |
CN111696524A (en) | Character-overlapping voice recognition method and system | |
CN111429890B (en) | Weak voice enhancement method, voice recognition method and computer readable storage medium | |
WO2002069324A1 (en) | Detection of inconsistent training data in a voice recognition system | |
US20230197097A1 (en) | Sound enhancement method and related communication apparatus | |
CN116072104A (en) | Voice gender recognition method and device and related equipment | |
CN116682416A (en) | Method and device for identifying ringing tone type |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |