CN111933176A

CN111933176A - Method and device for positioning voice contents in batches

Info

Publication number: CN111933176A
Application number: CN202010999495.9A
Authority: CN
Inventors: 舒畅; 何云鹏; 许兵
Original assignee: Chipintelli Technology Co Ltd
Current assignee: Chipintelli Technology Co Ltd
Priority date: 2020-09-22
Filing date: 2020-09-22
Publication date: 2020-11-13
Anticipated expiration: 2040-09-22
Also published as: CN111933176B

Abstract

S1, carrying out prompt tone playing before recording starts, and recording the path of a stored audio file into a path recording file; s2, traversing and reading all paths recorded in the path recording file; s3, performing prompt tone detection and positioning on the content of the previous part of time of the audio file S4, performing prompt tone detection again on the audio file of which the prompt tone is detected and a part of audio segments are deleted, and if the audio file of the prompt tone is not detected again, saving the file; until no new prompt tone is detected; and S5, finishing processing all detected audio files. The invention also discloses a device for positioning the voice contents in batch. The voice recording position in all audio files can be found out by accurately positioning the position of the prompt tone, and the processing speed of the voice corpus files is improved.

Description

Method and device for positioning voice contents in batches

Technical Field

The invention belongs to the technical field of intelligent voice recognition, relates to a corpus recognition technology, and particularly relates to a method and a device for positioning voice contents in batches.

Background

In the existing artificial intelligence field, speech recognition is mature day by day, and most artificial intelligence development is based on speech recognition and processing. However, due to research and development on speech recognition, it needs to be established on the basis of a large amount of linguistic data, but the normal recording process is not effective in the whole process, and a large amount of redundant information exists. The voice quantity is large, and redundant information is complicated to become a road block stone in the research and development of voice recognition.

The method for processing corpus redundancy in the prior art has the following defects:

1. the linguistic data with different frequencies and different sound channels need to be processed separately;

2. the corpus processing needs to be configured into the same path template, and the audios with different storage formats under the directory cannot be processed;

3. the audio processing speed is low;

4. the audio positioning is easily influenced by the background noise, so that the positioning is inaccurate;

5. there is a lack of reproducible detection.

Disclosure of Invention

In order to overcome the defects of the existing corpus processing technology, the invention discloses a method and a device for positioning voice contents in batches.

The method for positioning the voice contents in batch comprises the following steps:

s1, playing a warning tone before recording is started, recording an audio after the warning tone is played, recording and storing an audio file comprising the warning tone, and recording a path of the stored audio file into a path recording file;

s2, traversing and reading all paths recorded in the path recording file; reporting an error and recording the error in the generated error log when the path does not exist actually or the audio file cannot be found in the path;

in the traversal reading process, after a corpus audio file is found, the characteristics of the corpus audio file are read and processed into a single-track audio file;

s3, performing prompt tone detection and positioning on the content of the front part of the audio file, wherein the front part at least comprises the front half part of the audio file, detecting the file of the prompt tone, and deleting the audio segment before the last prompt tone;

if no prompt tone is detected in the front part of the audio file, the audio file is considered to be wrong, and the error path record is written into an error log;

s4, secondary screening and detection, which specifically comprises the following steps:

s41, carrying out prompt tone detection again on the audio file with the detected prompt tone and deleted part of the audio segment, and if the audio file with the prompt tone is not detected again, saving the file;

s42, repositioning the audio file with the detected new prompt tone and deleting the audio segment before the new prompt tone;

repeating the steps S41-S42 until no new prompt tone is detected;

s5, repeating the steps S3-S4, and finishing processing all detected audio files.

Preferably, the alert tone is a periodically repeating audio signal.

Preferably, the detection method of the warning tone is as follows: detecting an audio file, and recording a starting point of an audio segment when the audio segment matched with the audio amplitude characteristic of the prompt tone is found; and continuously judging whether audio matched with the periodic characteristics of the audio of the prompt tone appears in the follow-up period, and marking as the prompt tone if the period times are the same.

Preferably, the data processing in steps S1-S5 is implemented based on python algorithm programming.

Preferably, the specific process of processing into the monaural audio file in step S2 is as follows: calling a python library function to automatically read the audio file, and obtaining the number of sampling points, the sampling frequency and the number of sound channels of the current audio file; and determining whether the current audio is single-channel audio or double-channel audio by judging whether the number of the channels is 1 or 2, and separating the audio of one channel into single-channel audio files for the double-channel audio files.

The invention also discloses a device for positioning the voice contents in batches, which comprises a prompt tone broadcasting module, an audio recording module, a path reading module, an audio file traversing searching module and an audio processing module which are connected in sequence; the audio processing module comprises an audio file feature extraction module, a single sound channel processing module, a prompt tone detection module and an audio band deletion module which are connected in sequence; the audio file feature extraction module is connected with the audio file traversal searching module;

the device also comprises an error log generation module connected with the path reading module, the audio file traversal searching module and the prompt tone detection module.

Preferably, the device further comprises a window generation module connected to the path reading module.

Compared with the prior art, the method for positioning the voice contents in batch has the following advantages that:

1. and realizing unified processing of audios in different formats. The system can automatically identify the characteristics and the format of the current wav audio, and carry out normalization processing on the single and double sound channels and the amplitude value, so that the applicability is improved;

2. and audio processing under different file directories is realized. The system can traverse a given main directory by itself, search wav files at each corner of the current main directory, send the wav files to the processing module after traversing the wav files, and automatically skip the wav files when not traversing the wav files, so that the problem that the directories need to be uniformly formatted is solved, and convenience is improved;

3. the processing of the batch audio is relatively shortened by about half, the system can intercept and select the audio which needs to be aligned currently, and only the first half of the matrix is judged, so that the operation speed is greatly improved;

under the condition of small bottom noise, the prompt tone can be accurately positioned, and the error does not exceed 0.05 s; when the cue tone repeatedly appears, the position of the last appearing cue tone, namely the position of the voice starting to be recorded correctly, can be positioned.

Drawings

FIG. 1 is a diagram illustrating an embodiment of processing an audio file according to the present invention;

fig. 2 is a schematic diagram of an embodiment of a device for batch locating of voice content according to the present invention.

Detailed Description

The following provides a more detailed description of the present invention.

repeating the steps S41-S42 until no new prompt tone is detected;

The method comprises the steps of playing a prompt tone before the recording is started, recording audio after the prompt tone is played, recording a path of a stored audio folder into a path recording file after a corpus audio file obtained by recording is stored, wherein the corpus audio file is usually a file with an extension name of WAV, and the path of the path recording file is usually copied into a window generated by a program so as to read the file, thereby reading the audio file, and realizing the function of copying the path through a window generation module.

The program starts to traverse and read the path in the path recording file, and when a certain path does not exist or the audio file cannot be found under the path, an error is reported and recorded in the generated error log;

after reading the path in the path recording file, the program traverses all files and folders under the path based on the path, namely searches for an audio file, usually a file with the extension name of wav; when a certain path does not exist or an audio file cannot be found under a certain path, a log file is recorded and generated, and generally a log is used as a suffix.

When all paths contained in the path recording file are traversed, automatically distinguishing files and folders under each path by the system, continuously entering the folders for searching audio files, reporting errors and generating an error log to record the path if no audio file is found in all the folders under the path;

after traversing to find the corpus audio file with the extension name of wav, automatically reading the characteristics of the wav file by a program, and respectively processing the audio files of single and double sound channels;

the python library function has a function of automatically reading audio, can directly obtain the characteristic values of the current audio, such as the number of sampling points, the sampling frequency, the number of channels and the like, determines whether the current audio is a single-channel audio or a double-channel audio by judging whether the number of channels is 1 or 2, and respectively performs different processing flows. The two-channel audio is processed by separating the audio of one channel.

After the audio file is processed into a single-channel file, the content of the first half time of the audio file is subjected to prompt tone detection and positioning, and the specific detection means is that the audio file is detected to be in accordance with the characteristics of the prompt tone, for example, the amplitude and the period of a certain section of audio signal are found to be consistent with the setting of the prompt tone, the position is recorded for the time starting point of the detected section of audio signal, and the prompt tone or other sounds are determined by judging the number of times of the periodic appearance of the amplitude corresponding to the prompt tone.

If the time is judged to be the alert tone, the time is returned to the main program, the length inherent to the alert tone is added to the time to obtain a new point, and all the contents before the new point are deleted. Recording and reporting errors of the audio files without the detected prompt tones;

the waveform of the cue tone is periodic and regular, and is peculiar periodicity, for example, a certain tone is repeated for many times, or the waveform amplitude of a certain tone continuously reaches a certain value, so that the characteristic is selected as a detection mark; after the audio is read by a program, the audio exists in a memory in a point value form, and when the characteristic that a certain section of audio accords with the prompt tone is detected, a first point of the audio is required to be recorded; because the detected audio frequency exists in the form of amplitude, points with the same amplitude as the warning tone may appear, but the warning tone has the characteristic of periodicity, and whether the frequency of the periodic appearance of the amplitude points is the same as the frequency of the warning tone itself can be judged, if so, the audio frequency is regarded as the warning tone, and if not, the audio frequency is regarded as not the warning tone.

If no cue tone is detected in the first half of the audio, the entire audio is considered as erroneous and the error path record is written into the error log.

The alert tone detection is then re-performed on the audio detected by the alert tone, and since the audio from which no alert tone was initially detected has been considered to be an error, only the audio that has been subjected to the alert tone processing is addressed in this step,

the detection continues for the audio segment preceding the alert tone and if the audio file of the alert tone is not detected again, the file is saved. And (3) repositioning the audio frequency of the detected new prompt tone, namely, returning the position of the time starting point of the prompt tone to the main program, adding the inherent length of the prompt tone, extending backwards to obtain the end point of the prompt tone, deleting all the contents before the end point, and finishing after all the wav audio frequencies are processed by the process.

To detect the audio file of many times of prompt tones, usually prove that the recording has appeared and has made mistakes in the recording process, perhaps break off, the person of recording has restarted the recording, because the recorder was whole at each person of recording in-process and is opened during the recording, so the condition of many times of prompt tones can appear, then should regard the position that the last prompt tone appears as the standard this moment, get the audio information behind the last prompt tone.

As shown in fig. 1, a first alert tone detection may be performed on an audio segment before the time midpoint of an audio file, and after the alert tone is detected, the preceding audio segment including the alert tone is deleted; and the deleted audio segment continues to carry out the second prompt tone detection, if the prompt tone is detected again, the audio segment before the new prompt tone is deleted again until the new prompt tone is not detected, and the rest audio segments are stored.

The method for positioning the voice contents in batches can be realized based on the device for positioning the voice contents in batches and realized based on python software programming. As shown in fig. 2, the device comprises a prompt tone broadcasting module, an audio recording module, a path reading module, an audio file traversal searching module and an audio processing module which are connected in sequence; the audio processing module comprises an audio file feature extraction module, a single sound channel processing module, a prompt tone detection module and an audio band deletion module which are connected in sequence; the audio file feature extraction module is connected with the audio file traversal searching module;

Through the error log generated by the error log generating module, a user can conveniently search invalid paths and invalid audio files and correct errors in the recording process in time.

The foregoing is directed to preferred embodiments of the present invention, wherein the preferred embodiments are not obviously contradictory or subject to any particular embodiment, and any combination of the preferred embodiments may be combined in any overlapping manner, and the specific parameters in the embodiments and examples are only for the purpose of clearly illustrating the inventor's invention verification process and are not intended to limit the scope of the invention, which is defined by the claims and the equivalent structural changes made by the description and drawings of the present invention are also intended to be included in the scope of the present invention.

Claims

1. A method for batch locating voice content, comprising the steps of:

repeating the steps S41-S42 until no new prompt tone is detected;

2. The method for batch locating speech content according to claim 1, wherein the alert tone is a periodically repeating audio signal.

3. The method for bulk positioning of speech content according to claim 2, wherein the detection of the alert tone is performed by: detecting an audio file, and recording a starting point of an audio segment when the audio segment matched with the audio amplitude characteristic of the prompt tone is found; and continuously judging whether audio matched with the periodic characteristics of the audio of the prompt tone appears in the follow-up period, and marking as the prompt tone if the period times are the same.

4. The method of batch locating speech content according to claim 1, wherein the data processing procedure in steps S1-S5 is implemented programmatically based on a python algorithm.

5. The method for batch locating voice contents according to claim 4, wherein the specific process of processing into the mono audio file in the step S2 is as follows: calling a python library function to automatically read the audio file, and obtaining the number of sampling points, the sampling frequency and the number of sound channels of the current audio file; and determining whether the current audio is single-channel audio or double-channel audio by judging whether the number of the channels is 1 or 2, and separating the audio of one channel into single-channel audio files for the double-channel audio files.

6. A device for positioning voice contents in batches is characterized by comprising a prompt tone broadcasting module, an audio recording module, a path reading module, an audio file traversing searching module and an audio processing module which are connected in sequence; the audio processing module comprises an audio file feature extraction module, a single sound channel processing module, a prompt tone detection module and an audio band deletion module which are connected in sequence; the audio file feature extraction module is connected with the audio file traversal searching module;

7. The apparatus for bulk positioning of voice content according to claim 6, further comprising a window generation module connected to the path reading module.