WO2020186695A1 - Voice information batch processing method and apparatus, computer device, and storage medium - Google Patents

Voice information batch processing method and apparatus, computer device, and storage medium Download PDF

Info

Publication number
WO2020186695A1
WO2020186695A1 PCT/CN2019/103345 CN2019103345W WO2020186695A1 WO 2020186695 A1 WO2020186695 A1 WO 2020186695A1 CN 2019103345 W CN2019103345 W CN 2019103345W WO 2020186695 A1 WO2020186695 A1 WO 2020186695A1
Authority
WO
WIPO (PCT)
Prior art keywords
preset
script
voice information
running
voice
Prior art date
Application number
PCT/CN2019/103345
Other languages
French (fr)
Chinese (zh)
Inventor
王涛
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020186695A1 publication Critical patent/WO2020186695A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • This application relates to the field of data processing, and in particular to a method, device, computer equipment and storage medium for batch processing of voice information.
  • the embodiments of the present application provide a batch processing method, device, computer equipment, and storage medium for voice information, which can efficiently and accurately realize the unified conversion of multiple voice information to be processed, and reduce errors in the conversion process.
  • an embodiment of the present application provides a method for batch processing of voice information, the method including:
  • the preset Bash script includes at least one preset sub-running script, and each sub-running script is used to implement batch processing of all voice information to be processed.
  • the number of voice messages is less than or equal to the number of voice messages to be processed;
  • an embodiment of the present application also provides a batch processing device for voice information, which includes:
  • An obtaining unit configured to obtain a preset training set if an information processing instruction is received, the training set including a plurality of to-be-processed voice information;
  • the batch processing unit is used to sequentially call and run the sub-running scripts in the preset Bash script according to the information processing instruction, so that when one of the sub-running scripts is executed, all the voice messages to be processed will be processed in batches.
  • multiple target voice information is obtained, wherein the preset Bash script includes at least one preset sub-running script, and each sub-running script is used to implement the processing of all voice information to be processed
  • the quantity of the target voice information is less than or equal to the quantity of the voice information to be processed;
  • the noise removal unit is used to filter all target voice information through preset voice activation detection to obtain the intermediate voice information after noise removal;
  • the framing unit performs framing processing on all intermediate voice information through preset framing rules to obtain test voice information for training the voice recognition model.
  • an embodiment of the present application also provides a computer device, which includes a memory and a processor, the memory stores a computer program, and the processor implements the above method when the computer program is executed.
  • an embodiment of the present application also provides a computer-readable storage medium, where the storage medium stores a computer program, and the computer program can implement the foregoing method when executed by a processor.
  • FIG. 1 is a schematic flowchart of a method for batch processing of voice information provided by an embodiment of the present application
  • FIG. 2 is a schematic diagram of a sub-flow of a method for batch processing of voice information provided by an embodiment of the present application
  • FIG. 3 is a schematic diagram of a sub-flow of a method for batch processing of voice information provided by an embodiment of the present application
  • FIG. 4 is a schematic diagram of a sub-flow of a method for batch processing of voice information provided by an embodiment of the present application
  • FIG. 5 is a schematic block diagram of an apparatus for batch processing of voice information according to an embodiment of the present application.
  • FIG. 6 is a schematic block diagram of a batch processing unit of a voice information batch processing apparatus provided by an embodiment of the present application.
  • FIG. 7 is another schematic block diagram of a batch processing unit of a voice information batch processing apparatus provided by an embodiment of the present application.
  • FIG. 8 is another schematic block diagram of a batch processing unit of a voice information batch processing apparatus provided by an embodiment of the present application.
  • FIG. 9 is a schematic diagram of the structural composition of a computer device provided by an embodiment of the present application.
  • Fig. 1 is a schematic flowchart of a method for batch processing of voice information provided by an embodiment of the present application.
  • the batch processing method of voice information is applied to the management server.
  • the management server trains the neural network through the training set, it performs batch preprocessing on the to-be-processed voice information in the acquired training set, such as removing damaged and too short to-be-processed voice information from the training set; Convert the audio format and sampling rate of the voice information to be processed in the training set to a uniform audio format and sampling rate; rename all the voice information to be processed according to specific rules, etc., through the above-mentioned batch processing Efficiently and accurately realize the unified conversion of multiple to-be-processed voice information in the training set, and can effectively reduce errors in the process of processing and converting another to-be-processed voice information after each to-be-processed voice information is processed and converted , In order to accurately realize the training of the neural network.
  • the steps of the method include steps
  • Step S101 If an information processing instruction is received, a preset training set is obtained, and the training set includes a plurality of to-be-processed voice information.
  • the training set can be preset, that is, voice information can be collected and stored from various applications capable of obtaining voice information. At this time, the voice information stored in the training set is the voice information to be processed.
  • the management server receives the information processing instruction initiated by the user, it obtains a preset training set, that is, obtains multiple to-be-processed voice information in the training set to facilitate subsequent operations.
  • Step S102 according to the information processing instruction, call and run the sub-running scripts in the preset Bash script in turn, so that when one of the sub-running scripts is executed, all the voice messages to be processed are processed in batches and all the voice messages are processed.
  • the preset Bash script includes at least one preset sub-run script, and each sub-run script is used to realize batch processing of all voice information to be processed, The quantity of the target voice information is less than or equal to the quantity of the voice information to be processed.
  • the preset Bash script can be integrated with multiple pre-set sub-run scripts.
  • Each sub-run script can realize batch processing of all audio files to be processed in the same processing step.
  • Running a sub-run script is able to perform the same conversion or change processing on all pending audio files, and the management server can call another sub-run script after all pending audio files have completed the corresponding processing. On the basis of the previous process, another conversion or change process is realized.
  • the management server can sequentially call the sub-running scripts in the preset Bash script according to the information processing instruction, and each sub-running script is called once, so as to perform corresponding batch processing on all the voice information to be processed, and then Call another sub-run script in the Bash script again until all sub-run scripts have been run to obtain multiple target voice messages that have been completely converted or changed.
  • Each batch conversion or change mentioned above requires all waiting The next batch conversion or change is performed after the processing of the voice information is completed, which can effectively reduce errors caused by the number of voice information to be processed and too many conversion steps in the current conversion process, thereby greatly improving the processing efficiency of the voice information to be processed.
  • the management server can generally execute the Bash script through Python, that is, it can execute multiple preset sub-run scripts in the Bash script in turn through Python to sequentially implement batch processing operations on the voice information to be processed in the training set, reducing gradual iterative processing Errors in the process improve the efficiency and accuracy of conversion.
  • the step S102 may include steps S201 to S202.
  • the management server may call the first running script in the preset Bash script according to the received information processing instruction, so as to facilitate subsequent processing.
  • the first running script can realize the conversion of audio format and sampling rate of all voice information to be processed in the preset training set.
  • the first running script may be an FFmpeg script.
  • the FFmpeg script is a set of open source computer programs that can be used to record, convert digital audio and video, and convert them into streams.
  • the FFmpeg script can convert the audio format and sample rate of the voice information to be tested.
  • S202 Run the first running script to perform audio format conversion and sample rate conversion on all voice information to be processed, so as to obtain multiple target voice information with preset audio formats and preset sampling rates.
  • all the voice information to be tested can be converted into a unified audio format and a unified sampling rate.
  • the management server runs the first running script, it can batch convert all the voice information to be processed into preset audio formats and preset samples according to the preset audio format and preset sampling rate set in the first running script Rate the target voice information.
  • common audio formats can include WAV, MIDI, MP3, RA, MP4 and other format types.
  • the preset audio format can be set to WAV format, that is, when all audio formats are non-preset audio formats, It can be converted to WAV format by running the first run script.
  • sampling rate is also called sampling speed or sampling rate, which defines the number of samples extracted from a continuous signal per second to form a discrete signal, and it is expressed in Hertz (Hz).
  • sampling period or sampling time, which is the time interval between samples.
  • sampling frequency refers to how many signal samples the computer collects per second.
  • Sampling rate indicates how many sampling points are collected per second, then 8k means 8000 times of 1s acquisition, and 16k means 1s acquisition of 16000 times, that is, if the preset sampling rate is 8k, and the sampling rate of the voice information to be converted is 16k, then pass
  • the first running script converts the sampling rate of the voice information to be processed from 16k to 8k.
  • the preset Bash script includes a first running script for audio format conversion and a second running script for effective audio filtering.
  • the step S102 may include Steps S301 to S304.
  • S301 Invoke a first running script in a preset Bash script according to the information processing instruction.
  • the management server may call the first running script in the preset Bash script according to the received information processing instruction, so as to facilitate subsequent processing.
  • the first running script can realize the conversion of audio format and sampling rate of all voice information to be processed in the preset training set.
  • all the voice information to be tested can be converted into a unified audio format and a unified sampling rate.
  • the management server runs the first running script, it can batch convert all the voice information to be processed into preset audio formats and preset samples according to the preset audio format and preset sampling rate set in the first running script Rate the target voice information.
  • the management server needs to call the second running script for effective audio filtering in the preset Bash script.
  • the preset specifications in the second running script set conditions for screening voice information, so that voice information that meets the preset specifications can be selected from a plurality of first voice messages as valid voice information.
  • the second running script may be SOX.
  • SOX can filter out effective voice information from a plurality of first voice information according to a set preset specification.
  • S304 Run the second running script to filter all the first voice information, so as to obtain a plurality of target voice information meeting preset specifications, and the number of the target voice information is less than or equal to the number of the first voice information.
  • the management server runs the second running script, it can filter all the first voice information according to the preset specifications to obtain the target voice information that meets the conditions, so after the screening, the number of the target voice information is less than or equal to The number of first voice messages.
  • the preset specification may be a preset voice duration threshold. For example, if the duration of the first voice message is lower than the preset threshold, the first voice message is deleted.
  • the preset specification can also be a preset threshold for the sampling point of the voice information, or a preset threshold for the scaling factor of the voice information, or a preset threshold for the maximum amplitude of the voice information. value.
  • the preset Bash script includes a first running script for audio format conversion, a second running script for effective audio filtering, and a script for renaming.
  • the third running script, the step S102 may include steps S401 to S406.
  • S401 Call a first running script in a preset Bash script according to the information processing instruction.
  • the management server may call the first running script in the preset Bash script according to the received information processing instruction, so as to facilitate subsequent processing.
  • the first running script can realize the conversion of audio format and sampling rate of all voice information to be processed in the preset training set.
  • S402 Run the first running script to perform audio format conversion and sample rate conversion on all voice information to be processed, so as to obtain a corresponding number of first voice information with the same audio format and sampling rate.
  • all the voice information to be tested can be converted into a unified audio format and a unified sampling rate.
  • the management server runs the first running script, it can batch convert all the voice information to be processed into preset audio formats and preset samples according to the preset audio format and preset sampling rate set in the first running script Rate the target voice information.
  • the management server needs to call the second running script for effective audio filtering in the preset Bash script.
  • the preset specifications in the second running script set conditions for screening voice information, so that voice information that meets the preset specifications can be selected from multiple first voice messages as valid voice information.
  • the second running script may be SOX.
  • SOX can filter out effective voice information from a plurality of first voice information according to a set preset specification.
  • the management server runs the second running script, it can filter all the first voice information according to the preset specifications to obtain the target voice information that meets the conditions, so after the screening, the number of the target voice information is less than or equal to The number of first voice messages.
  • the preset specification may be a preset voice duration threshold. For example, if the duration of the first voice message is lower than the preset threshold, the first voice message is deleted.
  • the preset specification can also be a preset threshold for the sampling point of the voice information, or a preset threshold for the scaling factor of the voice information, or a preset threshold for the maximum amplitude of the voice information. value.
  • the management server needs to call the third running script for renaming in the preset Bash script, so that the renamed voice message can be more accurately and quickly Read.
  • a preset name format is preset in the third running script, so that multiple second voice messages can be renamed according to the preset name format.
  • the third running script is a renaming function, and the renaming function may be a function rename() for renaming files.
  • S406 Run the third running script to rename all the second voice information, so as to obtain a corresponding number of target voice information with a preset name format.
  • all the voice information in the training set can be generated by the same subject, that is, each subject can correspond to multiple pieces of different voice information.
  • it needs to be based on the preset name format and the second voice information.
  • Rename the existing information The management server can obtain the corresponding renamed target voice information after running the third running script.
  • the naming of the target voice information conforms to the preset name format.
  • the number of target voice information is the same as that of the second voice. The amount of information is equal, and there is a one-to-one correspondence between the two.
  • Step S103 Perform filtering processing on all target voice information through preset voice activation detection to obtain intermediate voice information after noise removal.
  • the voice activation detection is Voice Activity Detection, or VAD for short, which can distinguish voices in voice signals. Signal and background noise, thereby improving the accuracy of training neural networks and reducing the time required for training.
  • VAD Voice Activity Detection
  • the voice activation detection can cut off the mute at the beginning and the end of the voice information and reduce the interference caused to the subsequent steps. That is, the voice activation detection can filter all target voice information in batch processing to obtain multiple corresponding intermediate voices after denoising information.
  • Step S104 Perform framing processing on all intermediate voice information according to preset framing rules to obtain test voice information for training a voice recognition model.
  • the management server also needs to perform framing processing on all intermediate voice information according to preset framing rules, so as to obtain a corresponding number of framed test voice information.
  • the test speech information can be used to train a speech recognition model, so as to obtain a speech recognition model capable of corresponding speech recognition.
  • the preset framing rule may refer to sound framing through a moving window function, that is, the voice information is cut into a small segment and a small segment, each segment is called a frame, and there is generally a frame between each frame. Overlapping.
  • the step S104 may specifically include: performing framing processing on the intermediate voice information through the Enframe function to obtain test voice information for training a voice recognition model.
  • the Enframe function is a specific framing function
  • the management server can perform unified framing processing on all intermediate voice information after calling the framing function, so as to obtain the final test voice information for training.
  • the embodiments of the present application can efficiently and accurately realize the unified conversion of multiple to-be-processed voice information in the training set, and reduce errors in the conversion process, so as to accurately implement neural network training.
  • the program can be stored in a computer readable storage medium. When executed, it may include the processes of the above-mentioned method embodiments.
  • the storage medium may be a magnetic disk, an optical disc, a read-only memory (Read-Only Memory, ROM), etc.
  • an embodiment of the present application also proposes a device for batch processing of voice information.
  • the device 100 includes: an acquisition unit 101, a batch processing unit 102, a noise removal unit 103, and Framing unit 104.
  • the obtaining unit 101 is configured to obtain a preset training set if an information processing instruction is received, and the training set includes a plurality of to-be-processed voice information.
  • the batch processing unit 102 is configured to sequentially call and run the sub-running scripts in the preset Bash script according to the information processing instruction, so that when one of the sub-running scripts is executed, all the voice messages to be processed are batched accordingly Process and run until all the sub-run scripts are finished, so as to obtain multiple target voice messages.
  • the preset Bash script includes at least one preset sub-run script, and each sub-run script is used to realize the processing of all the sub-run scripts.
  • the quantity of the target voice information is less than or equal to the quantity of the voice information to be processed.
  • the preset Bash script includes a first running script for converting audio format and sampling rate
  • the batch processing unit 102 may include: a first calling unit 201 and The first operating unit 202.
  • the first calling unit 201 is configured to call the first running script in the preset Bash script according to the information processing instruction.
  • the first running unit 202 is configured to run the first running script to perform audio format conversion and sample rate conversion on all voice information to be processed, thereby obtaining multiple targets with preset audio formats and preset sampling rates voice message.
  • the preset Bash script includes a first running script for audio format conversion and a second running script for effective audio filtering.
  • the batch processing unit 102 It may include a first calling unit 301, a first running unit 302, a second calling unit 303, and a second running unit 304.
  • the first calling unit 301 is configured to call the first running script in the preset Bash script according to the information processing instruction.
  • the first running unit 302 is configured to run the first running script to perform audio format conversion and sample rate conversion on all the voice information to be processed, so as to obtain a corresponding number of audio formats with preset audio formats and preset sampling rates.
  • the first voice message is configured to run the first running script to perform audio format conversion and sample rate conversion on all the voice information to be processed, so as to obtain a corresponding number of audio formats with preset audio formats and preset sampling rates.
  • the second calling unit 303 is configured to call a second running script in the preset Bash script.
  • the second running unit 304 is configured to run the second running script to filter all the first voice information, so as to obtain a plurality of target voice information meeting preset specifications, and the number of the target voice information is less than or Equal to the number of first voice messages.
  • the preset Bash script includes a first running script for audio format conversion, a second running script for effective audio filtering, and a renaming script.
  • the third running script, the batch processing unit 102 may include a first calling unit 401, a first running unit 402, a second calling unit 403, a second running unit 404, a third calling unit 405, and a third running unit 406.
  • the first calling unit 401 is configured to call the first running script in the preset Bash script according to the information processing instruction.
  • the first running unit 402 is configured to run the first running script to perform audio format conversion and sample rate conversion on all voice information to be processed, so as to obtain a corresponding number of first voices with the same audio format and sampling rate information.
  • the second calling unit 403 is configured to call a second running script in the preset Bash script.
  • the second running unit 404 is configured to run the second running script to filter all the first voice messages, so as to obtain a plurality of second voice messages that meet the preset specifications, and the number of the second voice messages Less than or equal to the number of first voice messages.
  • the third calling unit 405 is configured to call the third running script in the preset Bash script.
  • the third running unit 406 is configured to run the third running script to rename all the second voice information, so as to obtain a corresponding number of target voice information with a preset name format.
  • the noise removal unit 103 is configured to perform filtering processing on all target voice information through preset voice activation detection to obtain intermediate voice information after noise removal.
  • the framing unit 104 performs framing processing on all intermediate voice information according to a preset framing rule to obtain test voice information for training a voice recognition model.
  • the framing unit 104 may be specifically configured to perform framing processing on the intermediate voice information through the Enframe function to obtain test voice information for training a voice recognition model.
  • the above acquisition unit 101, batch processing unit 102, noise removal unit 103, and framing unit 104 can be embedded in hardware or independent of life insurance reporting devices, or can be in software Stored in the memory of the batch processing device for voice information, so that the processor can call and execute the operations corresponding to the above units.
  • the processor can be a central processing unit (CPU), a microprocessor, a single-chip microcomputer, etc.
  • the foregoing apparatus for batch processing of voice information can be implemented in the form of a computer program, and the computer program can be run on a computer device as shown in FIG. 9.
  • FIG. 9 is a schematic diagram of the structural composition of a computer device of this application.
  • the device can be a server, where the server can be an independent server or a server cluster composed of multiple servers.
  • the computer device 500 includes a processor 502, a memory, an internal memory 504, and a network interface 505 connected through a system bus 501, where the memory may include a non-volatile storage medium 503 and the internal memory 504.
  • the non-volatile storage medium 503 can store an operating system 5031 and a computer program 5032.
  • the processor 502 can execute a method for batch processing of voice information.
  • the processor 502 is used to provide computing and control capabilities, and support the operation of the entire computer device 500.
  • the internal memory 504 provides an environment for the operation of the computer program 5032 in the non-volatile storage medium 503.
  • the processor 502 can execute a method for batch processing of voice information.
  • the network interface 505 is used for network communication with other devices.
  • FIG. 9 is only a block diagram of part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device 500 to which the solution of the present application is applied.
  • the specific computer device 500 may include more or fewer components than shown in the figure, or combine certain components, or have a different component arrangement.
  • the processor 502 is configured to run a computer program 5032 stored in a memory, so as to implement the method for batch processing of voice information in any of the foregoing embodiments.
  • the processor 502 may be a central processing unit (Central Processing Unit, CPU), and the processor 502 may also be other general-purpose processors, digital signal processors (Digital Signal Processors, DSPs), Application Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor.
  • the computer program may be stored in a storage medium, and the storage medium is a computer-readable storage medium.
  • the computer program is executed by at least one processor in the computer system to implement the process steps of the foregoing method embodiment.
  • the storage medium may be a computer-readable storage medium.
  • the storage medium stores a computer program, and when the computer program is executed by the processor, the processor executes the voice information batch processing method in any of the above embodiments.
  • the storage medium is a physical, non-transitory storage medium, such as a U disk, a mobile hard disk, a read-only memory (Read-Only Memory, ROM), a magnetic disk, or an optical disk that can store program codes. medium.
  • a physical, non-transitory storage medium such as a U disk, a mobile hard disk, a read-only memory (Read-Only Memory, ROM), a magnetic disk, or an optical disk that can store program codes. medium.
  • the disclosed device and method may be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of each unit is only a logical function division, and there may be other division methods in actual implementation.
  • multiple units or components can be combined or integrated into another system, or some features can be omitted or not implemented.
  • the steps in the method of the embodiment of the present application can be adjusted, merged, and deleted in order according to actual needs.
  • the units in the device in the embodiment of the present application may be combined, divided, and deleted according to actual needs.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a storage medium.
  • the technical solution of this application is essentially or the part that contributes to the existing technology, or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium It includes several instructions to make a computer device (which may be a personal computer, a terminal, or a network device, etc.) execute all or part of the steps of the method described in each embodiment of the present application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Telephonic Communication Services (AREA)

Abstract

Disclosed are a voice information batch processing method and apparatus, a computer device, and a storage medium. The method comprises: if an information processing instruction is received, obtaining a preset training set, the training set comprising a plurality of voice information to be processed; sequentially invoking and running sub-run scripts in a preset Bash script according to the information processing instruction to perform corresponding batch processing on all the voice information to be processed, so as to obtain a plurality of target voice information; filtering all target voice information by means of preset voice activation detection to obtain intermediate voice information after noise removal; and performing frame segmentation on all the intermediate voice information according to a preset frame segmentation rule to obtain test voice information for training a voice recognition model.

Description

语音信息的批量处理方法、装置、计算机设备及存储介质Method, device, computer equipment and storage medium for batch processing of voice information
本申请要求于2019年03月15日提交中国专利局、申请号为201910197848.0、申请名称为“语音信息的批量处理方法、装置、计算机设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on March 15, 2019, the application number is 201910197848.0, and the application name is "Batch processing methods, devices, computer equipment and storage media for voice information", and its entire contents Incorporated in this application by reference.
技术领域Technical field
本申请涉及数据处理领域,尤其涉及一种语音信息的批量处理方法、装置、计算机设备及存储介质。This application relates to the field of data processing, and in particular to a method, device, computer equipment and storage medium for batch processing of voice information.
背景技术Background technique
在语音识别项目中通常需要从各种渠道收集或采集大量语音信息,并利用这些语音信息作为训练集中的训练样本对神经网络进行训练,从而得到相应的用于进行特征的语音识别的识别模型。而为了确保神经网络的训练过程的顺利以及所获取的识别模型的精准性,通常需要对所获取的语音信息进行训练前的预先处理,而实现对大量的语音信息的预处理工作需要逐步迭代才能完成,但是重复迭代处理的过程因数据量大,非常容易出现操作失误,造成语音信息处理不准确的问题。In speech recognition projects, it is usually necessary to collect or collect a large amount of speech information from various channels, and use these speech information as training samples in the training set to train the neural network, so as to obtain the corresponding recognition model for feature speech recognition. In order to ensure the smooth training process of the neural network and the accuracy of the acquired recognition model, it is usually necessary to pre-process the acquired speech information before training, and the preprocessing of a large amount of speech information requires gradual iteration. Complete, but the process of repeated iterative processing is very prone to operational errors due to the large amount of data, resulting in inaccurate voice information processing.
发明内容Summary of the invention
本申请实施例提供一种语音信息的批量处理方法、装置、计算机设备及存储介质,能够高效准确地实现对多个待处理语音信息的统一转换,并减少转换过程的错误。The embodiments of the present application provide a batch processing method, device, computer equipment, and storage medium for voice information, which can efficiently and accurately realize the unified conversion of multiple voice information to be processed, and reduce errors in the conversion process.
第一方面,本申请实施例提供了一种语音信息的批量处理方法,该方法包括:In the first aspect, an embodiment of the present application provides a method for batch processing of voice information, the method including:
若接收到信息处理指令,获取预设的训练集,所述训练集包括多个待处理语音信息;If an information processing instruction is received, obtain a preset training set, where the training set includes multiple voice messages to be processed;
根据所述信息处理指令依次调用并运行预设的Bash脚本中的子运行脚本,以运行其中一个子运行脚本时即对所有的待处理语音信息进行相应的批量处理 并直至运行完所有的子运行脚本,从而得到多个目标语音信息,其中,所述预设的Bash脚本至少包括一个预设的子运行脚本,每个子运行脚本均用于实现对所有待处理语音信息的批量处理,所述目标语音信息的数量小于或等于所述待处理语音信息的数量;According to the information processing instruction, call and run the sub-run scripts in the preset Bash script in turn, so that when one of the sub-run scripts is run, all the voice messages to be processed will be processed in batches until all sub-runs are run. Script to obtain multiple target voice information, wherein the preset Bash script includes at least one preset sub-running script, and each sub-running script is used to implement batch processing of all voice information to be processed. The number of voice messages is less than or equal to the number of voice messages to be processed;
通过预设的语音激活检测对所有目标语音信息进行滤波处理以得到除噪之后的中间语音信息;Perform filtering processing on all target voice information through preset voice activation detection to obtain intermediate voice information after noise removal;
通过预设分帧规则对所有中间语音信息进行分帧处理以得到用于训练语音识别模型的测试语音信息。Perform framing processing on all intermediate voice information by preset framing rules to obtain test voice information for training the voice recognition model.
第二方面,本申请实施例还提供了一种语音信息的批量处理装置,该装置包括:In the second aspect, an embodiment of the present application also provides a batch processing device for voice information, which includes:
获取单元,用于若接收到信息处理指令,获取预设的训练集,所述训练集包括多个待处理语音信息;An obtaining unit, configured to obtain a preset training set if an information processing instruction is received, the training set including a plurality of to-be-processed voice information;
批量处理单元,用于根据所述信息处理指令依次调用并运行预设的Bash脚本中的子运行脚本,以运行其中一个子运行脚本时即对所有的待处理语音信息进行相应的批量处理并直至运行完所有的子运行脚本,从而得到多个目标语音信息,其中,所述预设的Bash脚本至少包括一个预设的子运行脚本,每个子运行脚本均用于实现对所有待处理语音信息的批量处理,所述目标语音信息的数量小于或等于所述待处理语音信息的数量;The batch processing unit is used to sequentially call and run the sub-running scripts in the preset Bash script according to the information processing instruction, so that when one of the sub-running scripts is executed, all the voice messages to be processed will be processed in batches. After running all the sub-running scripts, multiple target voice information is obtained, wherein the preset Bash script includes at least one preset sub-running script, and each sub-running script is used to implement the processing of all voice information to be processed In batch processing, the quantity of the target voice information is less than or equal to the quantity of the voice information to be processed;
除噪单元,用于通过预设的语音激活检测对所有目标语音信息进行滤波处理以得到除噪之后的中间语音信息;The noise removal unit is used to filter all target voice information through preset voice activation detection to obtain the intermediate voice information after noise removal;
分帧单元,通过预设分帧规则对所有中间语音信息进行分帧处理以得到用于训练语音识别模型的测试语音信息。The framing unit performs framing processing on all intermediate voice information through preset framing rules to obtain test voice information for training the voice recognition model.
第三方面,本申请实施例还提供了一种计算机设备,其包括存储器及处理器,所述存储器上存储有计算机程序,所述处理器执行所述计算机程序时实现上述方法。In a third aspect, an embodiment of the present application also provides a computer device, which includes a memory and a processor, the memory stores a computer program, and the processor implements the above method when the computer program is executed.
第四方面,本申请实施例还提供了一种计算机可读存储介质,所述存储介质存储有计算机程序,所述计算机程序当被处理器执行时可实现上述方法。In a fourth aspect, an embodiment of the present application also provides a computer-readable storage medium, where the storage medium stores a computer program, and the computer program can implement the foregoing method when executed by a processor.
附图说明Description of the drawings
为了更清楚地说明本申请实施例技术方案,下面将对实施例描述中所需要 使用的附图作简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to explain the technical solutions of the embodiments of the present application more clearly, the following will briefly introduce the drawings needed in the description of the embodiments. Obviously, the drawings in the following description are some embodiments of the present application. Ordinary technicians can obtain other drawings based on these drawings without creative work.
图1是本申请实施例提供的一种语音信息的批量处理方法的流程示意图;FIG. 1 is a schematic flowchart of a method for batch processing of voice information provided by an embodiment of the present application;
图2是本申请实施例提供的一种语音信息的批量处理方法的子流程示意图;2 is a schematic diagram of a sub-flow of a method for batch processing of voice information provided by an embodiment of the present application;
图3是本申请实施例提供的一种语音信息的批量处理方法的子流程示意图;3 is a schematic diagram of a sub-flow of a method for batch processing of voice information provided by an embodiment of the present application;
图4是本申请实施例提供的一种语音信息的批量处理方法的子流程示意图;4 is a schematic diagram of a sub-flow of a method for batch processing of voice information provided by an embodiment of the present application;
图5是本申请实施例提供的一种语音信息的批量处理装置的示意性框图;FIG. 5 is a schematic block diagram of an apparatus for batch processing of voice information according to an embodiment of the present application;
图6是本申请实施例提供的一种语音信息的批量处理装置的批量处理单元的示意性框图;6 is a schematic block diagram of a batch processing unit of a voice information batch processing apparatus provided by an embodiment of the present application;
图7是本申请实施例提供的一种语音信息的批量处理装置的批量处理单元的另一示意性框图;FIG. 7 is another schematic block diagram of a batch processing unit of a voice information batch processing apparatus provided by an embodiment of the present application;
图8是本申请实施例提供的一种语音信息的批量处理装置的批量处理单元的另一示意性框图;FIG. 8 is another schematic block diagram of a batch processing unit of a voice information batch processing apparatus provided by an embodiment of the present application;
图9是本申请实施例提供的一种计算机设备结构组成示意图。FIG. 9 is a schematic diagram of the structural composition of a computer device provided by an embodiment of the present application.
具体实施方式detailed description
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be described clearly and completely in conjunction with the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, rather than all of them. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of this application.
应当理解,当在本说明书和所附权利要求书中使用时,术语“包括”和“包含”指示所描述特征、整体、步骤、操作、元素和/或组件的存在,但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。It should be understood that when used in this specification and the appended claims, the terms "including" and "including" indicate the existence of the described features, wholes, steps, operations, elements and/or components, but do not exclude one or The existence or addition of multiple other features, wholes, steps, operations, elements, components, and/or collections thereof.
还应当理解,在此本申请说明书中所使用的术语仅仅是出于描述特定实施例的目的而并不意在限制本申请。如在本申请说明书和所附权利要求书中所使用的那样,除非上下文清楚地指明其它情况,否则单数形式的“一”、“一个”及“该”意在包括复数形式。It should also be understood that the terms used in the specification of this application are only for the purpose of describing specific embodiments and are not intended to limit the application. As used in the specification of this application and the appended claims, unless the context clearly indicates other circumstances, the singular forms "a", "an" and "the" are intended to include plural forms.
请参阅图1,图1是本申请实施例提供的一种语音信息的批量处理方法的示 意流程图。该语音信息的批量处理方法应用于管理服务器中。该管理服务器通过训练集对神经网络进行训练之前,对获取到的训练集中的待处理语音信息进行批量预处理,如将损坏的待处理语音信息、过短的待处理语音信息从训练集中剔除;将训练集中的待处理语音信息的音频格式、采样率均转换为统一的音频格式和采样率;按照特定的规则进行对所有的待处理语音信息进行重命名等,通过上述的分次批处理能够高效准确地实现对训练集中的多个待处理语音信息的统一转换,并能有效减少因每个待处理语音信息完成处理转换后再对另一待处理语音信息进行处理转换的过程中发生的错误,以便于精准地实现神经网络的训练。如图1所示,该方法的步骤包括步骤S101~S104。Please refer to Fig. 1, which is a schematic flowchart of a method for batch processing of voice information provided by an embodiment of the present application. The batch processing method of voice information is applied to the management server. Before the management server trains the neural network through the training set, it performs batch preprocessing on the to-be-processed voice information in the acquired training set, such as removing damaged and too short to-be-processed voice information from the training set; Convert the audio format and sampling rate of the voice information to be processed in the training set to a uniform audio format and sampling rate; rename all the voice information to be processed according to specific rules, etc., through the above-mentioned batch processing Efficiently and accurately realize the unified conversion of multiple to-be-processed voice information in the training set, and can effectively reduce errors in the process of processing and converting another to-be-processed voice information after each to-be-processed voice information is processed and converted , In order to accurately realize the training of the neural network. As shown in Figure 1, the steps of the method include steps S101 to S104.
步骤S101,若接收到信息处理指令,获取预设的训练集,所述训练集包括多个待处理语音信息。Step S101: If an information processing instruction is received, a preset training set is obtained, and the training set includes a plurality of to-be-processed voice information.
在本实施例中,为了对神经网络进行训练从而得到相应的语音识别模型,需要对获取到的训练集中的语音信息进行批量的预处理,从而达到符合训练神经网络的要求,提高训练得到的语音识别模型的精准度。而训练集可以是预先设置的,即可以从各个能够进行语音信息获取的应用程序中收集语音信息进行存储,此时存储在训练集中的语音信息即为待处理的语音信息。当管理服务器接收到用户发起的信息处理指令,那么则获取预先设置的训练集,即获取训练集中的多个待处理的语音信息,以便于后续操作。In this embodiment, in order to train the neural network to obtain the corresponding speech recognition model, it is necessary to perform batch preprocessing on the acquired speech information in the training set, so as to meet the requirements of training the neural network and improve the speech obtained by training. Identify the accuracy of the model. The training set can be preset, that is, voice information can be collected and stored from various applications capable of obtaining voice information. At this time, the voice information stored in the training set is the voice information to be processed. When the management server receives the information processing instruction initiated by the user, it obtains a preset training set, that is, obtains multiple to-be-processed voice information in the training set to facilitate subsequent operations.
步骤S102,根据所述信息处理指令依次调用并运行预设的Bash脚本中的子运行脚本,以运行其中一个子运行脚本时即对所有的待处理语音信息进行相应的批量处理并直至运行完所有的子运行脚本,从而得到多个目标语音信息,其中,所述预设的Bash脚本至少包括一个预设的子运行脚本,每个子运行脚本均用于实现对所有待处理语音信息的批量处理,所述目标语音信息的数量小于或等于所述待处理语音信息的数量。Step S102, according to the information processing instruction, call and run the sub-running scripts in the preset Bash script in turn, so that when one of the sub-running scripts is executed, all the voice messages to be processed are processed in batches and all the voice messages are processed. To obtain multiple target voice information, the preset Bash script includes at least one preset sub-run script, and each sub-run script is used to realize batch processing of all voice information to be processed, The quantity of the target voice information is less than or equal to the quantity of the voice information to be processed.
在本实施例中,预设的Bash脚本能够集成有多个预先设置的子运行脚本,每个子运行脚本均能够实现对所有待处理音频文件的实现同一个处理步骤的批量处理,具体是指每运行一个子运行脚本即能够对所有的待处理音频文件的进行同一项转换或改变的处理,并且要在所有的待处理音频文件完成相应的处理后,管理服务器才能调用另一个子运行脚本从而在上一项处理的基础上实现另一项转换或改变的处理。In this embodiment, the preset Bash script can be integrated with multiple pre-set sub-run scripts. Each sub-run script can realize batch processing of all audio files to be processed in the same processing step. Running a sub-run script is able to perform the same conversion or change processing on all pending audio files, and the management server can call another sub-run script after all pending audio files have completed the corresponding processing. On the basis of the previous process, another conversion or change process is realized.
具体的,管理服务器能够根据所述信息处理指令依次调用预设的Bash脚本中的子运行脚本,每调用一个子运行脚本即运行一次,从而对所有的待处理语音信息进行相应的批量处理,然后再次调用Bash脚本中的另一个子运行脚本,并直至运行完所有的子运行脚本,从而得到多个已完成全部转换或者改变的目标语音信息,上述的每次批量转换或改变都要所有的待处理语音信息完成之后才会进行下一次批量转换或改变,可以有效减少当前转换过程中因待处理语音信息的数量以及转换步骤过多造成的错误,从而能大大提高对待处理语音信息的处理效率。Specifically, the management server can sequentially call the sub-running scripts in the preset Bash script according to the information processing instruction, and each sub-running script is called once, so as to perform corresponding batch processing on all the voice information to be processed, and then Call another sub-run script in the Bash script again until all sub-run scripts have been run to obtain multiple target voice messages that have been completely converted or changed. Each batch conversion or change mentioned above requires all waiting The next batch conversion or change is performed after the processing of the voice information is completed, which can effectively reduce errors caused by the number of voice information to be processed and too many conversion steps in the current conversion process, thereby greatly improving the processing efficiency of the voice information to be processed.
其中,管理服务器一般可以通过Python执行Bash脚本,即可以通过Python依次执行Bash脚本中的多个预设的子运行脚本从而依次实现对训练集的待处理语音信息的批量处理操作,减少逐步迭代处理过程中出现的错误,提高转换的效率和准确度。Among them, the management server can generally execute the Bash script through Python, that is, it can execute multiple preset sub-run scripts in the Bash script in turn through Python to sequentially implement batch processing operations on the voice information to be processed in the training set, reducing gradual iterative processing Errors in the process improve the efficiency and accuracy of conversion.
在一实施例中,如图2所述,所述步骤S102可以包括步骤S201~S202。In an embodiment, as shown in FIG. 2, the step S102 may include steps S201 to S202.
S201,根据所述信息处理指令调用预设的Bash脚本中的第一运行脚本。S201: Call a first running script in a preset Bash script according to the information processing instruction.
其中,管理服务器可以根据接收到的信息处理指令调用预设的Bash脚本中的第一运行脚本,以便于进行后续处理。所述第一运行脚本能够实现对预设的训练集中的所有的待处理语音信息进行音频格式和采样率的转换。The management server may call the first running script in the preset Bash script according to the received information processing instruction, so as to facilitate subsequent processing. The first running script can realize the conversion of audio format and sampling rate of all voice information to be processed in the preset training set.
作为可选的,所述第一运行脚本可以是FFmpeg脚本。所述FFmpeg脚本是一套可以用来记录、转换数字音频、视频,并能将其转化为流的开源计算机程序。在本申请中FFmpeg脚本能够对实现对待测语音信息的音频格式的转换和采样率的转换。Optionally, the first running script may be an FFmpeg script. The FFmpeg script is a set of open source computer programs that can be used to record, convert digital audio and video, and convert them into streams. In this application, the FFmpeg script can convert the audio format and sample rate of the voice information to be tested.
S202,运行所述第一运行脚本以对所有的待处理语音信息进行音频格式转换以及采样率转换,从而得到多个具有预设音频格式以及预设采样率的目标语音信息。S202: Run the first running script to perform audio format conversion and sample rate conversion on all voice information to be processed, so as to obtain multiple target voice information with preset audio formats and preset sampling rates.
其中,为了使得训练集中的待测语音信息在训练神经网络的过程中快速地进行特征提取,可使得所有的待测语音信息转换为统一的音频格式以及统一的采样率。管理服务器运行了所述第一运行脚本后,可以根据第一运行脚本中设置的预设音频格式和预设采样率将所有的待处理语音信息批量地转换为具有预设音频格式和预设采样率的目标语音信息。Among them, in order to enable the voice information to be tested in the training set to quickly perform feature extraction in the process of training the neural network, all the voice information to be tested can be converted into a unified audio format and a unified sampling rate. After the management server runs the first running script, it can batch convert all the voice information to be processed into preset audio formats and preset samples according to the preset audio format and preset sampling rate set in the first running script Rate the target voice information.
具体的,常见的音频格式可以包括WAV、MIDI、MP3、RA、MP4等格式 类型,为了统一音频格式,预设音频格式可以设置为WAV格式,即当所有音频格式为非预设音频格式时,可以通过运行第一运行脚本将其转换为WAV格式。Specifically, common audio formats can include WAV, MIDI, MP3, RA, MP4 and other format types. In order to unify the audio format, the preset audio format can be set to WAV format, that is, when all audio formats are non-preset audio formats, It can be converted to WAV format by running the first run script.
而所述采样率也称为采样速度或者采样率,定义了每秒从连续信号中提取并组成离散信号的采样个数,它用赫兹(Hz)来表示。采样率的倒数是采样周期或者叫作采样时间,它是采样之间的时间间隔。通俗的讲采样频率是指计算机每秒钟采集多少个信号样本。采样率表示每秒采多少个采样点,那么8k就是1s采8000次,而16k表示1s采集16000次,即若预设采样率为8k,而待转换语音信息的采样率为16k,那么则通过第一运行脚本将该待处理语音信息的采样率从16k转换为8k。The sampling rate is also called sampling speed or sampling rate, which defines the number of samples extracted from a continuous signal per second to form a discrete signal, and it is expressed in Hertz (Hz). The reciprocal of the sampling rate is the sampling period or sampling time, which is the time interval between samples. In layman's terms, sampling frequency refers to how many signal samples the computer collects per second. Sampling rate indicates how many sampling points are collected per second, then 8k means 8000 times of 1s acquisition, and 16k means 1s acquisition of 16000 times, that is, if the preset sampling rate is 8k, and the sampling rate of the voice information to be converted is 16k, then pass The first running script converts the sampling rate of the voice information to be processed from 16k to 8k.
在一实施例中,如图3所述,所述预设的Bash脚本包括用于进行音频格式转换的第一运行脚本以及用于进行有效音频筛选的第二运行脚本,所述步骤S102可以包括步骤S301~S304。In an embodiment, as shown in FIG. 3, the preset Bash script includes a first running script for audio format conversion and a second running script for effective audio filtering. The step S102 may include Steps S301 to S304.
S301,根据所述信息处理指令调用预设的Bash脚本中的第一运行脚本。S301: Invoke a first running script in a preset Bash script according to the information processing instruction.
其中,管理服务器可以根据接收到的信息处理指令调用预设的Bash脚本中的第一运行脚本,以便于进行后续处理。所述第一运行脚本能够实现对预设的训练集中的所有的待处理语音信息进行音频格式和采样率的转换。The management server may call the first running script in the preset Bash script according to the received information processing instruction, so as to facilitate subsequent processing. The first running script can realize the conversion of audio format and sampling rate of all voice information to be processed in the preset training set.
S302,运行所述第一运行脚本以对所有的待处理语音信息进行音频格式转换以及采样率转换,从而得到相应数量的具有预设音频格式以及预设采样率的第一语音信息。S302. Run the first running script to perform audio format conversion and sampling rate conversion on all voice information to be processed, so as to obtain a corresponding number of first voice information having a preset audio format and a preset sampling rate.
其中,为了使得训练集中的待测语音信息在训练神经网络的过程中快速地进行特征提取,可使得所有的待测语音信息转换为统一的音频格式以及统一的采样率。管理服务器运行了所述第一运行脚本后,可以根据第一运行脚本中设置的预设音频格式和预设采样率将所有的待处理语音信息批量地转换为具有预设音频格式和预设采样率的目标语音信息。Among them, in order to enable the voice information to be tested in the training set to quickly perform feature extraction in the process of training the neural network, all the voice information to be tested can be converted into a unified audio format and a unified sampling rate. After the management server runs the first running script, it can batch convert all the voice information to be processed into preset audio formats and preset samples according to the preset audio format and preset sampling rate set in the first running script Rate the target voice information.
S303,调用预设的Bash脚本中的第二运行脚本。S303, calling a second running script in the preset Bash script.
其中,为了对当前的已经转换音频格式和采样率的第一语音信息进行筛选,需要管理服务器调用预设的Bash脚本中的用于进行有效音频筛选的第二运行脚本。该第二运行脚本中的预设规格为筛选语音信息设定了条件,从而能够从多个第一语音信息中筛选符合该预设规格的语音信息作为有效的语音信息。作为可选的,所述第二运行脚本可以是SOX,SOX作为语音处理工具,能够根据设 置的预设规格从多个第一语音信息中筛选出有效的语音信息。Among them, in order to filter the current first voice information of which the audio format and sampling rate have been converted, the management server needs to call the second running script for effective audio filtering in the preset Bash script. The preset specifications in the second running script set conditions for screening voice information, so that voice information that meets the preset specifications can be selected from a plurality of first voice messages as valid voice information. Optionally, the second running script may be SOX. As a voice processing tool, SOX can filter out effective voice information from a plurality of first voice information according to a set preset specification.
S304,运行所述第二运行脚本以对所有的第一语音信息进行筛选,从而得到多个符合预设规格的目标语音信息,所述目标语音信息的数量小于或等于第一语音信息的数量。S304: Run the second running script to filter all the first voice information, so as to obtain a plurality of target voice information meeting preset specifications, and the number of the target voice information is less than or equal to the number of the first voice information.
其中,管理服务器运行所述第二运行脚本后,能够根据预设规格对所有的第一语音信息进行筛选,得到符合条件的目标语音信息,故筛选之后,所述目标语音信息的数量小于或等于第一语音信息的数量。再者,该预设规格可以是预先设置语音时长的阀值,例如若第一语音信息的时长低于预设阀值,则删除该第一语音信息。同理,该预设规格还可以是预先设置的语音信息的采样点的阀值,也可以是预先设置的语音信息的缩放系数的阀值,还可以是预先设置的语音信息的最大幅度的阀值。Wherein, after the management server runs the second running script, it can filter all the first voice information according to the preset specifications to obtain the target voice information that meets the conditions, so after the screening, the number of the target voice information is less than or equal to The number of first voice messages. Furthermore, the preset specification may be a preset voice duration threshold. For example, if the duration of the first voice message is lower than the preset threshold, the first voice message is deleted. In the same way, the preset specification can also be a preset threshold for the sampling point of the voice information, or a preset threshold for the scaling factor of the voice information, or a preset threshold for the maximum amplitude of the voice information. value.
在一实施例中,如图4所述,所述预设的Bash脚本包括用于进行音频格式转换的第一运行脚本、用于进行有效音频筛选的第二运行脚本以及用于进行重命名的第三运行脚本,所述步骤S102可以包括步骤S401~S406。In one embodiment, as shown in FIG. 4, the preset Bash script includes a first running script for audio format conversion, a second running script for effective audio filtering, and a script for renaming. The third running script, the step S102 may include steps S401 to S406.
S401,根据所述信息处理指令调用预设的Bash脚本中的第一运行脚本。S401: Call a first running script in a preset Bash script according to the information processing instruction.
其中,管理服务器可以根据接收到的信息处理指令调用预设的Bash脚本中的第一运行脚本,以便于进行后续处理。所述第一运行脚本能够实现对预设的训练集中的所有的待处理语音信息进行音频格式和采样率的转换。The management server may call the first running script in the preset Bash script according to the received information processing instruction, so as to facilitate subsequent processing. The first running script can realize the conversion of audio format and sampling rate of all voice information to be processed in the preset training set.
S402,运行所述第一运行脚本以对所有的待处理语音信息进行音频格式转换以及采样率转换,从而得到相应数量的具有相同音频格式以及采样率的第一语音信息。S402: Run the first running script to perform audio format conversion and sample rate conversion on all voice information to be processed, so as to obtain a corresponding number of first voice information with the same audio format and sampling rate.
其中,为了使得训练集中的待测语音信息在训练神经网络的过程中快速地进行特征提取,可使得所有的待测语音信息转换为统一的音频格式以及统一的采样率。管理服务器运行了所述第一运行脚本后,可以根据第一运行脚本中设置的预设音频格式和预设采样率将所有的待处理语音信息批量地转换为具有预设音频格式和预设采样率的目标语音信息。Among them, in order to enable the voice information to be tested in the training set to quickly perform feature extraction in the process of training the neural network, all the voice information to be tested can be converted into a unified audio format and a unified sampling rate. After the management server runs the first running script, it can batch convert all the voice information to be processed into preset audio formats and preset samples according to the preset audio format and preset sampling rate set in the first running script Rate the target voice information.
S403,调用预设的Bash脚本中的第二运行脚本。S403: Call the second running script in the preset Bash script.
其中,为了对当前的已经转换音频格式和采样率的第一语音信息进行筛选,需要管理服务器调用预设的Bash脚本中的用于进行有效音频筛选的第二运行脚本。该第二运行脚本中的预设规格为筛选语音信息设定了条件,从而能够从多 个第一语音信息中筛选符合该预设规格的语音信息作为有效的语音信息。作为可选的,所述第二运行脚本可以是SOX,SOX作为语音处理工具,能够根据设置的预设规格从多个第一语音信息中筛选出有效的语音信息。Among them, in order to filter the current first voice information of which the audio format and sampling rate have been converted, the management server needs to call the second running script for effective audio filtering in the preset Bash script. The preset specifications in the second running script set conditions for screening voice information, so that voice information that meets the preset specifications can be selected from multiple first voice messages as valid voice information. Optionally, the second running script may be SOX. As a voice processing tool, SOX can filter out effective voice information from a plurality of first voice information according to a set preset specification.
S404,运行所述第二运行脚本以对所有的第一语音信息进行筛选,从而得到多个符合预设规格的第二语音信息,所述第二语音信息的数量小于或等于第一语音信息的数量。S404. Run the second running script to filter all the first voice information, so as to obtain a plurality of second voice information meeting preset specifications, and the number of the second voice information is less than or equal to that of the first voice information. Quantity.
其中,管理服务器运行所述第二运行脚本后,能够根据预设规格对所有的第一语音信息进行筛选,得到符合条件的目标语音信息,故筛选之后,所述目标语音信息的数量小于或等于第一语音信息的数量。再者,该预设规格可以是预先设置语音时长的阀值,例如若第一语音信息的时长低于预设阀值,则删除该第一语音信息。同理,该预设规格还可以是预先设置的语音信息的采样点的阀值,也可以是预先设置的语音信息的缩放系数的阀值,还可以是预先设置的语音信息的最大幅度的阀值。Wherein, after the management server runs the second running script, it can filter all the first voice information according to the preset specifications to obtain the target voice information that meets the conditions, so after the screening, the number of the target voice information is less than or equal to The number of first voice messages. Furthermore, the preset specification may be a preset voice duration threshold. For example, if the duration of the first voice message is lower than the preset threshold, the first voice message is deleted. In the same way, the preset specification can also be a preset threshold for the sampling point of the voice information, or a preset threshold for the scaling factor of the voice information, or a preset threshold for the maximum amplitude of the voice information. value.
S405,调用预设的Bash脚本中的第三运行脚本。S405: Call the third running script in the preset Bash script.
其中,为了对当前的第二语音信息重命名,需要管理服务器调用预设的Bash脚本中的用于进行重命名的第三运行脚本,以便于重命名后的语音信息能够更为准确快速地被读取。该第三运行脚本中预先设置有预设名称格式,从而能够根据预设名称格式对多个第二语音信息进行重命名。作为可选的,第三运行脚本为重命名函数,该重命名函数可是函数rename()用于重命名文件。Among them, in order to rename the current second voice message, the management server needs to call the third running script for renaming in the preset Bash script, so that the renamed voice message can be more accurately and quickly Read. A preset name format is preset in the third running script, so that multiple second voice messages can be renamed according to the preset name format. Optionally, the third running script is a renaming function, and the renaming function may be a function rename() for renaming files.
S406,运行所述第三运行脚本以对所有的第二语音信息进行重命名,从而得到相应数量的具有预设名称格式的目标语音信息。S406: Run the third running script to rename all the second voice information, so as to obtain a corresponding number of target voice information with a preset name format.
其中,训练集中的所有的语音信息可以是同一个主体生成,即每一个主体可以对应多条不同的语音信息,为便于进行区分设置,需要根据预先设置的预设名称格式以及第二语音信息的现有信息对其进行重命名。管理服务器在运行所述第三运行脚本后即能够得到相应的被重命名的目标语音信息,同时目标语音信息的命名是符合预设名称格式的,再者,目标语音信息的数量跟第二语音信息的数量是相等的,且两者之间是一一对应的关系。Among them, all the voice information in the training set can be generated by the same subject, that is, each subject can correspond to multiple pieces of different voice information. In order to facilitate distinguishing settings, it needs to be based on the preset name format and the second voice information. Rename the existing information. The management server can obtain the corresponding renamed target voice information after running the third running script. At the same time, the naming of the target voice information conforms to the preset name format. Furthermore, the number of target voice information is the same as that of the second voice. The amount of information is equal, and there is a one-to-one correspondence between the two.
步骤S103,通过预设的语音激活检测对所有目标语音信息进行滤波处理以得到除噪之后的中间语音信息。Step S103: Perform filtering processing on all target voice information through preset voice activation detection to obtain intermediate voice information after noise removal.
在本实施例中,在进行神经网络训练之前,还需要通过语音激活检测来对 目标语音信息进行滤波处理,其中所述语音激活检测是Voice Activity Detection,简称VAD,其能够区分语音信号中的语音信号和背景噪音,从而提高训练神经网络的准确度,减少训练所需的时间。其中,语音激活检测能够将语音信息的首尾端的静音切除,降低对后续步骤造成的干扰,即该语音激活检测能够将所有目标语音信息进行滤波批量处理,得到除噪之后的多个相应的中间语音信息。In this embodiment, before performing neural network training, it is also necessary to filter the target voice information through voice activation detection. The voice activation detection is Voice Activity Detection, or VAD for short, which can distinguish voices in voice signals. Signal and background noise, thereby improving the accuracy of training neural networks and reducing the time required for training. Among them, the voice activation detection can cut off the mute at the beginning and the end of the voice information and reduce the interference caused to the subsequent steps. That is, the voice activation detection can filter all target voice information in batch processing to obtain multiple corresponding intermediate voices after denoising information.
步骤S104,通过预设分帧规则对所有中间语音信息进行分帧处理以得到用于训练语音识别模型的测试语音信息。Step S104: Perform framing processing on all intermediate voice information according to preset framing rules to obtain test voice information for training a voice recognition model.
在本实施例中,管理服务器还需要根据预设分帧规则对所有中间语音信息进行分帧处理,从而得到相应数量的分帧后的测试语音信息。其中,测试语音信息能够用于训练语音识别模型,从而得到能够进行相应的语音识别的语音识别模型。具体的,所述预设的分帧规则可以是指通过移动窗函数进行声音分帧,也就是把语音信息切开成一小段一小段,每小段称为一帧,其中各帧之间一般是有交叠的。In this embodiment, the management server also needs to perform framing processing on all intermediate voice information according to preset framing rules, so as to obtain a corresponding number of framed test voice information. Among them, the test speech information can be used to train a speech recognition model, so as to obtain a speech recognition model capable of corresponding speech recognition. Specifically, the preset framing rule may refer to sound framing through a moving window function, that is, the voice information is cut into a small segment and a small segment, each segment is called a frame, and there is generally a frame between each frame. Overlapping.
在另一实施例中,所述步骤S104具体可以包括:通过Enframe函数对所述中间语音信息进行分帧处理以得到用于训练语音识别模型的测试语音信息。In another embodiment, the step S104 may specifically include: performing framing processing on the intermediate voice information through the Enframe function to obtain test voice information for training a voice recognition model.
其中,所述Enframe函数即为具体的分帧函数,管理服务器调用该分帧函数后能够将所有的中间语音信息进行统一的分帧处理,从而得到最终的用于进行训练的测试语音信息。Wherein, the Enframe function is a specific framing function, and the management server can perform unified framing processing on all intermediate voice information after calling the framing function, so as to obtain the final test voice information for training.
综上,本申请实施例能够高效准确地实现对训练集中的多个待处理语音信息的统一转换,并减少转换过程的错误,以便于精准地实现神经网络的训练。In summary, the embodiments of the present application can efficiently and accurately realize the unified conversion of multiple to-be-processed voice information in the training set, and reduce errors in the conversion process, so as to accurately implement neural network training.
本领域普通技术员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)等。A person of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing relevant hardware through a computer program. The program can be stored in a computer readable storage medium. When executed, it may include the processes of the above-mentioned method embodiments. Wherein, the storage medium may be a magnetic disk, an optical disc, a read-only memory (Read-Only Memory, ROM), etc.
请参阅图5,对应上述一种语音信息的批量处理方法,本申请实施例还提出一种语音信息的批量处理装置,该装置100包括:获取单元101、批量处理单元102、除噪单元103以及分帧单元104。Referring to FIG. 5, corresponding to the above-mentioned method for batch processing of voice information, an embodiment of the present application also proposes a device for batch processing of voice information. The device 100 includes: an acquisition unit 101, a batch processing unit 102, a noise removal unit 103, and Framing unit 104.
所述获取单元101,用于若接收到信息处理指令,获取预设的训练集,所述训练集包括多个待处理语音信息。The obtaining unit 101 is configured to obtain a preset training set if an information processing instruction is received, and the training set includes a plurality of to-be-processed voice information.
所述批量处理单元102,用于根据所述信息处理指令依次调用并运行预设的Bash脚本中的子运行脚本,以运行其中一个子运行脚本时即对所有的待处理语音信息进行相应的批量处理并直至运行完所有的子运行脚本,从而得到多个目标语音信息,其中,所述预设的Bash脚本至少包括一个预设的子运行脚本,每个子运行脚本均用于实现对所有待处理语音信息的批量处理,所述目标语音信息的数量小于或等于所述待处理语音信息的数量。The batch processing unit 102 is configured to sequentially call and run the sub-running scripts in the preset Bash script according to the information processing instruction, so that when one of the sub-running scripts is executed, all the voice messages to be processed are batched accordingly Process and run until all the sub-run scripts are finished, so as to obtain multiple target voice messages. The preset Bash script includes at least one preset sub-run script, and each sub-run script is used to realize the processing of all the sub-run scripts. For batch processing of voice information, the quantity of the target voice information is less than or equal to the quantity of the voice information to be processed.
在一实施例中,如图6所述,所述预设的Bash脚本包括用于进行音频格式以及采样率转换的第一运行脚本,所述批量处理单元102可以包括:第一调用单元201以及第一运行单元202。In an embodiment, as shown in FIG. 6, the preset Bash script includes a first running script for converting audio format and sampling rate, and the batch processing unit 102 may include: a first calling unit 201 and The first operating unit 202.
所述第一调用单元201,用于根据所述信息处理指令调用预设的Bash脚本中的第一运行脚本。The first calling unit 201 is configured to call the first running script in the preset Bash script according to the information processing instruction.
所述第一运行单元202,用于运行所述第一运行脚本以对所有的待处理语音信息进行音频格式转换以及采样率转换,从而得到多个具有预设音频格式以及预设采样率的目标语音信息。The first running unit 202 is configured to run the first running script to perform audio format conversion and sample rate conversion on all voice information to be processed, thereby obtaining multiple targets with preset audio formats and preset sampling rates voice message.
在一实施例中,如图7所述,所述预设的Bash脚本包括用于进行音频格式转换的第一运行脚本以及用于进行有效音频筛选的第二运行脚本,所述批量处理单元102可以包括第一调用单元301、第一运行单元302、第二调用单元303以及第二运行单元304。In an embodiment, as shown in FIG. 7, the preset Bash script includes a first running script for audio format conversion and a second running script for effective audio filtering. The batch processing unit 102 It may include a first calling unit 301, a first running unit 302, a second calling unit 303, and a second running unit 304.
所述第一调用单元301,用于根据所述信息处理指令调用预设的Bash脚本中的第一运行脚本。The first calling unit 301 is configured to call the first running script in the preset Bash script according to the information processing instruction.
所述第一运行单元302,用于运行所述第一运行脚本以对所有的待处理语音信息进行音频格式转换以及采样率转换,从而得到相应数量的具有预设音频格式以及预设采样率的第一语音信息。The first running unit 302 is configured to run the first running script to perform audio format conversion and sample rate conversion on all the voice information to be processed, so as to obtain a corresponding number of audio formats with preset audio formats and preset sampling rates. The first voice message.
所述第二调用单元303,用于调用预设的Bash脚本中的第二运行脚本。The second calling unit 303 is configured to call a second running script in the preset Bash script.
所述第二运行单元304,用于运行所述第二运行脚本以对所有的第一语音信息进行筛选,从而得到多个符合预设规格的目标语音信息,所述目标语音信息的数量小于或等于第一语音信息的数量。The second running unit 304 is configured to run the second running script to filter all the first voice information, so as to obtain a plurality of target voice information meeting preset specifications, and the number of the target voice information is less than or Equal to the number of first voice messages.
在一实施例中,如图8所述,所述预设的Bash脚本包括用于进行音频格式转换的第一运行脚本、用于进行有效音频筛选的第二运行脚本以及用于进行重命名的第三运行脚本,所述批量处理单元102可以包括第一调用单元401、第一 运行单元402、第二调用单元403、第二运行单元404、第三调用单元405以及第三运行单元406。In one embodiment, as shown in FIG. 8, the preset Bash script includes a first running script for audio format conversion, a second running script for effective audio filtering, and a renaming script. The third running script, the batch processing unit 102 may include a first calling unit 401, a first running unit 402, a second calling unit 403, a second running unit 404, a third calling unit 405, and a third running unit 406.
所述第一调用单元401,用于根据所述信息处理指令调用预设的Bash脚本中的第一运行脚本。The first calling unit 401 is configured to call the first running script in the preset Bash script according to the information processing instruction.
所述第一运行单元402,用于运行所述第一运行脚本以对所有的待处理语音信息进行音频格式转换以及采样率转换,从而得到相应数量的具有相同音频格式以及采样率的第一语音信息。The first running unit 402 is configured to run the first running script to perform audio format conversion and sample rate conversion on all voice information to be processed, so as to obtain a corresponding number of first voices with the same audio format and sampling rate information.
所述第二调用单元403,用于调用预设的Bash脚本中的第二运行脚本。The second calling unit 403 is configured to call a second running script in the preset Bash script.
所述第二运行单元404,用于运行所述第二运行脚本以对所有的第一语音信息进行筛选,从而得到多个符合预设规格的第二语音信息,所述第二语音信息的数量小于或等于第一语音信息的数量。The second running unit 404 is configured to run the second running script to filter all the first voice messages, so as to obtain a plurality of second voice messages that meet the preset specifications, and the number of the second voice messages Less than or equal to the number of first voice messages.
所述第三调用单元405,用于调用预设的Bash脚本中的第三运行脚本。The third calling unit 405 is configured to call the third running script in the preset Bash script.
所述第三运行单元406,用于运行所述第三运行脚本以对所有的第二语音信息进行重命名,从而得到相应数量的具有预设名称格式的目标语音信息。The third running unit 406 is configured to run the third running script to rename all the second voice information, so as to obtain a corresponding number of target voice information with a preset name format.
除噪单元103,用于通过预设的语音激活检测对所有目标语音信息进行滤波处理以得到除噪之后的中间语音信息。The noise removal unit 103 is configured to perform filtering processing on all target voice information through preset voice activation detection to obtain intermediate voice information after noise removal.
分帧单元104,通过预设分帧规则对所有中间语音信息进行分帧处理以得到用于训练语音识别模型的测试语音信息。The framing unit 104 performs framing processing on all intermediate voice information according to a preset framing rule to obtain test voice information for training a voice recognition model.
在另一实施例中,所述分帧单元104具体可以用于通过Enframe函数对所述中间语音信息进行分帧处理以得到用于训练语音识别模型的测试语音信息。In another embodiment, the framing unit 104 may be specifically configured to perform framing processing on the intermediate voice information through the Enframe function to obtain test voice information for training a voice recognition model.
需要说明的是,所属领域的技术人员可以清楚地了解到,上述语音信息的批量处理装置100和各单元的具体实现过程,可以参考前述方法实施例中的相应描述,为了描述的方便和简洁,在此不再赘述。It should be noted that those skilled in the art can clearly understand that the above-mentioned voice information batch processing apparatus 100 and the specific implementation process of each unit can refer to the corresponding description in the foregoing method embodiment. For the convenience and conciseness of the description, I won't repeat them here.
由以上可见,在硬件实现上,以上获取单元101、批量处理单元102、除噪单元103以及分帧单元104等可以以硬件形式内嵌于或独立于寿险报案的装置中,也可以以软件形式存储于语音信息的批量处理装置的存储器中,以便处理器调用执行以上各个单元对应的操作。该处理器可以为中央处理单元(CPU)、微处理器、单片机等。As can be seen from the above, in terms of hardware implementation, the above acquisition unit 101, batch processing unit 102, noise removal unit 103, and framing unit 104 can be embedded in hardware or independent of life insurance reporting devices, or can be in software Stored in the memory of the batch processing device for voice information, so that the processor can call and execute the operations corresponding to the above units. The processor can be a central processing unit (CPU), a microprocessor, a single-chip microcomputer, etc.
上述语音信息的批量处理装置可以实现为一种计算机程序的形式,计算机程序可以在如图9所示的计算机设备上运行。The foregoing apparatus for batch processing of voice information can be implemented in the form of a computer program, and the computer program can be run on a computer device as shown in FIG. 9.
图9为本申请一种计算机设备的结构组成示意图。该设备可以是服务器,其中,服务器可以是独立的服务器,也可以是多个服务器组成的服务器集群。FIG. 9 is a schematic diagram of the structural composition of a computer device of this application. The device can be a server, where the server can be an independent server or a server cluster composed of multiple servers.
参照图9,该计算机设备500包括通过系统总线501连接的处理器502、存储器、内存储器504和网络接口505,其中,存储器可以包括非易失性存储介质503和内存储器504。9, the computer device 500 includes a processor 502, a memory, an internal memory 504, and a network interface 505 connected through a system bus 501, where the memory may include a non-volatile storage medium 503 and the internal memory 504.
该非易失性存储介质503可存储操作系统5031和计算机程序5032,该计算机程序5032被执行时,可使得处理器502执行一种语音信息的批量处理方法。The non-volatile storage medium 503 can store an operating system 5031 and a computer program 5032. When the computer program 5032 is executed, the processor 502 can execute a method for batch processing of voice information.
该处理器502用于提供计算和控制能力,支撑整个计算机设备500的运行。The processor 502 is used to provide computing and control capabilities, and support the operation of the entire computer device 500.
该内存储器504为非易失性存储介质503中的计算机程序5032的运行提供环境,该计算机程序5032被处理器502执行时,可使得处理器502执行一种语音信息的批量处理方法。The internal memory 504 provides an environment for the operation of the computer program 5032 in the non-volatile storage medium 503. When the computer program 5032 is executed by the processor 502, the processor 502 can execute a method for batch processing of voice information.
该网络接口505用于与其它设备进行网络通信。本领域技术人员可以理解,图9中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备500的限定,具体的计算机设备500可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。The network interface 505 is used for network communication with other devices. Those skilled in the art can understand that the structure shown in FIG. 9 is only a block diagram of part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device 500 to which the solution of the present application is applied. The specific computer device 500 may include more or fewer components than shown in the figure, or combine certain components, or have a different component arrangement.
其中,所述处理器502用于运行存储在存储器中的计算机程序5032,以实现上述任一实施例中的语音信息的批量处理方法。Wherein, the processor 502 is configured to run a computer program 5032 stored in a memory, so as to implement the method for batch processing of voice information in any of the foregoing embodiments.
应当理解,在本申请实施例中,处理器502可以是中央处理单元(Central Processing Unit,CPU),该处理器502还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。其中,通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。It should be understood that, in this embodiment of the application, the processor 502 may be a central processing unit (Central Processing Unit, CPU), and the processor 502 may also be other general-purpose processors, digital signal processors (Digital Signal Processors, DSPs), Application Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. Among them, the general-purpose processor may be a microprocessor or the processor may also be any conventional processor.
本领域普通技术人员可以理解的是实现上述实施例的方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成。该计算机程序可存储于一存储介质中,该存储介质为计算机可读存储介质。该计算机程序被该计算机系统中的至少一个处理器执行,以实现上述方法的实施例的流程步骤。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the foregoing embodiments can be implemented by computer programs instructing relevant hardware. The computer program may be stored in a storage medium, and the storage medium is a computer-readable storage medium. The computer program is executed by at least one processor in the computer system to implement the process steps of the foregoing method embodiment.
因此,本申请还提供一种存储介质。该存储介质可以为计算机可读存储介质。该存储介质存储有计算机程序,该计算机程序被处理器执行时使处理器执 行上述任一实施例中的语音信息的批量处理方法。Therefore, this application also provides a storage medium. The storage medium may be a computer-readable storage medium. The storage medium stores a computer program, and when the computer program is executed by the processor, the processor executes the voice information batch processing method in any of the above embodiments.
所述存储介质为实体的、非瞬时性的存储介质,例如可以是U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、磁碟或者光盘等各种可以存储程序代码的实体存储介质。The storage medium is a physical, non-transitory storage medium, such as a U disk, a mobile hard disk, a read-only memory (Read-Only Memory, ROM), a magnetic disk, or an optical disk that can store program codes. medium.
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。A person of ordinary skill in the art may realize that the units and algorithm steps of the examples described in the embodiments disclosed herein can be implemented by electronic hardware, computer software, or a combination of the two, in order to clearly illustrate the hardware and software Interchangeability. In the above description, the composition and steps of each example have been generally described in terms of function. Whether these functions are executed by hardware or software depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of this application.
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的。例如,各个单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。In the several embodiments provided in this application, it should be understood that the disclosed device and method may be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of each unit is only a logical function division, and there may be other division methods in actual implementation. For example, multiple units or components can be combined or integrated into another system, or some features can be omitted or not implemented.
本申请实施例方法中的步骤可以根据实际需要进行顺序调整、合并和删减。本申请实施例装置中的单元可以根据实际需要进行合并、划分和删减。另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以是两个或两个以上单元集成在一个单元中。The steps in the method of the embodiment of the present application can be adjusted, merged, and deleted in order according to actual needs. The units in the device in the embodiment of the present application may be combined, divided, and deleted according to actual needs. In addition, the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
该集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分,或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,终端,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a storage medium. Based on this understanding, the technical solution of this application is essentially or the part that contributes to the existing technology, or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium It includes several instructions to make a computer device (which may be a personal computer, a terminal, or a network device, etc.) execute all or part of the steps of the method described in each embodiment of the present application.
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。The above are only specific implementations of this application, but the protection scope of this application is not limited to this. Anyone familiar with the technical field can easily think of various equivalents within the technical scope disclosed in this application. Modifications or replacements, these modifications or replacements shall be covered within the protection scope of this application. Therefore, the protection scope of this application shall be subject to the protection scope of the claims.

Claims (20)

  1. 一种语音信息的批量处理方法,包括:A method for batch processing of voice information, including:
    若接收到信息处理指令,获取预设的训练集,所述训练集包括多个待处理语音信息;If an information processing instruction is received, obtain a preset training set, where the training set includes multiple voice messages to be processed;
    根据所述信息处理指令依次调用并运行预设的Bash脚本中的子运行脚本,以运行其中一个子运行脚本时即对所有的待处理语音信息进行相应的批量处理并直至运行完所有的子运行脚本,从而得到多个目标语音信息,其中,所述预设的Bash脚本至少包括一个预设的子运行脚本,每个子运行脚本均用于实现对所有待处理语音信息的批量处理,所述目标语音信息的数量小于或等于所述待处理语音信息的数量;According to the information processing instruction, call and run the sub-run scripts in the preset Bash script in turn, so that when one of the sub-run scripts is run, all the voice messages to be processed will be processed in batches until all sub-runs are run. Script to obtain multiple target voice information, wherein the preset Bash script includes at least one preset sub-running script, and each sub-running script is used to implement batch processing of all voice information to be processed. The number of voice messages is less than or equal to the number of voice messages to be processed;
    通过预设的语音激活检测对所有目标语音信息进行滤波处理以得到除噪之后的中间语音信息;Perform filtering processing on all target voice information through preset voice activation detection to obtain intermediate voice information after noise removal;
    通过预设分帧规则对所有中间语音信息进行分帧处理以得到用于训练语音识别模型的测试语音信息。Perform framing processing on all intermediate voice information by preset framing rules to obtain test voice information for training the voice recognition model.
  2. 如权利要求1所述的方法,其中,所述预设的Bash脚本包括用于进行音频格式以及采样率转换的第一运行脚本,所述根据所述信息处理指令依次调用并运行预设的Bash脚本中的子运行脚本,以运行其中一个子运行脚本时即对所有的待处理语音信息进行相应的批量处理并直至运行完所有的子运行脚本,从而得到多个目标语音信息的步骤,包括:The method of claim 1, wherein the preset Bash script includes a first running script for converting audio format and sampling rate, and the preset Bash script is called and executed in turn according to the information processing instruction. The sub-running script in the script, when one of the sub-running scripts is run, all the voice messages to be processed will be processed in batches until all sub-running scripts are run to obtain multiple target voice messages, including:
    根据所述信息处理指令调用预设的Bash脚本中的第一运行脚本;Calling the first running script in the preset Bash script according to the information processing instruction;
    运行所述第一运行脚本以对所有的待处理语音信息进行音频格式转换以及采样率转换,从而得到多个具有预设音频格式以及预设采样率的目标语音信息。The first running script is executed to perform audio format conversion and sampling rate conversion on all voice information to be processed, thereby obtaining multiple target voice information with preset audio formats and preset sampling rates.
  3. 如权利要求2所述的方法,其中,所述第一运行脚本为FFmpeg脚本,所述运行所述第一运行脚本以对所有的待处理语音信息进行音频格式转换以及采样率转换,从而得到多个具有预设音频格式以及预设采样率的目标语音信息的步骤,包括:The method according to claim 2, wherein the first running script is an FFmpeg script, and the running the first running script performs audio format conversion and sampling rate conversion on all voice information to be processed, thereby obtaining multiple A step of target voice information with preset audio format and preset sampling rate includes:
    运行所述FFmpeg脚本并确定所述FFmpeg脚本中设置的预设音频格式和预设采样率,以将所有的待处理语音信息批量地转换为具有预设音频格式和预设采样率的目标语音信息。Run the FFmpeg script and determine the preset audio format and preset sampling rate set in the FFmpeg script to convert all the voice information to be processed into target voice information with the preset audio format and preset sampling rate in batch .
  4. 如权利要求1所述的方法,其中,所述预设的Bash脚本包括用于进行音 频格式转换的第一运行脚本以及用于进行有效音频筛选的第二运行脚本,所述根据所述信息处理指令依次调用并运行预设的Bash脚本中的子运行脚本,以运行其中一个子运行脚本时即对所有的待处理语音信息进行相应的批量处理并直至运行完所有的子运行脚本,从而得到多个目标语音信息的步骤,包括:The method according to claim 1, wherein the preset Bash script includes a first running script for performing audio format conversion and a second running script for performing effective audio screening, and the processing according to the information The instructions call and run the sub-run scripts in the preset Bash script in turn, so that when one of the sub-run scripts is run, all the voice messages to be processed will be processed in batches until all sub-run scripts are run. The steps of a target voice message include:
    根据所述信息处理指令调用预设的Bash脚本中的第一运行脚本;Calling the first running script in the preset Bash script according to the information processing instruction;
    运行所述第一运行脚本以对所有的待处理语音信息进行音频格式转换以及采样率转换,从而得到相应数量的具有预设音频格式以及预设采样率的第一语音信息;Running the first running script to perform audio format conversion and sampling rate conversion on all voice information to be processed, so as to obtain a corresponding number of first voice information with a preset audio format and a preset sampling rate;
    调用预设的Bash脚本中的第二运行脚本;Call the second running script in the preset Bash script;
    运行所述第二运行脚本以对所有的第一语音信息进行筛选,从而得到多个符合预设规格的目标语音信息,所述目标语音信息的数量小于或等于第一语音信息的数量。The second running script is executed to filter all the first voice information, so as to obtain a plurality of target voice information meeting preset specifications, and the number of the target voice information is less than or equal to the number of the first voice information.
  5. 如权利要求4所述的方法,其中,所述第二运行脚本为SOX语音处理工具,所述运行所述第二运行脚本以对所有的第一语音信息进行筛选,从而得到多个符合预设规格的目标语音信息的步骤,包括:The method of claim 4, wherein the second running script is a SOX voice processing tool, and the running the second running script filters all the first voice information to obtain a plurality of The steps to specify the target voice message include:
    运行所述SOX语音处理工具以对所有的第一语音信息进行筛选,从而得到多个符合预先设置的语音时长的阀值的目标语音信息。Run the SOX voice processing tool to filter all the first voice information, so as to obtain multiple target voice information that meet the preset voice duration threshold.
  6. 如权利要求1所述的方法,其中,所述预设的Bash脚本包括用于进行音频格式转换的第一运行脚本、用于进行有效音频筛选的第二运行脚本以及用于进行重命名的第三运行脚本,所述根据所述信息处理指令依次调用并运行预设的Bash脚本中的子运行脚本,以运行其中一个子运行脚本时即对所有的待处理语音信息进行相应的批量处理并直至运行完所有的子运行脚本,从而得到多个目标语音信息的步骤,包括:The method of claim 1, wherein the preset Bash script includes a first running script for audio format conversion, a second running script for effective audio filtering, and a second running script for renaming. Three running scripts, the sub-run scripts in the preset Bash script are called and run in turn according to the information processing instructions, so that when one of the sub-run scripts is run, all the voice messages to be processed are processed in batches until The steps to get multiple target voice information after running all the sub-run scripts include:
    根据所述信息处理指令调用预设的Bash脚本中的第一运行脚本;Calling the first running script in the preset Bash script according to the information processing instruction;
    运行所述第一运行脚本以对所有的待处理语音信息进行音频格式转换以及采样率转换,从而得到相应数量的具有相同音频格式以及采样率的第一语音信息;Running the first running script to perform audio format conversion and sampling rate conversion on all the voice information to be processed, so as to obtain a corresponding number of first voice information with the same audio format and sampling rate;
    调用预设的Bash脚本中的第二运行脚本;Call the second running script in the preset Bash script;
    运行所述第二运行脚本以对所有的第一语音信息进行筛选,从而得到多个符合预设规格的第二语音信息,所述第二语音信息的数量小于或等于第一语音 信息的数量;Running the second running script to filter all the first voice information, so as to obtain a plurality of second voice information meeting preset specifications, and the number of the second voice information is less than or equal to the number of the first voice information;
    调用预设的Bash脚本中的第三运行脚本;Call the third running script in the preset Bash script;
    运行所述第三运行脚本以对所有的第二语音信息进行重命名,从而得到相应数量的具有预设名称格式的目标语音信息。Run the third running script to rename all the second voice information, so as to obtain a corresponding number of target voice information with a preset name format.
  7. 如权利要求6所述的方法,其中,第三运行脚本为重命名函数,所述重命名函数为函数rename()。7. The method of claim 6, wherein the third running script is a renaming function, and the renaming function is a function rename().
  8. 如权利要求1所述的方法,其中,所述通过预设分帧规则对所有中间语音信息进行分帧处理以得到用于训练语音识别模型的测试语音信息的步骤,包括:8. The method according to claim 1, wherein the step of performing framing processing on all intermediate voice information according to a preset framing rule to obtain test voice information for training a voice recognition model comprises:
    通过Enframe函数对所述中间语音信息进行分帧处理以得到用于训练语音识别模型的测试语音信息。The intermediate voice information is framed by the Enframe function to obtain test voice information for training the voice recognition model.
  9. 一种语音信息的批量处理装置,包括:A batch processing device for voice information, including:
    获取单元,用于若接收到信息处理指令,获取预设的训练集,所述训练集包括多个待处理语音信息;An obtaining unit, configured to obtain a preset training set if an information processing instruction is received, the training set including a plurality of to-be-processed voice information;
    批量处理单元,用于根据所述信息处理指令依次调用并运行预设的Bash脚本中的子运行脚本,以运行其中一个子运行脚本时即对所有的待处理语音信息进行相应的批量处理并直至运行完所有的子运行脚本,从而得到多个目标语音信息,其中,所述预设的Bash脚本至少包括一个预设的子运行脚本,每个子运行脚本均用于实现对所有待处理语音信息的批量处理,所述目标语音信息的数量小于或等于所述待处理语音信息的数量;The batch processing unit is used to sequentially call and run the sub-running scripts in the preset Bash script according to the information processing instruction, so that when one of the sub-running scripts is executed, all the voice messages to be processed will be processed in batches. After running all the sub-running scripts, multiple target voice information is obtained, wherein the preset Bash script includes at least one preset sub-running script, and each sub-running script is used to implement the processing of all voice information to be processed In batch processing, the quantity of the target voice information is less than or equal to the quantity of the voice information to be processed;
    除噪单元,用于通过预设的语音激活检测对所有目标语音信息进行滤波处理以得到除噪之后的中间语音信息;The noise removal unit is used to filter all target voice information through preset voice activation detection to obtain the intermediate voice information after noise removal;
    分帧单元,通过预设分帧规则对所有中间语音信息进行分帧处理以得到用于训练语音识别模型的测试语音信息。The framing unit performs framing processing on all intermediate voice information through preset framing rules to obtain test voice information for training the voice recognition model.
  10. 如权利要求9所述的装置,其中,所述预设的Bash脚本包括用于进行音频格式以及采样率转换的第一运行脚本,所述批量处理单元,包括:9. The device of claim 9, wherein the preset Bash script includes a first running script for converting audio format and sampling rate, and the batch processing unit includes:
    第一调用单元,用于根据所述信息处理指令调用预设的Bash脚本中的第一运行脚本;The first calling unit is configured to call the first running script in the preset Bash script according to the information processing instruction;
    第一运行单元,用于运行所述第一运行脚本以对所有的待处理语音信息进行音频格式转换以及采样率转换,从而得到多个具有预设音频格式以及预设采样率的目标语音信息。The first running unit is configured to run the first running script to perform audio format conversion and sampling rate conversion on all voice information to be processed, so as to obtain multiple target voice information with preset audio formats and preset sampling rates.
  11. 一种计算机设备,所述计算机设备包括存储器及处理器,所述存储器上存储有计算机程序,所述处理器执行所述计算机程序时实现以下步骤:A computer device includes a memory and a processor, the memory stores a computer program, and the processor implements the following steps when executing the computer program:
    若接收到信息处理指令,获取预设的训练集,所述训练集包括多个待处理语音信息;If an information processing instruction is received, obtain a preset training set, where the training set includes multiple voice messages to be processed;
    根据所述信息处理指令依次调用并运行预设的Bash脚本中的子运行脚本,以运行其中一个子运行脚本时即对所有的待处理语音信息进行相应的批量处理并直至运行完所有的子运行脚本,从而得到多个目标语音信息,其中,所述预设的Bash脚本至少包括一个预设的子运行脚本,每个子运行脚本均用于实现对所有待处理语音信息的批量处理,所述目标语音信息的数量小于或等于所述待处理语音信息的数量;According to the information processing instruction, call and run the sub-run scripts in the preset Bash script in turn, so that when one of the sub-run scripts is run, all the voice messages to be processed will be processed in batches until all sub-runs are run. Script to obtain multiple target voice information, wherein the preset Bash script includes at least one preset sub-running script, and each sub-running script is used to implement batch processing of all voice information to be processed. The number of voice messages is less than or equal to the number of voice messages to be processed;
    通过预设的语音激活检测对所有目标语音信息进行滤波处理以得到除噪之后的中间语音信息;Perform filtering processing on all target voice information through preset voice activation detection to obtain intermediate voice information after noise removal;
    通过预设分帧规则对所有中间语音信息进行分帧处理以得到用于训练语音识别模型的测试语音信息。Perform framing processing on all intermediate voice information by preset framing rules to obtain test voice information for training the voice recognition model.
  12. 根据权利要求11所述的计算机设备,其中,所述预设的Bash脚本包括用于进行音频格式以及采样率转换的第一运行脚本,所述根据所述信息处理指令依次调用并运行预设的Bash脚本中的子运行脚本,以运行其中一个子运行脚本时即对所有的待处理语音信息进行相应的批量处理并直至运行完所有的子运行脚本,从而得到多个目标语音信息的步骤,包括:The computer device according to claim 11, wherein the preset Bash script includes a first running script for converting audio format and sampling rate, and the preset Bash script is called and executed in turn according to the information processing instruction. The sub-running script in the Bash script, when one of the sub-running scripts is run, all the voice messages to be processed are processed in batches until all sub-running scripts are run to obtain multiple target voice information, including :
    根据所述信息处理指令调用预设的Bash脚本中的第一运行脚本;Calling the first running script in the preset Bash script according to the information processing instruction;
    运行所述第一运行脚本以对所有的待处理语音信息进行音频格式转换以及采样率转换,从而得到多个具有预设音频格式以及预设采样率的目标语音信息。The first running script is executed to perform audio format conversion and sampling rate conversion on all voice information to be processed, thereby obtaining multiple target voice information with preset audio formats and preset sampling rates.
  13. 根据权利要求12所述的计算机设备,其中,所述第一运行脚本为FFmpeg脚本,所述运行所述第一运行脚本以对所有的待处理语音信息进行音频格式转换以及采样率转换,从而得到多个具有预设音频格式以及预设采样率的目标语音信息的步骤,包括:The computer device according to claim 12, wherein the first running script is an FFmpeg script, and the running the first running script performs audio format conversion and sampling rate conversion on all voice information to be processed, thereby obtaining The steps of multiple target voice information with preset audio formats and preset sampling rates include:
    运行所述FFmpeg脚本并确定所述FFmpeg脚本中设置的预设音频格式和预设采样率,以将所有的待处理语音信息批量地转换为具有预设音频格式和预设采样率的目标语音信息。Run the FFmpeg script and determine the preset audio format and preset sampling rate set in the FFmpeg script to convert all the voice information to be processed into target voice information with the preset audio format and preset sampling rate in batch .
  14. 根据权利要求11所述的计算机设备,其中,所述预设的Bash脚本包括 用于进行音频格式转换的第一运行脚本以及用于进行有效音频筛选的第二运行脚本,所述根据所述信息处理指令依次调用并运行预设的Bash脚本中的子运行脚本,以运行其中一个子运行脚本时即对所有的待处理语音信息进行相应的批量处理并直至运行完所有的子运行脚本,从而得到多个目标语音信息的步骤,包括:11. The computer device according to claim 11, wherein the preset Bash script includes a first running script for performing audio format conversion and a second running script for performing effective audio screening, and the said information The processing instructions sequentially call and run the sub-running scripts in the preset Bash script to run one of the sub-running scripts, that is, to perform corresponding batch processing of all the voice information to be processed until all the sub-running scripts are run, thereby obtaining The steps for multiple target voice messages include:
    根据所述信息处理指令调用预设的Bash脚本中的第一运行脚本;Calling the first running script in the preset Bash script according to the information processing instruction;
    运行所述第一运行脚本以对所有的待处理语音信息进行音频格式转换以及采样率转换,从而得到相应数量的具有预设音频格式以及预设采样率的第一语音信息;Running the first running script to perform audio format conversion and sampling rate conversion on all voice information to be processed, so as to obtain a corresponding number of first voice information with a preset audio format and a preset sampling rate;
    调用预设的Bash脚本中的第二运行脚本;Call the second running script in the preset Bash script;
    运行所述第二运行脚本以对所有的第一语音信息进行筛选,从而得到多个符合预设规格的目标语音信息,所述目标语音信息的数量小于或等于第一语音信息的数量。The second running script is executed to filter all the first voice information, so as to obtain a plurality of target voice information meeting preset specifications, and the number of the target voice information is less than or equal to the number of the first voice information.
  15. 根据权利要求14所述的计算机设备,其中,所述第二运行脚本为SOX语音处理工具,所述运行所述第二运行脚本以对所有的第一语音信息进行筛选,从而得到多个符合预设规格的目标语音信息的步骤,包括:The computer device according to claim 14, wherein the second running script is a SOX voice processing tool, and the running of the second running script is to filter all the first voice information, so as to obtain a plurality of conforming presets. The steps of setting the target voice information of the specification include:
    运行所述SOX语音处理工具以对所有的第一语音信息进行筛选,从而得到多个符合预先设置的语音时长的阀值的目标语音信息。Run the SOX voice processing tool to filter all the first voice information, so as to obtain multiple target voice information that meet the preset voice duration threshold.
  16. 根据权利要求11所述的计算机设备,其中,所述预设的Bash脚本包括用于进行音频格式转换的第一运行脚本、用于进行有效音频筛选的第二运行脚本以及用于进行重命名的第三运行脚本,所述根据所述信息处理指令依次调用并运行预设的Bash脚本中的子运行脚本,以运行其中一个子运行脚本时即对所有的待处理语音信息进行相应的批量处理并直至运行完所有的子运行脚本,从而得到多个目标语音信息的步骤,包括:The computer device according to claim 11, wherein the preset Bash script includes a first running script for performing audio format conversion, a second running script for performing effective audio screening, and a second running script for performing renaming. The third running script, which sequentially calls and runs the sub-run scripts in the preset Bash script according to the information processing instruction, so that when one of the sub-run scripts is run, all the voice messages to be processed are processed in batches and The steps until all sub-run scripts are run to obtain multiple target voice information include:
    根据所述信息处理指令调用预设的Bash脚本中的第一运行脚本;Calling the first running script in the preset Bash script according to the information processing instruction;
    运行所述第一运行脚本以对所有的待处理语音信息进行音频格式转换以及采样率转换,从而得到相应数量的具有相同音频格式以及采样率的第一语音信息;Running the first running script to perform audio format conversion and sampling rate conversion on all the voice information to be processed, so as to obtain a corresponding number of first voice information with the same audio format and sampling rate;
    调用预设的Bash脚本中的第二运行脚本;Call the second running script in the preset Bash script;
    运行所述第二运行脚本以对所有的第一语音信息进行筛选,从而得到多个 符合预设规格的第二语音信息,所述第二语音信息的数量小于或等于第一语音信息的数量;Running the second running script to filter all the first voice messages to obtain a plurality of second voice messages that meet the preset specifications, and the number of the second voice messages is less than or equal to the number of the first voice messages;
    调用预设的Bash脚本中的第三运行脚本;Call the third running script in the preset Bash script;
    运行所述第三运行脚本以对所有的第二语音信息进行重命名,从而得到相应数量的具有预设名称格式的目标语音信息。Run the third running script to rename all the second voice information, so as to obtain a corresponding number of target voice information with a preset name format.
  17. 根据权利要求16所述的计算机设备,其中,第三运行脚本为重命名函数,所述重命名函数为函数rename()。The computer device according to claim 16, wherein the third execution script is a rename function, and the rename function is a function rename().
  18. 根据权利要求11所述的计算机设备,其中,所述通过预设分帧规则对所有中间语音信息进行分帧处理以得到用于训练语音识别模型的测试语音信息的步骤,包括:11. The computer device according to claim 11, wherein the step of performing framing processing on all intermediate voice information according to preset framing rules to obtain test voice information for training a voice recognition model comprises:
    通过Enframe函数对所述中间语音信息进行分帧处理以得到用于训练语音识别模型的测试语音信息。The intermediate voice information is framed by the Enframe function to obtain test voice information for training the voice recognition model.
  19. 一种计算机可读存储介质,其中,所述存储介质存储有计算机程序,所述计算机程序被处理器执行时使所述处理器执行以下步骤:A computer-readable storage medium, wherein the storage medium stores a computer program, and when the computer program is executed by a processor, the processor executes the following steps:
    若接收到信息处理指令,获取预设的训练集,所述训练集包括多个待处理语音信息;If an information processing instruction is received, obtain a preset training set, where the training set includes multiple voice messages to be processed;
    根据所述信息处理指令依次调用并运行预设的Bash脚本中的子运行脚本,以运行其中一个子运行脚本时即对所有的待处理语音信息进行相应的批量处理并直至运行完所有的子运行脚本,从而得到多个目标语音信息,其中,所述预设的Bash脚本至少包括一个预设的子运行脚本,每个子运行脚本均用于实现对所有待处理语音信息的批量处理,所述目标语音信息的数量小于或等于所述待处理语音信息的数量;According to the information processing instruction, call and run the sub-run scripts in the preset Bash script in turn, so that when one of the sub-run scripts is run, all the voice messages to be processed will be processed in batches until all sub-runs are run. Script to obtain multiple target voice information, wherein the preset Bash script includes at least one preset sub-running script, and each sub-running script is used to implement batch processing of all voice information to be processed. The number of voice messages is less than or equal to the number of voice messages to be processed;
    通过预设的语音激活检测对所有目标语音信息进行滤波处理以得到除噪之后的中间语音信息;Perform filtering processing on all target voice information through preset voice activation detection to obtain intermediate voice information after noise removal;
    通过预设分帧规则对所有中间语音信息进行分帧处理以得到用于训练语音识别模型的测试语音信息。Perform framing processing on all intermediate voice information by preset framing rules to obtain test voice information for training the voice recognition model.
  20. 根据权利要求19所述的计算机可读存储介质,其中,所述预设的Bash脚本包括用于进行音频格式以及采样率转换的第一运行脚本,所述根据所述信息处理指令依次调用并运行预设的Bash脚本中的子运行脚本,以运行其中一个子运行脚本时即对所有的待处理语音信息进行相应的批量处理并直至运行完所 有的子运行脚本,从而得到多个目标语音信息的步骤,包括:The computer-readable storage medium according to claim 19, wherein the preset Bash script includes a first running script for performing audio format and sampling rate conversion, and the information processing instructions are sequentially called and run The sub-running scripts in the preset Bash script, when one of the sub-running scripts is run, all the voice messages to be processed will be processed in batches until all sub-running scripts are run to obtain multiple target voice messages. The steps include:
    根据所述信息处理指令调用预设的Bash脚本中的第一运行脚本;Calling the first running script in the preset Bash script according to the information processing instruction;
    运行所述第一运行脚本以对所有的待处理语音信息进行音频格式转换以及采样率转换,从而得到多个具有预设音频格式以及预设采样率的目标语音信息。The first running script is executed to perform audio format conversion and sampling rate conversion on all voice information to be processed, thereby obtaining multiple target voice information with preset audio formats and preset sampling rates.
PCT/CN2019/103345 2019-03-15 2019-08-29 Voice information batch processing method and apparatus, computer device, and storage medium WO2020186695A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910197848.0A CN110060667B (en) 2019-03-15 2019-03-15 Batch processing method and device for voice information, computer equipment and storage medium
CN201910197848.0 2019-03-15

Publications (1)

Publication Number Publication Date
WO2020186695A1 true WO2020186695A1 (en) 2020-09-24

Family

ID=67317009

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/103345 WO2020186695A1 (en) 2019-03-15 2019-08-29 Voice information batch processing method and apparatus, computer device, and storage medium

Country Status (2)

Country Link
CN (1) CN110060667B (en)
WO (1) WO2020186695A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110060667B (en) * 2019-03-15 2023-05-30 平安科技(深圳)有限公司 Batch processing method and device for voice information, computer equipment and storage medium
CN112820309A (en) * 2020-12-31 2021-05-18 北京天润融通科技股份有限公司 RNN-based noise reduction processing method and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5428707A (en) * 1992-11-13 1995-06-27 Dragon Systems, Inc. Apparatus and methods for training speech recognition systems and their users and otherwise improving speech recognition performance
CN1588538A (en) * 2004-09-29 2005-03-02 上海交通大学 Training method for embedded automatic sound identification system
CN108922543A (en) * 2018-06-11 2018-11-30 平安科技(深圳)有限公司 Model library method for building up, audio recognition method, device, equipment and medium
CN109326305A (en) * 2018-09-18 2019-02-12 易诚博睿(南京)科技有限公司 A kind of batch testing speech recognition and text synthetic method and test macro
CN110060667A (en) * 2019-03-15 2019-07-26 平安科技(深圳)有限公司 Batch processing method, device, computer equipment and the storage medium of voice messaging

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9286892B2 (en) * 2014-04-01 2016-03-15 Google Inc. Language modeling in speech recognition
CN107908679B (en) * 2017-10-26 2020-11-27 平安科技(深圳)有限公司 Script statement conversion method and device and computer readable storage medium
CN108595656B (en) * 2018-04-28 2022-02-18 宁波银行股份有限公司 Data processing method and system
CN108877775B (en) * 2018-06-04 2023-03-31 平安科技(深圳)有限公司 Voice data processing method and device, computer equipment and storage medium
CN109376166B (en) * 2018-08-20 2023-07-04 中国平安财产保险股份有限公司 Script conversion method, script conversion device, computer equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5428707A (en) * 1992-11-13 1995-06-27 Dragon Systems, Inc. Apparatus and methods for training speech recognition systems and their users and otherwise improving speech recognition performance
CN1588538A (en) * 2004-09-29 2005-03-02 上海交通大学 Training method for embedded automatic sound identification system
CN108922543A (en) * 2018-06-11 2018-11-30 平安科技(深圳)有限公司 Model library method for building up, audio recognition method, device, equipment and medium
CN109326305A (en) * 2018-09-18 2019-02-12 易诚博睿(南京)科技有限公司 A kind of batch testing speech recognition and text synthetic method and test macro
CN110060667A (en) * 2019-03-15 2019-07-26 平安科技(深圳)有限公司 Batch processing method, device, computer equipment and the storage medium of voice messaging

Also Published As

Publication number Publication date
CN110060667B (en) 2023-05-30
CN110060667A (en) 2019-07-26

Similar Documents

Publication Publication Date Title
AU2016260156B2 (en) Method and device for improving audio processing performance
CN108833722B (en) Speech recognition method, speech recognition device, computer equipment and storage medium
WO2015188581A1 (en) Method and device for audio control of motor vibration
WO2016112113A1 (en) Utilizing digital microphones for low power keyword detection and noise suppression
WO2020140374A1 (en) Voice data processing method, apparatus and device and storage medium
WO2020186695A1 (en) Voice information batch processing method and apparatus, computer device, and storage medium
CN110047472B (en) Batch conversion method and device for voice information, computer equipment and storage medium
US11790886B2 (en) System and method for synthesizing automated test cases from natural interactions
CN112185424B (en) Voice file clipping and restoring method, device, equipment and storage medium
JP2015537237A (en) Real-time traffic detection
WO2020228107A1 (en) Audio repair method and device, and readable storage medium
WO2023103253A1 (en) Audio detection method and apparatus, and terminal device
WO2024099359A1 (en) Voice detection method and apparatus, electronic device and storage medium
CN108428457B (en) Audio duplicate removal method and device
CN104240697A (en) Audio data feature extraction method and device
CN111968620B (en) Algorithm testing method and device, electronic equipment and storage medium
WO2021179470A1 (en) Method, device and system for recognizing sampling rate of pure voice data
US9978393B1 (en) System and method for automatically removing noise defects from sound recordings
CN110189763B (en) Sound wave configuration method and device and terminal equipment
CN111028860B (en) Audio data processing method and device, computer equipment and storage medium
CN113889086A (en) Training method of voice recognition model, voice recognition method and related device
CN110930986B (en) Voice processing method and device, electronic equipment and storage medium
CN113270118A (en) Voice activity detection method and device, storage medium and electronic equipment
CN110059059B (en) Batch screening method and device for voice information, computer equipment and storage medium
CN113156373B (en) Sound source positioning method, digital signal processing device and audio system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19919963

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19919963

Country of ref document: EP

Kind code of ref document: A1