WO2021159635A1 - 语音训练样本的获取方法、装置、计算机设备和存储介质 - Google Patents

语音训练样本的获取方法、装置、计算机设备和存储介质 Download PDF

Info

Publication number
WO2021159635A1
WO2021159635A1 PCT/CN2020/093092 CN2020093092W WO2021159635A1 WO 2021159635 A1 WO2021159635 A1 WO 2021159635A1 CN 2020093092 W CN2020093092 W CN 2020093092W WO 2021159635 A1 WO2021159635 A1 WO 2021159635A1
Authority
WO
WIPO (PCT)
Prior art keywords
tearing
spectrogram
sound
time
point
Prior art date
Application number
PCT/CN2020/093092
Other languages
English (en)
French (fr)
Inventor
马坤
赵之砚
施奕明
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021159635A1 publication Critical patent/WO2021159635A1/zh

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique

Definitions

  • This application relates to the field of artificial intelligence, in particular to methods, devices, computer equipment, and storage media for obtaining speech training samples.
  • Speech recognition identity that is, voiceprint recognition
  • voiceprint recognition is an important direction in the field of artificial intelligence and an important application of artificial intelligence technology in biometric recognition scenarios.
  • accuracy of voiceprint recognition has been breaking new highs under laboratory conditions, in actual business scenarios, because voice transmission relies on transmission channels, such as telephones, broadband networks and other transmission channels, the received voice will be affected by the channel. Therefore, the accuracy of voiceprint recognition is still not high.
  • the features extracted by the speech A in, respectively, are accompanied by the features of the telephone channel and the network channel, which will cause the judgment error of the voiceprint recognition. Therefore, in the field of voiceprint recognition, the cross-channel problem is still a difficult problem so far.
  • the mainstream solution in the industry is to collect voice data of each channel, either to train a model of feature conversion between channels, or to expand the training set of the original model with the collected cross-channel data.
  • the core is to collect enough cross-channel data as samples.
  • the main purpose of this application is to provide a method, device, computer equipment, and storage medium for acquiring voice training samples, aiming to solve the technical problem that sufficient and effective cross-channel voice data cannot be collected as samples in the prior art.
  • this application proposes a method for acquiring speech training samples, including:
  • the tearing spectrogram is obtained, and the tearing spectrogram is used as the speech training sample, wherein the separation distance of the sound spectrogram on both sides of the tearing point is s, and the s is from [0 , S] randomly selected number in the uniform distribution, S is the time deformation parameter.
  • This application also provides a device for acquiring voice training samples, including:
  • the conversion unit is used to process the voice signal to obtain the sound spectrogram of the voice signal
  • a selection unit configured to randomly select a time point in the time direction on the sound spectrogram
  • the tearing unit is used to use the time point as the tearing point to separate the sound spectrograms on both sides of the tearing point in the time direction, to complete the tearing processing of the sound spectrogram, and to break the sound spectrogram. Add excess information according to preset rules to obtain the tearing spectrogram, and use the tearing spectrogram as the voice training sample, where the separation distance of the sound spectrogram on both sides of the tearing point is s, so The s is a number randomly selected from the uniform distribution of [0, S], and S is the time deformation parameter.
  • the present application also provides a computer device including a memory and a processor, the memory stores a computer program, and when the processor executes the computer program, a method for acquiring a voice training sample is implemented.
  • the method includes the following steps:
  • the tearing spectrogram is obtained, and the tearing spectrogram is used as the speech training sample, wherein the separation distance of the sound spectrogram on both sides of the tearing point is s, and the s is from [0 , S] randomly selected number in the uniform distribution, S is the time deformation parameter.
  • the present application also provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, a method for obtaining a voice training sample is realized.
  • the method includes the following steps:
  • the tearing spectrogram is obtained, and the tearing spectrogram is used as the speech training sample, wherein the separation distance of the sound spectrogram on both sides of the tearing point is s, and the s is from [0 , S] randomly selected number in the uniform distribution, S is the time deformation parameter.
  • the voice training sample acquisition method, device, computer equipment and storage medium of this application can convert an original voice signal into a sound spectrogram, and then through the processing of tearing and masking, a large number of sound spectrograms can be derived
  • the tearing spectrogram, the first mask spectrogram and the second mask spectrogram, and these tearing spectrograms, the first mask spectrogram and the second mask spectrogram can all be used as samples for training the voiceprint recognition model.
  • This can solve the problem that the number of samples for training the voiceprint recognition model in the prior art is small and an accurate voiceprint recognition model cannot be obtained. For example, it can solve the problem that there are fewer samples in different channel scenarios, and a voiceprint recognition model cannot be trained well.
  • FIG. 1 is a schematic flowchart of a method for acquiring a voice training sample according to an embodiment of this application
  • FIG. 2 is a schematic diagram of the structure of an apparatus for acquiring a voice training sample according to an embodiment of the application
  • FIG. 3 is a schematic structural diagram of a computer device according to an embodiment of the application.
  • a method for obtaining speech training samples includes:
  • the voice signal as a sample is first converted into a sound spectrogram.
  • the sound spectrogram is generally a mel spectrogram.
  • the specific conversion process can use any one of the existing technologies.
  • the sound spectrogram is torn apart at a certain time point as the tearing point, that is, the above sound spectrogram is separated in time at this point in time. There can be many ways to separate, for example, dividing the two sides of the tearing point.
  • the first side in the sound spectrogram is fixed, and the second side moves away from the first side; or, the first side and the second side move away from each other respectively, etc.
  • the first side may be fixed, and the second side may be moved away from the first side by s; and then the second side may be fixed on the original sound spectrogram, and the first side may be away from the second side.
  • Move s and so on so as to obtain two tearing spectrograms with different moving directions in the processing at a time point. In other embodiments, you can also move a specified distance in a specified direction.
  • the above steps S2 and S3 are repeated, each time a different time point is selected to obtain a plurality of the tearing spectrum graphs corresponding to the sound spectrogram, and finally the sound spectrogram and the plurality of tearing spectrum graphs are obtained.
  • the split spectrum map forms the first speech training sample set.
  • multiple tearing spectrograms after tearing can be derived from one sound spectrogram, thereby enriching the number of voice training samples, thereby solving the problem of training voiceprint recognition models in the prior art.
  • the above steps of adding excessive information at the break according to preset rules include:
  • the transition information is randomly added to the fracture of the tearing spectrogram.
  • the excess information can be added to the gaps, such as adding different smooth signals, etc. .
  • the excess information can be preset. Generally, a number of different excess information are preset, and then one excess information is randomly selected and added to the break. If the excess information cannot just fill the gap, the excess information can be enlarged or reduced proportionally So that it can just be added to the blank.
  • the above S is a positive integer, then S types of transition information are set, and each type of transition information includes multiple transition information with different contents. When adding transition information, the corresponding s types of transition information Randomly select one of the excess information to further provide the diversity of Xu Liang's sample.
  • the above-mentioned preset rule is to add all the same data at the break, such as adding all 0s, adding 1, or other data such as 010101 continuously repetitively looping.
  • step S2 of randomly selecting a time point in the time direction on the sound spectrogram the method includes:
  • S202 Determine the number of times of tearing processing of the sound spectrogram according to the length of time.
  • the wireless frequency tearing process cannot be performed on the above-mentioned sound spectrogram. Therefore, the present application will determine the number of tearing according to the length of the time information in the sound spectrogram. Specifically, set up a mapping table. One column in the mapping table is the time length range, and the other column is the number of tearings corresponding to the time length range. After determining the time length in the sound spectrogram, check that the time length falls into the mapping table. Which time length is within the range, and then select the number of tears corresponding to the time length range. The specific time length and the number of tearing can be set manually based on experience. The idea of the setting is that the longer the time length, the more the corresponding tearing times, and vice versa, the fewer the tearing times.
  • the step S203 of selecting the same number of time points as the number of times of the tearing process to perform different times of the tearing process on the sound spectrogram includes:
  • the time points corresponding to the number of times of the tearing process are evenly allocated over the length of time, so as to perform different times of tearing processes on the sound spectrogram.
  • the time points are equally distributed within the above-mentioned time length, the distribution is fast and even, and the difference between samples is more even than the above-mentioned random distribution.
  • the above-mentioned sound spectrogram can only be torn at one time point at a time to obtain a tearing spectrogram with only one tear; in another embodiment, a sound spectrogram can be torn at multiple times The point is the tearing point and the tearing process is performed to obtain a tearing spectrogram with multiple tearing points.
  • the above-mentioned time point is used as the tearing point, the sound spectrograms on both sides of the tearing point are separated in the time direction, the tearing process of the sound spectrogram is completed, and the After the step S3 of adding excess information according to preset rules at the fracture to obtain the tearing spectrogram, it includes:
  • a masked spectrogram and all tearing spectrograms are put together to form a second speech training sample set, which further improves the number of samples and the richness of samples.
  • the time length represented by the above t is less than the time length of the tearing spectrogram, and the above t0 is any time point in the tearing spectrogram. .
  • the above-mentioned time point is used as the tearing point, the sound spectrograms on both sides of the tearing point are separated in the time direction, the tearing process of the sound spectrogram is completed, and the After the step S3 of adding excess information at the fracture according to the preset rules to obtain the tearing spectrogram, it also includes:
  • the above-mentioned second spectrum block is a spectrum block in the frequency direction, not a time spectrum block.
  • the mask sequence [v1, ...] to the spectrum blocks of n (positive integer) continuous frequency channels [m0, m0+n], where v is from [0,V] A randomly selected number in the uniform distribution, V is the frequency mask parameter.
  • the aforementioned m0 is any frequency channel point in the aforementioned tearing spectrogram, but it is required to be able to meet the blocking of the tearing spectrogram.
  • the step S2 of randomly selecting a time point in the time direction on the sound spectrogram includes:
  • a mask is added to the sound spectrogram first, and then the time point is randomly selected in the time direction on the third mask spectrogram, so that more abundant samples can be obtained.
  • the voice training sample acquisition method of the embodiment of the application can convert an original voice signal into a sound spectrogram, and then through the processing of tearing and masking, a sound spectrogram is derived from a large number of tearing spectrograms, first The masked spectrogram and the second masked spectrogram, and these tearing spectrograms, the first masked spectrogram and the second masked spectrogram can all be used as samples for training the voiceprint recognition model, which can solve the problems in the prior art There is a problem that the sample size of training the voiceprint recognition model is small and an accurate voiceprint recognition model cannot be obtained.
  • an embodiment of the present application also provides an apparatus for acquiring voice training samples, including:
  • the conversion unit 10 is configured to process the voice signal to obtain a sound spectrogram of the voice signal
  • the selection unit 20 is configured to randomly select a time point in the time direction on the sound spectrogram
  • the tearing unit 30 is configured to use the time point as the tearing point to separate the sound spectrograms on both sides of the tearing point in the time direction, to complete the tearing processing of the sound spectrogram, and Add excess information at the break according to preset rules to obtain a tearing spectrogram, and use the tearing spectrogram as the voice training sample, wherein the separation distance of the sound spectrogram on both sides of the tearing point is s,
  • the s is a number randomly selected from the uniform distribution of [0, S], and S is a time deformation parameter.
  • the conversion unit 10 first converts the voice signal as a sample into a sound spectrogram.
  • the sound spectrogram is generally a mel spectrogram.
  • the specific conversion process can use any one of the prior art.
  • the tearing unit 30 uses the time point as the tearing point to tear the sound spectrogram, that is, the sound spectrogram is separated in time at the time point, and the method is divided.
  • the first side may be fixed, and the second side may be moved away from the first side by s; and then the second side may be fixed on the original sound spectrogram, and the first side may be away from the second side.
  • Move s and so on so as to obtain two tearing spectrograms with different moving directions in the processing at a time point. In other embodiments, you can also move a specified distance in a specified direction.
  • the tearing unit 30 further includes:
  • the adding unit is used to randomly add the transition information to the fracture of the tearing spectrogram. That is, the preset rule is to randomly add excessive information at the break.
  • the excess information can be added to the gaps, for example, different smooth signals will be added. Wait.
  • the excess information can be preset. Generally, a number of different excess information are preset, and then one excess information is randomly selected and added to the break. If the excess information cannot just fill the gap, the excess information can be enlarged or reduced proportionally So that it can just be added to the blank.
  • the above S is a positive integer, then S types of transition information are set, and each type of transition information includes multiple transition information with different contents. When adding transition information, the corresponding s types of transition information Randomly select an excessive information from the excessive information to further provide the diversity of training samples.
  • the preset rule is to add all the same data at the break, such as adding all 0s, adding 1, or other data such as 010101 that repeats the cycle continuously.
  • the device for acquiring speech training samples further includes:
  • An acquiring unit configured to acquire the time length of the sound spectrogram
  • a determining unit configured to determine the number of times of tearing processing of the sound spectrogram according to the length of time
  • the selection unit is configured to select the same number of time points as the number of times of the tearing process, so as to perform different times of the tearing process on the sound spectrogram.
  • the wireless frequency tearing process cannot be performed on the above-mentioned sound spectrogram. Therefore, the present application will determine the number of tearing according to the length of the time information in the sound spectrogram. Specifically, set up a mapping table. One column in the mapping table is the time length range, and the other column is the number of tearings corresponding to the time length range. After determining the time length in the sound spectrogram, check that the time length falls into the mapping table. Which time length is within the range, and then select the number of tears corresponding to the time length range. The specific time length and the number of tearing can be set manually based on experience. The idea of the setting is that the longer the time length, the more the corresponding tearing times, and vice versa, the fewer the tearing times.
  • the above selection unit includes:
  • the average selection module is configured to evenly allocate time points corresponding to the number of times of the tearing processing over the length of time, so as to perform different times of tearing processing on the sound spectrogram.
  • the time points are equally distributed within the above-mentioned time length, the distribution is fast and even, and the difference between samples is more even than the above-mentioned random distribution.
  • the above-mentioned sound spectrogram can only be torn at one time point at a time to obtain a tearing spectrogram with only one tear; in another embodiment, a sound spectrogram can be torn at multiple times The point is the tearing point and the tearing process is performed to obtain a tearing spectrogram with multiple tearing points.
  • the device for acquiring speech training samples further includes:
  • a time spectrum unit configured to select a plurality of first spectrum blocks set at intervals in the time direction on the tearing spectrum graph
  • the first mask unit is configured to apply a mask sequence on each of the first spectrum blocks to obtain a first mask spectrogram.
  • a masked spectrogram and all tearing spectrograms are put together to form a second speech training sample set, which further improves the number of samples and the richness of samples.
  • the time length represented by the above t is less than the time length of the tearing spectrogram, and the above t0 is any time point in the tearing spectrogram. .
  • the device for acquiring speech training samples further includes:
  • a frequency spectrum unit configured to select a plurality of second spectrum blocks of different frequency channels in the frequency direction on the tearing spectrum graph
  • the second mask unit is used to apply a mask sequence on each of the second spectrum blocks to obtain a second mask spectrogram.
  • the above-mentioned second spectrum block is a spectrum block in the frequency direction, not a time spectrum block.
  • the mask sequence [v1, ...] to the spectrum blocks of n (positive integer) continuous frequency channels [m0, m0+n], where v is from [0,V] A randomly selected number in the uniform distribution, V is the frequency mask parameter.
  • the aforementioned m0 is any frequency channel point in the aforementioned tearing spectrogram, but it is required to be able to meet the blocking of the tearing spectrogram.
  • the aforementioned selection unit 20 includes:
  • a mask module configured to randomly add a mask in the time direction on the sound spectrogram to obtain a third mask spectrogram
  • the selection module is configured to randomly select the time point in the time direction on the third mask spectrogram.
  • a mask is added to the sound spectrogram first, and then the time point is randomly selected in the time direction on the third mask spectrogram, so that more abundant samples can be obtained.
  • the voice training sample acquisition device of the embodiment of the present application can convert an original voice signal into a sound spectrogram, and then through the processing of tearing and masking, a sound spectrogram is derived from a large number of tearing spectrograms and first spectrograms.
  • a mask spectrogram and a second mask spectrogram, and these tearing spectrograms, the first mask spectrogram and the second mask spectrogram can all be used as samples for training the voiceprint recognition model, which can solve the prior art The problem that the sample size of the training voiceprint recognition model is small and an accurate voiceprint recognition model cannot be obtained.
  • an embodiment of the present application also provides a computer device, including a memory and a processor, the memory stores a computer program, and its internal structure may be as shown in FIG. 3.
  • the computer equipment includes a processor, a memory, a network interface, and a database connected through a system bus. Among them, the processor designed by the computer is used to provide calculation and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, a computer program, and a database.
  • the memory provides an environment for the operation of the operating system and the computer program in the non-volatile storage medium.
  • the database of the computer equipment is used to store data such as sample sets.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • a method for acquiring a voice training sample includes: processing a voice signal to obtain a sound spectrogram of the voice signal; randomly selecting a time point in the time direction on the sound spectrogram; taking the time point as the tearing Split point, separate the sound spectrogram on both sides of the tear point in the time direction, complete the tearing process of the sound spectrogram, and add excess information at the fracture according to preset rules to obtain the tear spectrum And use the tearing spectrogram as the speech training sample, where the separation distance of the sound spectrogram on both sides of the tearing point is s, and the s is from the uniform distribution of [0, S] A randomly selected number, S is the time deformation parameter.
  • the step of adding excess spectrogram information at the tearing point according to a preset rule includes: randomly adding the excess information to the fractured part of the tearing spectrogram.
  • the method before the step of randomly selecting a time point in the time direction on the sound spectrogram, the method includes: acquiring the time length of the sound spectrogram; The number of tearing treatments in the figure; selecting the same number of time points as the number of tearing treatments to perform different times of tearing treatments on the sound spectrogram.
  • the step of selecting the same number of time points as the number of times of the tearing process to perform different times of the tearing process on the sound spectrogram includes: evenly distributing all the time points over the length of time.
  • the number of times of the tearing process corresponds to the number of time points to perform different times of the tearing process on the sound spectrogram.
  • the time point is used as the tearing point, the sound spectrograms on both sides of the tearing point are separated in the time direction, and the tearing process of the sound spectrogram is completed, and After the step of adding excess information at the break to obtain the tearing spectrogram according to preset rules, the method includes: selecting a plurality of first spectrum blocks set at intervals in the time direction on the tearing spectrogram; A mask sequence is applied to the first spectrum block to obtain a first mask spectrogram.
  • the time point is used as the tearing point, the sound spectrograms on both sides of the tearing point are separated in the time direction, and the tearing process of the sound spectrogram is completed, and After the step of adding excess information at the break to obtain the tearing spectrogram according to preset rules, the method further includes: selecting a plurality of second spectrum blocks of different frequency channels in the frequency direction on the tearing spectrogram; A mask sequence is applied to each of the second spectrum blocks to obtain a second mask spectrogram.
  • the step of randomly selecting a time point in the time direction on the sound spectrogram includes: randomly adding a mask in the time direction on the sound spectrogram to obtain a third mask spectrogram; The time point is randomly selected in the time direction on the third mask spectrogram.
  • the computer device of the embodiment of the present application can convert an original voice signal into a sound spectrogram, and then through the processing of tearing and masking, a sound spectrogram is derived from a large number of tearing spectrograms and first masked spectrograms. And the second mask spectrogram, and these tearing spectrograms, the first mask spectrogram and the second mask spectrogram can all be used as samples for training the voiceprint recognition model, which can solve the training voiceprint in the prior art The sample size of the recognition model is small, and an accurate voiceprint recognition model cannot be obtained.
  • An embodiment of the present application also provides a computer-readable storage medium.
  • the computer-readable storage medium may be non-volatile or volatile.
  • a computer program is stored thereon. When the computer program is executed by a processor, a How to obtain speech training samples. specifically:
  • a method for acquiring a voice training sample includes: processing a voice signal to obtain a sound spectrogram of the voice signal; randomly selecting a time point in the time direction on the sound spectrogram; taking the time point as the tearing Split point, separate the sound spectrogram on both sides of the tear point in the time direction, complete the tearing process of the sound spectrogram, and add excess information at the fracture according to preset rules to obtain the tear spectrum And use the tearing spectrogram as the speech training sample, where the separation distance of the sound spectrogram on both sides of the tearing point is s, and the s is from the uniform distribution of [0, S] A randomly selected number, S is the time deformation parameter.
  • the step of adding excess spectrogram information at the tearing point according to a preset rule includes: randomly adding the excess information to the fractured part of the tearing spectrogram.
  • the method before the step of randomly selecting a time point in the time direction on the sound spectrogram, the method includes: acquiring the time length of the sound spectrogram; The number of tearing treatments in the figure; selecting the same number of time points as the number of tearing treatments to perform different times of tearing treatments on the sound spectrogram.
  • the step of selecting the same number of time points as the number of times of the tearing process to perform different times of the tearing process on the sound spectrogram includes: evenly distributing all the time points over the length of time.
  • the number of times of the tearing process corresponds to the number of time points to perform different times of the tearing process on the sound spectrogram.
  • the time point is used as the tearing point, the sound spectrograms on both sides of the tearing point are separated in the time direction, and the tearing process of the sound spectrogram is completed, and After the step of adding excess information at the break to obtain the tearing spectrogram according to preset rules, the method includes: selecting a plurality of first spectrum blocks set at intervals in the time direction on the tearing spectrogram; A mask sequence is applied to the first spectrum block to obtain a first mask spectrogram.
  • the time point is used as the tearing point, the sound spectrograms on both sides of the tearing point are separated in the time direction, and the tearing process of the sound spectrogram is completed, and After the step of adding excess information at the break to obtain the tearing spectrogram according to preset rules, the method further includes: selecting a plurality of second spectrum blocks of different frequency channels in the frequency direction on the tearing spectrogram; A mask sequence is applied to each of the second spectrum blocks to obtain a second mask spectrogram.
  • the step of randomly selecting a time point in the time direction on the sound spectrogram includes: randomly adding a mask in the time direction on the sound spectrogram to obtain a third mask spectrogram; The time point is randomly selected in the time direction on the third mask spectrogram.
  • an original voice signal can be converted into a sound spectrogram, and then a large number of sound spectrograms can be derived from a sound spectrogram through the processing of tearing and masking.
  • the tearing spectrogram, the first mask spectrogram and the second mask spectrogram, and these tearing spectrograms, the first mask spectrogram and the second mask spectrogram can all be used as samples for training the voiceprint recognition model. This can solve the problem that the number of samples for training the voiceprint recognition model in the prior art is small and an accurate voiceprint recognition model cannot be obtained.
  • Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual-rate SDRAM (SSRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Computational Linguistics (AREA)
  • Human Computer Interaction (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Theoretical Computer Science (AREA)
  • Signal Processing (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Auxiliary Devices For Music (AREA)
  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)

Abstract

一种语音训练样本的获取方法、装置、计算机设备和存储介质,其中方法包括:对语音信号进行处理,得到该语音信号的声音频谱图(S1);在声音频谱图上的时间方向上随机选择时间点(S2);以该时间点为撕裂点,将撕裂点两侧的声音频谱图在时间方向上进行分离,完成对声音频谱图的撕裂处理,并在断裂处按照预设规则添加过度信息,得到撕裂频谱图,并将撕裂频谱图作为语音训练样本(S3)。该方法将一个原始的语音信号转换成声音频谱图后,通过撕裂、掩码的处理,将一个声音频谱图衍生出大量的撕裂频谱图、第一掩码频谱图和第二掩码频谱图,从而可解决现有技术中训练声纹识别模型的样本量较少无法得到准确的声纹识别模型的问题。

Description

语音训练样本的获取方法、装置、计算机设备和存储介质
本申请要求于2020年2月14日提交中国专利局、申请号为202010093613.X,发明名称为“语音训练样本的获取方法、装置、计算机设备和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能领域,特别是涉及到语音训练样本的获取方法、装置、计算机设备和存储介质。
背景技术
语音识别身份,即声纹识别,是人工智能领域的重要方向,是人工智能技术在生物特征识别场景中的重要应用。虽然在实验室条件下,声纹识别的准确率一直突破新高,但在实际业务场景中,由于语音传输依赖传输信道,如电话、宽带网络等传输信道,接收到的语音会被信道所影响,所以声纹识别的准确率仍然不高。
发明人发现,说话语音和信道是无法完全切分开的,所以在声纹识别的过程中,提取出来的说话人声音特征中都不可避免的存在信道特征,如电话录音的说话人A和网络语音中的说话A提取出来的特征中分别附带了电话信道和网络信道的特征,会造成其声纹识别的判定误差。因此在声纹识别领域里,跨信道问题迄今为止仍是一个难题。
目前业界的主流解决方法是采集各信道的语音数据,要么训练一种信道间特征转换的模型,要么用采集的跨信道数据扩充原模型的训练集。其核心都是收集到足够多的跨信道数据作为样本。而实际生产中,由于采集样本成本和采集条件的限制,无法采集足够多而且有效的跨信道语音数据作为样本。
技术问题
本申请的主要目的为提供一种语音训练样本的获取方法、装置、计算机设备和存储介质,旨在解决现有技术中无法采集足够多而且有效的跨信道语音数据作为样本的技术问题。
技术解决方案
为了实现上述发明目的,本申请提出一种语音训练样本的获取方法,包括:
对语音信号进行处理,得到所述语音信号的声音频谱图;
在所述声音频谱图上的时间方向上随机选择时间点;
以所述时间点为撕裂点,将所述撕裂点两侧的声音频谱图在时间方向上进行分离,完成对所述声音频谱图的撕裂处理,并在断裂处按照预设规则添加过度信息,得到撕裂频谱图,并将所述撕裂频谱图作为所述语音训练样本,其中,所述撕裂点两侧的声音频谱图的分离距离为s,所述s是从[0,S]的均匀分布中随机选取的数字,S为时间变形参数。
本申请还提供一种语音训练样本的获取装置,包括:
转换单元,用于对语音信号进行处理,得到所述语音信号的声音频谱图;
选择单元,用于在所述声音频谱图上的时间方向上随机选择时间点;
撕裂单元,用于以所述时间点为撕裂点,将所述撕裂点两侧的声音频谱图在时间方向上进行分离,完成对所述声音频谱图的撕裂处理,并在断裂处按照预设规则添加过度信息,得到撕裂频谱图,并将所述撕裂频谱图作为所述语音训练样本,其中,所述撕裂点两侧的声音频谱图的分离距离为s,所述s是从[0,S]的均匀分布中随机选取的数字,S为时间变形参数。
本申请还提供一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,所述处理器执行所述计算机程序时实现一种语音训练样本的获取方法,该方法包括以下步骤:
对语音信号进行处理,得到所述语音信号的声音频谱图;
在所述声音频谱图上的时间方向上随机选择时间点;
以所述时间点为撕裂点,将所述撕裂点两侧的声音频谱图在时间方向上进行分离,完成对所述声音频谱图的撕裂处理,并在断裂处按照预设规则添加过度信息,得到撕裂频谱图,并将所述撕裂频谱图作为所述语音训练样本,其中,所述撕裂点两侧的声音频谱图的分离距离为s,所述s是从[0,S]的均匀分布中随机选取的数字,S为时间变形参数。
本申请还提供一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现一种语音训练样本的获取方法,该方法包括以下步骤:
对语音信号进行处理,得到所述语音信号的声音频谱图;
在所述声音频谱图上的时间方向上随机选择时间点;
以所述时间点为撕裂点,将所述撕裂点两侧的声音频谱图在时间方向上进行分离,完成对所述声音频谱图的撕裂处理,并在断裂处按照预设规则添加过度信息,得到撕裂频谱图,并将所述撕裂频谱图作为所述语音训练样本,其中,所述撕裂点两侧的声音频谱图的分离距离为s,所述s是从[0,S]的均匀分布中随机选取的数字,S为时间变形参数。
有益效果
本申请的语音训练样本的获取方法、装置、计算机设备和存储介质,可以将一个原始的语音信号转换成声音频谱图后,通过撕裂、掩码的处理,将一个声音频谱图衍生出大量的撕裂频谱图、第一掩码频谱图和第二掩码频谱图,而这些撕裂频谱图、第一掩码频谱图和第二掩码频谱图均可以作为训练声纹识别模型的样本,从而可以解决现有技术中训练声纹识别模型的样本量较少无法得到准确的声纹识别模型的问题。比如,可以很好的解决不同信道场景下的样本较少,无法很好的训练出一个声纹识别模型的问题。
附图说明
图1 为本申请一实施例的语音训练样本的获取方法的流程示意图;
图2 为本申请一实施例的语音训练样本的获取装置的结构程示意图;
图3 为本申请一实施例的计算机设备的结构示意图。
本发明的最佳实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
参照图1,一种语音训练样本的获取方法,包括:
S1、对语音信号进行处理,得到所述语音信号的声音频谱图;
S2、在所述声音频谱图上的时间方向上随机选择时间点;
S3、以所述时间点为撕裂点,将所述撕裂点两侧的声音频谱图在时间方向上进行分离,完成对所述声音频谱图的撕裂处理,并在断裂处按照预设规则添加过度信息,得到撕裂频谱图,并将所述撕裂频谱图作为所述语音训练样本,其中,所述撕裂点两侧的声音频谱图的分离距离为s,所述s是从[0,S]的均匀分布中随机选取的数字,S为时间变形参数。
在本实施例中,首先将作为样本的语音信号转换成声音频谱图,声音频谱图一般为梅尔频谱图,具体的转换过程可以使用现有技术中的任意一中。以某一个时间点作为撕裂点,将声音频谱图进行撕裂,即在该时间点将上述声音频谱图在时间上分开,分开的方式可以是多种,比如,将撕裂点两侧的声音频谱图中的第一侧固定,第二侧向远离第一侧的方向移动;或者,第一侧和第二侧分别向远离对方的方向移动等。在一个具体实施例,可以是固定第一侧,第二侧向远离第一侧的方向移动s;,然后在原始的声音频谱图上固定第二侧,第一侧向远离第二侧的方向移动s等,从而在一个时间点的处理上得到不同移动方向的两张撕裂频谱图。在它实施例中,也可以在指定方向移动指定的距离。进一步地,重复上述步骤S2和S3,每次选择不同的时间点,得到多个对应所述声音频谱图的多个所述撕裂频谱图形,最后将所述声音频谱图和多个所述撕裂频谱图形成第一语音训练样本集。利用本申请的技术方案,可以通过一张声音频谱图衍生出多张撕裂处理后的撕裂频谱图,从而丰富了语音训练样本的数量,从而可以解决现有技术中训练声纹识别模型的样本量较少无法得到准确的声纹识别模型的问题。比如,可以很好的解决不同信道场景下的样本较少,无法很好的训练出一个声纹识别模型的问题。
上述在断裂处按照预设规则添加过度信息的步骤,包括:
对所述撕裂频谱图的断裂处随机添加所述过度信息。
在本实施例中,因为撕裂频谱图存在撕裂的频谱段,撕裂的频谱段会存在空白,为了提高训练样本的多样性,可以在空白处添加过度信息,如添加不同的平滑信号等。过度信息可以为预设的,一般会预设多个不同的过度信息,然后随机选择一个过度信息添加到断裂处,如果过度信息不能刚好补满空白,可以对过度信息进行等比例的放大或缩小,使其可以恰好添加到空白。在另一具体实施例中,上述S为正整数,那么就设置S种的过度信息,每一种过度信息又包括多个内容不同的过度信息,在添加过度信息的时候,在对应s种的过度信息中随机选择一个过度信息,进一步地提供徐良样本的多样性。
在另一实施例,上述预设规则是在断裂处添加全部相同的数据,如全部加0、加1,或者其它如010101不断重复循环的数据等。
在一个实施例中,所述在所述声音频谱图上的时间方向上随机选择时间点的步骤S2之前,包括:
S201、获取所述声音频谱图的时间长度;
S202、根据所述时间长度确定对所述声音频谱图的撕裂处理次数;
S203、选择与所述撕裂处理次数相同个数的时间点,以对所述声音频谱图进行不同次的撕裂处理。
在本实施例中,不能对上述声音频谱图进行无线次数的撕裂处理,所以本申请会根据声音频谱图中的时间信息的长度来确定撕裂的次数。具体的,设置一个映射表,映射表中一列是时间长度范围,一列是对应时间长度范围的撕裂次数,当确定声音频谱图中的时间长度后,查看该时间长度落入了映射表中的哪一个时间长度范围内,然后选择该时间长度范围对应的撕裂次数。具体的时间长度和撕裂次数的设置,可以根据经验进行人为设定,其设置的思路是,时间长度越长,其对应的可撕裂次数越多,反之可撕裂次数越少。
在一个实施例中,上述选择与所述撕裂处理次数相同个数的时间点,以对所述声音频谱图进行不同次的撕裂处理的步骤S203包括:
在所述时间长度上平均分配所述撕裂处理次数对应数量的时间点,以对所述声音频谱图进行不同次的撕裂处理。
在本实施例中,在上述时间长度内平均分配时间点,分配快速,均匀,样本之间的差异相对上述的随机分配更为平均。
在一个实施例中,上述声音频谱图一次只能以一个时间点进行撕裂,得到只有一个撕裂处的撕裂频谱图;在另一个实施例中,一个声音频谱图可以同时以多个时间点为撕裂点进行撕裂处理,得到一个存在多个撕裂处的撕裂频谱图。
在一个实施例中,上述以所述时间点为撕裂点,将所述撕裂点两侧的声音频谱图在时间方向上进行分离,完成对所述声音频谱图的撕裂处理,并在断裂处按照预设规则添加过度信息,得到撕裂频谱图的步骤S3之后,包括:
S4、在所述撕裂频谱图上,在时间方向上选择出多个间隔设置的第一频谱块;
S5、在每一个所述第一频谱块上应用掩码序列,得到第一掩码频谱图。
在本实施例中,在撕裂频谱图的时间方向上,选取出x(正整数)个连续的时间步长[t0, t0+t]的第一频谱块,然后在这些第一频谱块上应用掩码序列[w1,…], w是从[0,W]的均匀分布中随机选取的数字,W 为时间掩码参数。在一个具体实施例中,选取不同t,既可以得到不同的第一掩码频谱图,从而得到多个对应撕裂频谱图的多个第一掩码频谱图,上述声音频谱图和全部的第一掩码频谱图、全部撕裂频谱图放在一起,形成第二语音训练样本集,进一步地提高样本的数量和样本的丰富度。在本实施例中,上述的t所代表的时间长度小于上述撕裂频谱图的时间长度,上述t0是上述撕裂频谱图中的任意时间点,但是要求能够满足对撕裂频谱图的分块。
在一个实施例中,上述以所述时间点为撕裂点,将所述撕裂点两侧的声音频谱图在时间方向上进行分离,完成对所述声音频谱图的撕裂处理,并在断裂处按照预设规则添加过度信息,得到撕裂频谱图的步骤S3之后,还包括:
S6、在所述撕裂频谱图上,在频率方向上选择出多个不同频率通道的第二频谱块;
S7、在每一个所述第二频谱块上应用掩码序列,得到第二掩码频谱图。
在本实施例中,上述第二频谱块是在频率方向的频谱块,而不是时间上的频谱块。具体的,在频谱图的频率方向上,对n(正整数)个连续的频率通道[m0, m0+n]的频谱块应用掩码序列[v1, …], v是从[0,V]的均匀分布中随机选取的数字,V为频率掩码参数。同样的,选取不同的n,既可以得到不同的第二掩码频谱图,从而得到多个对应撕裂频谱图的多个第二掩码频谱图,上述声音频谱图和全部的第二掩码频谱图、全部撕裂频谱图放在一起,形成第三语音训练样本集。在本实施例中,上述m0是上述撕裂频谱图中的任意频率通道点,但是要求能够满足对撕裂频谱图的分块。
在一个实施例中,上述在所述声音频谱图上的时间方向上随机选择时间点的步骤S2包括:
S21、在所述声音频谱图上的时间方向随机添加掩码,得到第三掩码频谱图;
S22、在所述第三掩码频谱图上的时间方向上随机选择所述时间点。
在本实施例中,先在声音频谱图上添加掩码,然后再在上述第三掩码频谱图上的时间方向上随机选择所述时间点,从而可以得到更加丰富的样本。
本申请实施例的语音训练样本的获取方法可以将一个原始的语音信号转换成声音频谱图后,通过撕裂、掩码的处理,将一个声音频谱图衍生出大量的撕裂频谱图、第一掩码频谱图和第二掩码频谱图,而这些撕裂频谱图、第一掩码频谱图和第二掩码频谱图均可以作为训练声纹识别模型的样本,从而可以解决现有技术中训练声纹识别模型的样本量较少无法得到准确的声纹识别模型的问题。比如,分别获取到不同信道场景下的语音信息,如果直接使用这些语音信息作为训练样本,会因为训练样本的数量较少而无法得到准确的声纹识别模型,但是通过本申请的上述方法,可以根据少量语音信息对应的训练样本,衍生出大量的训练样本,从而解决训练样本少的问题,很好的解决不同信道场景下的样本较少,无法很好的训练出一个声纹识别模型的问题。
参照图2,本申请实施例还提供一种语音训练样本的获取装置,包括:
转换单元10,用于对语音信号进行处理,得到所述语音信号的声音频谱图;
选择单元20,用于在所述声音频谱图上的时间方向上随机选择时间点;
撕裂单元30,用于以所述时间点为撕裂点,将所述撕裂点两侧的声音频谱图在时间方向上进行分离,完成对所述声音频谱图的撕裂处理,并在断裂处按照预设规则添加过度信息,得到撕裂频谱图,并将所述撕裂频谱图作为所述语音训练样本,其中,所述撕裂点两侧的声音频谱图的分离距离为s,所述s是从[0,S]的均匀分布中随机选取的数字,S为时间变形参数。
在本实施例中,转换单元10首先将作为样本的语音信号转换成声音频谱图,声音频谱图一般为梅尔频谱图,具体的转换过程可以使用现有技术中的任意一中。选择单元20随机选择出时间点后,撕裂单元30以该个时间点作为撕裂点,将声音频谱图进行撕裂,即在该时间点将上述声音频谱图在时间上分开,分开的方式可以是多种,比如,将撕裂点两侧的声音频谱图中的第一侧固定,第二侧向远离第一侧的方向移动;或者,第一侧和第二侧分别向远离对方的方向移动等。在一个具体实施例,可以是固定第一侧,第二侧向远离第一侧的方向移动s;,然后在原始的声音频谱图上固定第二侧,第一侧向远离第二侧的方向移动s等,从而在一个时间点的处理上得到不同移动方向的两张撕裂频谱图。在它实施例中,也可以在指定方向移动指定的距离。进一步地,重复上述随机选择时间点和撕裂处理的过程,每次选择不同的时间点,得到多个对应所述声音频谱图的多个所述撕裂频谱图形,最后将所述声音频谱图和多个所述撕裂频谱图形成第一语音训练样本集。利用本申请的技术方案,可以通过一张声音频谱图衍生出多张撕裂处理后的撕裂频谱图,从而丰富了语音训练样本的数量,从而可以解决现有技术中训练声纹识别模型的样本量较少无法得到准确的声纹识别模型的问题。比如,可以很好的解决不同信道场景下的样本较少,无法很好的训练出一个声纹识别模型的问题。
在一个实施例中,上述撕裂单元30,还包括:
添加单元,用于对所述撕裂频谱图的断裂处随机添加所述过度信息。即预设规则是在断裂处随机添加过度信息。
在本实施例中,因为撕裂频谱图存在撕裂的频谱段,撕裂的频谱段会存在空白,为了提高训练样本的多样性,可以在空白处添加过度信息,如将添加不同的平滑信号等。过度信息可以为预设的,一般会预设多个不同的过度信息,然后随机选择一个过度信息添加到断裂处,如果过度信息不能刚好补满空白,可以对过度信息进行等比例的放大或缩小,使其可以恰好添加到空白。在另一具体实施例中,上述S为正整数,那么就设置S种的过度信息,每一种过度信息又包括多个内容不同的过度信息,在添加过度信息的时候,在对应s种的过度信息中随机选择一个过度信息,进一步地提供训练样本的多样性。
在另一实施例,在预设规则是在断裂处添加全部相同的数据,如全部加0、加1,或者其它如010101不断重复循环的数据等。
在一个实施例中,上述语音训练样本的获取装置,还包括:
获取单元,用于获取所述声音频谱图的时间长度;
确定单元,用于根据所述时间长度确定对所述声音频谱图的撕裂处理次数;
选择单元,用于选择与所述撕裂处理次数相同个数的时间点,以对所述声音频谱图进行不同次的撕裂处理。
在本实施例中,不能对上述声音频谱图进行无线次数的撕裂处理,所以本申请会根据声音频谱图中的时间信息的长度来确定撕裂的次数。具体的,设置一个映射表,映射表中一列是时间长度范围,一列是对应时间长度范围的撕裂次数,当确定声音频谱图中的时间长度后,查看该时间长度落入了映射表中的哪一个时间长度范围内,然后选择该时间长度范围对应的撕裂次数。具体的时间长度和撕裂次数的设置,可以根据经验进行人为设定,其设置的思路是,时间长度越长,其对应的可撕裂次数越多,反之可撕裂次数越少。
在一个实施例中,上述选择单元,包括:
平均选择模块,用于在所述时间长度上平均分配所述撕裂处理次数对应数量的时间点,以对所述声音频谱图进行不同次的撕裂处理。
在本实施例中,在上述时间长度内平均分配时间点,分配快速,均匀,样本之间的差异相对上述的随机分配更为平均。
在一个实施例中,上述声音频谱图一次只能以一个时间点进行撕裂,得到只有一个撕裂处的撕裂频谱图;在另一个实施例中,一个声音频谱图可以同时以多个时间点为撕裂点进行撕裂处理,得到一个存在多个撕裂处的撕裂频谱图。
在一个实施例中,上述语音训练样本的获取装置,还包括:
时间频谱单元,用于在所述撕裂频谱图上,在时间方向上选择出多个间隔设置的第一频谱块;
第一掩码单元,用于在每一个所述第一频谱块上应用掩码序列,得到第一掩码频谱图。
在本实施例中,在撕裂频谱图的时间方向上,选取出x(正整数)个连续的时间步长[t0, t0+t]的第一频谱块,然后在这些第一频谱块上应用掩码序列[w1,…], w是从[0,W]的均匀分布中随机选取的数字,W 为时间掩码参数。在一个具体实施例中,选取不同t,既可以得到不同的第一掩码频谱图,从而得到多个对应撕裂频谱图的多个第一掩码频谱图,上述声音频谱图和全部的第一掩码频谱图、全部撕裂频谱图放在一起,形成第二语音训练样本集,进一步地提高样本的数量和样本的丰富度。在本实施例中,上述的t所代表的时间长度小于上述撕裂频谱图的时间长度,上述t0是上述撕裂频谱图中的任意时间点,但是要求能够满足对撕裂频谱图的分块。
在一个实施例中,上述语音训练样本的获取装置,还包括:
频率频谱单元,用于在所述撕裂频谱图上,在频率方向上选择出多个不同频率通道的第二频谱块;
第二掩码单元,用于在每一个所述第二频谱块上应用掩码序列,得到第二掩码频谱图。
在本实施例中,上述第二频谱块是在频率方向的频谱块,而不是时间上的频谱块。具体的,在频谱图的频率方向上,对n(正整数)个连续的频率通道[m0, m0+n]的频谱块应用掩码序列[v1, …], v是从[0,V]的均匀分布中随机选取的数字,V为频率掩码参数。同样的,选取不同的n,既可以得到不同的第二掩码频谱图,从而得到多个对应撕裂频谱图的多个第二掩码频谱图,上述声音频谱图和全部的第二掩码频谱图、全部撕裂频谱图放在一起,形成第三语音训练样本集。在本实施例中,上述m0是上述撕裂频谱图中的任意频率通道点,但是要求能够满足对撕裂频谱图的分块。
在一个实施例中,上述选择单元20,包括:
掩码模块,用于在所述声音频谱图上的时间方向随机添加掩码,得到第三掩码频谱图;
选择模块,用于在所述第三掩码频谱图上的时间方向上随机选择所述时间点。
在本实施例中,先在声音频谱图上添加掩码,然后再在上述第三掩码频谱图上的时间方向上随机选择所述时间点,从而可以得到更加丰富的样本。
本申请实施例的语音训练样本的获取装置,可以将一个原始的语音信号转换成声音频谱图后,通过撕裂、掩码的处理,将一个声音频谱图衍生出大量的撕裂频谱图、第一掩码频谱图和第二掩码频谱图,而这些撕裂频谱图、第一掩码频谱图和第二掩码频谱图均可以作为训练声纹识别模型的样本,从而可以解决现有技术中训练声纹识别模型的样本量较少无法得到准确的声纹识别模型的问题。比如,分别获取到不同信道场景下的语音信息,如果直接使用这些语音信息作为训练样本,会因为训练样本的数量较少而无法得到准确的声纹识别模型,但是通过本申请的上述方法,可以根据少量语音信息对应的训练样本,衍生出大量的训练样本,从而解决训练样本少的问题,很好的解决不同信道场景下的样本较少,无法很好的训练出一个声纹识别模型的问题。
参照图3,本申请实施例还提供一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,其内部结构可以如图3所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设计的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机程序和数据库。该内存器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的数据库用于存储样本集等数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时以实现一种语音训练样本的获取方法。具体地:
一种语音训练样本的获取方法,包括:对语音信号进行处理,得到所述语音信号的声音频谱图;在所述声音频谱图上的时间方向上随机选择时间点;以所述时间点为撕裂点,将所述撕裂点两侧的声音频谱图在时间方向上进行分离,完成对所述声音频谱图的撕裂处理,并在断裂处按照预设规则添加过度信息,得到撕裂频谱图,并将所述撕裂频谱图作为所述语音训练样本,其中,所述撕裂点两侧的声音频谱图的分离距离为s,所述s是从[0,S]的均匀分布中随机选取的数字,S为时间变形参数。
在一个实施例中,所述在撕裂处按照预设规则添加过度频谱图信息的步骤,包括:对所述撕裂频谱图的断裂处随机添加所述过度信息。
在一个实施例中,所述在所述声音频谱图上的时间方向上随机选择时间点的步骤之前,包括:获取所述声音频谱图的时间长度;根据所述时间长度确定对所述声音频谱图的撕裂处理次数;选择与所述撕裂处理次数相同个数的时间点,以对所述声音频谱图进行不同次的撕裂处理。
在一个实施例中,所述选择与所述撕裂处理次数相同个数的时间点,以对所述声音频谱图进行不同次的撕裂处理的步骤包括:在所述时间长度上平均分配所述撕裂处理次数对应数量的时间点,以对所述声音频谱图进行不同次的撕裂处理。
在一个实施例中,所述以所述时间点为撕裂点,将所述撕裂点两侧的声音频谱图在时间方向上进行分离,完成对所述声音频谱图的撕裂处理,并在断裂处按照预设规则添加过度信息,得到撕裂频谱图的步骤之后,包括:在所述撕裂频谱图上,在时间方向上选择出多个间隔设置的第一频谱块;在每一个所述第一频谱块上应用掩码序列,得到第一掩码频谱图。
在一个实施例中,所述以所述时间点为撕裂点,将所述撕裂点两侧的声音频谱图在时间方向上进行分离,完成对所述声音频谱图的撕裂处理,并在断裂处按照预设规则添加过度信息,得到撕裂频谱图的步骤之后,还包括:在所述撕裂频谱图上,在频率方向上选择出多个不同频率通道的第二频谱块;在每一个所述第二频谱块上应用掩码序列,得到第二掩码频谱图。
在一个实施例中,在所述声音频谱图上的时间方向上随机选择时间点的步骤包括:在所述声音频谱图上的时间方向随机添加掩码,得到第三掩码频谱图;在所述第三掩码频谱图上的时间方向上随机选择所述时间点。
本申请实施例的计算机设备,可以将一个原始的语音信号转换成声音频谱图后,通过撕裂、掩码的处理,将一个声音频谱图衍生出大量的撕裂频谱图、第一掩码频谱图和第二掩码频谱图,而这些撕裂频谱图、第一掩码频谱图和第二掩码频谱图均可以作为训练声纹识别模型的样本,从而可以解决现有技术中训练声纹识别模型的样本量较少无法得到准确的声纹识别模型的问题。比如,分别获取到不同信道场景下的语音信息,如果直接使用这些语音信息作为训练样本,会因为训练样本的数量较少而无法得到准确的声纹识别模型,但是通过本申请的上述方法,可以根据少量语音信息对应的训练样本,衍生出大量的训练样本,从而解决训练样本少的问题,很好的解决不同信道场景下的样本较少,无法很好的训练出一个声纹识别模型的问题。
本申请一实施例还提供一种计算机可读存储介质,计算机可读存储介质可以是非易失性,也可以是易失性,其上存储有计算机程序,计算机程序被处理器执行时实现一种语音训练样本的获取方法。具体地:
一种语音训练样本的获取方法,包括:对语音信号进行处理,得到所述语音信号的声音频谱图;在所述声音频谱图上的时间方向上随机选择时间点;以所述时间点为撕裂点,将所述撕裂点两侧的声音频谱图在时间方向上进行分离,完成对所述声音频谱图的撕裂处理,并在断裂处按照预设规则添加过度信息,得到撕裂频谱图,并将所述撕裂频谱图作为所述语音训练样本,其中,所述撕裂点两侧的声音频谱图的分离距离为s,所述s是从[0,S]的均匀分布中随机选取的数字,S为时间变形参数。
在一个实施例中,所述在撕裂处按照预设规则添加过度频谱图信息的步骤,包括:对所述撕裂频谱图的断裂处随机添加所述过度信息。
在一个实施例中,所述在所述声音频谱图上的时间方向上随机选择时间点的步骤之前,包括:获取所述声音频谱图的时间长度;根据所述时间长度确定对所述声音频谱图的撕裂处理次数;选择与所述撕裂处理次数相同个数的时间点,以对所述声音频谱图进行不同次的撕裂处理。
在一个实施例中,所述选择与所述撕裂处理次数相同个数的时间点,以对所述声音频谱图进行不同次的撕裂处理的步骤包括:在所述时间长度上平均分配所述撕裂处理次数对应数量的时间点,以对所述声音频谱图进行不同次的撕裂处理。
在一个实施例中,所述以所述时间点为撕裂点,将所述撕裂点两侧的声音频谱图在时间方向上进行分离,完成对所述声音频谱图的撕裂处理,并在断裂处按照预设规则添加过度信息,得到撕裂频谱图的步骤之后,包括:在所述撕裂频谱图上,在时间方向上选择出多个间隔设置的第一频谱块;在每一个所述第一频谱块上应用掩码序列,得到第一掩码频谱图。
在一个实施例中,所述以所述时间点为撕裂点,将所述撕裂点两侧的声音频谱图在时间方向上进行分离,完成对所述声音频谱图的撕裂处理,并在断裂处按照预设规则添加过度信息,得到撕裂频谱图的步骤之后,还包括:在所述撕裂频谱图上,在频率方向上选择出多个不同频率通道的第二频谱块;在每一个所述第二频谱块上应用掩码序列,得到第二掩码频谱图。
在一个实施例中,在所述声音频谱图上的时间方向上随机选择时间点的步骤包括:在所述声音频谱图上的时间方向随机添加掩码,得到第三掩码频谱图;在所述第三掩码频谱图上的时间方向上随机选择所述时间点。
计算机程序被处理器执行时实现一种语音训练样本的获取方法时,可以将一个原始的语音信号转换成声音频谱图后,通过撕裂、掩码的处理,将一个声音频谱图衍生出大量的撕裂频谱图、第一掩码频谱图和第二掩码频谱图,而这些撕裂频谱图、第一掩码频谱图和第二掩码频谱图均可以作为训练声纹识别模型的样本,从而可以解决现有技术中训练声纹识别模型的样本量较少无法得到准确的声纹识别模型的问题。比如,分别获取到不同信道场景下的语音信息,如果直接使用这些语音信息作为训练样本,会因为训练样本的数量较少而无法得到准确的声纹识别模型,但是通过本申请的上述方法,可以根据少量语音信息对应的训练样本,衍生出大量的训练样本,从而解决训练样本少的问题,很好的解决不同信道场景下的样本较少,无法很好的训练出一个声纹识别模型的问题。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一非易失性计算机可读取存储介质中,该计算机程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的和实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可以包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双速据率SDRAM(SSRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
以上所述仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。

Claims (20)

  1. 一种语音训练样本的获取方法,其中,包括:
    对语音信号进行处理,得到所述语音信号的声音频谱图;
    在所述声音频谱图上的时间方向上随机选择时间点;
    以所述时间点为撕裂点,将所述撕裂点两侧的声音频谱图在时间方向上进行分离,完成对所述声音频谱图的撕裂处理,并在断裂处按照预设规则添加过度信息,得到撕裂频谱图,并将所述撕裂频谱图作为所述语音训练样本,其中,所述撕裂点两侧的声音频谱图的分离距离为s,所述s是从[0,S]的均匀分布中随机选取的数字,S为时间变形参数。
  2. 根据权利要求1所述的语音训练样本的获取方法,其中,所述在撕裂处按照预设规则添加过度信息的步骤,包括:
    对所述撕裂频谱图的断裂处随机添加所述过度信息。
  3. 根据权利要求1所述的语音训练样本的获取方法,其中,所述在所述声音频谱图上的时间方向上随机选择时间点的步骤之前,包括:
    获取所述声音频谱图的时间长度;
    根据所述时间长度确定对所述声音频谱图的撕裂处理次数;
    选择与所述撕裂处理次数相同个数的时间点,以对所述声音频谱图进行不同次的撕裂处理。
  4. 根据权利要求3所述的语音训练样本的获取方法,其中,所述选择与所述撕裂处理次数相同个数的时间点,以对所述声音频谱图进行不同次的撕裂处理的步骤包括:
    在所述时间长度上平均分配所述撕裂处理次数对应数量的时间点,以对所述声音频谱图进行不同次的撕裂处理。
  5. 根据权利要求1所述的语音训练样本的获取方法,其中,所述以所述时间点为撕裂点,将所述撕裂点两侧的声音频谱图在时间方向上进行分离,完成对所述声音频谱图的撕裂处理,并在断裂处按照预设规则添加过度信息,得到撕裂频谱图的步骤之后,包括:
    在所述撕裂频谱图上,在时间方向上选择出多个间隔设置的第一频谱块;
    在每一个所述第一频谱块上应用掩码序列,得到第一掩码频谱图。
  6. 根据权利要求1所述的语音训练样本的获取方法,其中,所述以所述时间点为撕裂点,将所述撕裂点两侧的声音频谱图在时间方向上进行分离,完成对所述声音频谱图的撕裂处理,并在断裂处按照预设规则添加过度信息,得到撕裂频谱图的步骤之后,还包括:
    在所述撕裂频谱图上,在频率方向上选择出多个不同频率通道的第二频谱块;
    在每一个所述第二频谱块上应用掩码序列,得到第二掩码频谱图。
  7. 根据权利要求1所述的语音训练样本的获取方法,其中,在所述声音频谱图上的时间方向上随机选择时间点的步骤包括:
    在所述声音频谱图上的时间方向随机添加掩码,得到第三掩码频谱图;
    在所述第三掩码频谱图上的时间方向上随机选择所述时间点。
  8. 一种语音训练样本的获取装置,其中,包括:
    转换单元,用于对语音信号进行处理,得到所述语音信号的声音频谱图;
    选择单元,用于在所述声音频谱图上的时间方向上随机选择时间点;
    撕裂单元,用于以所述时间点为撕裂点,将所述撕裂点两侧的声音频谱图在时间方向上进行分离,完成对所述声音频谱图的撕裂处理,并在断裂处按照预设规则添加过度信息,得到撕裂频谱图,并将所述撕裂频谱图作为所述语音训练样本,其中,所述撕裂点两侧的声音频谱图的分离距离为s,所述s是从[0,S]的均匀分布中随机选取的数字,S为时间变形参数。
  9. 一种计算机设备,包括存储器和处理器,其中,所述存储器存储有计算机程序,所述处理器执行所述计算机程序时实现一种语音训练样本的获取方法,该方法包括以下步骤:
    对语音信号进行处理,得到所述语音信号的声音频谱图;
    在所述声音频谱图上的时间方向上随机选择时间点;
    以所述时间点为撕裂点,将所述撕裂点两侧的声音频谱图在时间方向上进行分离,完成对所述声音频谱图的撕裂处理,并在断裂处按照预设规则添加过度信息,得到撕裂频谱图,并将所述撕裂频谱图作为所述语音训练样本,其中,所述撕裂点两侧的声音频谱图的分离距离为s,所述s是从[0,S]的均匀分布中随机选取的数字,S为时间变形参数。
  10. 根据权利要求9所述的计算机设备,其中,所述在撕裂处按照预设规则添加过度信息的步骤,包括:
    对所述撕裂频谱图的断裂处随机添加所述过度信息。
  11. 根据权利要求9所述的计算机设备,其中,所述在所述声音频谱图上的时间方向上随机选择时间点的步骤之前,包括:
    获取所述声音频谱图的时间长度;
    根据所述时间长度确定对所述声音频谱图的撕裂处理次数;
    选择与所述撕裂处理次数相同个数的时间点,以对所述声音频谱图进行不同次的撕裂处理。
  12. 根据权利要求11所述的计算机设备,其中,所述选择与所述撕裂处理次数相同个数的时间点,以对所述声音频谱图进行不同次的撕裂处理的步骤包括:
    在所述时间长度上平均分配所述撕裂处理次数对应数量的时间点,以对所述声音频谱图进行不同次的撕裂处理。
  13. 根据权利要求9所述的计算机设备,其中,所述以所述时间点为撕裂点,将所述撕裂点两侧的声音频谱图在时间方向上进行分离,完成对所述声音频谱图的撕裂处理,并在断裂处按照预设规则添加过度信息,得到撕裂频谱图的步骤之后,包括:
    在所述撕裂频谱图上,在时间方向上选择出多个间隔设置的第一频谱块;
    在每一个所述第一频谱块上应用掩码序列,得到第一掩码频谱图。
  14. 根据权利要求9所述的计算机设备,其中,所述以所述时间点为撕裂点,将所述撕裂点两侧的声音频谱图在时间方向上进行分离,完成对所述声音频谱图的撕裂处理,并在断裂处按照预设规则添加过度信息,得到撕裂频谱图的步骤之后,还包括:
    在所述撕裂频谱图上,在频率方向上选择出多个不同频率通道的第二频谱块;
    在每一个所述第二频谱块上应用掩码序列,得到第二掩码频谱图。
  15. 根据权利要求9所述的计算机设备,其中,在所述声音频谱图上的时间方向上随机选择时间点的步骤包括:
    在所述声音频谱图上的时间方向随机添加掩码,得到第三掩码频谱图;
    在所述第三掩码频谱图上的时间方向上随机选择所述时间点。
  16. 一种计算机可读存储介质,其上存储有计算机程序,其中,所述计算机程序被处理器执行时实现一种语音训练样本的获取方法,该方法包括以下步骤:
    对语音信号进行处理,得到所述语音信号的声音频谱图;
    在所述声音频谱图上的时间方向上随机选择时间点;
    以所述时间点为撕裂点,将所述撕裂点两侧的声音频谱图在时间方向上进行分离,完成对所述声音频谱图的撕裂处理,并在断裂处按照预设规则添加过度信息,得到撕裂频谱图,并将所述撕裂频谱图作为所述语音训练样本,其中,所述撕裂点两侧的声音频谱图的分离距离为s,所述s是从[0,S]的均匀分布中随机选取的数字,S为时间变形参数。
  17. 根据权利要求16所述的计算机可读存储介质,其中,所述在撕裂处按照预设规则添加过度信息的步骤,包括:
    对所述撕裂频谱图的断裂处随机添加所述过度信息。
  18. 根据权利要求16所述的计算机可读存储介质,其中,所述在所述声音频谱图上的时间方向上随机选择时间点的步骤之前,包括:
    获取所述声音频谱图的时间长度;
    根据所述时间长度确定对所述声音频谱图的撕裂处理次数;
    选择与所述撕裂处理次数相同个数的时间点,以对所述声音频谱图进行不同次的撕裂处理。
  19. 根据权利要求18所述的计算机可读存储介质,其中,所述选择与所述撕裂处理次数相同个数的时间点,以对所述声音频谱图进行不同次的撕裂处理的步骤包括:
    在所述时间长度上平均分配所述撕裂处理次数对应数量的时间点,以对所述声音频谱图进行不同次的撕裂处理。
  20. 根据权利要求16所述的计算机可读存储介质,其中,所述以所述时间点为撕裂点,将所述撕裂点两侧的声音频谱图在时间方向上进行分离,完成对所述声音频谱图的撕裂处理,并在断裂处按照预设规则添加过度信息,得到撕裂频谱图的步骤之后,包括:
    在所述撕裂频谱图上,在时间方向上选择出多个间隔设置的第一频谱块;
    在每一个所述第一频谱块上应用掩码序列,得到第一掩码频谱图。
PCT/CN2020/093092 2020-02-14 2020-05-29 语音训练样本的获取方法、装置、计算机设备和存储介质 WO2021159635A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010093613.X 2020-02-14
CN202010093613.XA CN111370002B (zh) 2020-02-14 2020-02-14 语音训练样本的获取方法、装置、计算机设备和存储介质

Publications (1)

Publication Number Publication Date
WO2021159635A1 true WO2021159635A1 (zh) 2021-08-19

Family

ID=71206253

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/093092 WO2021159635A1 (zh) 2020-02-14 2020-05-29 语音训练样本的获取方法、装置、计算机设备和存储介质

Country Status (2)

Country Link
CN (1) CN111370002B (zh)
WO (1) WO2021159635A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115580682A (zh) * 2022-12-07 2023-01-06 北京云迹科技股份有限公司 机器人拨打电话的接通挂断时刻的确定的方法及装置

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112017638A (zh) * 2020-09-08 2020-12-01 北京奇艺世纪科技有限公司 语音语义识别模型构建方法、语义识别方法、装置及设备
CN113241062B (zh) * 2021-06-01 2023-12-26 平安科技(深圳)有限公司 语音训练数据集的增强方法、装置、设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106898357A (zh) * 2017-02-16 2017-06-27 华南理工大学 一种基于正态分布规律的矢量量化方法
CN108830277A (zh) * 2018-04-20 2018-11-16 平安科技(深圳)有限公司 语义分割模型的训练方法、装置、计算机设备和存储介质
CN108922560A (zh) * 2018-05-02 2018-11-30 杭州电子科技大学 一种基于混合深度神经网络模型的城市噪声识别方法
CN109087632A (zh) * 2018-08-17 2018-12-25 平安科技(深圳)有限公司 语音处理方法、装置、计算机设备及存储介质
CN110751177A (zh) * 2019-09-17 2020-02-04 阿里巴巴集团控股有限公司 分类模型的训练方法、预测方法及装置

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR940002854B1 (ko) * 1991-11-06 1994-04-04 한국전기통신공사 음성 합성시스팀의 음성단편 코딩 및 그의 피치조절 방법과 그의 유성음 합성장치
CN104408681A (zh) * 2014-11-04 2015-03-11 南昌大学 基于分数梅林变换的多图像隐藏方法
CN104484872A (zh) * 2014-11-27 2015-04-01 浙江工业大学 基于方向的干涉图像边缘扩充方法
US10373073B2 (en) * 2016-01-11 2019-08-06 International Business Machines Corporation Creating deep learning models using feature augmentation
CN110148400B (zh) * 2018-07-18 2023-03-17 腾讯科技(深圳)有限公司 发音类型的识别方法、模型的训练方法、装置及设备
CN110379414B (zh) * 2019-07-22 2021-12-03 出门问问(苏州)信息科技有限公司 声学模型增强训练方法、装置、可读存储介质及计算设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106898357A (zh) * 2017-02-16 2017-06-27 华南理工大学 一种基于正态分布规律的矢量量化方法
CN108830277A (zh) * 2018-04-20 2018-11-16 平安科技(深圳)有限公司 语义分割模型的训练方法、装置、计算机设备和存储介质
CN108922560A (zh) * 2018-05-02 2018-11-30 杭州电子科技大学 一种基于混合深度神经网络模型的城市噪声识别方法
CN109087632A (zh) * 2018-08-17 2018-12-25 平安科技(深圳)有限公司 语音处理方法、装置、计算机设备及存储介质
CN110751177A (zh) * 2019-09-17 2020-02-04 阿里巴巴集团控股有限公司 分类模型的训练方法、预测方法及装置

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115580682A (zh) * 2022-12-07 2023-01-06 北京云迹科技股份有限公司 机器人拨打电话的接通挂断时刻的确定的方法及装置

Also Published As

Publication number Publication date
CN111370002A (zh) 2020-07-03
CN111370002B (zh) 2022-08-19

Similar Documents

Publication Publication Date Title
WO2021159635A1 (zh) 语音训练样本的获取方法、装置、计算机设备和存储介质
WO2021128256A1 (zh) 语音转换方法、装置、设备及存储介质
EP0366192B1 (de) Textverarbeitungsvorrichtung
CN108597496A (zh) 一种基于生成式对抗网络的语音生成方法及装置
CN103903627A (zh) 一种语音数据的传输方法及装置
EP1361739A1 (de) Verfahren und System zur Verarbeitung von Sprachdaten mit vorausgehender Erkennung der Sprache
CN110136696B (zh) 音频数据的监控处理方法和系统
DE60207217T2 (de) Verfahren zum ermöglichen der sprachinteraktion mit einer internet-seite
CN110310662A (zh) 音节自动标注方法、装置、计算机设备及存储介质
CN110600052B (zh) 一种语音评测的方法及装置
CN113870892A (zh) 基于语音识别的会议记录方法、装置、设备及存储介质
DE102015106280B4 (de) Systeme und Verfahren zum Kompensieren von Sprachartefakten in Spracherkennungssystemen
DE60133537T2 (de) Automatisches umtrainieren eines spracherkennungssystems
WO2008009429A1 (de) Verfahren, sprachdialogsystem und telekommunikationsendgerät zur multilingualen sprachausgabe
CN113470688A (zh) 语音数据的分离方法、装置、设备及存储介质
DE10220522A1 (de) Verfahren und System zur Verarbeitung von Sprachdaten mittels Spracherkennung und Frequenzanalyse
US11367456B2 (en) Streaming voice conversion method and apparatus and computer readable storage medium using the same
Albuquerque et al. Automatic no-reference speech quality assessment with convolutional neural networks
Kąkol et al. Improving objective speech quality indicators in noise conditions
CN109213466B (zh) 庭审信息的显示方法及装置
CN109086387A (zh) 一种音频流评分方法、装置、设备及存储介质
Zergat et al. The voice as a material clue: a new forensic Algerian Corpus
CN113270089A (zh) 语音重采样方法及装置
CN106971731B (zh) 一种声纹识别的修正方法
EP4064081B1 (de) Verfahren und system zum identifizieren und authentifizieren eines nutzers in einem ip netz

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20918539

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20918539

Country of ref document: EP

Kind code of ref document: A1