CN111312208A - Neural network vocoder system with irrelevant speakers - Google Patents

Neural network vocoder system with irrelevant speakers Download PDF

Info

Publication number
CN111312208A
CN111312208A CN202010158293.1A CN202010158293A CN111312208A CN 111312208 A CN111312208 A CN 111312208A CN 202010158293 A CN202010158293 A CN 202010158293A CN 111312208 A CN111312208 A CN 111312208A
Authority
CN
China
Prior art keywords
tone
feature
neural network
acoustic
waveform
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010158293.1A
Other languages
Chinese (zh)
Inventor
周俊明
何颖洋
吴东海
黄博贤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Shensheng Technology Co Ltd
Original Assignee
Guangzhou Shensheng Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Shensheng Technology Co Ltd filed Critical Guangzhou Shensheng Technology Co Ltd
Priority to CN202010158293.1A priority Critical patent/CN111312208A/en
Publication of CN111312208A publication Critical patent/CN111312208A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Electrophonic Musical Instruments (AREA)

Abstract

The invention discloses a neural network vocoder system with irrelevant speakers, which comprises the following steps: s1, the tone color feature extraction module receives the acoustic features M and performs tone color feature extraction on the acoustic features to obtain tone color feature information S, and the acoustic features can select a Mel frequency spectrum, a Mel cepstrum and a linear magnitude spectrum; and S2, the waveform production module receives the acoustic feature M and the tone feature S output by the tone extraction module, and carries out waveform generation processing to obtain a voice waveform W. The invention solves the problems that each single tone vocoder system can only serve one specific tone, the service deployment and operation cost is high, a new vocoder system needs to be trained when a new tone is encountered, the training time is long, and a large amount of recording data of a certain tone is needed for training.

Description

Neural network vocoder system with irrelevant speakers
Technical Field
The invention relates to the technical field of networks, in particular to a neural network vocoder system with irrelevant speakers.
Background
With the rapid development of the neural network technology, the voice synthesis effect is also rapidly improved. Realistic speech synthesis technology has been applied to news broadcasting, audio books, voice assistants, intelligent customer service, virtual characters, voice cloning, and the like. Along with the continuous development of artificial intelligence technology and the continuous increase of application scenes, people have higher and higher requirements on speech synthesis technology. Not only is the sound quality of speech synthesis required to be realistic, but it is also desirable to be able to synthesize a wide variety of timbres. This presents many challenges to the development of speech synthesis technology and application deployment.
The current mainstream speech synthesis technology system mainly comprises three subsystems: speech synthesis front-end systems (converting text to phonemes); speech synthesis backend systems (convert phonemes into acoustic features); vocoder systems (convert acoustic features into audio). Among them, the vocoder system plays an important role in synthesizing the sound quality. In recent years, with the success of vocoder systems constructed by neural networks such as WaveNet, SampleRNN, WaveRNN, etc., existing single-tone vocoder systems have been able to synthesize comparable real recording sounds. However, these monophonic vocoder systems can only synthesize sounds of a single timbre and cannot support high quality synthesis of multiple timbres with a single system. Therefore, if the timbre diversity requires a high application scenario (e.g., talking books, voice cloning, etc.), a very large number of vocoder systems are required to meet the requirement of multiple timbres. Along with the increase of the number of systems, the number of hardware for service deployment is increased, and the operation cost is greatly improved. Moreover, the vocoder system for each tone requires training with recording data of the tone for several hours, and the training is converged before synthesizing the voice. The time required for training will vary depending on the training hardware, but typically requires 2-7 days of training time. This undoubtedly brings a huge obstacle to the synthesis of multi-timbre high-quality speech, especially in scenes where sufficient training recording data cannot be acquired.
In summary, the current vocoder system has the following disadvantages in the multi-tone application scenario:
1. each monophonic vocoder system can only serve one particular timbre. Service deployment and operation costs are high.
2. Encountering a new timbre requires a lengthy training time (typically 2-7 days) from the new training vocoder system.
3. A large amount of recording data (generally, recording data of 3 hours or more) of a certain tone color is required for training.
Disclosure of Invention
The invention aims to solve the problems that each single tone vocoder system can only serve one specific tone, the service deployment and operation cost is high, a new vocoder system needs to be trained from a new tone, the training time is long, and a large amount of recording data of a certain tone are needed for training.
In order to achieve the purpose, the invention adopts the following technical scheme: a speaker-independent neural network vocoder system, comprising the steps of:
s1, the tone color feature extraction module receives the acoustic features M and performs tone color feature extraction on the acoustic features to obtain tone color feature information S, and the acoustic features can select a Mel frequency spectrum, a Mel cepstrum and a linear magnitude spectrum;
and S2, the waveform production module receives the acoustic feature M and the tone feature S output by the tone extraction module, and performs waveform generation processing to obtain a voice waveform W.
2. The speaker-incoherent neural network vocoder system of claim 1, wherein: in S1, the acoustic feature may be selected from a mel-frequency spectrum, a mel-frequency cepstrum, and a linear amplitude spectrum.
3. The speaker-incoherent neural network vocoder system of claim 1, wherein: in S1, the traditional tone color feature extraction module extracts a traditional tone color feature sp from the input acoustic feature M, where the traditional tone color feature may be selected from a basic audio frequency, a voiced-unvoiced flag, a magnitude spectrum envelope, a linear prediction coefficient, or a line spectrum pair;
the feature mapping network module maps the traditional tone features sp output by the traditional tone feature extraction module into abstract tone features S, and the tone feature mapping network can be formed by a residual error network or a bidirectional cyclic neural network.
4. The speaker-incoherent neural network vocoder system of claim 1, wherein: in S2, performing upsampling processing on the acoustic feature M and the tone feature S, and increasing the sampling rate to the sampling rate of the audio waveform, for example, if the sampling rate of the audio waveform is 16000Hz, the sampling rate of the acoustic feature is 80Hz, and the duration of each frame is 12.5ms, upsampling the acoustic feature and the tone feature sampling rate 80Hz by 200 times, so as to obtain 16000Hz sampled acoustic feature M1 and tone feature S1;
the up-sampled acoustic feature M1 and the tone feature S1 are input to the neural network layer 1, the feature M2 is output, and then, the operation is repeated N times, the output feature Mi of the previous neural network layer and the tone feature S1 are input to the next neural network layer i, and then, the feature Mi +1 is output. The neural network layer can be realized by a CNN network or a unidirectional RNN network;
the DNN network layer converts the output feature MN +1 of the neural network layer N into a speech waveform W.
Compared with the prior art, the invention has the following beneficial effects: the invention adopts an independent tone characteristic extraction module to extract the tone characteristics of a target speaker, and continuously inputs the tone-changed characteristics into each processing network of the waveform production module, thereby enhancing the robustness of the waveform production module and enabling the waveform production module to synthesize sounds with different tones in high quality from acoustic characteristics.
The neural network vocoder system with irrelevant speakers can synthesize voices in a training data set and outside the training data set, the synthesis effect is close to real person recording, and the voice data of the target speakers do not need to be collected in large quantity for new timbre because the system is irrelevant to the target speakers, and the new training model is not needed, and only a neural network vocoder with irrelevant speakers needs to be trained in advance to be applied to the timbre. This greatly reduces the time cost and hardware cost of a multi-tone speech synthesis scene.
Drawings
The invention is described in further detail below with reference to the following figures and detailed description:
FIG. 1 is a schematic view of the present invention in its entirety;
FIG. 2 is a schematic diagram of processing details of the timbre feature extraction module of the present invention;
FIG. 3 is a schematic diagram of processing details of the waveform generation module of the present invention.
In the figure: a tone color feature extraction module 101, a waveform production module 102, a traditional tone color feature extraction module 01, a feature mapping network module 02 and an up-sampling processing module 03.
Detailed Description
The following description of the embodiments of the present invention is provided for illustrative purposes, and other advantages and effects of the present invention will become apparent to those skilled in the art from the present disclosure.
Please refer to fig. 1 to 3. It should be understood that the structures, ratios, sizes, and the like shown in the drawings and described in the specification are only used for matching with the disclosure of the specification, so as to be understood and read by those skilled in the art, and are not used to limit the conditions under which the present invention can be implemented, so that the present invention has no technical significance, and any structural modification, ratio relationship change, or size adjustment should still fall within the scope of the present invention without affecting the efficacy and the achievable purpose of the present invention. In addition, the terms "upper", "lower", "left", "right", "middle" and "one" used in the present specification are for clarity of description, and are not intended to limit the scope of the present invention, and the relative relationship between the terms and the terms is not to be construed as a scope of the present invention.
The invention provides a technical scheme that: a neural network vocoder system with speaker incoherence comprises a tone color feature extraction module 101 and a waveform production module 102, and the processing procedure comprises the following two steps:
the acoustic feature extraction module 101 receives the acoustic feature M, performs a sound feature extraction process on the acoustic feature, and outputs sound feature information S, where the acoustic feature of this example is selected as a mel frequency spectrum, but is not limited to the mel frequency spectrum;
the waveform generation module 102 receives the acoustic features M and the timbre features S output by the timbre extraction module 101, performs waveform generation processing, and outputs a speech waveform W.
In step 1, the processing details of the tone feature extraction module 101 are shown in fig. 2:
the traditional tone color feature extraction module 01 receives the acoustic feature M and outputs a traditional tone color feature sp, in this example, the pitch frequency F0 and the amplitude spectrum envelope are adopted as the traditional tone color feature, but not limited to these two features;
the feature mapping network module 02 receives the traditional tone features sp output by the traditional tone feature extraction module 01, and maps the traditional tone features sp into abstract tone features S, and the feature mapping network of this example is implemented by using a 5-layer residual error network, but is not limited to this implementation.
In step 2, the processing details of the waveform generating module 102 are as shown in fig. 3:
the up-sampling processing module 03 receives the acoustic features M and the abstract tone features S output by the feature mapping network module 02, and increases the sampling rate of the two features by 200 times to the sampling rate of the audio waveform, where the sampling rate of the example speech audio is 16000Hz, and the sampling rate of the acoustic features and the tone features is 80Hz, and the duration of each frame is 12.5ms, but is not limited to this parameter;
the neural network layer 104 receives the acoustic feature M1 and the tone feature S1 after the sampling rate is raised, processes the output feature M2, and repeats the operation N times: inputting the output characteristic Mi +1 of the last neural network layer i and the tone characteristic S1 output by the up-sampling processing module 03 into the next neural network layer i +1, and then processing the output characteristic Mi +2, wherein the neural network layer of the embodiment is realized by adopting a CNN network, the number of times of repeated operation N is 10, but is not limited to the parameter;
the DNN network layer 07 receives the output feature MN +1 of the neural network layer N06, and processes the output speech waveform W.
The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.

Claims (4)

1. A speaker-independent neural network vocoder system, comprising the steps of:
s1, the tone characteristic extraction module receives the acoustic characteristic M and performs tone characteristic extraction on the acoustic characteristic M to obtain tone characteristic information S;
and S2, the waveform production module receives the acoustic feature M and the tone feature S output by the tone extraction module, and performs waveform generation processing to obtain a voice waveform W.
2. The speaker-incoherent neural network vocoder system of claim 1, wherein: in S1, the acoustic feature may be selected from a mel-frequency spectrum, a mel-frequency cepstrum, and a linear amplitude spectrum.
3. The speaker-incoherent neural network vocoder system of claim 1, wherein: in S1, the tone color feature extraction module includes a traditional tone color feature extraction module, where the traditional tone color feature extraction module extracts a traditional tone color feature sp from the input acoustic feature M, and the traditional tone color feature may be selected from a basic audio frequency, a voiced and unvoiced sound flag, an amplitude spectrum envelope, a linear prediction coefficient, or a line spectrum pair;
the feature mapping network module maps the traditional tone features sp output by the traditional tone feature extraction module into abstract tone features S, and the tone feature mapping network can be formed by a residual error network or a bidirectional cyclic neural network.
4. The speaker-incoherent neural network vocoder system of claim 1, wherein: in S2, performing upsampling processing on the acoustic feature M and the tone feature S, and increasing the sampling rate to the sampling rate of the audio waveform, for example, if the sampling rate of the audio waveform is 16000Hz, the sampling rate of the acoustic feature is 80Hz, and the duration of each frame is 12.5ms, upsampling the acoustic feature and the tone feature sampling rate 80Hz by 200 times, so as to obtain 16000Hz sampled acoustic feature M1 and tone feature S1;
inputting the up-sampled acoustic feature M1 and the tone feature S1 to the neural network layer 1, outputting the feature M2, and then repeating the operation N times, inputting the output feature Mi of the previous neural network layer and the tone feature S1 to the next neural network layer i, and then outputting the feature Mi + 1;
the neural network layer can be realized by a CNN network or a unidirectional RNN network;
the DNN network layer converts the output feature MN +1 of the neural network layer N into a speech waveform W.
CN202010158293.1A 2020-03-09 2020-03-09 Neural network vocoder system with irrelevant speakers Pending CN111312208A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010158293.1A CN111312208A (en) 2020-03-09 2020-03-09 Neural network vocoder system with irrelevant speakers

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010158293.1A CN111312208A (en) 2020-03-09 2020-03-09 Neural network vocoder system with irrelevant speakers

Publications (1)

Publication Number Publication Date
CN111312208A true CN111312208A (en) 2020-06-19

Family

ID=71147968

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010158293.1A Pending CN111312208A (en) 2020-03-09 2020-03-09 Neural network vocoder system with irrelevant speakers

Country Status (1)

Country Link
CN (1) CN111312208A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111883106A (en) * 2020-07-27 2020-11-03 腾讯音乐娱乐科技(深圳)有限公司 Audio processing method and device
CN112133278A (en) * 2020-11-20 2020-12-25 成都启英泰伦科技有限公司 Network training and personalized speech synthesis method for personalized speech synthesis model
CN112365877A (en) * 2020-11-27 2021-02-12 北京百度网讯科技有限公司 Speech synthesis method, speech synthesis device, electronic equipment and storage medium
CN113724683A (en) * 2021-07-23 2021-11-30 阿里巴巴达摩院(杭州)科技有限公司 Audio generation method, computer device, and computer-readable storage medium
WO2023083252A1 (en) * 2021-11-11 2023-05-19 北京字跳网络技术有限公司 Timbre selection method and apparatus, electronic device, readable storage medium, and program product

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105788608A (en) * 2016-03-03 2016-07-20 渤海大学 Chinese initial consonant and compound vowel visualization method based on neural network
CN107610707A (en) * 2016-12-15 2018-01-19 平安科技(深圳)有限公司 A kind of method for recognizing sound-groove and device
US20180130474A1 (en) * 2015-06-19 2018-05-10 Google Llc Speech recognition with acoustic models
CN108615525A (en) * 2016-12-09 2018-10-02 中国移动通信有限公司研究院 A kind of audio recognition method and device
CN110033755A (en) * 2019-04-23 2019-07-19 平安科技(深圳)有限公司 Phoneme synthesizing method, device, computer equipment and storage medium
JP6578544B1 (en) * 2019-06-14 2019-09-25 株式会社テクノスピーチ Audio processing apparatus and audio processing method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180130474A1 (en) * 2015-06-19 2018-05-10 Google Llc Speech recognition with acoustic models
CN105788608A (en) * 2016-03-03 2016-07-20 渤海大学 Chinese initial consonant and compound vowel visualization method based on neural network
CN108615525A (en) * 2016-12-09 2018-10-02 中国移动通信有限公司研究院 A kind of audio recognition method and device
CN107610707A (en) * 2016-12-15 2018-01-19 平安科技(深圳)有限公司 A kind of method for recognizing sound-groove and device
CN110033755A (en) * 2019-04-23 2019-07-19 平安科技(深圳)有限公司 Phoneme synthesizing method, device, computer equipment and storage medium
JP6578544B1 (en) * 2019-06-14 2019-09-25 株式会社テクノスピーチ Audio processing apparatus and audio processing method

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111883106A (en) * 2020-07-27 2020-11-03 腾讯音乐娱乐科技(深圳)有限公司 Audio processing method and device
CN111883106B (en) * 2020-07-27 2024-04-19 腾讯音乐娱乐科技(深圳)有限公司 Audio processing method and device
CN112133278A (en) * 2020-11-20 2020-12-25 成都启英泰伦科技有限公司 Network training and personalized speech synthesis method for personalized speech synthesis model
CN112133278B (en) * 2020-11-20 2021-02-05 成都启英泰伦科技有限公司 Network training and personalized speech synthesis method for personalized speech synthesis model
CN112365877A (en) * 2020-11-27 2021-02-12 北京百度网讯科技有限公司 Speech synthesis method, speech synthesis device, electronic equipment and storage medium
CN113724683A (en) * 2021-07-23 2021-11-30 阿里巴巴达摩院(杭州)科技有限公司 Audio generation method, computer device, and computer-readable storage medium
CN113724683B (en) * 2021-07-23 2024-03-22 阿里巴巴达摩院(杭州)科技有限公司 Audio generation method, computer device and computer readable storage medium
WO2023083252A1 (en) * 2021-11-11 2023-05-19 北京字跳网络技术有限公司 Timbre selection method and apparatus, electronic device, readable storage medium, and program product

Similar Documents

Publication Publication Date Title
CN111312208A (en) Neural network vocoder system with irrelevant speakers
CN110534089A (en) A kind of Chinese speech synthesis method based on phoneme and rhythm structure
US8706488B2 (en) Methods and apparatus for formant-based voice synthesis
CN101004911B (en) Method and device for generating frequency bending function and carrying out frequency bending
WO2021225829A1 (en) Speech recognition using unspoken text and speech synthesis
US9135923B1 (en) Pitch synchronous speech coding based on timbre vectors
CN109767778B (en) Bi-L STM and WaveNet fused voice conversion method
JP2956548B2 (en) Voice band expansion device
CN111462769B (en) End-to-end accent conversion method
CN111210803B (en) System and method for training clone timbre and rhythm based on Bottle sock characteristics
Liu et al. High quality voice conversion through phoneme-based linear mapping functions with straight for mandarin
CN110675886A (en) Audio signal processing method, audio signal processing device, electronic equipment and storage medium
CN109616131B (en) Digital real-time voice sound changing method
Cooper et al. Can speaker augmentation improve multi-speaker end-to-end TTS?
CN111465982A (en) Signal processing device and method, training device and method, and program
Liu et al. Non-parallel voice conversion with autoregressive conversion model and duration adjustment
CN111724809A (en) Vocoder implementation method and device based on variational self-encoder
CN114283822A (en) Many-to-one voice conversion method based on gamma pass frequency cepstrum coefficient
Zhang et al. AccentSpeech: Learning accent from crowd-sourced data for target speaker TTS with accents
CN112908293A (en) Method and device for correcting pronunciations of polyphones based on semantic attention mechanism
CN112002302A (en) Speech synthesis method and device
CN113314109B (en) Voice generation method based on cycle generation network
CN115862590A (en) Text-driven speech synthesis method based on characteristic pyramid
Westall et al. Speech technology for telecommunications
Aso et al. Speakbysinging: Converting singing voices to speaking voices while retaining voice timbre

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200619