CN109326281A - Prosodic labeling method, apparatus and equipment - Google Patents

Prosodic labeling method, apparatus and equipment Download PDF

Info

Publication number
CN109326281A
CN109326281A CN201810988973.9A CN201810988973A CN109326281A CN 109326281 A CN109326281 A CN 109326281A CN 201810988973 A CN201810988973 A CN 201810988973A CN 109326281 A CN109326281 A CN 109326281A
Authority
CN
China
Prior art keywords
prosodic
text
marked
voice data
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810988973.9A
Other languages
Chinese (zh)
Other versions
CN109326281B (en
Inventor
孟君
曹琼
廖晓玲
郝玉峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Haitian Rui Sheng Polytron Technologies Inc
Original Assignee
Beijing Haitian Rui Sheng Polytron Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Family has litigation
First worldwide family litigation filed litigation Critical https://patents.darts-ip.com/?family=65263729&utm_source=google_patent&utm_medium=platform_link&utm_campaign=public_patent_search&patent=CN109326281(A) "Global patent litigation dataset” by Darts-ip is licensed under a Creative Commons Attribution 4.0 International License.
Application filed by Beijing Haitian Rui Sheng Polytron Technologies Inc filed Critical Beijing Haitian Rui Sheng Polytron Technologies Inc
Priority to CN201810988973.9A priority Critical patent/CN109326281B/en
Publication of CN109326281A publication Critical patent/CN109326281A/en
Application granted granted Critical
Publication of CN109326281B publication Critical patent/CN109326281B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents
    • G06F40/117Tagging; Marking up; Designating a block; Setting of attributes
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1807Speech classification or search using natural language modelling using prosody or stress
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • G10L2015/025Phonemes, fenemes or fenones being the recognition units

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • Artificial Intelligence (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The present invention provides a kind of prosodic labeling method, apparatus and equipment.Wherein, prosodic labeling method includes: the voice data for obtaining text to be marked;According to voice data, determine that the prosodic information in voice data, prosodic information are used to indicate the pause duration in voice data;According to the prosodic information in voice data, prosodic sign mark is carried out to text to be marked.Prosodic labeling method provided by the invention improves the efficiency and accuracy of prosodic labeling.

Description

Prosodic labeling method, apparatus and equipment
Technical field
The present invention relates to prosodic labeling technical field more particularly to a kind of prosodic labeling method, apparatus and equipment.
Background technique
The rhythm, also known as super-segmental feature, the rhythm and pace of moving things or musical note generally include rhythm, emphasize, intonation etc..Prosodic information It is that people express thoughts a kind of necessary means of emotion.Identical text can be given expression to completely not using the different tone and rhythm The same meaning.Therefore, prosodic information plays the role of highly important in speech synthesis system.
Currently, the prosodic labeling in speech synthesis system is generally by the way of predicting the rhythm based on text information.In For text mark, prosody prediction is carried out based on text information, is determined generally according to information such as initial consonant, simple or compound vowel of a Chinese syllable, word, phrase, paragraphs Prosody prediction result.The mark personnel of profession complete prosodic labeling according to prosody prediction result.
But language expression is with rich.Prosodic labeling is carried out by artificial mode only according to text information, it is right Needed in text marked halt or need obvious mute part cannot correctly predicted prosodic information.Mark personnel need There are many place to be changed.Cause efficiency and the accuracy of prosodic labeling lower.
Summary of the invention
The present invention provides a kind of prosodic labeling method, apparatus and equipment, improves the efficiency and accuracy of prosodic labeling.
In a first aspect, the present invention provides a kind of prosodic labeling method, comprising:
Obtain the voice data of text to be marked;
According to the voice data, determine that the prosodic information in the voice data, the prosodic information are used to indicate institute State the pause duration in voice data;
According to the prosodic information in the voice data, prosodic sign mark is carried out to the text to be marked.
Optionally, in a kind of possible embodiment, further includes:
Obtain the prosodic information in the text data of the text to be marked;
Optionally, in a kind of possible embodiment, the prosodic information according in the voice data, to described Text to be marked carries out prosodic sign mark, comprising:
According to the prosodic information in the prosodic information and the text data in the voice data, to the text to be marked This progress prosodic sign mark.
Optionally, in a kind of possible embodiment, the prosodic information according in the voice data and described Prosodic information in text data carries out prosodic sign mark to the text to be marked, comprising:
According to the prosodic information in the voice data, prosodic sign mark is carried out to the text to be marked;
According to the prosodic information in the text data, the prosodic sign marked in the text to be marked is carried out more Newly.
Optionally, in a kind of possible embodiment, the prosodic information according in the text data, to described The prosodic sign marked in text to be marked is updated, comprising:
If the prosodic information in the text data indicates that at least one rhythm marked in the text to be marked accords with Number position without marking prosodic sign, then delete at least one the described prosodic sign marked.
Optionally, described according to the voice data in a kind of possible embodiment, it determines in the voice data Prosodic information, comprising:
According to the voice data, mute section of at least one of described voice data is obtained;
For each mute section, according to this mute section, mute section of corresponding prosodic information of this in the voice data is determined.
Optionally, described according to the voice data in a kind of possible embodiment, it obtains in the voice data At least one mute section, comprising:
Phoneme segmentation is carried out to the text data of the text to be marked, obtains voice annotation sequence;
According to the voice annotation sequence, the voice data and predetermined acoustic model, the voice data is carried out Phoneme segmentation obtains described at least one mute section in the voice data;Wherein, the predetermined acoustic model is for indicating The corresponding phonetic feature of different phonemes.
Second aspect, the present invention provide a kind of prosodic labeling device, comprising:
First obtains module, for obtaining the voice data of text to be marked;
Prosodic information determining module, for determining the prosodic information in the voice data, institute according to the voice data State the pause duration that prosodic information is used to indicate in the voice data;
Labeling module, for carrying out rhythm symbol to the text to be marked according to the prosodic information in the voice data Number mark.
It optionally, further include the second acquisition module in a kind of possible embodiment;
Described second obtains module, the prosodic information in text data for obtaining the text to be marked;
The labeling module is specifically used for:
According to the prosodic information in the prosodic information and the text data in the voice data, to the text to be marked This progress prosodic sign mark.
Optionally, in a kind of possible embodiment, the labeling module is specifically used for:
According to the prosodic information in the voice data, prosodic sign mark is carried out to the text to be marked;
According to the prosodic information in the text data, the prosodic sign marked in the text to be marked is carried out more Newly.
The third aspect, the present invention provide a kind of prosodic labeling equipment, which includes processor and memory. Memory is for storing instruction.Processor is for executing the instruction stored in memory, so that prosodic labeling equipment executes this hair The prosodic labeling method that bright first aspect any embodiment provides.
Fourth aspect, the present invention provide a kind of storage medium, comprising: readable storage medium storing program for executing and computer program, the meter The prosodic labeling method that calculation machine program provides for realizing first aspect present invention any embodiment.
The present invention provides a kind of prosodic labeling method, apparatus and equipment, treats mark according to the voice data of text to be marked The mark of explanatory notes this progress prosodic sign, it is contemplated that language expression it is rich, especially consider marked halt in voice or Person is mute section obvious, improves the efficiency and accuracy of prosodic labeling, reduces the cost of prosodic labeling.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is the present invention Some embodiments for those of ordinary skill in the art without creative efforts, can also basis These attached drawings obtain other attached drawings.
Fig. 1 is the flow chart of prosodic labeling method provided in an embodiment of the present invention;
Fig. 2 is the structural schematic diagram of prosodic labeling device provided in an embodiment of the present invention;
Fig. 3 is the structural schematic diagram of prosodic labeling equipment provided in an embodiment of the present invention.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art Every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.
Fig. 1 is the flow chart of prosodic labeling method provided in an embodiment of the present invention.Prosodic labeling side provided in this embodiment Method, executing subject can be prosodic labeling device, or be prosodic labeling equipment.As shown in Figure 1, the rhythm provided in this embodiment Mask method may include:
S101, the voice data for obtaining text to be marked.
S102, according to voice data, determine the prosodic information in voice data.
Wherein, prosodic information is used to indicate the pause duration in voice data.
S103, according to the prosodic information in voice data, prosodic sign mark is carried out to text to be marked.
Specifically, in the present embodiment, the text for carrying out prosodic labeling is needed to be properly termed as text to be marked.Text to be marked This voice data is the voice data generated after declaimer reads aloud text to be marked.The present embodiment for declaimer not It limits.The prosodic information in voice data can be determined according to the voice data of text to be marked.Wherein, prosodic information is used for Indicate the pause duration in voice data.In turn, rhythm can be carried out to text to be marked according to the pause duration in voice data Restrain the mark of symbol.
Prosodic labeling method provided in this embodiment carries out rhythm to text to be marked according to the voice data of text to be marked Restrain symbol mark, it is contemplated that language expression it is rich.Read aloud to text to be marked the voice of generation based on declaimer Data have fully considered the marked halt or mute section obvious in voice.Pass through artificial side compared to based on text to be marked Formula carries out prosodic labeling, improves the accuracy of prosodic labeling.Due to reducing the place for needing to change, the rhythm is improved The efficiency of mark reduces prosodic labeling cost.
It should be noted that the present embodiment for prosodic sign implementation without limitation, be configured as needed. Wherein, the corresponding pause duration range of different prosodic signs can be preset.The present embodiment is for pause duration range Specific value is without limitation.
For example, prosodic sign may include #1, #2, #3 and #4.At this point, the pause duration in voice data can have 4 kinds.
It is illustrated below by example.
Table 1 shows the corresponding relationship between prosodic sign, the meaning and pause duration range that prosodic sign indicates.Its In, #1 and #2 scene one is corresponding to pause due to subtle in sense of hearing, it is subjective, therefore, in the present embodiment may be used Not define pause duration range.It is of course also possible to define pause duration range.The present embodiment does not limit this.Wherein, t3 <t4≤t5<t6.The present embodiment for t3~t6 specific value without limitation.For example, t4=t5=90ms.It is assumed that be marked One example of text is xxxxxxx, xxxxxxxx.Text to be marked can be xxxx# after carrying out prosodic sign mark 2xxx#3, xxx#2xxxxx#4.
Table 1
Optionally, prosodic labeling method provided in this embodiment can also include:
Obtain the prosodic information in the text data of text to be marked.
S103 carries out prosodic sign mark to text to be marked, may include: according to the prosodic information in voice data
According to the prosodic information in the prosodic information and text data in voice data, rhythm symbol is carried out to text to be marked Number mark.
Specifically, the prosodic information in the text data of text to be marked, is used to indicate the text data of text to be marked In pause duration.It should be noted that prosodic information in text data of the present embodiment for obtaining text to be marked Implementation without limitation, can be using the existing method for carrying out prosody prediction based on text information.
Rhythm symbol is carried out to text to be marked according to the prosodic information in the prosodic information and text data in voice data Number mark, comprehensively considered text prosody prediction result and phonetic-rhythm analysis as a result, further improving the effect of prosodic labeling Rate and accuracy.
Optionally, according to the prosodic information in the prosodic information and text data in voice data, to text to be marked into Row prosodic sign marks, and may include:
According to the prosodic information in voice data, prosodic sign mark is carried out to text to be marked.
According to the prosodic information in text data, the prosodic sign marked in text to be marked is updated.
By carrying out prosodic sign mark to text to be marked based on the prosodic information in voice data, according to text Prosodic information in data updates the mark of prosodic sign, and it is pre- to consider the text rhythm on the basis of phonetic-rhythm analyzes result It surveys as a result, further improving the efficiency and accuracy of prosodic labeling.
Optionally, according to the prosodic information in text data, the prosodic sign marked in text to be marked is updated, May include:
If the prosodic information in text data indicates the position of at least one prosodic sign marked in text to be marked Without marking prosodic sign, then at least one prosodic sign marked is deleted.
Specifically, the prosodic information in text data is pre- according to the determining text rhythm of the text data of text to be marked Survey result.Prosodic information in text data usually reflects the pause duration that can grammatically pause, and also including cannot The position of pause.In some scenes, the prosodic information in text data indicates at least one marked in text to be marked The position of prosodic sign is without marking prosodic sign.For example, not having pause usually in the centre of function word, function word be can wrap Include phrase, Chinese idiom, common saying etc..It has been marked in text to be marked at this point it is possible to be deleted according to the prosodic information in text data At least one prosodic sign further improves the accuracy of prosodic labeling.
Optionally, S102 determines the prosodic information in voice data, may include: according to voice data
According to voice data, mute section of at least one of voice data is obtained.
For each mute section, according to this mute section, mute section of corresponding prosodic information of this in voice data is determined.
Specifically, obtaining mute section of at least one of voice data according to voice data.Described mute section when it is a length of Pause duration in voice data.
Optionally, for each mute section, according to this mute section, mute section of this in voice data corresponding rhythm letter is determined It ceases, may include:
According to the mute section of initial time and end time in voice data, mute section of duration is obtained.
It is illustrated below by example.
It is assumed that one mute section of initial time is 00:22:07:300, end time 00:22:07:360.Mute section When a length of 60ms.Referring to table 1.It is assumed that t3=30ms, t4=90ms.It is possible to according to mute section of the duration wait mark It is #2 that prosodic sign is marked in explanatory notes sheet.
Optionally, according to voice data, mute section of at least one of voice data is obtained, may include:
Phoneme segmentation is carried out to the text data of text to be marked, obtains voice annotation sequence.
According to voice annotation sequence, voice data and predetermined acoustic model, phoneme segmentation is carried out to voice data, is obtained Mute section of at least one of voice data.Wherein, predetermined acoustic model is for indicating the corresponding phonetic feature of different phonemes.
Specifically, phoneme is the least speech unit come out from the angular divisions of sound quality.To the textual data of text to be marked According to carrying out phoneme segmentation, a series of corresponding segment of the and phoneme that text data can be divided into timing adjacent.The segment can With referred to as voice annotation sequence.Predetermined acoustic model illustrates the corresponding phonetic feature of different phonemes.According to voice annotation sequence, Voice data and predetermined acoustic model can carry out phoneme segmentation to voice data, obtain at least one of voice data Mute section.
It should be noted that the present embodiment for phoneme segmentation method without limitation, existing phoneme segmentation can be used Method.For example, being based on the automatic speech segmentation algorithm of Markov model (Hidden Markov Model, HMM).In the calculation It can be given annotated sequence based on the language model of HMM in method, using Viterbi algorithm by voice signal and phonetics It marks unit (phoneme) corresponding HMM sequence and forces alignment.
It should be noted that the present embodiment for predetermined acoustic model type and acquisition modes without limitation.For example, can To be based on Open-Source Tools Kaldi, the voice data and corresponding text training predetermined acoustic model of the rhythm to be predicted are used.Example again Such as, predetermined acoustic model can be obtained based on deep neural network (Deep Neural Networks, DNN) algorithm.Optionally, When amount of voice data is smaller, predetermined acoustic model can be GMM-HMM acoustic model.When amount of voice data is larger, preset Acoustic model can be DNN-HMM model.
Optionally, phoneme segmentation is carried out to the text data of text to be marked, obtains voice annotation sequence, may include:
Phoneme segmentation carried out to the text data of text to be marked, and adjacent two words in text to be marked interleave Enter the symbol that pauses, obtains voice annotation sequence.
It is illustrated below by example.
It is assumed that phoneme includes initial consonant and simple or compound vowel of a Chinese syllable.Text to be marked is that " hello, dear motherland.".The text of text to be marked Notebook data is " ni hao, qin ai de zu guo ".So, voice annotation sequence can be " n i sp h ao sp q in sp ai sp d e sp z u sp g uo".Wherein, sp indicates the symbol that pauses.
The present embodiment provides a kind of prosodic labeling methods, comprising: the voice data for obtaining text to be marked, according to voice number According to the prosodic information determined in voice data, prosodic sign mark is carried out to text to be marked according to the prosodic information in voice data Note.Prosodic labeling method provided in this embodiment carries out rhythm symbol to text to be marked according to the voice data of text to be marked Number mark, improve the efficiency and accuracy of prosodic labeling.
Fig. 2 is the structural schematic diagram of prosodic labeling device provided in an embodiment of the present invention.Rhythm mark provided in this embodiment Dispensing device, for executing the prosodic labeling method of embodiment illustrated in fig. 1 offer.As shown in Fig. 2, rhythm mark provided in this embodiment Dispensing device may include:
First obtains module 11, for obtaining the voice data of text to be marked.
Prosodic information determining module 12, for determining the prosodic information in voice data, prosodic information according to voice data The pause duration being used to indicate in voice data.
Labeling module 13, for carrying out prosodic sign mark to text to be marked according to the prosodic information in voice data.
It optionally, further include the second acquisition module 14.
Second obtains module 14, the prosodic information in text data for obtaining text to be marked.
Labeling module 13 is specifically used for:
According to the prosodic information in the prosodic information and text data in voice data, rhythm symbol is carried out to text to be marked Number mark.
Optionally, labeling module 13 is specifically used for:
According to the prosodic information in voice data, prosodic sign mark is carried out to text to be marked.
According to the prosodic information in text data, the prosodic sign marked in text to be marked is updated.
Optionally, labeling module 13 is specifically used for:
If the prosodic information in text data indicates the position of at least one prosodic sign marked in text to be marked Without marking prosodic sign, then at least one prosodic sign marked is deleted.
Optionally, prosodic information determining module 12 is specifically used for:
According to voice data, mute section of at least one of voice data is obtained.
For each mute section, according to this mute section, mute section of corresponding prosodic information of this in voice data is determined.
Optionally, prosodic information determining module 12 is specifically used for:
Phoneme segmentation is carried out to the text data of text to be marked, obtains voice annotation sequence.
According to voice annotation sequence, voice data and predetermined acoustic model, phoneme segmentation is carried out to voice data, is obtained Mute section of at least one of voice data.Wherein, predetermined acoustic model is for indicating the corresponding phonetic feature of different phonemes.
Prosodic labeling device provided in this embodiment, it is former for executing the prosodic labeling method of embodiment illustrated in fig. 1 offer Reason is similar with technical effect, and details are not described herein again.
Fig. 3 is the structural schematic diagram of prosodic labeling equipment provided in an embodiment of the present invention.Rhythm mark provided in this embodiment Equipment is infused, for executing the prosodic labeling method of embodiment illustrated in fig. 1 offer.
As shown in figure 3, prosodic labeling equipment may include processor 21 and memory 22.The memory 22 is for storing Instruction, the processor 21 is for executing the instruction stored in the memory 22, so that the prosodic labeling equipment executes Fig. 1 The prosodic labeling method that illustrated embodiment provides, specific implementation is similar with technical effect, and which is not described herein again.
The embodiment of the present invention also provides a kind of storage medium, and instruction is stored in the storage medium, when it is in computer When upper operation, so that computer executes the prosodic labeling method of embodiment as shown in Figure 1 above.
The embodiment of the present invention also provides a kind of program product, and described program product includes computer program, the computer Program is stored in a storage medium, at least one processor can read the computer program from the storage medium, described At least one processor can realize the prosodic labeling method of above-mentioned embodiment illustrated in fig. 1 when executing the computer program.
Those of ordinary skill in the art will appreciate that: realize that all or part of the steps of above-mentioned each method embodiment can lead to The relevant hardware of program instruction is crossed to complete.Program above-mentioned can be stored in a computer readable storage medium.The journey When being executed, execution includes the steps that above-mentioned each method embodiment to sequence;And storage medium above-mentioned include: read-only memory (English: Read-Only Memory, referred to as: ROM), random access memory (English: Random Access Memory, referred to as: RAM), the various media that can store program code such as magnetic or disk.
Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present invention., rather than its limitations;To the greatest extent Pipe present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: its according to So be possible to modify the technical solutions described in the foregoing embodiments, or to some or all of the technical features into Row equivalent replacement;And these are modified or replaceed, various embodiments of the present invention technology that it does not separate the essence of the corresponding technical solution The range of scheme.

Claims (10)

1. a kind of prosodic labeling method characterized by comprising
Obtain the voice data of text to be marked;
According to the voice data, determine that the prosodic information in the voice data, the prosodic information are used to indicate institute's predicate Pause duration in sound data;
According to the prosodic information in the voice data, prosodic sign mark is carried out to the text to be marked.
2. the method according to claim 1, wherein further include:
Obtain the prosodic information in the text data of the text to be marked;
The prosodic information according in the voice data carries out prosodic sign mark to the text to be marked, comprising:
According to the prosodic information in the prosodic information and the text data in the voice data, to the text to be marked into Row prosodic sign mark.
3. according to the method described in claim 2, it is characterized in that, the prosodic information and institute according in the voice data The prosodic information in text data is stated, prosodic sign mark is carried out to the text to be marked, comprising:
According to the prosodic information in the voice data, prosodic sign mark is carried out to the text to be marked;
According to the prosodic information in the text data, the prosodic sign marked in the text to be marked is updated.
4. according to the method described in claim 3, it is characterized in that, the prosodic information according in the text data, right The prosodic sign marked in the text to be marked is updated, comprising:
If the prosodic information in the text data indicates at least one prosodic sign marked in the text to be marked At least one the described prosodic sign marked is then deleted without marking prosodic sign in position.
5. method according to claim 1-4, which is characterized in that it is described according to the voice data, determine institute State the prosodic information in voice data, comprising:
According to the voice data, mute section of at least one of described voice data is obtained;
For each mute section, according to this mute section, mute section of corresponding prosodic information of this in the voice data is determined.
6. according to the method described in claim 5, obtaining the voice number it is characterized in that, described according to the voice data According at least one of mute section, comprising:
Phoneme segmentation is carried out to the text data of the text to be marked, obtains voice annotation sequence;
According to the voice annotation sequence, the voice data and predetermined acoustic model, phoneme is carried out to the voice data Segmentation, obtains described at least one mute section in the voice data;Wherein, the predetermined acoustic model is for indicating different The corresponding phonetic feature of phoneme.
7. a kind of prosodic labeling device characterized by comprising
First obtains module, for obtaining the voice data of text to be marked;
Prosodic information determining module, for determining the prosodic information in the voice data, the rhythm according to the voice data Rule information is used to indicate the pause duration in the voice data;
Labeling module, for carrying out prosodic sign mark to the text to be marked according to the prosodic information in the voice data Note.
8. device according to claim 7, which is characterized in that further include the second acquisition module;
Described second obtains module, the prosodic information in text data for obtaining the text to be marked;
The labeling module is specifically used for:
According to the prosodic information in the prosodic information and the text data in the voice data, to the text to be marked into Row prosodic sign mark.
9. device according to claim 8, which is characterized in that the labeling module is specifically used for:
According to the prosodic information in the voice data, prosodic sign mark is carried out to the text to be marked;
According to the prosodic information in the text data, the prosodic sign marked in the text to be marked is updated.
10. a kind of prosodic labeling equipment characterized by comprising memory and processor;
The memory, for storing program instruction;
The processor, for calling the described program stored in the memory instruction to realize as appointed in claim 1-6 Prosodic labeling method described in one.
CN201810988973.9A 2018-08-28 2018-08-28 Rhythm labeling method, device and equipment Active CN109326281B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810988973.9A CN109326281B (en) 2018-08-28 2018-08-28 Rhythm labeling method, device and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810988973.9A CN109326281B (en) 2018-08-28 2018-08-28 Rhythm labeling method, device and equipment

Publications (2)

Publication Number Publication Date
CN109326281A true CN109326281A (en) 2019-02-12
CN109326281B CN109326281B (en) 2020-01-07

Family

ID=65263729

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810988973.9A Active CN109326281B (en) 2018-08-28 2018-08-28 Rhythm labeling method, device and equipment

Country Status (1)

Country Link
CN (1) CN109326281B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111105785A (en) * 2019-12-17 2020-05-05 广州多益网络股份有限公司 Text prosodic boundary identification method and device
CN111161725A (en) * 2019-12-17 2020-05-15 珠海格力电器股份有限公司 Voice interaction method and device, computing equipment and storage medium
CN111754978A (en) * 2020-06-15 2020-10-09 北京百度网讯科技有限公司 Rhythm hierarchy marking method, device, equipment and storage medium
CN115116427A (en) * 2022-06-22 2022-09-27 马上消费金融股份有限公司 Labeling method, voice synthesis method, training method and device
WO2023045433A1 (en) * 2021-09-24 2023-03-30 华为云计算技术有限公司 Prosodic information labeling method and related device
CN116030789A (en) * 2022-12-28 2023-04-28 南京硅基智能科技有限公司 Method and device for generating speech synthesis training data

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050267758A1 (en) * 2004-05-31 2005-12-01 International Business Machines Corporation Converting text-to-speech and adjusting corpus
CN104916284A (en) * 2015-06-10 2015-09-16 百度在线网络技术(北京)有限公司 Prosody and acoustics joint modeling method and device for voice synthesis system
CN105244020A (en) * 2015-09-24 2016-01-13 百度在线网络技术(北京)有限公司 Prosodic hierarchy model training method, text-to-speech method and text-to-speech device
CN105355193A (en) * 2015-10-30 2016-02-24 百度在线网络技术(北京)有限公司 Speech synthesis method and device
CN106601228A (en) * 2016-12-09 2017-04-26 百度在线网络技术(北京)有限公司 Sample marking method and device based on artificial intelligence prosody prediction

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050267758A1 (en) * 2004-05-31 2005-12-01 International Business Machines Corporation Converting text-to-speech and adjusting corpus
CN104916284A (en) * 2015-06-10 2015-09-16 百度在线网络技术(北京)有限公司 Prosody and acoustics joint modeling method and device for voice synthesis system
CN105244020A (en) * 2015-09-24 2016-01-13 百度在线网络技术(北京)有限公司 Prosodic hierarchy model training method, text-to-speech method and text-to-speech device
CN105355193A (en) * 2015-10-30 2016-02-24 百度在线网络技术(北京)有限公司 Speech synthesis method and device
CN106601228A (en) * 2016-12-09 2017-04-26 百度在线网络技术(北京)有限公司 Sample marking method and device based on artificial intelligence prosody prediction

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111105785A (en) * 2019-12-17 2020-05-05 广州多益网络股份有限公司 Text prosodic boundary identification method and device
CN111161725A (en) * 2019-12-17 2020-05-15 珠海格力电器股份有限公司 Voice interaction method and device, computing equipment and storage medium
CN111161725B (en) * 2019-12-17 2022-09-27 珠海格力电器股份有限公司 Voice interaction method and device, computing equipment and storage medium
CN111754978A (en) * 2020-06-15 2020-10-09 北京百度网讯科技有限公司 Rhythm hierarchy marking method, device, equipment and storage medium
CN111754978B (en) * 2020-06-15 2023-04-18 北京百度网讯科技有限公司 Prosodic hierarchy labeling method, device, equipment and storage medium
WO2023045433A1 (en) * 2021-09-24 2023-03-30 华为云计算技术有限公司 Prosodic information labeling method and related device
CN115116427A (en) * 2022-06-22 2022-09-27 马上消费金融股份有限公司 Labeling method, voice synthesis method, training method and device
CN115116427B (en) * 2022-06-22 2023-11-14 马上消费金融股份有限公司 Labeling method, voice synthesis method, training method and training device
CN116030789A (en) * 2022-12-28 2023-04-28 南京硅基智能科技有限公司 Method and device for generating speech synthesis training data
CN116030789B (en) * 2022-12-28 2024-01-26 南京硅基智能科技有限公司 Method and device for generating speech synthesis training data

Also Published As

Publication number Publication date
CN109326281B (en) 2020-01-07

Similar Documents

Publication Publication Date Title
CN109326281A (en) Prosodic labeling method, apparatus and equipment
Rao et al. Speech recognition using articulatory and excitation source features
KR900009170B1 (en) Rule synthesis voice synthesis system
US7890330B2 (en) Voice recording tool for creating database used in text to speech synthesis system
EP2958105B1 (en) Method and apparatus for speech synthesis based on large corpus
CN103985391A (en) Phonetic-level low power consumption spoken language evaluation and defect diagnosis method without standard pronunciation
EP2337006A1 (en) Speech processing and learning
CN105185373A (en) Rhythm-level prediction model generation method and apparatus, and rhythm-level prediction method and apparatus
Rebai et al. Text-to-speech synthesis system with Arabic diacritic recognition system
Ghai et al. Phone based acoustic modeling for automatic speech recognition for punjabi language
Klabbers Segmental and prosodic improvements to speech generation
Matoušek et al. Recording and annotation of speech corpus for Czech unit selection speech synthesis
Matoušek et al. On comparison of phonetic representations for Czech neural speech synthesis
Wu et al. Mandarin lexical tones: a corpus-based study of word length, syllable position and prosodic position on duration
Yu et al. Overview of SHRC-Ginkgo speech synthesis system for Blizzard Challenge 2013
WO2016200391A1 (en) System and method for outlier identification to remove poor alignments in speech synthesis
Evdokimova et al. Automatic phonetic transcription for Russian: Speech variability modeling
Hertz et al. Language-universal and language-specific components in the multi-language ETI-Eloquence text-to-speech system
Ng Survey of data-driven approaches to Speech Synthesis
Van Niekerk Tone realisation for speech synthesis of Yorubá
Liu Comparative Analysis of Transfer Learning in Deep Learning Text-to-Speech Models on a Few-Shot, Low-Resource, Customized Dataset
Shah et al. Influence of various asymmetrical contextual factors for TTS in a low resource language
Ekpenyong et al. A Template-Based Approach to Intelligent Multilingual Corpora Transcription
Al-Saiyd et al. Unit selection model in Arabic speech synthesis
Thai et al. Tonal languages speech synthesis using an indirect pitch markers and the quantitative target approximation methods

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant