CN109326281A - Prosodic labeling method, apparatus and equipment - Google Patents
Prosodic labeling method, apparatus and equipment Download PDFInfo
- Publication number
- CN109326281A CN109326281A CN201810988973.9A CN201810988973A CN109326281A CN 109326281 A CN109326281 A CN 109326281A CN 201810988973 A CN201810988973 A CN 201810988973A CN 109326281 A CN109326281 A CN 109326281A
- Authority
- CN
- China
- Prior art keywords
- prosodic
- text
- marked
- voice data
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000002372 labelling Methods 0.000 title claims abstract description 67
- 230000033764 rhythmic process Effects 0.000 claims description 23
- 230000011218 segmentation Effects 0.000 claims description 15
- 238000000034 method Methods 0.000 claims description 14
- 238000004590 computer program Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 238000004422 calculation algorithm Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000015572 biosynthetic process Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 150000001875 compounds Chemical class 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000003786 synthesis reaction Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000008451 emotion Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/103—Formatting, i.e. changing of presentation of documents
- G06F40/117—Tagging; Marking up; Designating a block; Setting of attributes
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/04—Segmentation; Word boundary detection
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1807—Speech classification or search using natural language modelling using prosody or stress
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
- G10L2015/025—Phonemes, fenemes or fenones being the recognition units
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Multimedia (AREA)
- Acoustics & Sound (AREA)
- Human Computer Interaction (AREA)
- Artificial Intelligence (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Document Processing Apparatus (AREA)
Abstract
The present invention provides a kind of prosodic labeling method, apparatus and equipment.Wherein, prosodic labeling method includes: the voice data for obtaining text to be marked;According to voice data, determine that the prosodic information in voice data, prosodic information are used to indicate the pause duration in voice data;According to the prosodic information in voice data, prosodic sign mark is carried out to text to be marked.Prosodic labeling method provided by the invention improves the efficiency and accuracy of prosodic labeling.
Description
Technical field
The present invention relates to prosodic labeling technical field more particularly to a kind of prosodic labeling method, apparatus and equipment.
Background technique
The rhythm, also known as super-segmental feature, the rhythm and pace of moving things or musical note generally include rhythm, emphasize, intonation etc..Prosodic information
It is that people express thoughts a kind of necessary means of emotion.Identical text can be given expression to completely not using the different tone and rhythm
The same meaning.Therefore, prosodic information plays the role of highly important in speech synthesis system.
Currently, the prosodic labeling in speech synthesis system is generally by the way of predicting the rhythm based on text information.In
For text mark, prosody prediction is carried out based on text information, is determined generally according to information such as initial consonant, simple or compound vowel of a Chinese syllable, word, phrase, paragraphs
Prosody prediction result.The mark personnel of profession complete prosodic labeling according to prosody prediction result.
But language expression is with rich.Prosodic labeling is carried out by artificial mode only according to text information, it is right
Needed in text marked halt or need obvious mute part cannot correctly predicted prosodic information.Mark personnel need
There are many place to be changed.Cause efficiency and the accuracy of prosodic labeling lower.
Summary of the invention
The present invention provides a kind of prosodic labeling method, apparatus and equipment, improves the efficiency and accuracy of prosodic labeling.
In a first aspect, the present invention provides a kind of prosodic labeling method, comprising:
Obtain the voice data of text to be marked;
According to the voice data, determine that the prosodic information in the voice data, the prosodic information are used to indicate institute
State the pause duration in voice data;
According to the prosodic information in the voice data, prosodic sign mark is carried out to the text to be marked.
Optionally, in a kind of possible embodiment, further includes:
Obtain the prosodic information in the text data of the text to be marked;
Optionally, in a kind of possible embodiment, the prosodic information according in the voice data, to described
Text to be marked carries out prosodic sign mark, comprising:
According to the prosodic information in the prosodic information and the text data in the voice data, to the text to be marked
This progress prosodic sign mark.
Optionally, in a kind of possible embodiment, the prosodic information according in the voice data and described
Prosodic information in text data carries out prosodic sign mark to the text to be marked, comprising:
According to the prosodic information in the voice data, prosodic sign mark is carried out to the text to be marked;
According to the prosodic information in the text data, the prosodic sign marked in the text to be marked is carried out more
Newly.
Optionally, in a kind of possible embodiment, the prosodic information according in the text data, to described
The prosodic sign marked in text to be marked is updated, comprising:
If the prosodic information in the text data indicates that at least one rhythm marked in the text to be marked accords with
Number position without marking prosodic sign, then delete at least one the described prosodic sign marked.
Optionally, described according to the voice data in a kind of possible embodiment, it determines in the voice data
Prosodic information, comprising:
According to the voice data, mute section of at least one of described voice data is obtained;
For each mute section, according to this mute section, mute section of corresponding prosodic information of this in the voice data is determined.
Optionally, described according to the voice data in a kind of possible embodiment, it obtains in the voice data
At least one mute section, comprising:
Phoneme segmentation is carried out to the text data of the text to be marked, obtains voice annotation sequence;
According to the voice annotation sequence, the voice data and predetermined acoustic model, the voice data is carried out
Phoneme segmentation obtains described at least one mute section in the voice data;Wherein, the predetermined acoustic model is for indicating
The corresponding phonetic feature of different phonemes.
Second aspect, the present invention provide a kind of prosodic labeling device, comprising:
First obtains module, for obtaining the voice data of text to be marked;
Prosodic information determining module, for determining the prosodic information in the voice data, institute according to the voice data
State the pause duration that prosodic information is used to indicate in the voice data;
Labeling module, for carrying out rhythm symbol to the text to be marked according to the prosodic information in the voice data
Number mark.
It optionally, further include the second acquisition module in a kind of possible embodiment;
Described second obtains module, the prosodic information in text data for obtaining the text to be marked;
The labeling module is specifically used for:
According to the prosodic information in the prosodic information and the text data in the voice data, to the text to be marked
This progress prosodic sign mark.
Optionally, in a kind of possible embodiment, the labeling module is specifically used for:
According to the prosodic information in the voice data, prosodic sign mark is carried out to the text to be marked;
According to the prosodic information in the text data, the prosodic sign marked in the text to be marked is carried out more
Newly.
The third aspect, the present invention provide a kind of prosodic labeling equipment, which includes processor and memory.
Memory is for storing instruction.Processor is for executing the instruction stored in memory, so that prosodic labeling equipment executes this hair
The prosodic labeling method that bright first aspect any embodiment provides.
Fourth aspect, the present invention provide a kind of storage medium, comprising: readable storage medium storing program for executing and computer program, the meter
The prosodic labeling method that calculation machine program provides for realizing first aspect present invention any embodiment.
The present invention provides a kind of prosodic labeling method, apparatus and equipment, treats mark according to the voice data of text to be marked
The mark of explanatory notes this progress prosodic sign, it is contemplated that language expression it is rich, especially consider marked halt in voice or
Person is mute section obvious, improves the efficiency and accuracy of prosodic labeling, reduces the cost of prosodic labeling.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is the present invention
Some embodiments for those of ordinary skill in the art without creative efforts, can also basis
These attached drawings obtain other attached drawings.
Fig. 1 is the flow chart of prosodic labeling method provided in an embodiment of the present invention;
Fig. 2 is the structural schematic diagram of prosodic labeling device provided in an embodiment of the present invention;
Fig. 3 is the structural schematic diagram of prosodic labeling equipment provided in an embodiment of the present invention.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention
In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is
A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art
Every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.
Fig. 1 is the flow chart of prosodic labeling method provided in an embodiment of the present invention.Prosodic labeling side provided in this embodiment
Method, executing subject can be prosodic labeling device, or be prosodic labeling equipment.As shown in Figure 1, the rhythm provided in this embodiment
Mask method may include:
S101, the voice data for obtaining text to be marked.
S102, according to voice data, determine the prosodic information in voice data.
Wherein, prosodic information is used to indicate the pause duration in voice data.
S103, according to the prosodic information in voice data, prosodic sign mark is carried out to text to be marked.
Specifically, in the present embodiment, the text for carrying out prosodic labeling is needed to be properly termed as text to be marked.Text to be marked
This voice data is the voice data generated after declaimer reads aloud text to be marked.The present embodiment for declaimer not
It limits.The prosodic information in voice data can be determined according to the voice data of text to be marked.Wherein, prosodic information is used for
Indicate the pause duration in voice data.In turn, rhythm can be carried out to text to be marked according to the pause duration in voice data
Restrain the mark of symbol.
Prosodic labeling method provided in this embodiment carries out rhythm to text to be marked according to the voice data of text to be marked
Restrain symbol mark, it is contemplated that language expression it is rich.Read aloud to text to be marked the voice of generation based on declaimer
Data have fully considered the marked halt or mute section obvious in voice.Pass through artificial side compared to based on text to be marked
Formula carries out prosodic labeling, improves the accuracy of prosodic labeling.Due to reducing the place for needing to change, the rhythm is improved
The efficiency of mark reduces prosodic labeling cost.
It should be noted that the present embodiment for prosodic sign implementation without limitation, be configured as needed.
Wherein, the corresponding pause duration range of different prosodic signs can be preset.The present embodiment is for pause duration range
Specific value is without limitation.
For example, prosodic sign may include #1, #2, #3 and #4.At this point, the pause duration in voice data can have 4 kinds.
It is illustrated below by example.
Table 1 shows the corresponding relationship between prosodic sign, the meaning and pause duration range that prosodic sign indicates.Its
In, #1 and #2 scene one is corresponding to pause due to subtle in sense of hearing, it is subjective, therefore, in the present embodiment may be used
Not define pause duration range.It is of course also possible to define pause duration range.The present embodiment does not limit this.Wherein, t3
<t4≤t5<t6.The present embodiment for t3~t6 specific value without limitation.For example, t4=t5=90ms.It is assumed that be marked
One example of text is xxxxxxx, xxxxxxxx.Text to be marked can be xxxx# after carrying out prosodic sign mark
2xxx#3, xxx#2xxxxx#4.
Table 1
Optionally, prosodic labeling method provided in this embodiment can also include:
Obtain the prosodic information in the text data of text to be marked.
S103 carries out prosodic sign mark to text to be marked, may include: according to the prosodic information in voice data
According to the prosodic information in the prosodic information and text data in voice data, rhythm symbol is carried out to text to be marked
Number mark.
Specifically, the prosodic information in the text data of text to be marked, is used to indicate the text data of text to be marked
In pause duration.It should be noted that prosodic information in text data of the present embodiment for obtaining text to be marked
Implementation without limitation, can be using the existing method for carrying out prosody prediction based on text information.
Rhythm symbol is carried out to text to be marked according to the prosodic information in the prosodic information and text data in voice data
Number mark, comprehensively considered text prosody prediction result and phonetic-rhythm analysis as a result, further improving the effect of prosodic labeling
Rate and accuracy.
Optionally, according to the prosodic information in the prosodic information and text data in voice data, to text to be marked into
Row prosodic sign marks, and may include:
According to the prosodic information in voice data, prosodic sign mark is carried out to text to be marked.
According to the prosodic information in text data, the prosodic sign marked in text to be marked is updated.
By carrying out prosodic sign mark to text to be marked based on the prosodic information in voice data, according to text
Prosodic information in data updates the mark of prosodic sign, and it is pre- to consider the text rhythm on the basis of phonetic-rhythm analyzes result
It surveys as a result, further improving the efficiency and accuracy of prosodic labeling.
Optionally, according to the prosodic information in text data, the prosodic sign marked in text to be marked is updated,
May include:
If the prosodic information in text data indicates the position of at least one prosodic sign marked in text to be marked
Without marking prosodic sign, then at least one prosodic sign marked is deleted.
Specifically, the prosodic information in text data is pre- according to the determining text rhythm of the text data of text to be marked
Survey result.Prosodic information in text data usually reflects the pause duration that can grammatically pause, and also including cannot
The position of pause.In some scenes, the prosodic information in text data indicates at least one marked in text to be marked
The position of prosodic sign is without marking prosodic sign.For example, not having pause usually in the centre of function word, function word be can wrap
Include phrase, Chinese idiom, common saying etc..It has been marked in text to be marked at this point it is possible to be deleted according to the prosodic information in text data
At least one prosodic sign further improves the accuracy of prosodic labeling.
Optionally, S102 determines the prosodic information in voice data, may include: according to voice data
According to voice data, mute section of at least one of voice data is obtained.
For each mute section, according to this mute section, mute section of corresponding prosodic information of this in voice data is determined.
Specifically, obtaining mute section of at least one of voice data according to voice data.Described mute section when it is a length of
Pause duration in voice data.
Optionally, for each mute section, according to this mute section, mute section of this in voice data corresponding rhythm letter is determined
It ceases, may include:
According to the mute section of initial time and end time in voice data, mute section of duration is obtained.
It is illustrated below by example.
It is assumed that one mute section of initial time is 00:22:07:300, end time 00:22:07:360.Mute section
When a length of 60ms.Referring to table 1.It is assumed that t3=30ms, t4=90ms.It is possible to according to mute section of the duration wait mark
It is #2 that prosodic sign is marked in explanatory notes sheet.
Optionally, according to voice data, mute section of at least one of voice data is obtained, may include:
Phoneme segmentation is carried out to the text data of text to be marked, obtains voice annotation sequence.
According to voice annotation sequence, voice data and predetermined acoustic model, phoneme segmentation is carried out to voice data, is obtained
Mute section of at least one of voice data.Wherein, predetermined acoustic model is for indicating the corresponding phonetic feature of different phonemes.
Specifically, phoneme is the least speech unit come out from the angular divisions of sound quality.To the textual data of text to be marked
According to carrying out phoneme segmentation, a series of corresponding segment of the and phoneme that text data can be divided into timing adjacent.The segment can
With referred to as voice annotation sequence.Predetermined acoustic model illustrates the corresponding phonetic feature of different phonemes.According to voice annotation sequence,
Voice data and predetermined acoustic model can carry out phoneme segmentation to voice data, obtain at least one of voice data
Mute section.
It should be noted that the present embodiment for phoneme segmentation method without limitation, existing phoneme segmentation can be used
Method.For example, being based on the automatic speech segmentation algorithm of Markov model (Hidden Markov Model, HMM).In the calculation
It can be given annotated sequence based on the language model of HMM in method, using Viterbi algorithm by voice signal and phonetics
It marks unit (phoneme) corresponding HMM sequence and forces alignment.
It should be noted that the present embodiment for predetermined acoustic model type and acquisition modes without limitation.For example, can
To be based on Open-Source Tools Kaldi, the voice data and corresponding text training predetermined acoustic model of the rhythm to be predicted are used.Example again
Such as, predetermined acoustic model can be obtained based on deep neural network (Deep Neural Networks, DNN) algorithm.Optionally,
When amount of voice data is smaller, predetermined acoustic model can be GMM-HMM acoustic model.When amount of voice data is larger, preset
Acoustic model can be DNN-HMM model.
Optionally, phoneme segmentation is carried out to the text data of text to be marked, obtains voice annotation sequence, may include:
Phoneme segmentation carried out to the text data of text to be marked, and adjacent two words in text to be marked interleave
Enter the symbol that pauses, obtains voice annotation sequence.
It is illustrated below by example.
It is assumed that phoneme includes initial consonant and simple or compound vowel of a Chinese syllable.Text to be marked is that " hello, dear motherland.".The text of text to be marked
Notebook data is " ni hao, qin ai de zu guo ".So, voice annotation sequence can be " n i sp h ao sp q in
sp ai sp d e sp z u sp g uo".Wherein, sp indicates the symbol that pauses.
The present embodiment provides a kind of prosodic labeling methods, comprising: the voice data for obtaining text to be marked, according to voice number
According to the prosodic information determined in voice data, prosodic sign mark is carried out to text to be marked according to the prosodic information in voice data
Note.Prosodic labeling method provided in this embodiment carries out rhythm symbol to text to be marked according to the voice data of text to be marked
Number mark, improve the efficiency and accuracy of prosodic labeling.
Fig. 2 is the structural schematic diagram of prosodic labeling device provided in an embodiment of the present invention.Rhythm mark provided in this embodiment
Dispensing device, for executing the prosodic labeling method of embodiment illustrated in fig. 1 offer.As shown in Fig. 2, rhythm mark provided in this embodiment
Dispensing device may include:
First obtains module 11, for obtaining the voice data of text to be marked.
Prosodic information determining module 12, for determining the prosodic information in voice data, prosodic information according to voice data
The pause duration being used to indicate in voice data.
Labeling module 13, for carrying out prosodic sign mark to text to be marked according to the prosodic information in voice data.
It optionally, further include the second acquisition module 14.
Second obtains module 14, the prosodic information in text data for obtaining text to be marked.
Labeling module 13 is specifically used for:
According to the prosodic information in the prosodic information and text data in voice data, rhythm symbol is carried out to text to be marked
Number mark.
Optionally, labeling module 13 is specifically used for:
According to the prosodic information in voice data, prosodic sign mark is carried out to text to be marked.
According to the prosodic information in text data, the prosodic sign marked in text to be marked is updated.
Optionally, labeling module 13 is specifically used for:
If the prosodic information in text data indicates the position of at least one prosodic sign marked in text to be marked
Without marking prosodic sign, then at least one prosodic sign marked is deleted.
Optionally, prosodic information determining module 12 is specifically used for:
According to voice data, mute section of at least one of voice data is obtained.
For each mute section, according to this mute section, mute section of corresponding prosodic information of this in voice data is determined.
Optionally, prosodic information determining module 12 is specifically used for:
Phoneme segmentation is carried out to the text data of text to be marked, obtains voice annotation sequence.
According to voice annotation sequence, voice data and predetermined acoustic model, phoneme segmentation is carried out to voice data, is obtained
Mute section of at least one of voice data.Wherein, predetermined acoustic model is for indicating the corresponding phonetic feature of different phonemes.
Prosodic labeling device provided in this embodiment, it is former for executing the prosodic labeling method of embodiment illustrated in fig. 1 offer
Reason is similar with technical effect, and details are not described herein again.
Fig. 3 is the structural schematic diagram of prosodic labeling equipment provided in an embodiment of the present invention.Rhythm mark provided in this embodiment
Equipment is infused, for executing the prosodic labeling method of embodiment illustrated in fig. 1 offer.
As shown in figure 3, prosodic labeling equipment may include processor 21 and memory 22.The memory 22 is for storing
Instruction, the processor 21 is for executing the instruction stored in the memory 22, so that the prosodic labeling equipment executes Fig. 1
The prosodic labeling method that illustrated embodiment provides, specific implementation is similar with technical effect, and which is not described herein again.
The embodiment of the present invention also provides a kind of storage medium, and instruction is stored in the storage medium, when it is in computer
When upper operation, so that computer executes the prosodic labeling method of embodiment as shown in Figure 1 above.
The embodiment of the present invention also provides a kind of program product, and described program product includes computer program, the computer
Program is stored in a storage medium, at least one processor can read the computer program from the storage medium, described
At least one processor can realize the prosodic labeling method of above-mentioned embodiment illustrated in fig. 1 when executing the computer program.
Those of ordinary skill in the art will appreciate that: realize that all or part of the steps of above-mentioned each method embodiment can lead to
The relevant hardware of program instruction is crossed to complete.Program above-mentioned can be stored in a computer readable storage medium.The journey
When being executed, execution includes the steps that above-mentioned each method embodiment to sequence;And storage medium above-mentioned include: read-only memory (English:
Read-Only Memory, referred to as: ROM), random access memory (English: Random Access Memory, referred to as:
RAM), the various media that can store program code such as magnetic or disk.
Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present invention., rather than its limitations;To the greatest extent
Pipe present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: its according to
So be possible to modify the technical solutions described in the foregoing embodiments, or to some or all of the technical features into
Row equivalent replacement;And these are modified or replaceed, various embodiments of the present invention technology that it does not separate the essence of the corresponding technical solution
The range of scheme.
Claims (10)
1. a kind of prosodic labeling method characterized by comprising
Obtain the voice data of text to be marked;
According to the voice data, determine that the prosodic information in the voice data, the prosodic information are used to indicate institute's predicate
Pause duration in sound data;
According to the prosodic information in the voice data, prosodic sign mark is carried out to the text to be marked.
2. the method according to claim 1, wherein further include:
Obtain the prosodic information in the text data of the text to be marked;
The prosodic information according in the voice data carries out prosodic sign mark to the text to be marked, comprising:
According to the prosodic information in the prosodic information and the text data in the voice data, to the text to be marked into
Row prosodic sign mark.
3. according to the method described in claim 2, it is characterized in that, the prosodic information and institute according in the voice data
The prosodic information in text data is stated, prosodic sign mark is carried out to the text to be marked, comprising:
According to the prosodic information in the voice data, prosodic sign mark is carried out to the text to be marked;
According to the prosodic information in the text data, the prosodic sign marked in the text to be marked is updated.
4. according to the method described in claim 3, it is characterized in that, the prosodic information according in the text data, right
The prosodic sign marked in the text to be marked is updated, comprising:
If the prosodic information in the text data indicates at least one prosodic sign marked in the text to be marked
At least one the described prosodic sign marked is then deleted without marking prosodic sign in position.
5. method according to claim 1-4, which is characterized in that it is described according to the voice data, determine institute
State the prosodic information in voice data, comprising:
According to the voice data, mute section of at least one of described voice data is obtained;
For each mute section, according to this mute section, mute section of corresponding prosodic information of this in the voice data is determined.
6. according to the method described in claim 5, obtaining the voice number it is characterized in that, described according to the voice data
According at least one of mute section, comprising:
Phoneme segmentation is carried out to the text data of the text to be marked, obtains voice annotation sequence;
According to the voice annotation sequence, the voice data and predetermined acoustic model, phoneme is carried out to the voice data
Segmentation, obtains described at least one mute section in the voice data;Wherein, the predetermined acoustic model is for indicating different
The corresponding phonetic feature of phoneme.
7. a kind of prosodic labeling device characterized by comprising
First obtains module, for obtaining the voice data of text to be marked;
Prosodic information determining module, for determining the prosodic information in the voice data, the rhythm according to the voice data
Rule information is used to indicate the pause duration in the voice data;
Labeling module, for carrying out prosodic sign mark to the text to be marked according to the prosodic information in the voice data
Note.
8. device according to claim 7, which is characterized in that further include the second acquisition module;
Described second obtains module, the prosodic information in text data for obtaining the text to be marked;
The labeling module is specifically used for:
According to the prosodic information in the prosodic information and the text data in the voice data, to the text to be marked into
Row prosodic sign mark.
9. device according to claim 8, which is characterized in that the labeling module is specifically used for:
According to the prosodic information in the voice data, prosodic sign mark is carried out to the text to be marked;
According to the prosodic information in the text data, the prosodic sign marked in the text to be marked is updated.
10. a kind of prosodic labeling equipment characterized by comprising memory and processor;
The memory, for storing program instruction;
The processor, for calling the described program stored in the memory instruction to realize as appointed in claim 1-6
Prosodic labeling method described in one.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810988973.9A CN109326281B (en) | 2018-08-28 | 2018-08-28 | Rhythm labeling method, device and equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810988973.9A CN109326281B (en) | 2018-08-28 | 2018-08-28 | Rhythm labeling method, device and equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109326281A true CN109326281A (en) | 2019-02-12 |
CN109326281B CN109326281B (en) | 2020-01-07 |
Family
ID=65263729
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810988973.9A Active CN109326281B (en) | 2018-08-28 | 2018-08-28 | Rhythm labeling method, device and equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109326281B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111105785A (en) * | 2019-12-17 | 2020-05-05 | 广州多益网络股份有限公司 | Text prosodic boundary identification method and device |
CN111161725A (en) * | 2019-12-17 | 2020-05-15 | 珠海格力电器股份有限公司 | Voice interaction method and device, computing equipment and storage medium |
CN111754978A (en) * | 2020-06-15 | 2020-10-09 | 北京百度网讯科技有限公司 | Rhythm hierarchy marking method, device, equipment and storage medium |
CN115116427A (en) * | 2022-06-22 | 2022-09-27 | 马上消费金融股份有限公司 | Labeling method, voice synthesis method, training method and device |
WO2023045433A1 (en) * | 2021-09-24 | 2023-03-30 | 华为云计算技术有限公司 | Prosodic information labeling method and related device |
CN116030789A (en) * | 2022-12-28 | 2023-04-28 | 南京硅基智能科技有限公司 | Method and device for generating speech synthesis training data |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050267758A1 (en) * | 2004-05-31 | 2005-12-01 | International Business Machines Corporation | Converting text-to-speech and adjusting corpus |
CN104916284A (en) * | 2015-06-10 | 2015-09-16 | 百度在线网络技术(北京)有限公司 | Prosody and acoustics joint modeling method and device for voice synthesis system |
CN105244020A (en) * | 2015-09-24 | 2016-01-13 | 百度在线网络技术(北京)有限公司 | Prosodic hierarchy model training method, text-to-speech method and text-to-speech device |
CN105355193A (en) * | 2015-10-30 | 2016-02-24 | 百度在线网络技术(北京)有限公司 | Speech synthesis method and device |
CN106601228A (en) * | 2016-12-09 | 2017-04-26 | 百度在线网络技术(北京)有限公司 | Sample marking method and device based on artificial intelligence prosody prediction |
-
2018
- 2018-08-28 CN CN201810988973.9A patent/CN109326281B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050267758A1 (en) * | 2004-05-31 | 2005-12-01 | International Business Machines Corporation | Converting text-to-speech and adjusting corpus |
CN104916284A (en) * | 2015-06-10 | 2015-09-16 | 百度在线网络技术(北京)有限公司 | Prosody and acoustics joint modeling method and device for voice synthesis system |
CN105244020A (en) * | 2015-09-24 | 2016-01-13 | 百度在线网络技术(北京)有限公司 | Prosodic hierarchy model training method, text-to-speech method and text-to-speech device |
CN105355193A (en) * | 2015-10-30 | 2016-02-24 | 百度在线网络技术(北京)有限公司 | Speech synthesis method and device |
CN106601228A (en) * | 2016-12-09 | 2017-04-26 | 百度在线网络技术(北京)有限公司 | Sample marking method and device based on artificial intelligence prosody prediction |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111105785A (en) * | 2019-12-17 | 2020-05-05 | 广州多益网络股份有限公司 | Text prosodic boundary identification method and device |
CN111161725A (en) * | 2019-12-17 | 2020-05-15 | 珠海格力电器股份有限公司 | Voice interaction method and device, computing equipment and storage medium |
CN111161725B (en) * | 2019-12-17 | 2022-09-27 | 珠海格力电器股份有限公司 | Voice interaction method and device, computing equipment and storage medium |
CN111754978A (en) * | 2020-06-15 | 2020-10-09 | 北京百度网讯科技有限公司 | Rhythm hierarchy marking method, device, equipment and storage medium |
CN111754978B (en) * | 2020-06-15 | 2023-04-18 | 北京百度网讯科技有限公司 | Prosodic hierarchy labeling method, device, equipment and storage medium |
WO2023045433A1 (en) * | 2021-09-24 | 2023-03-30 | 华为云计算技术有限公司 | Prosodic information labeling method and related device |
CN115116427A (en) * | 2022-06-22 | 2022-09-27 | 马上消费金融股份有限公司 | Labeling method, voice synthesis method, training method and device |
CN115116427B (en) * | 2022-06-22 | 2023-11-14 | 马上消费金融股份有限公司 | Labeling method, voice synthesis method, training method and training device |
CN116030789A (en) * | 2022-12-28 | 2023-04-28 | 南京硅基智能科技有限公司 | Method and device for generating speech synthesis training data |
CN116030789B (en) * | 2022-12-28 | 2024-01-26 | 南京硅基智能科技有限公司 | Method and device for generating speech synthesis training data |
Also Published As
Publication number | Publication date |
---|---|
CN109326281B (en) | 2020-01-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109326281A (en) | Prosodic labeling method, apparatus and equipment | |
Rao et al. | Speech recognition using articulatory and excitation source features | |
KR900009170B1 (en) | Rule synthesis voice synthesis system | |
US7890330B2 (en) | Voice recording tool for creating database used in text to speech synthesis system | |
EP2958105B1 (en) | Method and apparatus for speech synthesis based on large corpus | |
CN103985391A (en) | Phonetic-level low power consumption spoken language evaluation and defect diagnosis method without standard pronunciation | |
EP2337006A1 (en) | Speech processing and learning | |
CN105185373A (en) | Rhythm-level prediction model generation method and apparatus, and rhythm-level prediction method and apparatus | |
Rebai et al. | Text-to-speech synthesis system with Arabic diacritic recognition system | |
Ghai et al. | Phone based acoustic modeling for automatic speech recognition for punjabi language | |
Klabbers | Segmental and prosodic improvements to speech generation | |
Matoušek et al. | Recording and annotation of speech corpus for Czech unit selection speech synthesis | |
Matoušek et al. | On comparison of phonetic representations for Czech neural speech synthesis | |
Wu et al. | Mandarin lexical tones: a corpus-based study of word length, syllable position and prosodic position on duration | |
Yu et al. | Overview of SHRC-Ginkgo speech synthesis system for Blizzard Challenge 2013 | |
WO2016200391A1 (en) | System and method for outlier identification to remove poor alignments in speech synthesis | |
Evdokimova et al. | Automatic phonetic transcription for Russian: Speech variability modeling | |
Hertz et al. | Language-universal and language-specific components in the multi-language ETI-Eloquence text-to-speech system | |
Ng | Survey of data-driven approaches to Speech Synthesis | |
Van Niekerk | Tone realisation for speech synthesis of Yorubá | |
Liu | Comparative Analysis of Transfer Learning in Deep Learning Text-to-Speech Models on a Few-Shot, Low-Resource, Customized Dataset | |
Shah et al. | Influence of various asymmetrical contextual factors for TTS in a low resource language | |
Ekpenyong et al. | A Template-Based Approach to Intelligent Multilingual Corpora Transcription | |
Al-Saiyd et al. | Unit selection model in Arabic speech synthesis | |
Thai et al. | Tonal languages speech synthesis using an indirect pitch markers and the quantitative target approximation methods |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |