CN109410915A - The appraisal procedure and device of voice quality, computer readable storage medium - Google Patents
The appraisal procedure and device of voice quality, computer readable storage medium Download PDFInfo
- Publication number
- CN109410915A CN109410915A CN201710698522.7A CN201710698522A CN109410915A CN 109410915 A CN109410915 A CN 109410915A CN 201710698522 A CN201710698522 A CN 201710698522A CN 109410915 A CN109410915 A CN 109410915A
- Authority
- CN
- China
- Prior art keywords
- voice data
- voice
- assessed
- keyword
- primary
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 47
- 230000006854 communication Effects 0.000 claims abstract description 27
- 238000004891 communication Methods 0.000 claims abstract description 26
- 238000012545 processing Methods 0.000 claims description 13
- 230000005540 biological transmission Effects 0.000 claims description 9
- 238000006243 chemical reaction Methods 0.000 claims description 5
- 238000011156 evaluation Methods 0.000 claims description 5
- 230000002045 lasting effect Effects 0.000 claims 2
- 238000010586 diagram Methods 0.000 description 10
- 238000005516 engineering process Methods 0.000 description 4
- 241000234295 Musa Species 0.000 description 3
- 235000018290 Musa x paradisiaca Nutrition 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 230000008859 change Effects 0.000 description 2
- 150000001875 compounds Chemical class 0.000 description 2
- 238000013434 data augmentation Methods 0.000 description 2
- 235000013399 edible fruits Nutrition 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000013441 quality evaluation Methods 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 1
- 230000006735 deficit Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 239000002245 particle Substances 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/01—Assessment or evaluation of speech recognition systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/088—Word spotting
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Telephonic Communication Services (AREA)
Abstract
The invention discloses a kind of appraisal procedures of voice quality and device, computer readable storage medium.The appraisal procedure of the voice quality includes: according to pre-selection voice content acquisition primary voice data;Call is carried out to primary voice data to handle to obtain voice data to be assessed;Voice data to be assessed is converted into speech text to be assessed;Pre-selection voice content is split into M keyword, M keyword is utilized respectively and the quantity that retrieval obtains each keyword not restored correctly is carried out to speech text to be assessed;The quantity for each keyword not restored correctly is added to obtain the first total quantity of the keyword not restored correctly.The second total quantity of keyword corresponding with primary voice data is calculated according to the duration of primary voice data;The content intact degree of voice data to be assessed is assessed according to the first total quantity and the second total quantity.Using the appraisal procedure and device of the voice quality of the embodiment of the present invention, the integrity degree of voice communication content can be assessed.
Description
Technical field
The present invention relates to the appraisal procedures and device of voice communication technical field more particularly to a kind of voice quality, calculating
Machine readable storage medium storing program for executing.
Background technique
From early stage fixed-line telephone, mobile terminal, the tool for voice communication are rapidly developed till now, carry out voice
Call also becomes one of the primary demand in people's daily life.Accurately in order to the meaning to be expressed of a side that will converse
It is communicated to other side, needs to guarantee the integrity degree of voice communication content.
But the appraisal procedure of voice quality in the prior art is mainly in terms of tone color and tone to voice communication
Carry out distortion factor assessment.For example, establishing auditory model based on input-output mode, the voice signal received is calculated with original
The distortion factor between voice signal;Alternatively, being based on the way of output, connect according to IP network impairment parameter or audio stream parameter, calculating
The distortion factor of the voice signal received.Since the appraisal procedure of voice quality in the prior art does not include to voice communication content
Integrity degree assessment, therefore, it is necessary to establish the new appraisal procedure assessed for the integrity degree to voice communication content.
Summary of the invention
The embodiment of the invention provides a kind of appraisal procedure of voice quality and device, computer readable storage medium, energy
It is enough that the integrity degree of voice communication content is assessed.
In a first aspect, the embodiment of the invention provides a kind of appraisal procedure of voice quality, which includes:
Primary voice data is acquired according to pre-selection voice content;
Call processing is carried out to the primary voice data, obtains voice data to be assessed;
The voice data to be assessed is converted into speech text to be assessed;
The pre-selection voice content is split into M keyword, is utilized respectively the M keyword to the language to be assessed
Sound text is retrieved, and obtains the quantity for each keyword not restored correctly, and M is positive integer;
The quantity for each keyword not restored correctly is added, obtain the keyword not restored correctly first is total
Quantity;
According to the duration of the primary voice data, the of keyword corresponding with the primary voice data is calculated
Two total quantitys;
According to first total quantity and second total quantity, the content intact of the voice data to be assessed is assessed
Degree.
It is described according to first total quantity and second total quantity, assessment in some embodiments of first aspect
The content intact degree of the voice data to be assessed, comprising:
Calculate the ratio of first total quantity and second total quantity;
The content intact degree of the voice data to be assessed is assessed according to the ratio.
In some embodiments of first aspect, the pre-selection voice content is configured as covering and is arbitrarily designated languages use
The high tone of frequency and/or the basic pronunciation for constituting the specified languages.
In some embodiments of first aspect, the pre-selection voice content is additionally configured to the M keyword split into
Between meet at least one of following condition: it is semantic it is different, do not repeat, there is no comprising with by comprising relationship, be not present
Homonym.
It is described that primary voice data is acquired according to pre-selection voice content, comprising: root in some embodiments of first aspect
According to the primary voice data of pre-selection voice content acquisition male voice or female voice.
In some embodiments of first aspect, the duration according to the primary voice data, calculate and institute
State the second total quantity of the corresponding keyword of primary voice data, comprising:
According to the duration of the primary voice data, calculates the primary assessment of satisfaction and need to repeat in the pre-selection voice
The times N of appearance, N are positive integer;
Calculate the product of M and N, the second total quantity as keyword corresponding with the primary voice data.
In some embodiments of first aspect, the duration according to the primary voice data calculates and meets
Primary assessment needs to repeat the times N of the pre-selection voice content, comprising:
Obtain the number of words that the pre-selection voice content includes;
The product for calculating number of words and voice communication word speed that the pre-selection voice content includes, obtains being repeated once the original
The duration that beginning voice content needs;
Calculate the primary voice data duration and it is described be repeated once that the original speech content needs when
Long ratio, the primary assessment of satisfaction as the primary voice data need to repeat the times N of the pre-selection voice content.
In some embodiments of first aspect, in the primary voice data in the duplicate pre-selection voice of adjacent needs
There are one section between appearance to be left white the time.
In some embodiments of first aspect, the duration of the primary voice data requires to be greater than duration threshold value,
The duration threshold value is relative transport speed, the transmission frequency of the talk channel and the raw tone based on talk channel
What the word speed of data obtained.
In some embodiments of first aspect, the duration threshold of the primary voice data is determined using following formula
Value:
T=100 × α × max (c/ ν f, s)
Wherein, T is the duration threshold value of the primary voice data, and α is constant, and c is the light velocity, and ν is the opposite of talk channel
Transmission speed, f are the transmission frequency of the talk channel, and s is the word speed of the primary voice data.
Second aspect, the embodiment of the present invention provide a kind of assessment device of voice quality, which includes:
Acquisition module, for acquiring primary voice data according to pre-selection voice content;
Processing module obtains voice data to be assessed for carrying out call processing to the primary voice data;
Conversion module, for the voice data to be assessed to be converted to speech text to be assessed;
Retrieval module is utilized respectively the M keyword for the pre-selection voice content to be split into M keyword
The speech text to be assessed is retrieved, obtains the quantity for each keyword not restored correctly, M is positive integer;
First computing module obtains not gone back correctly for the quantity for each keyword not restored correctly to be added
First total quantity of former keyword;
Second computing module calculates and the raw tone number for the duration according to the primary voice data
According to the second total quantity of corresponding keyword;
Evaluation module, for assessing the voice number to be assessed according to first total quantity and second total quantity
According to content intact degree.
The third aspect, the embodiment of the present invention provide a kind of assessment device of voice quality, including memory, processor and deposit
The program that can be run on a memory and on a processor is stored up, the processor is realized as described in right when executing described program
Voice quality appraisal procedure.
Fourth aspect, the embodiment of the present invention provide a kind of computer readable storage medium, are stored thereon with program, the journey
The appraisal procedure of voice quality as described above is realized when sequence is executed by processor.
According to an embodiment of the invention, language will be preselected by the way that voice data to be assessed is converted to speech text to be assessed
Sound content splits into M keyword, and is utilized respectively M keyword and retrieves to speech text to be assessed, it can be deduced that not
The quantity of each keyword correctly restored.Then by the quantity for each keyword not restored correctly, it is available not
The total quantity of the keyword correctly restored gulps down number of words as during this Speech Assessment.Due to the embodiment of the present invention
Number of words is gulped down during available Speech Assessment, as long as gulping down number of words and Speech Assessment number during establishing Speech Assessment
According to comprising all keywords sum between relationship, it will be able to assess the content intact degree of voice data to be assessed.
Detailed description of the invention
The present invention may be better understood from the description with reference to the accompanying drawing to a specific embodiment of the invention wherein,
The same or similar appended drawing reference indicates the same or similar feature.
Fig. 1 is the flow diagram of the appraisal procedure for the voice quality that one embodiment of the invention provides;
Fig. 2 be another embodiment of the present invention provides voice quality appraisal procedure flow diagram;
Fig. 3 is the flow diagram of the appraisal procedure for the voice quality that the excellent embodiment of the present invention provides;
Fig. 4 is the structural schematic diagram of the assessment device of voice quality provided in an embodiment of the present invention;
Fig. 5 is the hardware structural diagram of the assessment device of voice quality provided in an embodiment of the present invention.
Specific embodiment
The feature and exemplary embodiment of various aspects of the invention is described more fully below.In following detailed description
In, many details are proposed, in order to provide complete understanding of the present invention.But to those skilled in the art
It will be apparent that the present invention can be implemented in the case where not needing some details in these details.Below to implementation
The description of example is used for the purpose of providing by showing example of the invention and better understanding of the invention.The present invention never limits
In any concrete configuration set forth below and algorithm, but cover under the premise of without departing from the spirit of the present invention element,
Any modification, replacement and the improvement of component and algorithm.In the the accompanying drawings and the following description, well known structure and skill is not shown
Art is unnecessary fuzzy to avoid causing the present invention.
The embodiment of the invention provides a kind of appraisal procedures of voice quality and device, computer readable storage medium.It adopts
The integrity degree of voice quality can be assessed with the embodiment of the present invention, so that can be by call in communication process
The meaning to be expressed of one side is accurately communicated to other side.
Fig. 1 is the flow diagram of the appraisal procedure for the voice quality that one embodiment of the invention provides.As shown in Figure 1, should
Appraisal procedure includes step 101 to step 107.
In step 101, primary voice data is acquired according to pre-selection voice content.
Wherein, pre-selection voice content is to be chosen in advance for carrying out assessment tested speech content.Preselect voice content
Form can be text, or a Duan Yuyin is not limited herein.
More accurately assessment result, the selection for preselecting voice content need to meet some conditions in order to obtain.
In one example, pre-selection voice content be configured as covering be arbitrarily designated the high tone of languages frequency of use and/
Or constitute the basic pronunciation of specified languages.The selection rule of pre-selection voice content is illustrated by taking Chinese as an example below.
Wherein, the higher word of frequency of use can be pronoun in Chinese, such as: you, I etc.;It is also possible to noun, than
Such as: family, friend, weather;Can also be modal particle, such as: it is good, uh, may etc..
Constitute Chinese it is basic pronounce in, share 21 initial consonants, be respectively b, p, N, f, d, t, n, l, g, k, h, N, q, x,
zh,ch,sh,r,z,c,s;24 simple or compound vowel of a Chinese syllable, wherein single vowel is a, o, e, i, u, v;Compound vowel be ai, ei, ui, ao, ou, iu,
ie、ve、er、an、en、in、un、vn、ang、eng、ing、ong。
In another example, pre-selection voice content is additionally configured to meet following condition between the M keyword split into
At least one of: it is semantic different, do not repeat, there is no comprising with by comprising relationship, there is no homonyms.It is right separately below
The each condition for needing to meet between M keyword is illustrated.
Wherein, semantic difference refers to that the meaning of word expression is different, for example banana and desk lamp are exactly two and look like completely not
Same word.Not repeating to refer in pre-selection voice content is not in the same word.There is no comprising with by comprising
Relationship refer to that there is no apparent subordinate relation, such as banana and fruit, banana is exactly the subordinate word of fruit.There is no same
Sound word is that there is no the same or similar words that pronounces, for example stay and flow down, and the pronunciation of the two is identical.
Optionally, the embodiment of the present invention can be according to the primary voice data of pre-selection voice content acquisition male voice or female voice.
Because there is notable difference in terms of tone color and tone and audio in male voice and female voice, then accordingly, the voice quality of the two
Assessment result also can be different.It therefore, can be by the voice input data augmentation of primary voice data to male voice or female voice, so that language
The assessment result of sound quality is more comprehensive.It is further possible to by the voice input data augmentation of primary voice data to child
Sound or the sound of the old etc., herein without limiting.
It should be noted that environment is tested according to actual Speech Assessment, and when determining pre-selection voice content, above-mentioned pre-selection
The selection rule of voice content can be whole satisfactions, be also possible to part satisfaction, wherein corresponding voice matter when all meeting
The assessment result of amount is most accurate.
In step 102, call processing is carried out to primary voice data, obtains voice data to be assessed.
Wherein, call processing refers to the communication conversed when carrying out Speech Assessment test by talk channel come analog voice
Environment.Specifically, primary voice data can be inputted to one end of talk channel, then received from the other end of talk channel former
Voice data of the beginning voice data after transmission loss, as voice data to be assessed.Illustratively, if A and B are leading to
Words, the speech understanding that A can be issued are primary voice data, and the sound that A is issued can be heard after transmission by B, can be by B
The speech understanding heard is voice data to be assessed.
The communication environment conversed using talk channel analog voice, the duration of primary voice data require to be greater than duration
Threshold value, duration threshold value refer to meet a quality evaluation need acquire primary voice data it is most short continue when the time.Example
Property, it can be obtained according to the word speed of the relative transport speed of talk channel, the transmission frequency of talk channel and primary voice data
To duration threshold value.
In one example, following formula be can use to determine the duration threshold value of primary voice data:
T=100 × α × max (c/ ν f, s) (1)
Wherein, T is the duration threshold value of primary voice data, and α is constant, and c is the light velocity, and ν is the relative transport of talk channel
Speed, f are the transmission frequency of talk channel, and s is the word speed of primary voice data, and unit is second/word.
In step 103, voice data to be assessed is converted into speech text to be assessed.Illustratively, voice can be passed through
Voice data to be assessed is converted to speech text to be assessed by identification technology.In one example, voice number to be assessed is being obtained
According to rear, voice data to be assessed can be automatically converted to speech text to be assessed.
In step 104, pre-selection voice content is split into M keyword, is utilized respectively M keyword to voice to be assessed
Text is retrieved, and obtains the quantity for each keyword not restored correctly, and M is positive integer.In one example, Ke Yili
Automatically retrieval is carried out to speech text to be assessed with retrieval technique.
In step 105, the quantity for each keyword not restored correctly is added, the key not restored correctly is obtained
First total quantity of word.
In step 106, according to the duration of primary voice data, keyword corresponding with primary voice data is calculated
Second total quantity;
In step 107, according to the first total quantity and the second total quantity, the content intact degree of voice data to be assessed is assessed.
According to an embodiment of the invention, language will be preselected by the way that voice data to be assessed is converted to speech text to be assessed
Sound content splits into M keyword, and is utilized respectively M keyword and retrieves to speech text to be assessed, it can be deduced that not
The quantity of each keyword correctly restored.Then by the quantity for each keyword not restored correctly, it is available not
The total quantity of the keyword correctly restored gulps down number of words as during this Speech Assessment.Due to the embodiment of the present invention
Number of words is gulped down during available Speech Assessment, as long as gulping down number of words and Speech Assessment number during establishing Speech Assessment
According to comprising all keywords sum between relationship, it will be able to assess the content intact degree of voice data to be assessed.
Further, since the embodiment of the present invention, which will preselect voice content, splits into M keyword, and it is based on each keyword pair
Speech text to be assessed is retrieved, can compared with the whole distortion factor that can only embody call voice in prior art
The word not restored correctly is accurately positioned.
In addition, can use speech recognition technology in the embodiment of the present invention carries out automation text turn to voice communication content
Change, and using retrieval technique automatically retrieval voice content keyword, therefore, can save a large amount of costs of labor and time at
This, and can be avoided the subjective impact of evaluator.
Preferably, according to an embodiment of the invention, the ratio of calculating the first total quantity and the second total quantity, root can be passed through
The content intact degree of voice data to be assessed is assessed according to ratio.
Wherein, the first total quantity refers to the total quantity for the keyword not restored correctly, and the second total quantity refers to and original
The total quantity of the corresponding keyword of beginning voice data, herein, can using the first total quantity as this Speech Assessment during
Number of words is gulped down, then, the ratio of the first total quantity and the second total quantity is it can be understood that for gulping down during this Speech Assessment
Word rate.
In one example, voice data to be assessed can be identified as text to be assessed first, according in pre-selection voice
M keyword of appearance respectively retrieves speech text to be assessed, and notes down the retrieval quantity q of each keyword0,q1,…,
qm-1, obtain the quantity p that each keyword is not restored correctly0,p1,…,pm-1.Then each keyword is not restored correctly
Quantity be added, obtainBe denoted as this speech quality evaluation process gulps down number of words, whereinWith it is original
The ratio of the corresponding all keyword sums of voice data is the word rate that gulps down.
It should be understood that gulping down, word rate is higher to represent that the integrity degree that primary voice data is reduced is lower, the matter of voice communication
It is poorer to measure.Therefore, using the technical solution in the embodiment of the present invention can more also original subscriber carries out voice communication
Scene, faster and reliably assessment voice communication reduction integrity degree.
Further, since appraisal procedure in the embodiment of the present invention gulps down the method for word rate and assesses voice quality by calculating,
And speech model is not set up, so as to avoid assessment result from being influenced by speech model Parameters variation, therefore, the present invention is implemented
Appraisal procedure in example also has the characteristics that stability is high.
Fig. 2 be another embodiment of the present invention provides voice quality appraisal procedure flow diagram.Fig. 2's and Fig. 1
The difference is that the step 106 in Fig. 1 can be refined as the step 1061 in Fig. 2 to step 1062.
In step 1061, according to the duration of primary voice data, calculates the primary assessment of satisfaction and need to repeat pre-selection
The times N of voice content, N are positive integer.
In step 1062, the product of M and N is calculated, the second sum as keyword corresponding with primary voice data
Amount.
Fig. 3 is the flow diagram of the appraisal procedure for the voice quality that further embodiment of this invention provides.The pass of Fig. 3 and figure
System is that the step in Fig. 2 can be refined as the step 10611 in Fig. 3 to step 10613.
In step 10611, the number of words that pre-selection voice content includes is obtained.It should be noted that by taking Chinese character as an example, herein
Number of words counted not in accordance with keyword, but counted according to individual Chinese character.
In step 10612, the product of number of words and voice communication word speed that pre-selection voice content includes is calculated, is repeated
The duration that original speech content needs.Wherein, the unit of voice communication word speed is second/word.
In step 10613, calculates the duration of primary voice data and be repeated once original speech content needs
The ratio of duration, the times N that the primary assessment of satisfaction as primary voice data needs to repeat to preselect voice content.
According to an embodiment of the invention, preselecting voice content for the multistage that can be recognized accurately in voice data to be assessed
And corresponding keyword, it can set identical for the initial position for preselecting voice content.It in one example, can be in original
There are one section between the duplicate pre-selection voice content of adjacent needs in beginning voice data to be left white the time, i.e., in every section of pre-selection voice
It is added k seconds and is left white to synchronize before content, k is positive integer.
Fig. 4 is the structural schematic diagram of the assessment device of voice quality provided in an embodiment of the present invention.Voice quality in Fig. 4
Assessment device include acquisition module 401, processing module 402, conversion module 403, retrieval module 404, the first computing module
405, the second computing module 406 and evaluation module 407.
Wherein, acquisition module 401, for acquiring primary voice data according to pre-selection voice content;
Processing module 402 obtains voice data to be assessed for carrying out call processing to primary voice data;
Conversion module 403, for voice data to be assessed to be converted to speech text to be assessed;
Retrieval module 404 splits into M keyword for that will preselect voice content, is utilized respectively M keyword to be evaluated
Estimate speech text to be retrieved, obtain the quantity for each keyword not restored correctly, M is positive integer;
First computing module 405 obtains incorrect for the quantity for each keyword not restored correctly to be added
First total quantity of the keyword of reduction;
Second computing module 406 calculates corresponding with primary voice data for the duration according to primary voice data
Keyword the second total quantity;
Evaluation module 407, for according to the first total quantity and the second total quantity, the content for assessing voice data to be assessed to be complete
Whole degree.
According to an embodiment of the invention, voice data to be assessed is converted to voice text to be assessed by conversion module 403
This, splits into M keyword for voice content is preselected by retrieval module 404, and be utilized respectively M keyword to voice to be assessed
Text is retrieved, it can be deduced that the quantity for each keyword not restored correctly.Then the first calculation module 405 will be by not just
The quantity of each keyword really restored, the total quantity of the available keyword not restored correctly, is commented as this voice
Number of words is gulped down during estimating.Due to gulping down number of words during the available Speech Assessment of the embodiment of the present invention, as long as establishing
The relationship gulped down between all keywords sum that number of words and Speech Assessment data include during Speech Assessment, evaluation module
407 can assess the content intact degree of voice data to be assessed.
Fig. 5 is the hardware structural diagram of the assessment device of voice quality provided in an embodiment of the present invention.As shown in figure 5,
The assessment device of voice quality in the embodiment of the present invention includes: processor 501, memory 502, communication interface 503 and bus
510.Wherein, processor 501, memory 502 and communication interface 503 connect by bus 510 and complete mutual communication.
Specifically, above-mentioned processor 501 may include central processing unit 501 (CPU) or specific integrated circuit
(ASIC), or may be configured to implement the embodiment of the present invention one or more integrated circuits.
Memory 502 may include for data or the mass storage of instruction 502.For example it rather than limits, deposits
Reservoir 502 may include HDD, floppy disk drive, flash memory, CD, magneto-optic disk, tape or universal serial bus 510 (USB) driver
Or the combination of two or more the above.In a suitable case, memory 502 may include removable or non-removable
The medium of (or fixed).In a suitable case, memory 502 can be inside or outside resource interface equipment.In specific reality
It applies in example, memory 502 is non-volatile solid state memory 502.In a particular embodiment, memory 502 includes read-only storage
Device 502 (ROM).In a suitable case, which can be the ROM of masked edit program, programming ROM (PROM), erasable PROM
(EPROM), electric erasable PROM (EEPROM), electrically-alterable ROM (EAROM) or flash memory or two or more the above
Combination.
Communication interface 503 is mainly used for realizing in the embodiment of the present invention between each module, device, unit and/or equipment
Communication.
That is, the assessment device of voice quality may be implemented as including: processor 501, memory 502, communication
Interface 503 and bus 510.Processor 501, memory 502 and communication interface 503 are connected by bus 510 and are completed each other
Communication.Memory 502 is for storing program code;Processor 501 is by reading the executable program stored in memory 502
Code runs program corresponding with the executable program code, with the assessment side for executing voice quality described above
Method, to realize the appraisal procedure and device of the voice quality in conjunction with described in Fig. 1 to Fig. 4.
It should be clear that all the embodiments in this specification are described in a progressive manner, each embodiment it
Between the same or similar part may refer to each other, the highlights of each of the examples are it is different from other embodiments it
Place.For device embodiment, related place may refer to the declaratives of embodiment of the method.The invention is not limited to upper
Literary particular step described and shown in figure and structure.Those skilled in the art can understand spirit of the invention
Afterwards, it is variously modified, modification and addition, or the sequence between changing the step.Also, it for brevity, omits here
To the detailed description of known method technology.
However, it is desirable to clear, the invention is not limited to specific configuration described above and shown in figure and processing.
Also, the detailed description to known method technology for brevity, is omitted here.In the above-described embodiments, it describes and shows
Several specific steps are as example.But method process of the invention is not limited to described and illustrated specific steps,
Those skilled in the art can be variously modified, modification and addition after understanding spirit of the invention, or change step
Sequence between rapid.
The present invention can realize in other specific forms, without departing from its spirit and essential characteristics.For example, particular implementation
Algorithm described in example can be modified, and system architecture is without departing from essence spirit of the invention.Therefore, currently
Embodiment be all counted as being exemplary rather than in all respects it is limited, the scope of the present invention by appended claims rather than
Foregoing description definition, also, the meaning of claim and whole changes in the range of equivalent are fallen into all be included in
Among the scope of the present invention.
Claims (13)
1. a kind of appraisal procedure of voice quality characterized by comprising
Primary voice data is acquired according to pre-selection voice content;
Call processing is carried out to the primary voice data, obtains voice data to be assessed;
The voice data to be assessed is converted into speech text to be assessed;
The pre-selection voice content is split into M keyword, is utilized respectively the M keyword to the voice text to be assessed
This is retrieved, and obtains the quantity for each keyword not restored correctly, and M is positive integer;
The quantity for each keyword not restored correctly is added, the first sum of the keyword not restored correctly is obtained
Amount;
According to the duration of the primary voice data, calculate keyword corresponding with the primary voice data second is total
Quantity;
According to first total quantity and second total quantity, the content intact degree of the voice data to be assessed is assessed.
2. appraisal procedure according to claim 1, which is characterized in that described according to first total quantity and described second
Total quantity assesses the content intact degree of the voice data to be assessed, comprising:
Calculate the ratio of first total quantity and second total quantity;
The content intact degree of the voice data to be assessed is assessed according to the ratio.
3. appraisal procedure according to claim 1, which is characterized in that it is any that the pre-selection voice content is configured as covering
The high tone of specified languages frequency of use and/or the basic pronunciation for constituting the specified languages.
4. appraisal procedure according to claim 3, which is characterized in that the pre-selection voice content is additionally configured to split into
M keyword between meet at least one of following condition: it is semantic it is different, do not repeat, there is no comprising with by comprising
Homonym is not present in relationship.
5. appraisal procedure according to claim 1, which is characterized in that described to acquire raw tone according to pre-selection voice content
Data, comprising:
According to the primary voice data of pre-selection voice content acquisition male voice or female voice.
6. appraisal procedure according to claim 1, which is characterized in that it is described according to the primary voice data it is lasting when
It is long, calculate the second total quantity of keyword corresponding with the primary voice data, comprising:
According to the duration of the primary voice data, calculates the primary assessment of satisfaction and need to repeat the pre-selection voice content
Times N, N are positive integer;
Calculate the product of M and N, the second total quantity as keyword corresponding with the primary voice data.
7. appraisal procedure according to claim 6, which is characterized in that it is described according to the primary voice data it is lasting when
It is long, it calculates and meets the times N that primary assessment needs to repeat the pre-selection voice content, comprising:
Obtain the number of words that the pre-selection voice content includes;
The product for calculating number of words and voice communication word speed that the pre-selection voice content includes, obtains being repeated once the original language
The duration that sound content needs;
Calculate the duration and the duration for being repeated once the original speech content needs of the primary voice data
Ratio, the primary assessment of satisfaction as the primary voice data need to repeat the times N of the pre-selection voice content.
8. appraisal procedure according to claim 6, which is characterized in that adjacent in the primary voice data to need to repeat
Pre-selection voice content between there are one section to be left white the time.
9. appraisal procedure according to claim 1, which is characterized in that the duration of the primary voice data requires big
In duration threshold value, the duration threshold value be relative transport speed based on talk channel, the transmission frequency of the talk channel and
What the word speed of the primary voice data obtained.
10. appraisal procedure according to claim 9, which is characterized in that determine the raw tone using following formula
The duration threshold value of data:
T=100 × α × max (c/ ν f, s)
Wherein, T is the duration threshold value of the primary voice data, and α is constant, and c is the light velocity, and ν is the relative transport of talk channel
Speed, f are the transmission frequency of the talk channel, and s is the word speed of the primary voice data.
11. a kind of assessment device of voice quality characterized by comprising
Acquisition module, for acquiring primary voice data according to pre-selection voice content;
Processing module obtains voice data to be assessed for carrying out call processing to the primary voice data;
Conversion module, for the voice data to be assessed to be converted to speech text to be assessed;
Retrieval module is utilized respectively the M keyword to institute for the pre-selection voice content to be split into M keyword
It states speech text to be assessed to be retrieved, obtains the quantity for each keyword not restored correctly, M is positive integer;
First computing module obtains not restored correctly for the quantity of each keyword not restored correctly to be added
First total quantity of keyword;
Second computing module calculates and the primary voice data pair for the duration according to the primary voice data
Second total quantity of the keyword answered;
Evaluation module, for assessing the voice data to be assessed according to first total quantity and second total quantity
Content intact degree.
12. a kind of assessment device of voice quality, including memory, processor and storage are on a memory and can be on a processor
The program of operation, which is characterized in that the processor is realized as described in claim 1-10 any one when executing described program
Voice quality appraisal procedure.
13. a kind of computer readable storage medium, is stored thereon with program, which is characterized in that described program is executed by processor
The appraisal procedure of voice quality of the Shi Shixian as described in claim 1-10 any one.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710698522.7A CN109410915B (en) | 2017-08-15 | 2017-08-15 | Method and device for evaluating voice quality and computer readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710698522.7A CN109410915B (en) | 2017-08-15 | 2017-08-15 | Method and device for evaluating voice quality and computer readable storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109410915A true CN109410915A (en) | 2019-03-01 |
CN109410915B CN109410915B (en) | 2022-03-04 |
Family
ID=65454290
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710698522.7A Active CN109410915B (en) | 2017-08-15 | 2017-08-15 | Method and device for evaluating voice quality and computer readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109410915B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113593536A (en) * | 2021-06-09 | 2021-11-02 | 中国电子产品可靠性与环境试验研究所((工业和信息化部电子第五研究所)(中国赛宝实验室)) | Device and system for detecting voice recognition accuracy |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080151769A1 (en) * | 2004-06-15 | 2008-06-26 | Mohamed El-Hennawey | Method and Apparatus for Non-Intrusive Single-Ended Voice Quality Assessment in Voip |
CN101572964A (en) * | 2009-06-15 | 2009-11-04 | 北京爱达特网络科技有限公司 | Mobile communication voice evaluation system |
CN201585137U (en) * | 2009-06-15 | 2010-09-15 | 北京集耀网络科技有限公司 | Mobile communication voice evaluation device |
CN102970133A (en) * | 2012-11-12 | 2013-03-13 | 安徽量子通信技术有限公司 | Voice transmission method of quantum network and voice terminal |
CN103365849A (en) * | 2012-03-27 | 2013-10-23 | 富士通株式会社 | Keyword search method and equipment |
CN104464757A (en) * | 2014-10-28 | 2015-03-25 | 科大讯飞股份有限公司 | Voice evaluation method and device |
CN105103221A (en) * | 2013-03-05 | 2015-11-25 | 微软技术许可有限责任公司 | Speech recognition assisted evaluation on text-to-speech pronunciation issue detection |
CN106356053A (en) * | 2016-08-09 | 2017-01-25 | 北京金山安全软件有限公司 | Method and device for testing recognition accuracy of voice input method and electronic equipment |
CN106933561A (en) * | 2015-12-31 | 2017-07-07 | 北京搜狗科技发展有限公司 | Pronunciation inputting method and terminal device |
CN106937005A (en) * | 2015-12-29 | 2017-07-07 | 成都鼎桥通信技术有限公司 | Mobile terminal realizes the method and mobile terminal of speech quality evaluation |
-
2017
- 2017-08-15 CN CN201710698522.7A patent/CN109410915B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080151769A1 (en) * | 2004-06-15 | 2008-06-26 | Mohamed El-Hennawey | Method and Apparatus for Non-Intrusive Single-Ended Voice Quality Assessment in Voip |
CN101572964A (en) * | 2009-06-15 | 2009-11-04 | 北京爱达特网络科技有限公司 | Mobile communication voice evaluation system |
CN201585137U (en) * | 2009-06-15 | 2010-09-15 | 北京集耀网络科技有限公司 | Mobile communication voice evaluation device |
CN103365849A (en) * | 2012-03-27 | 2013-10-23 | 富士通株式会社 | Keyword search method and equipment |
CN102970133A (en) * | 2012-11-12 | 2013-03-13 | 安徽量子通信技术有限公司 | Voice transmission method of quantum network and voice terminal |
CN105103221A (en) * | 2013-03-05 | 2015-11-25 | 微软技术许可有限责任公司 | Speech recognition assisted evaluation on text-to-speech pronunciation issue detection |
CN104464757A (en) * | 2014-10-28 | 2015-03-25 | 科大讯飞股份有限公司 | Voice evaluation method and device |
CN106937005A (en) * | 2015-12-29 | 2017-07-07 | 成都鼎桥通信技术有限公司 | Mobile terminal realizes the method and mobile terminal of speech quality evaluation |
CN106933561A (en) * | 2015-12-31 | 2017-07-07 | 北京搜狗科技发展有限公司 | Pronunciation inputting method and terminal device |
CN106356053A (en) * | 2016-08-09 | 2017-01-25 | 北京金山安全软件有限公司 | Method and device for testing recognition accuracy of voice input method and electronic equipment |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113593536A (en) * | 2021-06-09 | 2021-11-02 | 中国电子产品可靠性与环境试验研究所((工业和信息化部电子第五研究所)(中国赛宝实验室)) | Device and system for detecting voice recognition accuracy |
Also Published As
Publication number | Publication date |
---|---|
CN109410915B (en) | 2022-03-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109785824B (en) | Training method and device of voice translation model | |
CN111048064B (en) | Voice cloning method and device based on single speaker voice synthesis data set | |
CN106251859A (en) | Voice recognition processing method and apparatus | |
CN111223498A (en) | Intelligent emotion recognition method and device and computer readable storage medium | |
CN106503231B (en) | Search method and device based on artificial intelligence | |
CN108922521A (en) | A kind of voice keyword retrieval method, apparatus, equipment and storage medium | |
CN111128211B (en) | Voice separation method and device | |
CN106297773A (en) | A kind of neutral net acoustic training model method | |
WO2020119432A1 (en) | Speech recognition method and apparatus, and device and storage medium | |
CN108039168B (en) | Acoustic model optimization method and device | |
CN111508501B (en) | Voice recognition method and system with accent for telephone robot | |
CN112634866B (en) | Speech synthesis model training and speech synthesis method, device, equipment and medium | |
CN107910004A (en) | Voiced translation processing method and processing device | |
CN107240394A (en) | A kind of dynamic self-adapting speech analysis techniques for man-machine SET method and system | |
CN111639529A (en) | Speech technology detection method and device based on multi-level logic and computer equipment | |
CN110909879A (en) | Auto-regressive neural network disambiguation model, training and using method, device and system | |
CN105721651B (en) | A kind of voice dial-up method and equipment | |
Fu et al. | Improving punctuation restoration for speech transcripts via external data | |
KR20190032868A (en) | Method and apparatus for voice recognition | |
CN109410915A (en) | The appraisal procedure and device of voice quality, computer readable storage medium | |
CN110619886B (en) | End-to-end voice enhancement method for low-resource Tujia language | |
CN108538292A (en) | A kind of audio recognition method, device, equipment and readable storage medium storing program for executing | |
CN112652309A (en) | Dialect voice conversion method, device, equipment and storage medium | |
CN104537036A (en) | Language feature analyzing method and device | |
CN114528812A (en) | Voice recognition method, system, computing device and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |