CN109410915A - The appraisal procedure and device of voice quality, computer readable storage medium - Google Patents

The appraisal procedure and device of voice quality, computer readable storage medium Download PDF

Info

Publication number
CN109410915A
CN109410915A CN201710698522.7A CN201710698522A CN109410915A CN 109410915 A CN109410915 A CN 109410915A CN 201710698522 A CN201710698522 A CN 201710698522A CN 109410915 A CN109410915 A CN 109410915A
Authority
CN
China
Prior art keywords
voice data
voice
assessed
keyword
primary
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710698522.7A
Other languages
Chinese (zh)
Other versions
CN109410915B (en
Inventor
赵奕晨
何成林
刘启飞
丁芹
曹艳艳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN201710698522.7A priority Critical patent/CN109410915B/en
Publication of CN109410915A publication Critical patent/CN109410915A/en
Application granted granted Critical
Publication of CN109410915B publication Critical patent/CN109410915B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/01Assessment or evaluation of speech recognition systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L2015/088Word spotting

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention discloses a kind of appraisal procedures of voice quality and device, computer readable storage medium.The appraisal procedure of the voice quality includes: according to pre-selection voice content acquisition primary voice data;Call is carried out to primary voice data to handle to obtain voice data to be assessed;Voice data to be assessed is converted into speech text to be assessed;Pre-selection voice content is split into M keyword, M keyword is utilized respectively and the quantity that retrieval obtains each keyword not restored correctly is carried out to speech text to be assessed;The quantity for each keyword not restored correctly is added to obtain the first total quantity of the keyword not restored correctly.The second total quantity of keyword corresponding with primary voice data is calculated according to the duration of primary voice data;The content intact degree of voice data to be assessed is assessed according to the first total quantity and the second total quantity.Using the appraisal procedure and device of the voice quality of the embodiment of the present invention, the integrity degree of voice communication content can be assessed.

Description

The appraisal procedure and device of voice quality, computer readable storage medium
Technical field
The present invention relates to the appraisal procedures and device of voice communication technical field more particularly to a kind of voice quality, calculating Machine readable storage medium storing program for executing.
Background technique
From early stage fixed-line telephone, mobile terminal, the tool for voice communication are rapidly developed till now, carry out voice Call also becomes one of the primary demand in people's daily life.Accurately in order to the meaning to be expressed of a side that will converse It is communicated to other side, needs to guarantee the integrity degree of voice communication content.
But the appraisal procedure of voice quality in the prior art is mainly in terms of tone color and tone to voice communication Carry out distortion factor assessment.For example, establishing auditory model based on input-output mode, the voice signal received is calculated with original The distortion factor between voice signal;Alternatively, being based on the way of output, connect according to IP network impairment parameter or audio stream parameter, calculating The distortion factor of the voice signal received.Since the appraisal procedure of voice quality in the prior art does not include to voice communication content Integrity degree assessment, therefore, it is necessary to establish the new appraisal procedure assessed for the integrity degree to voice communication content.
Summary of the invention
The embodiment of the invention provides a kind of appraisal procedure of voice quality and device, computer readable storage medium, energy It is enough that the integrity degree of voice communication content is assessed.
In a first aspect, the embodiment of the invention provides a kind of appraisal procedure of voice quality, which includes:
Primary voice data is acquired according to pre-selection voice content;
Call processing is carried out to the primary voice data, obtains voice data to be assessed;
The voice data to be assessed is converted into speech text to be assessed;
The pre-selection voice content is split into M keyword, is utilized respectively the M keyword to the language to be assessed Sound text is retrieved, and obtains the quantity for each keyword not restored correctly, and M is positive integer;
The quantity for each keyword not restored correctly is added, obtain the keyword not restored correctly first is total Quantity;
According to the duration of the primary voice data, the of keyword corresponding with the primary voice data is calculated Two total quantitys;
According to first total quantity and second total quantity, the content intact of the voice data to be assessed is assessed Degree.
It is described according to first total quantity and second total quantity, assessment in some embodiments of first aspect The content intact degree of the voice data to be assessed, comprising:
Calculate the ratio of first total quantity and second total quantity;
The content intact degree of the voice data to be assessed is assessed according to the ratio.
In some embodiments of first aspect, the pre-selection voice content is configured as covering and is arbitrarily designated languages use The high tone of frequency and/or the basic pronunciation for constituting the specified languages.
In some embodiments of first aspect, the pre-selection voice content is additionally configured to the M keyword split into Between meet at least one of following condition: it is semantic it is different, do not repeat, there is no comprising with by comprising relationship, be not present Homonym.
It is described that primary voice data is acquired according to pre-selection voice content, comprising: root in some embodiments of first aspect According to the primary voice data of pre-selection voice content acquisition male voice or female voice.
In some embodiments of first aspect, the duration according to the primary voice data, calculate and institute State the second total quantity of the corresponding keyword of primary voice data, comprising:
According to the duration of the primary voice data, calculates the primary assessment of satisfaction and need to repeat in the pre-selection voice The times N of appearance, N are positive integer;
Calculate the product of M and N, the second total quantity as keyword corresponding with the primary voice data.
In some embodiments of first aspect, the duration according to the primary voice data calculates and meets Primary assessment needs to repeat the times N of the pre-selection voice content, comprising:
Obtain the number of words that the pre-selection voice content includes;
The product for calculating number of words and voice communication word speed that the pre-selection voice content includes, obtains being repeated once the original The duration that beginning voice content needs;
Calculate the primary voice data duration and it is described be repeated once that the original speech content needs when Long ratio, the primary assessment of satisfaction as the primary voice data need to repeat the times N of the pre-selection voice content.
In some embodiments of first aspect, in the primary voice data in the duplicate pre-selection voice of adjacent needs There are one section between appearance to be left white the time.
In some embodiments of first aspect, the duration of the primary voice data requires to be greater than duration threshold value, The duration threshold value is relative transport speed, the transmission frequency of the talk channel and the raw tone based on talk channel What the word speed of data obtained.
In some embodiments of first aspect, the duration threshold of the primary voice data is determined using following formula Value:
T=100 × α × max (c/ ν f, s)
Wherein, T is the duration threshold value of the primary voice data, and α is constant, and c is the light velocity, and ν is the opposite of talk channel Transmission speed, f are the transmission frequency of the talk channel, and s is the word speed of the primary voice data.
Second aspect, the embodiment of the present invention provide a kind of assessment device of voice quality, which includes:
Acquisition module, for acquiring primary voice data according to pre-selection voice content;
Processing module obtains voice data to be assessed for carrying out call processing to the primary voice data;
Conversion module, for the voice data to be assessed to be converted to speech text to be assessed;
Retrieval module is utilized respectively the M keyword for the pre-selection voice content to be split into M keyword The speech text to be assessed is retrieved, obtains the quantity for each keyword not restored correctly, M is positive integer;
First computing module obtains not gone back correctly for the quantity for each keyword not restored correctly to be added First total quantity of former keyword;
Second computing module calculates and the raw tone number for the duration according to the primary voice data According to the second total quantity of corresponding keyword;
Evaluation module, for assessing the voice number to be assessed according to first total quantity and second total quantity According to content intact degree.
The third aspect, the embodiment of the present invention provide a kind of assessment device of voice quality, including memory, processor and deposit The program that can be run on a memory and on a processor is stored up, the processor is realized as described in right when executing described program Voice quality appraisal procedure.
Fourth aspect, the embodiment of the present invention provide a kind of computer readable storage medium, are stored thereon with program, the journey The appraisal procedure of voice quality as described above is realized when sequence is executed by processor.
According to an embodiment of the invention, language will be preselected by the way that voice data to be assessed is converted to speech text to be assessed Sound content splits into M keyword, and is utilized respectively M keyword and retrieves to speech text to be assessed, it can be deduced that not The quantity of each keyword correctly restored.Then by the quantity for each keyword not restored correctly, it is available not The total quantity of the keyword correctly restored gulps down number of words as during this Speech Assessment.Due to the embodiment of the present invention Number of words is gulped down during available Speech Assessment, as long as gulping down number of words and Speech Assessment number during establishing Speech Assessment According to comprising all keywords sum between relationship, it will be able to assess the content intact degree of voice data to be assessed.
Detailed description of the invention
The present invention may be better understood from the description with reference to the accompanying drawing to a specific embodiment of the invention wherein, The same or similar appended drawing reference indicates the same or similar feature.
Fig. 1 is the flow diagram of the appraisal procedure for the voice quality that one embodiment of the invention provides;
Fig. 2 be another embodiment of the present invention provides voice quality appraisal procedure flow diagram;
Fig. 3 is the flow diagram of the appraisal procedure for the voice quality that the excellent embodiment of the present invention provides;
Fig. 4 is the structural schematic diagram of the assessment device of voice quality provided in an embodiment of the present invention;
Fig. 5 is the hardware structural diagram of the assessment device of voice quality provided in an embodiment of the present invention.
Specific embodiment
The feature and exemplary embodiment of various aspects of the invention is described more fully below.In following detailed description In, many details are proposed, in order to provide complete understanding of the present invention.But to those skilled in the art It will be apparent that the present invention can be implemented in the case where not needing some details in these details.Below to implementation The description of example is used for the purpose of providing by showing example of the invention and better understanding of the invention.The present invention never limits In any concrete configuration set forth below and algorithm, but cover under the premise of without departing from the spirit of the present invention element, Any modification, replacement and the improvement of component and algorithm.In the the accompanying drawings and the following description, well known structure and skill is not shown Art is unnecessary fuzzy to avoid causing the present invention.
The embodiment of the invention provides a kind of appraisal procedures of voice quality and device, computer readable storage medium.It adopts The integrity degree of voice quality can be assessed with the embodiment of the present invention, so that can be by call in communication process The meaning to be expressed of one side is accurately communicated to other side.
Fig. 1 is the flow diagram of the appraisal procedure for the voice quality that one embodiment of the invention provides.As shown in Figure 1, should Appraisal procedure includes step 101 to step 107.
In step 101, primary voice data is acquired according to pre-selection voice content.
Wherein, pre-selection voice content is to be chosen in advance for carrying out assessment tested speech content.Preselect voice content Form can be text, or a Duan Yuyin is not limited herein.
More accurately assessment result, the selection for preselecting voice content need to meet some conditions in order to obtain.
In one example, pre-selection voice content be configured as covering be arbitrarily designated the high tone of languages frequency of use and/ Or constitute the basic pronunciation of specified languages.The selection rule of pre-selection voice content is illustrated by taking Chinese as an example below.
Wherein, the higher word of frequency of use can be pronoun in Chinese, such as: you, I etc.;It is also possible to noun, than Such as: family, friend, weather;Can also be modal particle, such as: it is good, uh, may etc..
Constitute Chinese it is basic pronounce in, share 21 initial consonants, be respectively b, p, N, f, d, t, n, l, g, k, h, N, q, x, zh,ch,sh,r,z,c,s;24 simple or compound vowel of a Chinese syllable, wherein single vowel is a, o, e, i, u, v;Compound vowel be ai, ei, ui, ao, ou, iu, ie、ve、er、an、en、in、un、vn、ang、eng、ing、ong。
In another example, pre-selection voice content is additionally configured to meet following condition between the M keyword split into At least one of: it is semantic different, do not repeat, there is no comprising with by comprising relationship, there is no homonyms.It is right separately below The each condition for needing to meet between M keyword is illustrated.
Wherein, semantic difference refers to that the meaning of word expression is different, for example banana and desk lamp are exactly two and look like completely not Same word.Not repeating to refer in pre-selection voice content is not in the same word.There is no comprising with by comprising Relationship refer to that there is no apparent subordinate relation, such as banana and fruit, banana is exactly the subordinate word of fruit.There is no same Sound word is that there is no the same or similar words that pronounces, for example stay and flow down, and the pronunciation of the two is identical.
Optionally, the embodiment of the present invention can be according to the primary voice data of pre-selection voice content acquisition male voice or female voice. Because there is notable difference in terms of tone color and tone and audio in male voice and female voice, then accordingly, the voice quality of the two Assessment result also can be different.It therefore, can be by the voice input data augmentation of primary voice data to male voice or female voice, so that language The assessment result of sound quality is more comprehensive.It is further possible to by the voice input data augmentation of primary voice data to child Sound or the sound of the old etc., herein without limiting.
It should be noted that environment is tested according to actual Speech Assessment, and when determining pre-selection voice content, above-mentioned pre-selection The selection rule of voice content can be whole satisfactions, be also possible to part satisfaction, wherein corresponding voice matter when all meeting The assessment result of amount is most accurate.
In step 102, call processing is carried out to primary voice data, obtains voice data to be assessed.
Wherein, call processing refers to the communication conversed when carrying out Speech Assessment test by talk channel come analog voice Environment.Specifically, primary voice data can be inputted to one end of talk channel, then received from the other end of talk channel former Voice data of the beginning voice data after transmission loss, as voice data to be assessed.Illustratively, if A and B are leading to Words, the speech understanding that A can be issued are primary voice data, and the sound that A is issued can be heard after transmission by B, can be by B The speech understanding heard is voice data to be assessed.
The communication environment conversed using talk channel analog voice, the duration of primary voice data require to be greater than duration Threshold value, duration threshold value refer to meet a quality evaluation need acquire primary voice data it is most short continue when the time.Example Property, it can be obtained according to the word speed of the relative transport speed of talk channel, the transmission frequency of talk channel and primary voice data To duration threshold value.
In one example, following formula be can use to determine the duration threshold value of primary voice data:
T=100 × α × max (c/ ν f, s) (1)
Wherein, T is the duration threshold value of primary voice data, and α is constant, and c is the light velocity, and ν is the relative transport of talk channel Speed, f are the transmission frequency of talk channel, and s is the word speed of primary voice data, and unit is second/word.
In step 103, voice data to be assessed is converted into speech text to be assessed.Illustratively, voice can be passed through Voice data to be assessed is converted to speech text to be assessed by identification technology.In one example, voice number to be assessed is being obtained According to rear, voice data to be assessed can be automatically converted to speech text to be assessed.
In step 104, pre-selection voice content is split into M keyword, is utilized respectively M keyword to voice to be assessed Text is retrieved, and obtains the quantity for each keyword not restored correctly, and M is positive integer.In one example, Ke Yili Automatically retrieval is carried out to speech text to be assessed with retrieval technique.
In step 105, the quantity for each keyword not restored correctly is added, the key not restored correctly is obtained First total quantity of word.
In step 106, according to the duration of primary voice data, keyword corresponding with primary voice data is calculated Second total quantity;
In step 107, according to the first total quantity and the second total quantity, the content intact degree of voice data to be assessed is assessed.
According to an embodiment of the invention, language will be preselected by the way that voice data to be assessed is converted to speech text to be assessed Sound content splits into M keyword, and is utilized respectively M keyword and retrieves to speech text to be assessed, it can be deduced that not The quantity of each keyword correctly restored.Then by the quantity for each keyword not restored correctly, it is available not The total quantity of the keyword correctly restored gulps down number of words as during this Speech Assessment.Due to the embodiment of the present invention Number of words is gulped down during available Speech Assessment, as long as gulping down number of words and Speech Assessment number during establishing Speech Assessment According to comprising all keywords sum between relationship, it will be able to assess the content intact degree of voice data to be assessed.
Further, since the embodiment of the present invention, which will preselect voice content, splits into M keyword, and it is based on each keyword pair Speech text to be assessed is retrieved, can compared with the whole distortion factor that can only embody call voice in prior art The word not restored correctly is accurately positioned.
In addition, can use speech recognition technology in the embodiment of the present invention carries out automation text turn to voice communication content Change, and using retrieval technique automatically retrieval voice content keyword, therefore, can save a large amount of costs of labor and time at This, and can be avoided the subjective impact of evaluator.
Preferably, according to an embodiment of the invention, the ratio of calculating the first total quantity and the second total quantity, root can be passed through The content intact degree of voice data to be assessed is assessed according to ratio.
Wherein, the first total quantity refers to the total quantity for the keyword not restored correctly, and the second total quantity refers to and original The total quantity of the corresponding keyword of beginning voice data, herein, can using the first total quantity as this Speech Assessment during Number of words is gulped down, then, the ratio of the first total quantity and the second total quantity is it can be understood that for gulping down during this Speech Assessment Word rate.
In one example, voice data to be assessed can be identified as text to be assessed first, according in pre-selection voice M keyword of appearance respectively retrieves speech text to be assessed, and notes down the retrieval quantity q of each keyword0,q1,…, qm-1, obtain the quantity p that each keyword is not restored correctly0,p1,…,pm-1.Then each keyword is not restored correctly Quantity be added, obtainBe denoted as this speech quality evaluation process gulps down number of words, whereinWith it is original The ratio of the corresponding all keyword sums of voice data is the word rate that gulps down.
It should be understood that gulping down, word rate is higher to represent that the integrity degree that primary voice data is reduced is lower, the matter of voice communication It is poorer to measure.Therefore, using the technical solution in the embodiment of the present invention can more also original subscriber carries out voice communication Scene, faster and reliably assessment voice communication reduction integrity degree.
Further, since appraisal procedure in the embodiment of the present invention gulps down the method for word rate and assesses voice quality by calculating, And speech model is not set up, so as to avoid assessment result from being influenced by speech model Parameters variation, therefore, the present invention is implemented Appraisal procedure in example also has the characteristics that stability is high.
Fig. 2 be another embodiment of the present invention provides voice quality appraisal procedure flow diagram.Fig. 2's and Fig. 1 The difference is that the step 106 in Fig. 1 can be refined as the step 1061 in Fig. 2 to step 1062.
In step 1061, according to the duration of primary voice data, calculates the primary assessment of satisfaction and need to repeat pre-selection The times N of voice content, N are positive integer.
In step 1062, the product of M and N is calculated, the second sum as keyword corresponding with primary voice data Amount.
Fig. 3 is the flow diagram of the appraisal procedure for the voice quality that further embodiment of this invention provides.The pass of Fig. 3 and figure System is that the step in Fig. 2 can be refined as the step 10611 in Fig. 3 to step 10613.
In step 10611, the number of words that pre-selection voice content includes is obtained.It should be noted that by taking Chinese character as an example, herein Number of words counted not in accordance with keyword, but counted according to individual Chinese character.
In step 10612, the product of number of words and voice communication word speed that pre-selection voice content includes is calculated, is repeated The duration that original speech content needs.Wherein, the unit of voice communication word speed is second/word.
In step 10613, calculates the duration of primary voice data and be repeated once original speech content needs The ratio of duration, the times N that the primary assessment of satisfaction as primary voice data needs to repeat to preselect voice content.
According to an embodiment of the invention, preselecting voice content for the multistage that can be recognized accurately in voice data to be assessed And corresponding keyword, it can set identical for the initial position for preselecting voice content.It in one example, can be in original There are one section between the duplicate pre-selection voice content of adjacent needs in beginning voice data to be left white the time, i.e., in every section of pre-selection voice It is added k seconds and is left white to synchronize before content, k is positive integer.
Fig. 4 is the structural schematic diagram of the assessment device of voice quality provided in an embodiment of the present invention.Voice quality in Fig. 4 Assessment device include acquisition module 401, processing module 402, conversion module 403, retrieval module 404, the first computing module 405, the second computing module 406 and evaluation module 407.
Wherein, acquisition module 401, for acquiring primary voice data according to pre-selection voice content;
Processing module 402 obtains voice data to be assessed for carrying out call processing to primary voice data;
Conversion module 403, for voice data to be assessed to be converted to speech text to be assessed;
Retrieval module 404 splits into M keyword for that will preselect voice content, is utilized respectively M keyword to be evaluated Estimate speech text to be retrieved, obtain the quantity for each keyword not restored correctly, M is positive integer;
First computing module 405 obtains incorrect for the quantity for each keyword not restored correctly to be added First total quantity of the keyword of reduction;
Second computing module 406 calculates corresponding with primary voice data for the duration according to primary voice data Keyword the second total quantity;
Evaluation module 407, for according to the first total quantity and the second total quantity, the content for assessing voice data to be assessed to be complete Whole degree.
According to an embodiment of the invention, voice data to be assessed is converted to voice text to be assessed by conversion module 403 This, splits into M keyword for voice content is preselected by retrieval module 404, and be utilized respectively M keyword to voice to be assessed Text is retrieved, it can be deduced that the quantity for each keyword not restored correctly.Then the first calculation module 405 will be by not just The quantity of each keyword really restored, the total quantity of the available keyword not restored correctly, is commented as this voice Number of words is gulped down during estimating.Due to gulping down number of words during the available Speech Assessment of the embodiment of the present invention, as long as establishing The relationship gulped down between all keywords sum that number of words and Speech Assessment data include during Speech Assessment, evaluation module 407 can assess the content intact degree of voice data to be assessed.
Fig. 5 is the hardware structural diagram of the assessment device of voice quality provided in an embodiment of the present invention.As shown in figure 5, The assessment device of voice quality in the embodiment of the present invention includes: processor 501, memory 502, communication interface 503 and bus 510.Wherein, processor 501, memory 502 and communication interface 503 connect by bus 510 and complete mutual communication.
Specifically, above-mentioned processor 501 may include central processing unit 501 (CPU) or specific integrated circuit (ASIC), or may be configured to implement the embodiment of the present invention one or more integrated circuits.
Memory 502 may include for data or the mass storage of instruction 502.For example it rather than limits, deposits Reservoir 502 may include HDD, floppy disk drive, flash memory, CD, magneto-optic disk, tape or universal serial bus 510 (USB) driver Or the combination of two or more the above.In a suitable case, memory 502 may include removable or non-removable The medium of (or fixed).In a suitable case, memory 502 can be inside or outside resource interface equipment.In specific reality It applies in example, memory 502 is non-volatile solid state memory 502.In a particular embodiment, memory 502 includes read-only storage Device 502 (ROM).In a suitable case, which can be the ROM of masked edit program, programming ROM (PROM), erasable PROM (EPROM), electric erasable PROM (EEPROM), electrically-alterable ROM (EAROM) or flash memory or two or more the above Combination.
Communication interface 503 is mainly used for realizing in the embodiment of the present invention between each module, device, unit and/or equipment Communication.
That is, the assessment device of voice quality may be implemented as including: processor 501, memory 502, communication Interface 503 and bus 510.Processor 501, memory 502 and communication interface 503 are connected by bus 510 and are completed each other Communication.Memory 502 is for storing program code;Processor 501 is by reading the executable program stored in memory 502 Code runs program corresponding with the executable program code, with the assessment side for executing voice quality described above Method, to realize the appraisal procedure and device of the voice quality in conjunction with described in Fig. 1 to Fig. 4.
It should be clear that all the embodiments in this specification are described in a progressive manner, each embodiment it Between the same or similar part may refer to each other, the highlights of each of the examples are it is different from other embodiments it Place.For device embodiment, related place may refer to the declaratives of embodiment of the method.The invention is not limited to upper Literary particular step described and shown in figure and structure.Those skilled in the art can understand spirit of the invention Afterwards, it is variously modified, modification and addition, or the sequence between changing the step.Also, it for brevity, omits here To the detailed description of known method technology.
However, it is desirable to clear, the invention is not limited to specific configuration described above and shown in figure and processing. Also, the detailed description to known method technology for brevity, is omitted here.In the above-described embodiments, it describes and shows Several specific steps are as example.But method process of the invention is not limited to described and illustrated specific steps, Those skilled in the art can be variously modified, modification and addition after understanding spirit of the invention, or change step Sequence between rapid.
The present invention can realize in other specific forms, without departing from its spirit and essential characteristics.For example, particular implementation Algorithm described in example can be modified, and system architecture is without departing from essence spirit of the invention.Therefore, currently Embodiment be all counted as being exemplary rather than in all respects it is limited, the scope of the present invention by appended claims rather than Foregoing description definition, also, the meaning of claim and whole changes in the range of equivalent are fallen into all be included in Among the scope of the present invention.

Claims (13)

1. a kind of appraisal procedure of voice quality characterized by comprising
Primary voice data is acquired according to pre-selection voice content;
Call processing is carried out to the primary voice data, obtains voice data to be assessed;
The voice data to be assessed is converted into speech text to be assessed;
The pre-selection voice content is split into M keyword, is utilized respectively the M keyword to the voice text to be assessed This is retrieved, and obtains the quantity for each keyword not restored correctly, and M is positive integer;
The quantity for each keyword not restored correctly is added, the first sum of the keyword not restored correctly is obtained Amount;
According to the duration of the primary voice data, calculate keyword corresponding with the primary voice data second is total Quantity;
According to first total quantity and second total quantity, the content intact degree of the voice data to be assessed is assessed.
2. appraisal procedure according to claim 1, which is characterized in that described according to first total quantity and described second Total quantity assesses the content intact degree of the voice data to be assessed, comprising:
Calculate the ratio of first total quantity and second total quantity;
The content intact degree of the voice data to be assessed is assessed according to the ratio.
3. appraisal procedure according to claim 1, which is characterized in that it is any that the pre-selection voice content is configured as covering The high tone of specified languages frequency of use and/or the basic pronunciation for constituting the specified languages.
4. appraisal procedure according to claim 3, which is characterized in that the pre-selection voice content is additionally configured to split into M keyword between meet at least one of following condition: it is semantic it is different, do not repeat, there is no comprising with by comprising Homonym is not present in relationship.
5. appraisal procedure according to claim 1, which is characterized in that described to acquire raw tone according to pre-selection voice content Data, comprising:
According to the primary voice data of pre-selection voice content acquisition male voice or female voice.
6. appraisal procedure according to claim 1, which is characterized in that it is described according to the primary voice data it is lasting when It is long, calculate the second total quantity of keyword corresponding with the primary voice data, comprising:
According to the duration of the primary voice data, calculates the primary assessment of satisfaction and need to repeat the pre-selection voice content Times N, N are positive integer;
Calculate the product of M and N, the second total quantity as keyword corresponding with the primary voice data.
7. appraisal procedure according to claim 6, which is characterized in that it is described according to the primary voice data it is lasting when It is long, it calculates and meets the times N that primary assessment needs to repeat the pre-selection voice content, comprising:
Obtain the number of words that the pre-selection voice content includes;
The product for calculating number of words and voice communication word speed that the pre-selection voice content includes, obtains being repeated once the original language The duration that sound content needs;
Calculate the duration and the duration for being repeated once the original speech content needs of the primary voice data Ratio, the primary assessment of satisfaction as the primary voice data need to repeat the times N of the pre-selection voice content.
8. appraisal procedure according to claim 6, which is characterized in that adjacent in the primary voice data to need to repeat Pre-selection voice content between there are one section to be left white the time.
9. appraisal procedure according to claim 1, which is characterized in that the duration of the primary voice data requires big In duration threshold value, the duration threshold value be relative transport speed based on talk channel, the transmission frequency of the talk channel and What the word speed of the primary voice data obtained.
10. appraisal procedure according to claim 9, which is characterized in that determine the raw tone using following formula The duration threshold value of data:
T=100 × α × max (c/ ν f, s)
Wherein, T is the duration threshold value of the primary voice data, and α is constant, and c is the light velocity, and ν is the relative transport of talk channel Speed, f are the transmission frequency of the talk channel, and s is the word speed of the primary voice data.
11. a kind of assessment device of voice quality characterized by comprising
Acquisition module, for acquiring primary voice data according to pre-selection voice content;
Processing module obtains voice data to be assessed for carrying out call processing to the primary voice data;
Conversion module, for the voice data to be assessed to be converted to speech text to be assessed;
Retrieval module is utilized respectively the M keyword to institute for the pre-selection voice content to be split into M keyword It states speech text to be assessed to be retrieved, obtains the quantity for each keyword not restored correctly, M is positive integer;
First computing module obtains not restored correctly for the quantity of each keyword not restored correctly to be added First total quantity of keyword;
Second computing module calculates and the primary voice data pair for the duration according to the primary voice data Second total quantity of the keyword answered;
Evaluation module, for assessing the voice data to be assessed according to first total quantity and second total quantity Content intact degree.
12. a kind of assessment device of voice quality, including memory, processor and storage are on a memory and can be on a processor The program of operation, which is characterized in that the processor is realized as described in claim 1-10 any one when executing described program Voice quality appraisal procedure.
13. a kind of computer readable storage medium, is stored thereon with program, which is characterized in that described program is executed by processor The appraisal procedure of voice quality of the Shi Shixian as described in claim 1-10 any one.
CN201710698522.7A 2017-08-15 2017-08-15 Method and device for evaluating voice quality and computer readable storage medium Active CN109410915B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710698522.7A CN109410915B (en) 2017-08-15 2017-08-15 Method and device for evaluating voice quality and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710698522.7A CN109410915B (en) 2017-08-15 2017-08-15 Method and device for evaluating voice quality and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN109410915A true CN109410915A (en) 2019-03-01
CN109410915B CN109410915B (en) 2022-03-04

Family

ID=65454290

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710698522.7A Active CN109410915B (en) 2017-08-15 2017-08-15 Method and device for evaluating voice quality and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN109410915B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113593536A (en) * 2021-06-09 2021-11-02 中国电子产品可靠性与环境试验研究所((工业和信息化部电子第五研究所)(中国赛宝实验室)) Device and system for detecting voice recognition accuracy

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080151769A1 (en) * 2004-06-15 2008-06-26 Mohamed El-Hennawey Method and Apparatus for Non-Intrusive Single-Ended Voice Quality Assessment in Voip
CN101572964A (en) * 2009-06-15 2009-11-04 北京爱达特网络科技有限公司 Mobile communication voice evaluation system
CN201585137U (en) * 2009-06-15 2010-09-15 北京集耀网络科技有限公司 Mobile communication voice evaluation device
CN102970133A (en) * 2012-11-12 2013-03-13 安徽量子通信技术有限公司 Voice transmission method of quantum network and voice terminal
CN103365849A (en) * 2012-03-27 2013-10-23 富士通株式会社 Keyword search method and equipment
CN104464757A (en) * 2014-10-28 2015-03-25 科大讯飞股份有限公司 Voice evaluation method and device
CN105103221A (en) * 2013-03-05 2015-11-25 微软技术许可有限责任公司 Speech recognition assisted evaluation on text-to-speech pronunciation issue detection
CN106356053A (en) * 2016-08-09 2017-01-25 北京金山安全软件有限公司 Method and device for testing recognition accuracy of voice input method and electronic equipment
CN106933561A (en) * 2015-12-31 2017-07-07 北京搜狗科技发展有限公司 Pronunciation inputting method and terminal device
CN106937005A (en) * 2015-12-29 2017-07-07 成都鼎桥通信技术有限公司 Mobile terminal realizes the method and mobile terminal of speech quality evaluation

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080151769A1 (en) * 2004-06-15 2008-06-26 Mohamed El-Hennawey Method and Apparatus for Non-Intrusive Single-Ended Voice Quality Assessment in Voip
CN101572964A (en) * 2009-06-15 2009-11-04 北京爱达特网络科技有限公司 Mobile communication voice evaluation system
CN201585137U (en) * 2009-06-15 2010-09-15 北京集耀网络科技有限公司 Mobile communication voice evaluation device
CN103365849A (en) * 2012-03-27 2013-10-23 富士通株式会社 Keyword search method and equipment
CN102970133A (en) * 2012-11-12 2013-03-13 安徽量子通信技术有限公司 Voice transmission method of quantum network and voice terminal
CN105103221A (en) * 2013-03-05 2015-11-25 微软技术许可有限责任公司 Speech recognition assisted evaluation on text-to-speech pronunciation issue detection
CN104464757A (en) * 2014-10-28 2015-03-25 科大讯飞股份有限公司 Voice evaluation method and device
CN106937005A (en) * 2015-12-29 2017-07-07 成都鼎桥通信技术有限公司 Mobile terminal realizes the method and mobile terminal of speech quality evaluation
CN106933561A (en) * 2015-12-31 2017-07-07 北京搜狗科技发展有限公司 Pronunciation inputting method and terminal device
CN106356053A (en) * 2016-08-09 2017-01-25 北京金山安全软件有限公司 Method and device for testing recognition accuracy of voice input method and electronic equipment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113593536A (en) * 2021-06-09 2021-11-02 中国电子产品可靠性与环境试验研究所((工业和信息化部电子第五研究所)(中国赛宝实验室)) Device and system for detecting voice recognition accuracy

Also Published As

Publication number Publication date
CN109410915B (en) 2022-03-04

Similar Documents

Publication Publication Date Title
CN109785824B (en) Training method and device of voice translation model
CN111048064B (en) Voice cloning method and device based on single speaker voice synthesis data set
CN106251859A (en) Voice recognition processing method and apparatus
CN111223498A (en) Intelligent emotion recognition method and device and computer readable storage medium
CN106503231B (en) Search method and device based on artificial intelligence
CN108922521A (en) A kind of voice keyword retrieval method, apparatus, equipment and storage medium
CN111128211B (en) Voice separation method and device
CN106297773A (en) A kind of neutral net acoustic training model method
WO2020119432A1 (en) Speech recognition method and apparatus, and device and storage medium
CN108039168B (en) Acoustic model optimization method and device
CN111508501B (en) Voice recognition method and system with accent for telephone robot
CN112634866B (en) Speech synthesis model training and speech synthesis method, device, equipment and medium
CN107910004A (en) Voiced translation processing method and processing device
CN107240394A (en) A kind of dynamic self-adapting speech analysis techniques for man-machine SET method and system
CN111639529A (en) Speech technology detection method and device based on multi-level logic and computer equipment
CN110909879A (en) Auto-regressive neural network disambiguation model, training and using method, device and system
CN105721651B (en) A kind of voice dial-up method and equipment
Fu et al. Improving punctuation restoration for speech transcripts via external data
KR20190032868A (en) Method and apparatus for voice recognition
CN109410915A (en) The appraisal procedure and device of voice quality, computer readable storage medium
CN110619886B (en) End-to-end voice enhancement method for low-resource Tujia language
CN108538292A (en) A kind of audio recognition method, device, equipment and readable storage medium storing program for executing
CN112652309A (en) Dialect voice conversion method, device, equipment and storage medium
CN104537036A (en) Language feature analyzing method and device
CN114528812A (en) Voice recognition method, system, computing device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant