WO2010128560A1 - Dispositif, procédé et programme de reconnaissance vocale - Google Patents

Dispositif, procédé et programme de reconnaissance vocale Download PDF

Info

Publication number
WO2010128560A1
WO2010128560A1 PCT/JP2009/058707 JP2009058707W WO2010128560A1 WO 2010128560 A1 WO2010128560 A1 WO 2010128560A1 JP 2009058707 W JP2009058707 W JP 2009058707W WO 2010128560 A1 WO2010128560 A1 WO 2010128560A1
Authority
WO
WIPO (PCT)
Prior art keywords
pass
reliability
availability determination
recognition
speech recognition
Prior art date
Application number
PCT/JP2009/058707
Other languages
English (en)
Japanese (ja)
Inventor
川添 佳洋
吉田 実
Original Assignee
パイオニア株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by パイオニア株式会社 filed Critical パイオニア株式会社
Priority to JP2011512291A priority Critical patent/JPWO2010128560A1/ja
Priority to PCT/JP2009/058707 priority patent/WO2010128560A1/fr
Publication of WO2010128560A1 publication Critical patent/WO2010128560A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/32Multiple recognisers used in sequence or in parallel; Score combination systems therefor, e.g. voting systems

Definitions

  • the present invention relates to a speech recognition technique using multipath search in which speech recognition processing is executed a plurality of times.
  • Patent Document 1 discloses a first path processing unit that performs recognition processing on continuous speech based on a simple acoustic model and a simple language model, a recognition result of the first path processing, a detailed acoustic model, and a detailed description.
  • a speech recognition device is disclosed that includes second pass processing means for generating a word string based on a language model.
  • the first pass process can be processed in real time almost in parallel with the input voice.
  • the second pass process affects the response of the entire speech recognition process for the processing time. That is, a result output delay occurs for the processing time of the second pass process.
  • the recognition result of the first pass process can be regarded as sufficiently reliable, the speech recognition apparatus does not necessarily have to execute the recognition process after the second pass process.
  • Patent Document 1 the above problem is not studied at all.
  • the present invention has been made to solve the above-described problems, and provides a speech recognition apparatus capable of reducing the processing amount in multipath search and improving the processing speed until result output. With the goal.
  • the speech recognition apparatus includes an acoustic model storage unit that stores one or more acoustic models, a language model storage unit that stores one or more language models, the acoustic model, and the language.
  • the first pass processing means for determining word string candidates and scores from the input speech signal, and the second pass processing based on the recognition result and / or recognition environment information of the first pass processing means.
  • the second pass execution availability determination means for determining whether or not to execute and the second pass execution availability determination means determine that the second pass processing should be executed, based on the acoustic model and the language model, the candidate And a second pass processing means for re-determining the score.
  • the invention according to claim 11 is a speech recognition method using an acoustic model storage unit that stores one or more acoustic models and a language model storage unit that stores one or more language models, Based on an acoustic model and the language model, a first pass processing step of determining word string candidates and scores from the input speech signal, and based on recognition results and / or recognition environment information of the first pass processing means, When the second pass execution availability determination step for determining whether or not the second pass processing should be executed and the second pass execution availability determination step determines that the second pass processing should be executed, the acoustic model and the language model And a second pass processing step for re-determining the candidate and the score.
  • the invention according to claim 12 is a speech recognition program executed by a computer using an acoustic model storage unit that stores one or more acoustic models and a language model storage unit that stores one or more language models.
  • a first pass processing means for determining a word string candidate and a score from an input speech signal based on the acoustic model and the language model; and a recognition result and / or a recognition environment of the first pass processing means.
  • the second pass execution availability determination means for determining whether or not the second pass processing should be executed based on the information, and when the second pass execution availability determination means determines that the second pass processing should be executed, the sound And second pass processing means for re-determining the candidate and the score based on the model and the language model.
  • the block diagram of the process which a speech recognition apparatus performs is shown.
  • the example of the word graph showing the recognition result of the 1st path matching processing part 11c is shown.
  • the speech recognition apparatus includes an acoustic model storage unit that stores one or more acoustic models, a language model storage unit that stores one or more language models, the acoustic model, and the language model. And a second pass process based on the recognition result and / or information of the recognition environment of the first pass process means for determining word string candidates and scores from the input speech signal.
  • the candidate and Second pass processing means for re-determining the score.
  • the speech recognition apparatus includes an acoustic model storage unit, a language model storage unit, a first pass processing unit, a second pass execution availability determination unit, and a second pass processing unit.
  • the first path processing means determines word string candidates and scores from the input speech signal based on the acoustic model and the language model.
  • the second pass execution availability determination unit determines whether or not the second pass process should be executed based on the recognition result of the first path processing unit and / or the information of the recognition environment.
  • “Recognition environment information” refers to information related to the environment in which the speech recognition apparatus executes recognition processing, and includes, for example, the SN ratio, speech speed, input speech volume, vehicle information, and the like.
  • the second pass processing unit re-determines word string candidates and scores based on the acoustic model and the language model when the second pass execution possibility determination unit determines that the second pass processing should be executed.
  • the speech recognition apparatus unnecessarily executes the recognition process after the second pass process by determining whether the second pass process should be appropriately executed based on the recognition result and / or the information of the recognition environment. Can be suppressed. Therefore, the speech recognition apparatus can reduce the processing amount and improve the processing speed until the result output.
  • the second pass processing unit is used by the first pass processing unit when the second pass execution availability determination unit determines that the second pass process should be executed.
  • the candidate and the score are re-determined based on the acoustic model and the language model having higher accuracy than the acoustic model and the language model.
  • “accuracy higher than the acoustic model and language model used in the first pass processing means” refers to the acoustic model and language model having higher accuracy than the acoustic model and language model used in the first pass processing means. This means that the same acoustic model and language model as the acoustic model and language model used in the first path processing means are included.
  • the speech recognition apparatus reduces the processing amount of the first pass processing and performs the second pass processing only when necessary, thereby reducing the entire processing amount and processing speed until the result output. Can be improved.
  • the second pass execution availability determination unit calculates a reliability of the recognition result based on the recognition result and / or the information, and the reliability is a first threshold value. If it is higher or lower than the second threshold, it is determined that the second pass process should not be continued.
  • the first threshold value and the second threshold value are set to appropriate values based on experiments or the like.
  • the speech recognition apparatus calculates the reliability for the recognition result of the first pass process, and determines whether the second pass process should be executed based on the reliability. Then, when the reliability is greater than the first threshold, the speech recognition apparatus determines that the recognition result of the first pass process is sufficiently reliable and the possibility of correct answer is high.
  • the speech recognition apparatus when the reliability is smaller than the second threshold, the speech recognition apparatus has low reliability of the recognition result of the first pass process, and there is a possibility that a correct word string can be obtained even if the second pass process is executed. Judged as extremely low. Then, when the reliability is higher than the first threshold value or lower than the second threshold value, the speech recognition apparatus determines that the second pass process should not be continued, and the second pass process is unnecessary. The subsequent recognition process is suppressed from being executed. As described above, the speech recognition apparatus can appropriately determine whether or not to execute the second pass process based on the reliability, and can improve the processing speed until the result is output.
  • the second pass execution availability determination unit determines the reliability based on the number of candidates for each word constituting the word string determined by the first pass process. .
  • the speech recognition apparatus can appropriately set the reliability.
  • the second pass execution availability determination unit is configured such that the second pass execution availability determination unit determines that the number of keyword candidates determined by the first pass process is a first predetermined number.
  • the reliability is set to a value higher than the first threshold in the following cases, and the reliability is set to a value lower than the second threshold when the number of candidates is equal to or greater than the second predetermined number.
  • the keyword refers to a word that the voice recognition device needs to particularly recognize in the word string.
  • the first predetermined number is set to a value less than the second predetermined number. Specifically, the first predetermined number and the second predetermined number are set to appropriate values based on experiments or the like.
  • the speech recognition apparatus determines that the keywords are sufficiently narrowed down and the recognition result of the first pass processing unit is highly reliable.
  • the speech recognition apparatus determines that the recognition result of the first pass processing means is low due to an unknown word input or the like.
  • the speech recognition apparatus can determine whether or not the second pass process should be appropriately executed by determining the reliability based on the number of keyword candidates.
  • the second pass execution possibility determination unit sets the reliability higher than a first threshold when the SN ratio is greater than a first predetermined value, and the SN ratio is 2 When the value is smaller than the predetermined value, the reliability is set lower than the second threshold value.
  • the first predetermined value is set to a value equal to or greater than the second predetermined value.
  • the first predetermined value and the second predetermined value are set to appropriate values based on experiments or the like. In general, the SN ratio and the correct answer rate of the recognition result have a correlation. Therefore, when the S / N ratio is high, the recognition result of only the first pass process is likely to be sufficient.
  • the speech recognition apparatus can determine whether or not the second pass process should be appropriately executed by setting the reliability based on the SN ratio.
  • the second pass execution determination unit may determine the reliability when the difference between the best score and the second best score among the scores is greater than a predetermined value. Set higher than the threshold of 1. The predetermined value is set based on experiments or the like. In general, when the recognition result is correct, the above-described score difference tends to increase. Therefore, in this aspect, the speech recognition apparatus can appropriately determine whether or not the second pass process should be executed by setting the reliability based on the above-described score difference.
  • the second pass execution availability determination unit determines the reliability based on at least one of speech speed, speech volume, and presence / absence of sudden noise.
  • the accuracy rate of the recognition result greatly depends on the speech speed, the volume of speech, and the presence or absence of sudden noise. Therefore, the speech recognition apparatus can appropriately determine whether or not the second pass process should be executed by setting the reliability in consideration of the above-described elements.
  • the second pass execution availability determination unit is mounted on a vehicle, and determines the reliability based on information indicating a state of the vehicle.
  • the information indicating the state of the vehicle include a traveling speed based on a vehicle speed pulse, on / off information on an air conditioner, information on whether or not a window is open, and the like.
  • the speech recognition apparatus can appropriately estimate the recognition environment based on information indicating the state of the vehicle, and can appropriately set the reliability.
  • the speech recognition apparatus further includes subword recognition means that is executed in parallel with the first pass processing means and calculates a score by performing analysis in units of subwords based on the speech signal,
  • the second pass execution availability determination unit determines the reliability based on a score difference between the best score obtained by the subword recognition unit and the best score obtained by the first pass processing unit.
  • the speech recognition apparatus performs subword recognition in parallel with the first pass process and monitors the score difference to determine whether the recognition result of the first pass process is reliable. By doing in this way, the speech recognition apparatus can appropriately determine whether or not the second pass process should be executed by appropriately setting the reliability.
  • the speech recognition method uses an acoustic model storage unit that stores one or more acoustic models and a language model storage unit that stores one or more language models.
  • a first pass processing step of determining word string candidates and scores from the input speech signal based on the acoustic model and the language model, and recognition result and / or recognition environment information of the first pass processing means.
  • a second pass execution availability determination step for determining whether or not the second pass processing should be executed, and when the second pass execution availability determination step determines that the second pass processing should be executed, the acoustic model and the And a second pass processing step of redetermining the candidate and the score based on a language model.
  • the speech recognition apparatus is executed by a computer that uses an acoustic model storage unit that stores one or more acoustic models and a language model storage unit that stores one or more language models.
  • a speech recognition program comprising: a first pass processing unit that determines word string candidates and scores from an input speech signal based on the acoustic model and the language model; and a recognition result of the first pass processing unit and / or Or, based on the information of the recognition environment, the second pass execution availability determination unit that determines whether or not to execute the second pass process, and the second pass execution availability determination unit determines that the second pass process should be executed And second pass processing means for re-determining the candidate and the score based on the acoustic model and the language model.
  • the speech recognition apparatus can determine whether or not the second pass process should be appropriately executed, and can suppress unnecessary execution of the recognition process after the second pass process.
  • the program is recorded on a storage medium.
  • FIG. 1 is a schematic configuration diagram of a speech recognition apparatus using a language model.
  • a speech recognition apparatus using a language model recognizes an utterance by a user as a combination of words.
  • the process of recognizing an utterance as a combination of words and converting it into a text is called “dictation”.
  • dictation By recognizing an utterance as a combination of words, it is possible to recognize a sentence other than a sentence prepared in advance, that is, a sentence formed by arbitrarily combining a plurality of words.
  • the speech recognition apparatus includes a dictation unit 10 that performs dictation and a keyword extraction unit 30.
  • the dictation unit 10 includes a first pass execution processing unit 11, a second pass execution availability determination unit 12, a second pass execution processing unit 13, and a language model database 24 (hereinafter referred to as “database”) that stores language models. Is abbreviated as “DB”), and an acoustic model DB 25 for storing the acoustic model.
  • the dictation unit 10 performs a multipath search for performing speech recognition processing a plurality of times on utterance data input through a microphone or the like (hereinafter referred to as “utterance data Sa”).
  • utterance data Sa refers to an input signal including voice.
  • the utterance data Sa indicates an input signal recorded from a microphone during a predetermined time after the user presses the utterance button.
  • the acoustic model DB 25 is a database that stores the characteristics of sounds in units of syllables and phonemes. The sound feature of each word included in the utterance is determined by comparison with the sound feature recorded in the acoustic model, and is calculated as an acoustic score.
  • the acoustic model DB 25 has a high accuracy, that is, an acoustic model having a large model size (hereinafter referred to as “high accuracy acoustic model Hsm”) and a lower accuracy than the high accuracy acoustic model Hsm, that is, the model size.
  • a small acoustic model hereinafter referred to as “low-accuracy acoustic model Lsm”).
  • the acoustic model DB 25 corresponds to the acoustic model storage unit of the present invention.
  • the language model DB 24 is a database that stores the appearance probabilities of combinations of adjacent words.
  • a word N-gram model which is one of statistical language models is used.
  • the language model DB 24 is a language model with high accuracy (hereinafter referred to as “high accuracy language model Hlm”) and a language model with accuracy lower than that of the high accuracy language model Hlm (hereinafter referred to as “low accuracy language model Llm”). ).
  • a language score is calculated using the language model.
  • “Language score” is a value indicating the appearance probability (appearance frequency) of a combination of adjacent words.
  • the language model DB 24 corresponds to the language model storage unit of the present invention.
  • the first pass execution processing unit 11 outputs a word string candidate and a score (total score) corresponding thereto as a recognition result based on the low-accuracy acoustic model Lsm and the low-accuracy language model Llm. The total score will be described later.
  • the first pass execution processing unit 11 performs processing in parallel with the input of the utterance data Sa, and outputs a recognition result simultaneously with the end of the input of the utterance data Sa. The detailed description of the first pass execution processing unit 11 will be further described in the description of FIG.
  • the second pass execution availability determination unit 12 determines whether to continue the recognition process. This process will be described in detail in the “executability determination process” described later.
  • the second pass execution processing unit 11 uses the recognition result of the first pass execution processing unit 11 as the second pass execution processing. To the unit 13.
  • the second pass execution processing unit 13 determines the recognition result of the first pass execution processing unit 11 as the keyword extraction unit 30. To supply.
  • the second pass execution processing unit 13 recalculates the total score of the word string candidates obtained by the first pass execution processing unit 11 based on the high precision acoustic model Hsm and the high precision language model Hlm. As described above, the second pass execution processing unit 13 performs the recognition process only when the second pass execution availability determination unit 12 determines that the recognition process should be continued. The second pass execution processing unit 13 is executed after the input of the utterance data Sa.
  • the keyword extraction unit 30 extracts a predetermined keyword from a word string having a maximum total score obtained as a recognition result. Keywords are determined in advance, and information for identifying keywords and non-keywords is stored in the dictionary. For example, among many words stored in the dictionary, a keyword flag indicating the keyword is added. Preferably, an operation command of a device to which the speech recognition method of the present invention is applied is set as a keyword.
  • FIG. 2 shows a block diagram of processing executed by the speech recognition apparatus.
  • the first path execution processing unit 11 includes a speech segment cutout unit 11a, a feature parameter calculation unit 11b, and a first path matching processing unit 11c.
  • the second path execution processing unit 13 includes a second path matching processing unit 13a.
  • the recognition result output processing unit 31 corresponds to the keyword extraction unit 30 in FIG.
  • the voice segment cutout unit 11a detects a voice segment from the utterance data Sa and outputs voice data in the voice segment. That is, “voice data” refers to data obtained by cutting out only a section corresponding to voice from the utterance data Sa.
  • the feature parameter calculation unit 11b divides the voice data cut out by the voice segment cutout unit 11a for each unit time, calculates a feature parameter in each unit, and supplies it to the first path matching processing unit 11c.
  • the first path matching processing unit 11c performs a matching process of outputting a recognition result by applying the feature parameter obtained every unit time to the low-accuracy language model Llm and the low-accuracy acoustic model Lsm.
  • the first path matching processing unit 11c searches a combination of words registered in a dictionary DB (not shown) or the like that best matches the voice data in chronological order from the beginning of the voice data. By this search, a plurality of word string candidates (hereinafter also referred to as “candidate patterns”) are created.
  • a pruning process is performed so as not to perform the matching process thereafter for a combination having a low score.
  • the first path matching processing unit 11c calculates an acoustic score and a language score for a plurality of candidate patterns to obtain a total score.
  • FIG. 3 shows an example of a word graph representing the recognition result of the first path matching processing unit 11c.
  • FIG. 3 shows the recognition result of the utterance data Sa input when operating the navigation device in a word graph.
  • the horizontal axis represents the time axis
  • the white circles represent nodes.
  • 3A shows a case where there are few candidate patterns, that is, a case where the number of arrows terminating at each node is small
  • FIG. 3B shows a case where there are many candidate patterns, that is, terminations at each node. The case where there are many arrows to be shown is shown.
  • the first path matching processing unit 11c applies the feature parameters obtained for each unit time width (for each frame) to the low-accuracy language model Llm and the low-accuracy acoustic model Lsm, so that FIG. 3A or FIG. A candidate pattern as shown in FIG. Then, the candidate pattern generated by the first path matching processing unit 11c is supplied to the second pass execution availability determination unit 12.
  • the second pass execution availability determination unit 12 determines whether the second pass execution processing unit 13 performs the recognition process based on the recognition result of the first pass execution processing unit 11 and the recognition environment information Ri. judge.
  • the second pass execution availability determination unit 12 supplies the recognition result of the first pass execution processing unit 11 to the recognition result output processing unit 31 when determining that the processing by the second pass execution processing unit 13 is unnecessary.
  • the second pass execution availability determination unit 12 supplies the recognition result of the first pass execution processing unit 11 to the second path matching processing unit 13a.
  • the second path matching processing unit 13a when the second pass execution availability determination unit 12 determines that the recognition process should be continued, the candidate pattern or the total score obtained by the first pass execution processing unit 11 is the higher candidate pattern Is recalculated using the high-accuracy acoustic model Hsm and the high-accuracy language model Hlm. Then, the second path matching processing unit 13 a determines a word string to be output as a final result based on the recalculated total score, and supplies the recognition result to the recognition result output processing unit 31.
  • the recognition result output processing unit 31 outputs a predetermined image or sound by an output device such as a display or a speaker based on the supplied recognition result.
  • the second pass execution possibility determination unit 12 is based on the recognition result of the first pass execution processing unit 11 and the recognition environment information Ri, and is used for determining whether the recognition result of the first pass execution processing unit 11 is reliable. Degree (hereinafter referred to as “reliability T”) is calculated. Then, the second pass execution possibility determination unit 12 determines that the reliability T is greater than a predetermined threshold (hereinafter referred to as “first threshold Tth1”) or the reliability T is a predetermined threshold (hereinafter “ If it is smaller than the second threshold value Tth2, it is determined that the processing of the second pass execution processing unit 13 is unnecessary. By doing so, the speech recognition apparatus reduces the processing amount and improves the response.
  • FIG. 4 is a diagram illustrating processing executed by the speech recognition apparatus based on the reliability T.
  • “first pass processing” indicates processing executed by the first pass execution processing unit 11
  • “second pass processing” indicates processing executed by the second pass execution processing unit 13.
  • the first threshold value Tth1 and the second threshold value Tth2 shown in FIG. 4 are set to appropriate values through experiments or the like. A method for calculating the reliability T will be described in detail separately.
  • the speech recognition apparatus executes only the first pass process. That is, in this case, since the reliability T is smaller than the second threshold value Tth2, the second pass execution availability determination unit 12 determines that a correct recognition result cannot be obtained even if the second pass process is executed. Therefore, in this case, the second pass execution availability determination unit 12 determines that the second pass process should not be executed.
  • the second threshold value Tth2 is set to the lower limit value of the reliability T that may obtain a correct recognition result by executing the second pass process. As described above, when the reliability T is smaller than the second threshold value Tth2, the speech recognition apparatus can reduce wasteful processing and improve response by executing only the first pass processing.
  • the voice recognition device executes only the first pass process. That is, in this case, the second pass execution availability determination unit 12 determines that the recognition result of the first pass process is likely to be correct because the reliability T is greater than the first threshold value Th1. Therefore, in this case, the second pass execution availability determination unit 12 determines that the second pass process should not be executed.
  • the first threshold value Tth1 is set to the upper limit value of the reliability T that may obtain a recognition result with higher accuracy than the first pass process by executing the second pass process.
  • the speech recognition apparatus can reduce unnecessary processing and improve response by executing only the first pass processing.
  • the speech recognition apparatus executes the second pass process in addition to the first pass process. That is, the second pass execution availability determination unit 12 determines that a more accurate recognition result can be obtained by executing the second pass process based on the recognition result of the first pass process. Therefore, the second pass execution availability determination unit 12 determines that the second pass process should be executed. As described above, the speech recognition apparatus obtains a more accurate recognition result by executing the second pass process when the reliability T is equal to or higher than the second threshold and equal to or lower than the first threshold. Can do.
  • the reliability T is determined based on the recognition result of the first pass execution processing unit 11 and the recognition environment information Ri.
  • the second pass execution availability determination unit 12 determines the difference in the total score of each candidate pattern or / and the number of keyword candidates as the recognition result of the first pass execution processing unit 11. use.
  • the second pass execution availability determination unit 12 is external information (hereinafter simply referred to as “external information”) that can estimate the acquisition environment of acoustic information such as an S / N ratio and other utterance data Sa as the recognition environment information Ri. ).
  • the external information corresponds to information on on / off of the air conditioner and information on the traveling speed transmitted from the vehicle.
  • the second pass execution availability determination unit 12 acquires acoustic information from a voice input device such as a microphone, and is a device on which the voice recognition device is mounted or a device that is electrically connected to the device. Get external information from. Then, the second pass execution availability determination unit 12 calculates the reliability T from these pieces of information based on a predetermined formula or map.
  • the above formula or map is appropriately created by experiments or the like and stored in the memory of the speech recognition apparatus. By doing in this way, the speech recognition apparatus can determine whether or not the second pass process should be executed based on the reliability T set appropriately.
  • the second pass execution availability determination unit 12 calculates the reliability T based on the item having the highest priority according to, for example, a predetermined priority between the items. Alternatively, the reliability T may be calculated in consideration of each item by performing predetermined weighting or the like. In addition, the second pass execution availability determination unit 12 calculates the reliability T based on the above-described predetermined map or expression.
  • the second pass execution possibility determination unit 12 sets the reliability T high when the SN ratio is large. For example, when the SN ratio is greater than or equal to a predetermined value, the second pass execution availability determination unit 12 determines that only the first pass process is executed and the second pass process need not be executed.
  • the predetermined value is set to an appropriate value based on experiments or the like.
  • the voice recognition rate varies depending on the SN ratio. Therefore, when the SN ratio is larger than the predetermined value, it is estimated that the recognition rate (correct rate) by the first pass process is high.
  • the second pass execution availability determination unit 12 determines that it is not necessary to execute the second pass process when the SN ratio is equal to or greater than a predetermined value. Thereby, the 2nd pass execution availability judgment part 12 can reduce an unnecessary processing amount.
  • the speech recognition apparatus identifies a candidate pattern having the maximum total score as a word string to be output based only on the recognition result of the first pass process.
  • the second pass execution possibility determination unit 12 increases the difference value of the total score between the candidate pattern having the maximum total score and the second largest candidate pattern among the candidate patterns obtained by the first pass process.
  • Set the reliability T high.
  • the second pass execution availability determination unit 12 sets the reliability T higher than the first threshold Tth1 when the above-described difference value is equal to or greater than a predetermined value.
  • the predetermined value is set to an appropriate value based on experiments or the like.
  • the second pass execution possibility determination unit 12 sets the reliability T to the first when the difference value of the total score between the candidate pattern having the maximum total score and the second largest candidate pattern is equal to or larger than a predetermined value. 1 is set higher than the threshold value Tth1. Thereby, the second pass execution availability determination unit 12 determines that it is not necessary to execute the second pass process, and can reduce an unnecessary processing amount.
  • the speech recognition apparatus specifies a candidate pattern having the maximum total score as a word string to be output based only on the recognition result of the first pass process.
  • the second pass execution availability determination unit 12 sets the reliability T higher as the number of word candidates corresponding to the keyword obtained by the first pass process is smaller. For example, the second pass execution availability determination unit 12 sets the reliability T higher than the first threshold Tth1 when the above-described candidate is a predetermined value (for example, 1) or less.
  • the predetermined value is set to an appropriate value based on experiments or the like.
  • the second pass execution availability determination unit 12 sets the reliability T higher than the first threshold Tth1.
  • a plurality of candidates such as “100 meter scale”, “200 meter scale”, and “500 meter scale” are recognized as candidates corresponding to the keyword. That is, a large number of candidates are recognized by the first pass process.
  • the second pass execution availability determination unit 12 sets the reliability T lower than the first threshold Tth1.
  • the second pass execution availability determination unit 12 sets the reliability T based on the acoustic information as described above. For example, when the second pass execution availability determination unit 12 determines that the recognition rate is highly likely to be extremely low due to acoustic factors, the reliability T is set to a value lower than the second threshold Tth2. .
  • the following are major examples of acoustic factors.
  • the second pass execution availability determination unit 12 sets the reliability T to a lower value as the SN ratio is lower. For example, when the SN ratio is lower than a predetermined value, the second pass execution availability determination unit 12 determines that the recognition result of the first pass process is extremely low and the reliability T is lower than the second threshold Tth2. Set to value.
  • the predetermined value is set to an appropriate value based on experiments or the like. That is, in this case, the second pass execution availability determination unit 12 determines that the recognition result cannot be improved even if the second pass process is executed, and determines that the second pass process is not executed. Thereby, the speech recognition apparatus can reduce a wasteful processing amount.
  • the second pass execution availability determination unit 12 sets the reliability T to a lower value as the difference between the detected utterance speed and the assumed utterance speed increases. For example, the second pass execution possibility determination unit 12 calls the utterance speed faster than a predetermined speed (referred to as “first predetermined speed”) or the utterance speed as a predetermined speed (referred to as “second predetermined speed”). .), The reliability T is set to a value lower than the second threshold value Tth2.
  • the first and second predetermined speeds are set to appropriate values based on experiments and the like.
  • the second pass execution possibility determination unit 12 determines that there is a difference from the speech speed assumed in the acoustic models Lsm and Hsm, and the possibility of obtaining a correct recognition result is extremely low. Therefore, in this case, the speech recognition apparatus can reduce a wasteful processing amount by setting the reliability T to a value lower than the second threshold value Tth2.
  • the second pass execution possibility determination unit 12 calculates the utterance speed, for example, by dividing the input time width of the utterance data Sa by the recognized number of characters.
  • the loudness level of the second pass execution determination unit 12 increases the reliability T as the difference between the loudness of the input voice (that is, the signal level of the input voice data) and the assumed loudness of voice increases. Set to a low value.
  • the second pass execution availability determination unit 12 determines that the sound volume is greater than a predetermined value (referred to as “first predetermined value”) or from a predetermined value (referred to as “second predetermined value”). If it is smaller, the reliability T is set to a value lower than the second threshold Tth2.
  • the first and second predetermined values are set to appropriate values based on experiments or the like.
  • the second pass execution possibility determination unit 12 has a difference from the sound volume assumed in each model, and it is very unlikely that a correct recognition result is obtained even if the second pass process is executed. Judge. Thereby, the speech recognition apparatus can set the reliability T appropriately, and can reduce a wasteful processing amount.
  • the second pass execution availability determination unit 12 sets the reliability T to a low value. For example, when the utterance data Sa includes non-stationary noise, the second pass execution availability determination unit 12 sets a value lower than the second threshold Tth2. As another example, when the utterance data Sa includes a predetermined number or more of non-stationary noise, the second pass execution availability determination unit 12 sets the reliability T to a value lower than the second threshold Tth2. .
  • the predetermined number described above is set to an appropriate value based on experiments or the like. Also by this, the speech recognition apparatus can appropriately set the reliability T and reduce the amount of useless processing.
  • the second pass execution availability determination unit 12 sets the reliability T to a lower value as the number of word candidates corresponding to the keyword obtained by the first pass process increases. For example, the second pass execution possibility determination unit 12 sets the reliability T lower than the second threshold Tth2 when the above-described candidate is equal to or greater than a predetermined value.
  • the predetermined value is set to an appropriate value based on experiments or the like. Generally, when an unknown word that is not registered in the dictionary DB or the like is input, the number of word candidates tends to increase. Therefore, when the word candidate corresponding to the keyword is greater than or equal to the predetermined value, the second pass execution availability determination unit 12 sets the reliability T lower than the second threshold Tth2, thereby reducing unnecessary processing amount. Can do.
  • the second pass execution availability determination unit 12 sets the reliability T to a value lower than the first threshold value Tth1.
  • the second pass execution availability determination unit 12 sets the reliability T to a low value when it is estimated that there is a low possibility of obtaining a correct recognition result based on the acquired external information. Also by this, the 2nd pass execution availability determination part 12 can set the reliability T appropriately.
  • the second pass execution availability determination unit 12 acquires information on whether or not the air conditioner is activated, the traveling speed of the vehicle, and the opening and closing of windows provided in the vehicle from the vehicle. Then, the second pass execution processing unit 13 determines the reliability T based on these pieces of information. For example, the second pass execution processing unit 13 sets the reliability T to a value lower than the second threshold Tth2 when the air conditioner is operating or / and when the traveling speed is large and the window is open.
  • the second pass execution processing unit 13 can appropriately set the reliability T based on the external information.
  • the speech recognition apparatus includes an acoustic model storage unit that stores one or more acoustic models, a language model storage unit that stores one or more language models, and a low-accuracy acoustic model. Based on the low-accuracy language model, a first pass execution processing unit that determines word string candidates and a total score from the input speech signal, and a recognition result and / or recognition environment information of the first pass execution processing unit.
  • the speech recognition apparatus suppresses unnecessary execution of the second pass process by determining whether the second pass process should be appropriately executed based on the recognition result and / or the recognition environment information. can do. Therefore, the speech recognition apparatus can reduce the processing amount and improve the processing speed until the result output.
  • FIG. 5 is an example of a flowchart showing a procedure of processing executed by the speech recognition apparatus in the present embodiment.
  • the voice recognition apparatus repeatedly executes the process of the flowchart shown in FIG. 5 when the utterance data Sa is input.
  • the speech recognition apparatus executes a first pass process (step S101). Specifically, the voice segment cutout unit 11a cuts out voice data from the utterance data Sa. Then, the feature parameter calculation unit 11b divides the voice data cut out by the voice segment cutout unit 11a for each unit time, and calculates the feature parameter in each. Then, the first path matching processing unit 11c outputs the candidate pattern and the total score by applying the feature parameters obtained every unit time to the low-accuracy language model Llm and the low-accuracy acoustic model Lsm.
  • the speech recognition apparatus determines whether or not to execute the second pass process (step S102). Specifically, the second pass execution availability determination unit 12 calculates the reliability T based on the recognition result by the first pass process and the recognition environment information Ri. Then, the second pass execution availability determination unit 12 determines whether or not the second pass process should be executed based on the reliability T.
  • the speech recognition apparatus Pass processing is executed (step S103).
  • the second path matching processing unit 13a uses the high-accuracy acoustic model Hsm and the high-accuracy language model Hlm for the candidate pattern obtained by the first path execution processing unit 11 or the candidate pattern having the highest total score. And recalculate the total score.
  • step S102 when it is determined that the second pass process should not be executed (step S102; No), that is, when the reliability T is smaller than the second threshold Tth2 or when the reliability T is larger than the first threshold Tth1,
  • the speech recognition apparatus advances the process to step S104. Thereby, the speech recognition apparatus can reduce unnecessary processing and improve the response.
  • the voice recognition device outputs a recognition result (step S104). That is, the speech recognition apparatus outputs a word string obtained as a recognition result as a synthesized speech or outputs it on a display. In addition, the speech recognition apparatus extracts keywords from the word string obtained as a recognition result as necessary.
  • the second pass execution availability determination unit 12 sets the reliability T based on the recognition result of the first pass process and the recognition environment information Ri.
  • the method to which the present invention is applicable is not limited to this.
  • the second pass feasibility determining unit 12 performs subword recognition in syllables and phonemes in parallel with the first pass processing, and the score obtained by the subword recognition and the first pass
  • the reliability T may be set based on a score difference from the score obtained by the processing (hereinafter simply referred to as “score difference”). As a result, the second pass execution availability determination unit 12 can more appropriately determine whether or not the second pass process should be executed.
  • FIG. 6 is an example of a block diagram of the speech recognition apparatus according to the first modification.
  • the speech recognition apparatus includes a subword recognition processing unit 41 and an acoustic model DB 42.
  • the subword recognition processing unit 41 analyzes the speech data included in the utterance data Sa in units of subwords based on the acoustic model stored in the acoustic model DB 42, and evaluates the entire speech data. Then, the subword recognition processing unit 41 calculates a predetermined score. The subword recognition processing unit 41 supplies the recognition result to the second pass execution availability determination unit 12.
  • the acoustic model DB 42 stores an acoustic model for executing subword recognition.
  • An acoustic model such as a filler model is an example of this type of acoustic model.
  • the second pass execution possibility determination unit 12 calculates a score difference between the best score obtained by the subword recognition processing unit 41 and the best total score obtained by the first pass processing. Then, the second pass execution availability determination unit 12 sets the reliability T based on the score difference.
  • the second pass execution availability determination unit 12 sets the reliability T to a value lower than the second threshold Tth2.
  • the above threshold value is set to an appropriate value based on experiments or the like. That is, in this case, the second pass execution determination unit 12 has a low recognition result obtained by the first pass process, and it is unlikely that a correct result is obtained even if the second pass process is executed. The second pass process is not executed.
  • the second pass execution availability determination unit 12 sets the reliability T to a value higher than the first threshold Tth1. That is, in this case, the second pass execution availability determination unit 12 determines that the recognition result obtained by the first pass process is highly reliable, and does not execute the second pass process.
  • the second pass execution availability determination unit 12 performs subword recognition in parallel with the first pass process, and calculates the reliability T based on the score difference between the subword recognition score and the total score of the first pass process. By setting, unnecessary processing can be reduced.
  • the subword recognition processing unit 41 uses an acoustic model different from the acoustic model used by the dictation unit 10, but instead of this, the acoustic model used by the dictation unit 10 The same acoustic model may be used. Thereby, the speech recognition apparatus can reduce the amount of memory used.
  • the speech recognition apparatus executes the two-pass search method using the first pass execution processing unit 11 and the second pass execution processing unit 13. That is, the speech recognition apparatus has executed recognition processing twice.
  • the method to which the present invention is applicable is not limited to this. Instead, the speech recognition apparatus may execute the recognition process three times or more.
  • the speech recognition apparatus inputs whether or not to execute the next recognition process between the respective recognition processing units after the second pass execution processing unit 13.
  • a determination unit is provided for determining based on the recognized recognition result and the recognition environment information Ri.
  • each recognition processing unit uses, for example, a language model and an acoustic model with higher accuracy as the subsequent recognition processing unit.
  • the language model DB 24 and the acoustic model DB 25 include language models or acoustic models having different accuracy depending on the number of recognition processes, for example.
  • the speech recognition apparatus executes the subsequent recognition process only when the determination unit determines that the next recognition process should be executed.
  • the speech recognition apparatus can reduce unnecessary processes and improve the response by applying the present invention even if the recognition process is performed three times or more.
  • the second pass execution availability determination unit 12 determines the reliability T based on the number of keyword candidates as one method of determining the reliability T. It was decided. Instead of this, the second pass execution availability determination unit 12 may determine the reliability T based on the number of words terminating at each node of the word graph.
  • the second pass execution availability determination unit 12 calculates the number of words ending at each node, that is, the number of arrows input to each node, from the word graph as shown in FIG.
  • the second pass execution possibility determination unit 12 sets the reliability T to a value higher than the first threshold value Tth1.
  • the second pass execution possibility determination unit 12 sets the reliability T to the second threshold. A value lower than Tth2 is set.
  • the second pass execution availability determination unit 12 sets the reliability T based on the recognition result of the first pass process and the recognition environment information Ri. Instead, the second pass execution availability determination unit 12 may set the reliability T based on either the recognition result of the first pass process or the recognition environment information Ri. Further, as described above, when the reliability T is set using the recognition environment information Ri, the second pass execution availability determination unit 12 is any one of the plurality of acoustic information or external information exemplified above or The reliability T may be set based on a plurality of information.
  • the second pass execution processing unit 13 uses the high-accuracy acoustic model Hsm and the high-precision that are higher in accuracy than the low-accuracy acoustic model Lsm and the low-accuracy language model Llm used by the first pass execution processing unit 11.
  • the language model Hlm was used.
  • the method to which the present invention is applicable is not limited to this.
  • the second pass execution processing unit 13 may use the same acoustic model and language model as the acoustic model and language model used by the first pass execution processing unit 11.
  • the language model DB 24 and the acoustic model DB 25 include at least one language model or acoustic model.
  • the present invention can be applied to various devices that perform voice recognition processing.
  • the present invention can be applied to various devices having a voice input function such as a car navigation device, a mobile phone, a personal computer, an AV device, and a home appliance.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Machine Translation (AREA)

Abstract

La présente invention concerne un dispositif de reconnaissance vocale qui est équipé : d'une unité de stockage de modèle acoustique; d'une unité de stockage de modèle linguistique; d'un moyen de traitement de première passe; d'un moyen de détermination d'exécution et/ou de non-exécution de seconde passe; et d'un moyen de traitement de seconde passe. Le moyen de traitement de première passe détermine les candidats et les scores pour un ensemble de mots provenant du signal vocal entré, sur la base du modèle acoustique et du modèle linguistique. Le moyen de détermination d'exécution et/ou de non-exécution de seconde passe détermine si le traitement de second passe doit être exécuté, sur la base des résultats de reconnaissance du moyen de traitement de première passe et/ou des données d'environnement de reconnaissance. Si le moyen d'exécution et/ou de non-exécution de seconde passe détermine que le traitement de seconde passe doit être mis en œuvre, le moyen de traitement de seconde passe détermine de nouveau les candidats et les scores pour un ensemble de mots sur la base du modèle acoustique et du modèle linguistique.
PCT/JP2009/058707 2009-05-08 2009-05-08 Dispositif, procédé et programme de reconnaissance vocale WO2010128560A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2011512291A JPWO2010128560A1 (ja) 2009-05-08 2009-05-08 音声認識装置、音声認識方法、及び音声認識プログラム
PCT/JP2009/058707 WO2010128560A1 (fr) 2009-05-08 2009-05-08 Dispositif, procédé et programme de reconnaissance vocale

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2009/058707 WO2010128560A1 (fr) 2009-05-08 2009-05-08 Dispositif, procédé et programme de reconnaissance vocale

Publications (1)

Publication Number Publication Date
WO2010128560A1 true WO2010128560A1 (fr) 2010-11-11

Family

ID=43050073

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2009/058707 WO2010128560A1 (fr) 2009-05-08 2009-05-08 Dispositif, procédé et programme de reconnaissance vocale

Country Status (2)

Country Link
JP (1) JPWO2010128560A1 (fr)
WO (1) WO2010128560A1 (fr)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103093747A (zh) * 2011-11-04 2013-05-08 卡西欧计算机株式会社 自动调修正装置及自动调修正方法
JP2014013302A (ja) * 2012-07-04 2014-01-23 Seiko Epson Corp 音声認識システム、音声認識プログラム、記録媒体及び音声認識方法
JPWO2013005248A1 (ja) * 2011-07-05 2015-02-23 三菱電機株式会社 音声認識装置およびナビゲーション装置
WO2015118645A1 (fr) * 2014-02-06 2015-08-13 三菱電機株式会社 Dispositif de recherche vocale et procédé de recherche vocale
US9607619B2 (en) 2013-01-24 2017-03-28 Huawei Device Co., Ltd. Voice identification method and apparatus
US9666186B2 (en) 2013-01-24 2017-05-30 Huawei Device Co., Ltd. Voice identification method and apparatus
KR20180025379A (ko) * 2016-08-30 2018-03-09 자동차부품연구원 음성 인식률을 고려한 운전자 및 주행상황 맞춤형 hud 정보 제공 시스템 및 방법
JP2019020683A (ja) * 2017-07-21 2019-02-07 トヨタ自動車株式会社 音声認識システム及び音声認識方法
CN117351944A (zh) * 2023-12-06 2024-01-05 科大讯飞股份有限公司 语音识别方法、装置、设备及可读存储介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH03266898A (ja) * 1990-03-16 1991-11-27 Fujitsu Ltd 大語彙音声認識処理方式
JPH07160822A (ja) * 1993-12-07 1995-06-23 Ricoh Co Ltd パターン認識方法
JP2001092496A (ja) * 1999-09-22 2001-04-06 Nippon Hoso Kyokai <Nhk> 連続音声認識装置および記録媒体
JP2002006878A (ja) * 2000-06-07 2002-01-11 Sony Internatl Europ Gmbh 音声フレーズ認識方法及び音声認識装置
JP2006030908A (ja) * 2004-07-21 2006-02-02 Honda Motor Co Ltd 車両用音声認識装置及び移動体
JP2007108407A (ja) * 2005-10-13 2007-04-26 Nec Corp 音声認識システムと音声認識方法およびプログラム
JP2008009153A (ja) * 2006-06-29 2008-01-17 Xanavi Informatics Corp 音声対話システム

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH10254480A (ja) * 1997-03-13 1998-09-25 Nippon Telegr & Teleph Corp <Ntt> 音声認識方法
JP3813491B2 (ja) * 2001-10-30 2006-08-23 日本放送協会 連続音声認識装置およびそのプログラム
JP4413564B2 (ja) * 2003-09-16 2010-02-10 三菱電機株式会社 情報端末および音声認識システム

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH03266898A (ja) * 1990-03-16 1991-11-27 Fujitsu Ltd 大語彙音声認識処理方式
JPH07160822A (ja) * 1993-12-07 1995-06-23 Ricoh Co Ltd パターン認識方法
JP2001092496A (ja) * 1999-09-22 2001-04-06 Nippon Hoso Kyokai <Nhk> 連続音声認識装置および記録媒体
JP2002006878A (ja) * 2000-06-07 2002-01-11 Sony Internatl Europ Gmbh 音声フレーズ認識方法及び音声認識装置
JP2006030908A (ja) * 2004-07-21 2006-02-02 Honda Motor Co Ltd 車両用音声認識装置及び移動体
JP2007108407A (ja) * 2005-10-13 2007-04-26 Nec Corp 音声認識システムと音声認識方法およびプログラム
JP2008009153A (ja) * 2006-06-29 2008-01-17 Xanavi Informatics Corp 音声対話システム

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPWO2013005248A1 (ja) * 2011-07-05 2015-02-23 三菱電機株式会社 音声認識装置およびナビゲーション装置
CN103093747A (zh) * 2011-11-04 2013-05-08 卡西欧计算机株式会社 自动调修正装置及自动调修正方法
JP2013097302A (ja) * 2011-11-04 2013-05-20 Casio Comput Co Ltd 自動調修正装置、自動調修正方法及びそのプログラム
JP2014013302A (ja) * 2012-07-04 2014-01-23 Seiko Epson Corp 音声認識システム、音声認識プログラム、記録媒体及び音声認識方法
US9666186B2 (en) 2013-01-24 2017-05-30 Huawei Device Co., Ltd. Voice identification method and apparatus
US9607619B2 (en) 2013-01-24 2017-03-28 Huawei Device Co., Ltd. Voice identification method and apparatus
JPWO2015118645A1 (ja) * 2014-02-06 2017-03-23 三菱電機株式会社 音声検索装置および音声検索方法
CN105981099A (zh) * 2014-02-06 2016-09-28 三菱电机株式会社 语音检索装置和语音检索方法
WO2015118645A1 (fr) * 2014-02-06 2015-08-13 三菱電機株式会社 Dispositif de recherche vocale et procédé de recherche vocale
KR20180025379A (ko) * 2016-08-30 2018-03-09 자동차부품연구원 음성 인식률을 고려한 운전자 및 주행상황 맞춤형 hud 정보 제공 시스템 및 방법
KR102036606B1 (ko) * 2016-08-30 2019-10-28 자동차부품연구원 음성 인식률을 고려한 운전자 및 주행상황 맞춤형 hud 정보 제공 시스템 및 방법
JP2019020683A (ja) * 2017-07-21 2019-02-07 トヨタ自動車株式会社 音声認識システム及び音声認識方法
CN117351944A (zh) * 2023-12-06 2024-01-05 科大讯飞股份有限公司 语音识别方法、装置、设备及可读存储介质
CN117351944B (zh) * 2023-12-06 2024-04-12 科大讯飞股份有限公司 语音识别方法、装置、设备及可读存储介质

Also Published As

Publication number Publication date
JPWO2010128560A1 (ja) 2012-11-01

Similar Documents

Publication Publication Date Title
US20230409102A1 (en) Low-power keyword spotting system
WO2010128560A1 (fr) Dispositif, procédé et programme de reconnaissance vocale
US10210862B1 (en) Lattice decoding and result confirmation using recurrent neural networks
US9600231B1 (en) Model shrinking for embedded keyword spotting
US10923111B1 (en) Speech detection and speech recognition
US8275616B2 (en) System for detecting speech interval and recognizing continuous speech in a noisy environment through real-time recognition of call commands
US9070367B1 (en) Local speech recognition of frequent utterances
US8612223B2 (en) Voice processing device and method, and program
JP5218052B2 (ja) 言語モデル生成システム、言語モデル生成方法および言語モデル生成用プログラム
EP2216775B1 (fr) Reconnaissance vocale
US9165555B2 (en) Low latency real-time vocal tract length normalization
EP2048655B1 (fr) Reconnaissance vocale à plusieurs étages sensible au contexte
US20060287856A1 (en) Speech models generated using competitive training, asymmetric training, and data boosting
EP2192575A1 (fr) Reconnaissance vocale basée sur un modèle acoustique plurilingue
JP6464005B2 (ja) 雑音抑圧音声認識装置およびそのプログラム
US10199037B1 (en) Adaptive beam pruning for automatic speech recognition
US8234112B2 (en) Apparatus and method for generating noise adaptive acoustic model for environment migration including noise adaptive discriminative adaptation method
JPH11153999A (ja) 音声認識装置及びそれを用いた情報処理装置
JP6336219B1 (ja) 音声認識装置および音声認識方法
US9542939B1 (en) Duration ratio modeling for improved speech recognition
CN109065026B (zh) 一种录音控制方法及装置
JP6481939B2 (ja) 音声認識装置および音声認識プログラム
JPH11184491A (ja) 音声認識装置
JP2011118290A (ja) 音声認識装置
JP7159655B2 (ja) 感情推定システムおよびプログラム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 09844347

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2011512291

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 09844347

Country of ref document: EP

Kind code of ref document: A1