WO2019119553A1 - 语义识别方法及装置 - Google Patents

语义识别方法及装置 Download PDF

Info

Publication number
WO2019119553A1
WO2019119553A1 PCT/CN2018/072008 CN2018072008W WO2019119553A1 WO 2019119553 A1 WO2019119553 A1 WO 2019119553A1 CN 2018072008 W CN2018072008 W CN 2018072008W WO 2019119553 A1 WO2019119553 A1 WO 2019119553A1
Authority
WO
WIPO (PCT)
Prior art keywords
word
semantic
preset
semantics
voice
Prior art date
Application number
PCT/CN2018/072008
Other languages
English (en)
French (fr)
Inventor
张立新
周毕兴
Original Assignee
深圳市沃特沃德股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市沃特沃德股份有限公司 filed Critical 深圳市沃特沃德股份有限公司
Publication of WO2019119553A1 publication Critical patent/WO2019119553A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1815Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding

Definitions

  • the present invention relates to the field of speech recognition technologies, and in particular, to a semantic recognition method and apparatus.
  • the application of speech recognition is more and more extensive, not only used in the field of voice input method, but also widely used in translation machines, artificial intelligence and other applications.
  • Chinese characters do not count as tones, there are more than 400 pronunciations, and there are a lot of homonyms.
  • the content recognition of text can reason the association according to the meaning of word combination or context to improve the accuracy, but when the number, symbol, and letter are recognized irregularly, the false recognition rate is high, and manual correction is often required, and no display is displayed. Screen and keyboard input devices are more difficult to perform error correction operations, which will reduce the accuracy and recognition efficiency of speech recognition.
  • the main object of the present invention is to provide a semantic recognition method and apparatus for improving semantic recognition accuracy.
  • the invention provides a semantic recognition method comprising the following steps:
  • the semantics corresponding to the single-word speech are matched in the preset semantic database.
  • the comparison result is that the duration of the single-word speech is less than the first preset value
  • the preset semantic database is a Chinese character semantic library, where the Chinese character semantic library includes preset single-word speech and corresponding presets. Chinese character;
  • the comparison result is that the duration of the single-word speech is not less than the first preset value
  • the preset semantic database is a feature character semantic library
  • the feature character semantic library includes a preset single-word voice and corresponding At least one of a preset number, a letter, and a symbol.
  • the step of matching the semantics corresponding to the single-word voice in the preset semantic database includes:
  • the step of matching the semantics corresponding to the single-word voice in the preset semantic database includes:
  • a semantic associated with the meaning of the word is selected from a plurality of semantics corresponding to the single-word speech.
  • the step of analyzing whether the two semantics corresponding to the two single-word speech before or after the single-word speech are words after combining include:
  • the first preset value is 1 second.
  • the step of analyzing whether the preset word sequence in the preset word database has the same semantics as the two semantics includes:
  • the invention also provides a semantic recognition device, comprising:
  • An acquiring unit configured to obtain voice information to be identified
  • a first identifying unit configured to identify a duration of each of the voice messages in the voice information
  • a comparing unit configured to compare a duration of each of the single-word voices with a first preset value
  • a second identifying unit configured to match the semantics corresponding to the single-word voice in the preset semantic database according to the comparison result.
  • the comparison result is that the duration of the single-word speech is less than the first preset value
  • the preset semantic database is a Chinese character semantic library, where the Chinese character semantic library includes preset single-word speech and corresponding presets. Chinese character;
  • the comparison result is that the duration of the single-word speech is not less than the first preset value
  • the preset semantic database is a feature character semantic library
  • the feature character semantic library includes preset single-word speech and corresponding presets. At least one of numbers, letters, and symbols.
  • a first monitoring unit configured to monitor whether the semantic corresponding to the single-word voice is successfully matched
  • the matching unit when the semantic matching corresponding to the single-word voice is unsuccessful, matches the semantic corresponding to the single-word voice in the Chinese character semantic library.
  • a first monitoring unit configured to monitor a quantity of semantics corresponding to the single word voice matching
  • a first analyzing unit when the semantics corresponding to the single-word speech is multiple, analyzing whether two semantics corresponding to two single-word speech before or after the single-word speech are words after combining;
  • a second analyzing unit configured to analyze the meaning of the words if they are combined into words
  • a selecting unit configured to select a semantic associated with the meaning of the word from the plurality of semantics corresponding to the single-word voice.
  • the first analyzing unit includes:
  • a combination subunit configured to combine two semantics corresponding to two single-word voices before or after the single-word speech
  • An analysis subunit configured to analyze whether there is a preset word in the preset word database that is identical to the two semantics
  • the determining subunit which is used in the preset word database to have the same preset words as the two semantics, determines the two semantics as words.
  • the first preset value is 1 second.
  • the determining subunit is further configured to determine that the two semantics corresponding to the two single-word voices are not words after the combination if the preset word database does not have the same preset words as the two semantics.
  • the semantic recognition method and device acquires the voice information to be recognized; identifies the duration of each single-word voice in the voice information; compares the duration of each of the single-word voices with the first preset value; According to the comparison result, the semantics corresponding to the single-word speech are matched in the preset semantic database; each single-word speech is recognized according to the duration of the single-word speech, so that the numbers, letters and symbols in the voice information are easily distinguished, and the recognition accuracy is increased.
  • FIG. 1 is a schematic diagram showing the steps of a semantic recognition method in an embodiment of the present invention.
  • FIG. 2 is a schematic diagram showing the steps of a semantic recognition method in another embodiment of the present invention.
  • FIG. 3 is a schematic diagram showing the steps of a semantic recognition method in still another embodiment of the present invention.
  • FIG. 4 is a schematic structural diagram of a semantic recognition apparatus according to an embodiment of the present invention.
  • FIG. 5 is a structural block diagram of a semantic recognition apparatus according to another embodiment of the present invention.
  • FIG. 6 is a structural block diagram of a semantic recognition apparatus in still another embodiment of the present invention.
  • FIG. 7 is a structural block diagram of a first analysis unit in an embodiment of the present invention.
  • FIG. 1 is a schematic diagram of steps of a voice recognition method according to an embodiment of the present invention.
  • a speech recognition method which includes the following steps:
  • Step S1 acquiring voice information to be identified
  • Step S2 identifying a duration of each single-word voice in the voice information
  • Step S3 comparing the duration of each of the single-word voices with a first preset value
  • step S4 according to the comparison result, the semantics corresponding to the single-word speech are matched in the preset semantic database.
  • a semantic recognition method is provided, which is used for identifying voice information that conforms to a preset rule.
  • the voice information conforming to the preset rule refers to the way of extending the tail sound to represent numbers, letters, and symbols so as to be distinguished from other Chinese characters.
  • the duration of Chinese characters is 0.2-0.4s (seconds).
  • the semantic recognition method in this embodiment can accurately identify the semantics corresponding to the voice information through the above steps.
  • the duration of each single voice ie, the length of the pronunciation
  • a first preset value (which may be 0.4 s) is set, and the length of the pronunciation is longer than the first preset.
  • the value is short, it is judged to be the pronunciation of the Chinese character.
  • the length of the pronunciation is not less than the first preset value, it is judged that it may be a pronunciation of a number, a letter or a symbol.
  • the first preset value may also be 1 s.
  • the semantics corresponding to the single-word speech may be matched in the preset semantic database according to the comparison result.
  • different recognition methods are adopted, which is convenient for accurately identifying each single-word voice, improving the accuracy of semantic recognition, and improving the recognition speed.
  • the preset semantic database may be a Chinese character semantic library and a semantic library of feature characters.
  • the preset semantic database is a Chinese character semantic library, the Chinese character semantic library comprising a preset single word voice and a corresponding preset Chinese character; wherein the comparison result is that the duration of the single word voice is not less than the first preset value, the preset semantic database
  • the preset semantic database is a feature character semantic library, and the feature character semantic library includes at least one of preset single-word speech and corresponding preset numbers, letters and symbols.
  • the step S4 of matching the semantics corresponding to the single-word voice in the preset semantic database includes:
  • Step S5 Listening to whether the semantics corresponding to the single-word voice match successfully
  • step S6 if it is unsuccessful, the semantics corresponding to the single-word speech are matched in the Chinese character semantic library.
  • the single-word speech cannot be matched in the semantic library of the feature character, it is determined to be misidentified, and the single-word speech is recognized in the Chinese character semantic library.
  • the step S4 of matching the semantics corresponding to the single-word voice in the preset semantic database includes:
  • Step S5a monitoring the number of semantics corresponding to the matching of the single-word voice
  • Step S6a if the semantics corresponding to the single-word speech is multiple, analyze whether two semantics corresponding to the two single-word speech before or after the single-word speech are words after combining;
  • Step S7 if yes, analyzing the meaning of the words
  • Step S8 selecting semantics associated with the meaning of the words from a plurality of semantics corresponding to the single-word speech.
  • the single-word voice “Yi” may be the number “1” or the letter “E” or "e” is easy to confuse and the recognition is unclear. Therefore, in this embodiment, it is determined whether the two semantics corresponding to the two single-word voices before or immediately following the single-word voice sound "Yi" are words after the combination, and if so, the meaning of the words is analyzed, the relevance Identifying the single word speech.
  • the semantics corresponding to the two single-word voices before or after the single-word speech are the Chinese characters "uppercase”, and according to the meaning, the single-word speech "Yi” is the corresponding semantic selection letter "E”; if the semantics of the two single-word speeches are The Chinese character “lowercase”, according to its meaning, the semantically selected letter “e” corresponding to the single-voice pronunciation "Yi”. If the semantics corresponding to the two single-word voices before or immediately after the pronunciation "Yi" are not "capital” or "lowercase,” the semantic selection number "1" corresponding to the single-word voice "Yi" is determined.
  • the step of analyzing whether the two semantics corresponding to the two single-word voices before or after the single-word speech are combined is a word, and specifically includes:
  • the semantic recognition method acquires voice information to be recognized; identifies the duration of each single-word voice in the voice information; and compares the duration of each of the single-word voices with the first The preset values are compared; according to the comparison result, the semantics corresponding to the single-word speech are matched in the preset semantic database; each single-word speech is recognized according to the duration of the single-word speech, and the numbers, letters, and symbols in the voice information are easily distinguished, and the recognition is added.
  • Accuracy effectively solve the problem of semantic input recognition of numbers, letters and symbols without display and keyboard devices, especially setting various types of passwords, and the semantic recognition method is simple, the recognition rate is high, and the recognition speed is fast.
  • an embodiment of the present invention further provides a semantic recognition apparatus, including:
  • the acquiring unit 10 is configured to obtain voice information to be identified, where the voice information is voice information that meets a preset rule;
  • a first identifying unit 20 configured to identify a duration of each single-word voice in the voice information
  • the comparing unit 30 is configured to compare the duration of each of the single-word voices with a first preset value
  • the second identifying unit 40 is configured to match the semantics corresponding to the single-word voice in the preset semantic database according to the comparison result.
  • a semantic recognition apparatus for identifying voice information conforming to a preset rule.
  • the voice information conforming to the preset rule refers to the way of extending the tail sound to represent numbers, letters, and symbols so as to be distinguished from other Chinese characters.
  • the duration of Chinese characters is 0.2-0.4s (seconds).
  • the voice recognition device in this embodiment can accurately recognize the semantics corresponding to the voice information through the above module.
  • the first identifying unit 20 identifies the duration of each of the single-word voices (ie, the length of the pronunciation), and sets a first preset value (which may be 0.4 s).
  • the comparison unit 30 compares the length of the pronunciation of each of the single-word speech with the first preset value; when the length of the pronunciation is shorter than the first preset value, it determines that it is the pronunciation of the Chinese character, and the length of the pronunciation is not less than the first When a preset value is reached, it is judged that it may be a pronunciation of a number, a letter or a symbol.
  • the first preset value may also be 1 s.
  • the second identifying unit 40 may match the semantics corresponding to the single-word speech in the preset semantic database according to the comparison result.
  • different recognition methods are adopted, which is convenient for accurately identifying each single-word voice, improving the accuracy of semantic recognition, and improving the recognition speed.
  • a Chinese character speech library and a speech library of feature characters are provided.
  • the second recognition unit 40 matches the semantics of the single-word speech, if the comparison result of the comparison unit 30 is that the duration of the single-word speech is smaller than the first pre-prediction Setting a value
  • the preset semantic database is a Chinese character semantic library, and the Chinese character semantic library includes a preset single word voice and a corresponding preset Chinese character;
  • the preset semantic database is a feature character semantic library, and the feature character semantic library includes preset single-word speech and corresponding At least one of a preset number, a letter, and a symbol.
  • the semantic recognition apparatus further includes:
  • the first monitoring unit 50 is configured to monitor whether the semantics corresponding to the single-word voice match successfully
  • the matching unit 60 is configured to match the semantics corresponding to the single-word speech in the Chinese character semantic library when the semantic matching corresponding to the single-word speech is unsuccessful.
  • the single-word speech cannot be matched in the semantic library of the feature character, it is determined to be misidentified, and the single-word speech is recognized in the Chinese character semantic library.
  • the semantic recognition apparatus further includes:
  • the first monitoring unit 50a is configured to monitor the number of semantics corresponding to the single-word voice matching
  • the first analyzing unit 60a is configured to: when the semantics corresponding to the single-word speech are multiple, analyze whether two semantics corresponding to the two single-word voices before or after the single-word speech are words after combining;
  • a second analyzing unit 70 configured to analyze the meaning of the words if they are combined into words
  • the selecting unit 80 is configured to select a semantic associated with the meaning of the word from the plurality of semantics corresponding to the single-word voice.
  • the first monitoring unit 50a monitors the number of semantics corresponding to the single-word voice in the semantic library of the feature characters, for example, the single-word voice “Yi”, which may be the number “1”, may also It is the letter "E” or "e”, which is confusing and unclear. Therefore, in this embodiment, the first monitoring unit 50a monitors the number of semantics corresponding to the single-word speech, and when the semantics of the single-word speech are multiple, the first analyzing unit 60a analyzes the single-word speech or The two semantics corresponding to the two single-word speeches are words after combination; if so, the second analyzing unit 70 analyzes the meaning of the words, and the selecting unit 80 selects and selects multiple semantics corresponding to the single-word speech.
  • the first analyzing unit 60a analyzes that the semantics corresponding to the two single-word voices are Chinese characters "uppercase”, the second analyzing unit 70 can analyze the meaning of the Chinese characters “uppercase”, and the selecting unit 80 selects the corresponding one according to the meaning of "capital”. Semantic letter "E”; if the first analyzing unit 60a analyzes the semantics corresponding to the two single-word speech as the Chinese character "lowercase”, the second analyzing unit 70 can analyze the meaning of the Chinese character "lowercase”, and the selecting unit 80 is based on the “lowercase” The meaning selection selects the corresponding semantic letter "e”.
  • the selection unit 80 selects the semantic selection number "1" corresponding to the single-word speech "Yi".
  • the first analyzing unit 60a includes:
  • combination subunit 601 configured to combine two semantics corresponding to two single-word voices before or after the single-word speech
  • the analyzing sub-unit 602 is configured to analyze whether there are preset words in the preset word database that are identical to the two semantics;
  • the determining sub-unit 603 is configured to preset the preset words in the word database with the same semantics as the two semantics, and then determine the two semantics as words. If not, it is determined that the two semantics corresponding to the two single-word speech are not words after combination.
  • the method and device for semantic recognition acquires voice information to be recognized; identifies the duration of each single-word voice in the voice information; and duration of each single-word voice Comparing with the first preset value; matching the semantics corresponding to the single-word speech in the preset semantic database according to the comparison result; identifying each single-word voice according to the duration of the single-word speech, so as to distinguish the numbers, letters, and symbols in the voice information , increase recognition accuracy.
  • each block of the block diagrams and/or block diagrams and/or flow diagrams and combinations of blocks in the block diagrams and/or block diagrams and/or flow diagrams can be implemented by computer program instructions. .
  • these computer program instructions can be implemented by a general purpose computer, a professional computer, or a processor of other programmable data processing methods, such that the processor is executed by a computer or other programmable data processing method.
  • steps, measures, and solutions in the various operations, methods, and processes that have been discussed in the present invention may be alternated, changed, combined, or deleted. Further, other steps, measures, and schemes of the various operations, methods, and processes that have been discussed in the present invention may be alternated, modified, rearranged, decomposed, combined, or deleted. Further, the steps, measures, and solutions in the prior art having various operations, methods, and processes disclosed in the present invention may also be alternated, changed, rearranged, decomposed, combined, or deleted.

Landscapes

  • Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种语义识别方法及装置,包括获取待识别的语音信息(S1);识别所述语音信息中的每个单字语音的时长(S2);将每个所述单字语音的时长与第一预设值进行对比(S3);根据对比结果,在预设语义数据库中匹配所述单字语音对应的语义(S4);根据单字语音的时长识别每个单字语音,便于区分语音信息中的数字、字母以及符号,增加识别准确度。

Description

语义识别方法及装置 技术领域
本发明涉及语音识别技术领域,特别涉及一种语义识别方法及装置。
背景技术
语音识别的应用越来越广泛,不仅在语音输入法领域得到使用,在翻译机、人工智能等应用上也日益广泛。中文汉字不算声调有400多个发音,同音不同字的情况大量存在。通常文字的内容识别可根据词语组合或上下文的意思来推理联想以提高准确率,但当识别毫无规律的数字、符号、字母时误识别率较高,常常需要人工纠错,而在无显示屏和无键盘输入装置时比较难以进行纠错操作,将会降低语音识别的准确率和识别效率。
技术问题
本发明的主要目的为提供一种语义识别方法及装置,提升语义识别准确率。
技术解决方案
本发明提出一种语义识别方法,包括以下步骤:
获取待识别的语音信息;
识别所述语音信息中的每个单字语音的时长;
将每个所述单字语音的时长与第一预设值进行对比;
根据对比结果,在预设语义数据库中匹配所述单字语音对应的语义。
进一步地,所述对比结果为所述单字语音的时长小于所述第一预设值,所述预设语义数据库则为汉字语义库,所述汉字语义库包括预设单字语音及对应的预设汉字;
进一步地,所述对比结果为所述单字语音的时长不小于所述第一预设值,所述预设语义数据库则为特征字符语义库,所述特征字符语义库包括预设单字语音及对应的预设数字、字母以及符号中至少一种。
进一步地,所述在预设语义数据库中匹配所述单字语音对应的语义的步骤之后包括:
监听所述单字语音对应的语义是否匹配成功;
若不成功,则在所述汉字语义库中匹配所述单字语音对应的语义。
进一步地,所述在预设语义数据库中匹配所述单字语音对应的语义的步骤之后包括:
监测所述单字语音对应匹配的语义的数量;
若所述单字语音对应的语义为多个,则分析所述单字语音之前或之后的两个单字语音对应的两个语义在组合后是否为词语;
若是,则分析所述词语的意思;
从所述单字语音对应的多个语义中选择与所述词语的意思相关联的语义。
进一步地,分析所述单字语音之前或之后的两个单字语音对应的两个语义在组合后是否为词语的步骤包括:
将所述单字语音之前或之后的两个单字语音对应的两个语义进行组合;
分析预设的词语数据库中是否有与所述两个语义相同的预设词语;
若有,则将所述两个语义判定为词语。
进一步地,所述第一预设值为1秒。
进一步地,所述分析预设的词语数据库中是否有与所述两个语义相同的预设词语步骤之后包括:
若没有,则判定两个单字语音对应的两个语义在组合后不是词语。
本发明还提供了一种语义识别装置,包括:
获取单元,用于获取待识别的语音信息;
第一识别单元,用于识别所述语音信息中的每个单字语音的时长;
对比单元,用于将每个所述单字语音的时长与第一预设值进行对比;
第二识别单元,用于根据对比结果,在预设语义数据库中匹配所 述单字语音对应的语义。
进一步地,所述对比结果为所述单字语音的时长小于所述第一预设值,所述预设语义数据库则为汉字语义库,所述汉字语义库包括预设单字语音及对应的预设汉字;
所述对比结果为所述单字语音的时长不小于所述第一预设值,所述预设语义数据库则为特征字符语义库,所述特征字符语义库包括预设单字语音及对应的预设数字、字母以及符号中至少一种。
进一步地,还包括:
第一监听单元,用于监听所述单字语音对应的语义是否匹配成功;
匹配单元,用于所述单字语音对应的语义匹配不成功时,则在所述汉字语义库中匹配所述单字语音对应的语义。
进一步地,还包括:
第一监测单元,用于监测所述单字语音对应匹配的语义的数量;
第一分析单元,用于所述单字语音对应的语义为多个时,则分析所述单字语音之前或之后的两个单字语音对应的两个语义在组合后是否为词语;
第二分析单元,用于若组合后为词语,则分析所述词语的意思;
选择单元,用于从所述单字语音对应的多个语义中选择与所述词语的意思相关联的语义。
进一步地,所述第一分析单元包括:
组合子单元,用于将所述单字语音之前或之后的两个单字语音对应的两个语义进行组合;
分析子单元,用于分析预设的词语数据库中是否有与所述两个语义相同的预设词语;
判定子单元,用于预设的词语数据库中具有与所述两个语义相同的预设词语,则将所述两个语义判定为词语。
进一步地,所述第一预设值为1秒。
进一步地,所述判定子单元还用于若预设的词语数据库中不具有与所述两个语义相同的预设词语,则判定两个单字语音对应的两个语义在组合后不是词语。本发明中提供的语义识别方法及装置,具有以下有 益效果:
有益效果
本发明中提供的语义识别方法及装置,获取待识别的语音信息;识别所述语音信息中的每个单字语音的时长;将每个所述单字语音的时长与第一预设值进行对比;根据对比结果,在预设语义数据库中匹配所述单字语音对应的语义;根据单字语音的时长识别每个单字语音,便于区分语音信息中的数字、字母以及符号,增加识别准确度。
附图说明
图1是本发明一实施例中的语义识别方法步骤示意图;
图2是本发明另一实施例中的语义识别方法步骤示意图;
图3是本发明又一实施例中的语义识别方法步骤示意图;
图4是本发明一实施例中的语义识别装置结构示意图;
图5是本发明另一实施例中的语义识别装置结构框图;
图6是本发明又一实施例中的语义识别装置结构框图;
图7是本发明一实施例中的第一分析单元结构框图。
本发明目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。
本发明的最佳实施方式
应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。
本技术领域技术人员可以理解,除非特意声明,这里使用的单数形式“一”、“一个”、“所述”“上述”和“该”也可包括复数形式。应该进一步理解的是,本发明的说明书中使用的措辞“包括”是指存在所述特征、整数、步骤、操作、元件、单元、模块和/或组件,但是并不排除存在或添加一个或多个其他特征、整数、步骤、操作、元件、单元、模块、组件和/或它们的组。应该理解,当我们称元件被“连接”或“耦接”到另一元件时,它可以直接连接或耦接到其他元件,或者也可以存在中间元件。此外,这里使用的“连接”或“耦接” 可以包括无线连接或无线耦接。这里使用的措辞“和/或”包括一个或更多个相关联的列出项的全部或任一单元和全部组合。
本技术领域技术人员可以理解,除非另外定义,这里使用的所有术语(包括技术术语和科学术语),具有与本发明所属领域中的普通技术人员的一般理解相同的意义。还应该理解的是,诸如通用字典中定义的那些术语,应该被理解为具有与现有技术的上下文中的意义一致的意义,并且除非像这里一样被特定定义,否则不会用理想化或过于正式的含义来解释。
参照图1,为本发明一实施例中语音识别方法的步骤示意图。
本发明一实施例中提出一种语音识别方法,包括以下步骤:
步骤S1,获取待识别的语音信息;
步骤S2,识别所述语音信息中的每个单字语音的时长;
步骤S3,将每个所述单字语音的时长与第一预设值进行对比;
步骤S4,根据对比结果,在预设语义数据库中匹配所述单字语音对应的语义。
在对语音信息进行语义识别时,通常会遇到混淆识别的情况,例如,语音“Yi”可能识别为汉字“一”,也可能识别为数字“1”,还可能识别为字母“E”,因此,发音相同时,容易识别不清楚,降低识别准确率。本实施例中提供一种语义识别方法,其针对于符合预设规则的语音信息进行识别。符合预设规则的语音信息指的是拉长尾音的方式来代表数字、字母、符号以便于区别于其它汉字。通常汉字语音的时长为0.2-0.4s(秒),将数字、字母、符号的发音延长至1s时,则可以将数字、字母、符号明显区别于汉字。数字、字母、符号中容易混淆的发音中,还可以采用在其前面或者后面添加汉字语音发音来进行区别,例如添加数字、大写、小写的汉字发音等。本实施例中的语义识别方法通过上述步骤可可准确识别出语音信息对应的语义。
具体地,获取到待识别的语音信息时,识别出其中每一个单字语音的时长(即发音时长),设定一个第一预设值(可以是0.4s),当发音时长比该第一预设值短时,则判断其为汉字的发音,当发音时长不小于该第一预设值时,则判断其可能为数字、字母或符号的发音。优 选地,第一预设值还可以为1s。
将单字语音的时长与第一预设值对比之后,则可以根据对比结果,在预设语义数据库中匹配所述单字语音对应的语义。针对不同类型发音,采用不同的识别方式,便于准确识别出每一个单字语音,提升语义识别的准确率,以及提升识别速度。
在本实施例中,预设语义数据库可以是汉字语义库以及特征字符的语义库,对比结果为所述单字语音的时长小于所述第一预设值时,所述预设语义数据库则为所述汉字语义库,所述汉字语义库包括预设单字语音及对应的预设汉字;所述对比结果为所述单字语音的时长不小于所述第一预设值时,所述预设语义数据库则为特征字符语义库,所述特征字符语义库包括预设单字语音及对应的预设数字、字母以及符号中至少一种。
参照图2,在另一实施例中,所述在预设语义数据库中匹配所述单字语音对应的语义的步骤S4之后包括:
步骤S5,监听所述单字语音对应的语义是否匹配成功;
步骤S6,若不成功,则在所述汉字语义库中匹配所述单字语音对应的语义。
若在特征字符的语义库中匹配不出所述单字语音,判断为误识别,进而在汉字语义库中识别该单字语音。
参照图3,在又一实施例中,所述在预设语义数据库中匹配所述单字语音对应的语义的步骤S4之后包括:
步骤S5a,监测所述单字语音对应匹配的语义的数量;
步骤S6a,若所述单字语音对应的语义为多个,则分析所述单字语音之前或之后的两个单字语音对应的两个语义在组合后是否为词语;
步骤S7,若是,则分析所述词语的意思;
步骤S8,从所述单字语音对应的多个语义中选择与所述词语的意思相关联的语义。
在本实施例中,若在特征字符的语义库中匹配所述单字语音对应的语义数量为多个,例如单字语音“Yi”,其可能是数字“1”,也可 能是字母“E”或“e”,容易混淆,识别不清楚。因此,本实施例中,判断该单字语音音“Yi”之前或者紧跟在后的两个单字语音对应的两个语义在组合后是否为词语,若是,则分析所述词语的意思,关联性识别所述单字语音。例如,单字语音之前或之后的两个单字语音对应的语义为汉字“大写”,根据其意思则该单字语音“Yi”为对应的语义选择字母“E”;若两个单字语音对应的语义为汉字“小写”,根据其意思则该单字语音发音“Yi”对应的语义选择字母“e”。若该发音“Yi”之前或者紧跟在后的两个单字语音对应的语义不是“大写”或“小写,则判定该单字语音“Yi”对应的语义选择数字“1”。
具体地,上述分析所述单字语音之前或之后的两个单字语音对应的两个语义在组合后是否为词语的步骤,具体包括:
将所述单字语音之前或之后的两个单字语音对应的两个语义进行组合;分析预设的词语数据库中是否有与所述两个语义相同的预设词语;若有,则将所述两个语义判定为词语;若没有,则判定两个单字语音对应的两个语义在组合后不是词语。
综上所述,为本发明实施例中提供的语义识别方法,获取待识别的语音信息;识别所述语音信息中的每个单字语音的时长;将每个所述单字语音的时长与第一预设值进行对比;根据对比结果,在预设语义数据库中匹配所述单字语音对应的语义;根据单字语音的时长识别每个单字语音,便于区分语音信息中的数字、字母以及符号,增加识别准确度;有效地解决无显示屏、无键盘设备的数字、字母、符号的语义输入识别问题,特别是设置各类密码,且语义识别方法简单、识别率高、识别速度快。
参照图4,本发明一实施例中还提供了一种语义识别装置,包括:
获取单元10,用于获取待识别的语音信息,所述语音信息为符合预设规则的语音信息;
第一识别单元20,用于识别所述语音信息中的每个单字语音的时长;
对比单元30,用于将每个所述单字语音的时长与第一预设值进行对比;
第二识别单元40,用于根据对比结果,在预设语义数据库中匹配所述单字语音对应的语义。
在对语音信息进行语义识别时,通常会遇到混淆识别的情况,例如,语音“Yi”可能识别为汉字“一”,也可能识别为数字“1”,还可能识别为字母“E”,因此,发音相同时,容易识别不清楚,降低识别准确率。本实施例中提供一种语义识别装置,其针对于符合预设规则的语音信息进行识别。符合预设规则的语音信息指的是拉长尾音的方式来代表数字、字母、符号以便于区别于其它汉字。通常汉字语音的时长为0.2-0.4s(秒),将数字、字母、符号的发音延长至1s时,则可以将数字、字母、符号明显区别于汉字。数字、字母、符号中容易混淆的发音中,还可以采用在其前面或者后面添加汉字语音发音来进行区别,例如添加数字、大写、小写的汉字发音等。本实施例中的语音识别装置通过上述模块可准确识别出语音信息对应的语义。
具体地,获取单元10获取到待识别的语音信息时,第一识别单元20识别出其中每一个单字语音的时长(即发音时长),设定一个第一预设值(可以是0.4s),对比单元30将每个所述单字语音的发音时长与第一预设值进行对比;当发音时长比该第一预设值短时,则判断其为汉字的发音,当发音时长不小于该第一预设值时,则判断其可能为数字、字母或符号的发音。优选地,第一预设值还可以为1s。
对比单元30将单字语音的时长与第一预设值对比之后,第二识别单元40则可以根据对比结果,在预设语义数据库中匹配所述单字语音对应的语义。针对不同类型发音,采用不同的识别方式,便于准确识别出每一个单字语音,提升语义识别的准确率,以及提升识别速度。
在一实施例中,设置有汉字语音库以及特征字符的语音库,第二识别单元40匹配单字语音的语义时,若对比单元30的对比结果为所述单字语音的时长小于所述第一预设值,所述预设语义数据库则为汉字语义库,所述汉字语义库包括预设单字语音及对应的预设汉字;
若对比单元30的对比结果为所述单字语音的时长不小于所述第一预设值,所述预设语义数据库则为特征字符语义库,所述特征字符 语义库包括预设单字语音及对应的预设数字、字母以及符号中至少一种。
参照图5,在另一实施例中,所述语义识别装置还包括:
第一监听单元50,用于监听所述单字语音对应的语义是否匹配成功;
匹配单元60,用于所述单字语音对应的语义匹配不成功时,则在所述汉字语义库中匹配所述单字语音对应的语义。
若在特征字符的语义库中匹配不出所述单字语音,判断为误识别,进而在汉字语义库中识别该单字语音。
参照图6,在又一实施例中,所述语义识别装置还包括:
第一监测单元50a,用于监测所述单字语音对应匹配的语义的数量;
第一分析单元60a,用于所述单字语音对应的语义为多个时,则分析所述单字语音之前或之后的两个单字语音对应的两个语义在组合后是否为词语;
第二分析单元70,用于若组合后为词语,则分析所述词语的意思;
选择单元80,用于从所述单字语音对应的多个语义中选择与所述词语的意思相关联的语义。
在本实施例中,若第一监测单元50a监听在特征字符的语义库中匹配所述单字语音对应的语义数量为多个,例如单字语音“Yi”,其可能是数字“1”,也可能是字母“E”或“e”,容易混淆,识别不清楚。因此,本实施例中,第一监测单元50a监听所述单字语音对应匹配的语义的数量,当所述单字语音对应的语义为多个时,第一分析单元60a则分析所述单字语音之前或之后的两个单字语音对应的两个语义在组合后是否为词语;若是,第二分析单元70则分析所述词语的意思,选择单元80再从所述单字语音对应的多个语义中选择与所述词语的意思相关联的语义。例如,第一分析单元60a分析两个单字语音对应的语义为汉字“大写”,第二分析单元70可分析其汉字“大写”的意思,选择单元80则根据“大写”的意思选择选择对应的语 义字母“E”;若第一分析单元60a分析两个单字语音对应的语义为汉字“小写”,第二分析单元70可分析其汉字“小写”的意思,选择单元80则根据“小写”的意思选择选择对应的语义字母“e”。若第二分析单元70分析两个单字语音对应的语义意思不是汉字“大写”或“小写的语义,选择单元80则选择该单字语音“Yi”对应的语义选择数字“1”。
参照图7,所述第一分析单元60a包括:
组合子单元601,用于将所述单字语音之前或之后的两个单字语音对应的两个语义进行组合;
分析子单元602,用于分析预设的词语数据库中是否有与所述两个语义相同的预设词语;
判定子单元603,用于预设的词语数据库中具有与所述两个语义相同的预设词语,则将所述两个语义判定为词语。若没有,则判定两个单字语音对应的两个语义在组合后不是词语。
综上所述,为本发明实施例中提供的的语义识别方法及装置,获取待识别的语音信息;识别所述语音信息中的每个单字语音的时长;将每个所述单字语音的时长与第一预设值进行对比;根据对比结果,在预设语义数据库中匹配所述单字语音对应的语义;根据单字语音的时长识别每个单字语音,便于区分语音信息中的数字、字母以及符号,增加识别准确度。
本技术领域技术人员可以理解,可以用计算机程序指令来实现这些结构图和/或框图和/或流图中的每个框以及这些结构图和/或框图和/或流图中的框的组合。本技术领域技术人员可以理解,可以将这些计算机程序指令提供给通用计算机、专业计算机或其他可编程数据处理方法的处理器来实现,从而通过计算机或其他可编程数据处理方法的处理器来执行本发明公开的结构图和/或框图和/或流图的框或多个框中指定的方案。
本技术领域技术人员可以理解,本发明中已经讨论过的各种操作、方法、流程中的步骤、措施、方案可以被交替、更改、组合或删除。 进一步地,具有本发明中已经讨论过的各种操作、方法、流程中的其他步骤、措施、方案也可以被交替、更改、重排、分解、组合或删除。进一步地,现有技术中的具有与本发明中公开的各种操作、方法、流程中的步骤、措施、方案也可以被交替、更改、重排、分解、组合或删除。
以上所述仅为本发明的优选实施例,并非因此限制本发明的专利范围,凡是利用本发明说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本发明的专利保护范围内。

Claims (16)

  1. 一种语义识别方法,其特征在于,包括以下步骤:
    获取待识别的语音信息;
    识别所述语音信息中的每个单字语音的时长;
    将每个所述单字语音的时长与第一预设值进行对比;
    根据对比结果,在预设语义数据库中匹配所述单字语音对应的语义。
  2. 根据权利要求1所述的语义识别方法,其特征在于,所述对比结果为所述单字语音的时长小于所述第一预设值时,所述预设语义数据库则为汉字语义库,所述汉字语义库包括预设单字语音及对应的预设汉字;
  3. 根据权利要求2所述的语义识别方法,其特征在于,所述对比结果为所述单字语音的时长不小于所述第一预设值时,所述预设语义数据库则为特征字符语义库,所述特征字符语义库包括预设单字语音及对应的预设数字、字母以及符号中至少一种。
  4. 根据权利要求3所述的语义识别方法,其特征在于,所述在预设语义数据库中匹配所述单字语音对应的语义的步骤之后包括:
    监听所述单字语音对应的语义是否匹配成功;
    若不成功,则在所述汉字语义库中匹配所述单字语音对应的语义。
  5. 根据权利要求3所述的语义识别方法,其特征在于,所述在预设语义数据库中匹配所述单字语音对应的语义的步骤之后包括:
    监测所述单字语音对应匹配的语义的数量;
    若所述单字语音对应的语义为多个,则分析所述单字语音之前或之后的两个单字语音对应的两个语义在组合后是否为词语;
    若是,则分析所述词语的意思;
    从所述单字语音对应的多个语义中选择与所述词语的意思相关联的语义。
  6. 根据权利要求5所述的语义识别方法,其特征在于,分析所述单字语音之前或之后的两个单字语音对应的两个语义在组合后是否为词语的步骤包括:
    将所述单字语音之前或之后的两个单字语音对应的两个语义进行组合;
    分析预设的词语数据库中是否有与所述两个语义相同的预设词语;
    若有,则将所述两个语义判定为词语。
  7. 根据权利要求1所述的语义识别方法,其特征在于,所述第一预设值为1秒。
  8. 根据权利要求1所述的语义识别方法,其特征在于,所述分析预设的词语数据库中是否有与所述两个语义相同的预设词语步骤之后包括:
    若没有,则判定两个单字语音对应的两个语义在组合后不是词语。
  9. 一种语义识别装置,其特征在于,包括:
    获取单元,用于获取待识别的语音信息;
    第一识别单元,用于识别所述语音信息中的每个单字语音的时长;
    对比单元,用于将每个所述单字语音的时长与第一预设值进行对比;
    第二识别单元,用于根据对比结果,在预设语义数据库中匹配所述单字语音对应的语义。
  10. 根据权利要求9所述的语义识别装置,其特征在于,所述对比结果为所述单字语音的时长小于所述第一预设值,所述预设语义数 据库则为汉字语义库,所述汉字语义库包括预设单字语音及对应的预设汉字。
  11. 根据权利要求10所述的语义识别装置,其特征在于,所述对比结果为所述单字语音的时长不小于所述第一预设值,所述预设语义数据库则为特征字符语义库,所述特征字符语义库包括预设单字语音及对应的预设数字、字母以及符号中至少一种。
  12. 根据权利要求10所述的语义识别装置,其特征在于,还包括:
    第一监听单元,用于监听所述单字语音对应的语义是否匹配成功;
    匹配单元,用于所述单字语音对应的语义匹配不成功时,则在所述汉字语义库中匹配所述单字语音对应的语义。
  13. 根据权利要求10所述的语义识别装置,其特征在于,还包括:
    第一监测单元,用于监测所述单字语音对应匹配的语义的数量;
    第一分析单元,用于所述单字语音对应的语义为多个时,则分析所述单字语音之前或之后的两个单字语音对应的两个语义在组合后是否为词语;
    第二分析单元,用于若组合后为词语,则分析所述词语的意思;
    选择单元,用于从所述单字语音对应的多个语义中选择与所述词语的意思相关联的语义。
  14. 根据权利要求13所述的语义识别装置,其特征在于,所述第一分析单元包括:
    组合子单元,用于将所述单字语音之前或之后的两个单字语音对应的两个语义进行组合;
    分析子单元,用于分析预设的词语数据库中是否有与所述两个语义相同的预设词语;
    判定子单元,用于预设的词语数据库中具有与所述两个语义相同的预设词语,则将所述两个语义判定为词语。
  15. 根据权利要求1所述的语义识别方法,其特征在于,所述第一预设值为1秒。
  16. 根据权利要求14所述的语义识别方法,其特征在于,所述判定子单元还用于若预设的词语数据库中不具有与所述两个语义相同的预设词语,则判定两个单字语音对应的两个语义在组合后不是词语。
PCT/CN2018/072008 2017-12-21 2018-01-09 语义识别方法及装置 WO2019119553A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201711397017.5 2017-12-21
CN201711397017.5A CN108133706B (zh) 2017-12-21 2017-12-21 语义识别方法及装置

Publications (1)

Publication Number Publication Date
WO2019119553A1 true WO2019119553A1 (zh) 2019-06-27

Family

ID=62391316

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/072008 WO2019119553A1 (zh) 2017-12-21 2018-01-09 语义识别方法及装置

Country Status (2)

Country Link
CN (1) CN108133706B (zh)
WO (1) WO2019119553A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108133706B (zh) * 2017-12-21 2020-10-27 深圳市沃特沃德股份有限公司 语义识别方法及装置

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040199389A1 (en) * 2001-08-13 2004-10-07 Hans Geiger Method and device for recognising a phonetic sound sequence or character sequence
CN1750121A (zh) * 2004-09-16 2006-03-22 北京中科信利技术有限公司 一种基于语音识别及语音分析的发音评估方法
CN201323053Y (zh) * 2008-12-02 2009-10-07 无敌科技(西安)有限公司 自动分割单字语音信号的装置
CN102237088A (zh) * 2011-06-17 2011-11-09 盛乐信息技术(上海)有限公司 语音识别多信息文本获取装置及方法
CN103559880A (zh) * 2013-11-08 2014-02-05 百度在线网络技术(北京)有限公司 语音输入系统和方法
CN108133706A (zh) * 2017-12-21 2018-06-08 深圳市沃特沃德股份有限公司 语义识别方法及装置

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2947143B2 (ja) * 1995-10-16 1999-09-13 ソニー株式会社 音声認識装置及びナビゲーション装置
CN1137449C (zh) * 1997-09-19 2004-02-04 国际商业机器公司 在中文语音识别系统中识别字母/数字串的方法
US7689404B2 (en) * 2004-02-24 2010-03-30 Arkady Khasin Method of multilingual speech recognition by reduction to single-language recognizer engine components
CN1674092B (zh) * 2004-03-26 2010-06-09 松下电器产业株式会社 连续数字识别的声韵母跨词建模、解码方法及系统
CN1889171B (zh) * 2005-06-29 2010-09-01 诺基亚(中国)投资有限公司 用于识别字符/字符串的语音识别方法和系统
CN101436404A (zh) * 2007-11-16 2009-05-20 鹏智科技(深圳)有限公司 可会话的类生物装置及其会话方法
KR101493552B1 (ko) * 2008-05-14 2015-02-13 닛토보 온쿄 엔지니어링 가부시키가이샤 신호판정방법, 신호판정장치, 프로그램, 신호판정시스템
US20090326945A1 (en) * 2008-06-26 2009-12-31 Nokia Corporation Methods, apparatuses, and computer program products for providing a mixed language entry speech dictation system
CN101794576A (zh) * 2010-02-02 2010-08-04 重庆大学 一种脏话检测助手及其使用方法
CN103943109A (zh) * 2014-04-28 2014-07-23 深圳如果技术有限公司 一种将语音转换为文字的方法及装置
CN105741832B (zh) * 2016-01-27 2020-01-07 广东外语外贸大学 一种基于深度学习的口语评测方法和系统
CN107305768B (zh) * 2016-04-20 2020-06-12 上海交通大学 语音交互中的易错字校准方法
CN107195300B (zh) * 2017-05-15 2019-03-19 珠海格力电器股份有限公司 语音控制方法和系统
CN107423275A (zh) * 2017-06-27 2017-12-01 北京小度信息科技有限公司 订单信息生成方法及装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040199389A1 (en) * 2001-08-13 2004-10-07 Hans Geiger Method and device for recognising a phonetic sound sequence or character sequence
CN1750121A (zh) * 2004-09-16 2006-03-22 北京中科信利技术有限公司 一种基于语音识别及语音分析的发音评估方法
CN201323053Y (zh) * 2008-12-02 2009-10-07 无敌科技(西安)有限公司 自动分割单字语音信号的装置
CN102237088A (zh) * 2011-06-17 2011-11-09 盛乐信息技术(上海)有限公司 语音识别多信息文本获取装置及方法
CN103559880A (zh) * 2013-11-08 2014-02-05 百度在线网络技术(北京)有限公司 语音输入系统和方法
CN108133706A (zh) * 2017-12-21 2018-06-08 深圳市沃特沃德股份有限公司 语义识别方法及装置

Also Published As

Publication number Publication date
CN108133706A (zh) 2018-06-08
CN108133706B (zh) 2020-10-27

Similar Documents

Publication Publication Date Title
CN108962282B (zh) 语音检测分析方法、装置、计算机设备及存储介质
CN108847241B (zh) 将会议语音识别为文本的方法、电子设备及存储介质
CN105931644B (zh) 一种语音识别方法及移动终端
CN107798052B (zh) 词典更新装置及词典更新方法
CN107578770B (zh) 网络电话语音识别方法、装置、计算机设备和存储介质
US6763331B2 (en) Sentence recognition apparatus, sentence recognition method, program, and medium
CN112951275B (zh) 语音质检方法、装置、电子设备及介质
WO2005101235A1 (ja) 対話支援装置
KR20070001020A (ko) 문자 규정 방법 및 문자 선택 장치
CN103578471A (zh) 语音辨识方法及其电子装置
TW201606750A (zh) 使用外國字文法的語音辨識
US11373638B2 (en) Presentation assistance device for calling attention to words that are forbidden to speak
TW201337911A (zh) 電子裝置以及語音識別方法
US9805740B2 (en) Language analysis based on word-selection, and language analysis apparatus
WO2019119553A1 (zh) 语义识别方法及装置
US10380998B2 (en) Voice and textual interface for closed-domain environment
US9978368B2 (en) Information providing system
CN115881108A (zh) 语音识别方法、装置、设备及存储介质
JP4220151B2 (ja) 音声対話装置
US20200211533A1 (en) Processing method, device and electronic apparatus
JP2010197709A (ja) 音声認識応答方法、音声認識応答システム、及びそのプログラム
WO2013035293A1 (ja) 音声認識装置
CN110827800A (zh) 基于语音的性别识别方法及其装置、存储介质和设备
JP6538399B2 (ja) 音声処理装置、音声処理方法およびプログラム
CN109658933A (zh) 一种语音识别解锁方法、移动终端及存储器

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18890925

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18890925

Country of ref document: EP

Kind code of ref document: A1