WO2020024352A1 - 语音识别中符号添加方法、装置、计算机设备及存储介质 - Google Patents

语音识别中符号添加方法、装置、计算机设备及存储介质 Download PDF

Info

Publication number
WO2020024352A1
WO2020024352A1 PCT/CN2018/104046 CN2018104046W WO2020024352A1 WO 2020024352 A1 WO2020024352 A1 WO 2020024352A1 CN 2018104046 W CN2018104046 W CN 2018104046W WO 2020024352 A1 WO2020024352 A1 WO 2020024352A1
Authority
WO
WIPO (PCT)
Prior art keywords
duration
comma
period
mute segment
segment
Prior art date
Application number
PCT/CN2018/104046
Other languages
English (en)
French (fr)
Inventor
彭捷
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020024352A1 publication Critical patent/WO2020024352A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • G10L15/05Word boundary detection

Definitions

  • the present application relates to the field of speech recognition, and in particular, to a method, device, computer equipment, and storage medium for adding symbols in speech recognition.
  • a detection module configured to obtain speech to be recognized, perform speech recognition on the speech to be recognized, and synchronously detect a silent segment in the speech to be identified, and determine whether the duration of the silent segment exceeds a first duration
  • An output module configured to output a text sequence before the mute segment when the duration of the mute segment exceeds the first duration, and according to the duration of the mute segment, correspond to the mute in the text sequence Insert a comma or period at the position of the paragraph;
  • a correction module is configured to obtain the speech to be recognized after the mute segment and perform speech recognition on it, and at the same time, correct the comma or period inserted into the text sequence according to a preset discrimination model.
  • a computer device includes a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor.
  • the processor executes the computer-readable instructions, the following steps are implemented:
  • One or more non-volatile readable storage media storing computer-readable instructions, which when executed by one or more processors, cause the one or more processors to perform the following steps:
  • FIG. 1 is a schematic diagram of an application environment of a method for adding a symbol in speech recognition according to an embodiment of the present application
  • FIG. 2 is a flowchart of a method for adding symbols in speech recognition according to an embodiment of the present application
  • step S20 is a flowchart of step S20 of a method for adding a symbol in speech recognition according to an embodiment of the present application
  • step S20 is a flowchart of step S20 of a method for adding a symbol in speech recognition according to another embodiment of the present application.
  • step S30 is a flowchart of step S30 of a method for adding a symbol in speech recognition according to an embodiment of the present application
  • step S303 of a method for adding a symbol in speech recognition according to an embodiment of the present application
  • FIG. 7 is a principle block diagram of a symbol adding device in speech recognition according to an embodiment of the present application.
  • FIG. 8 is a schematic block diagram of an output module of a symbol adding device in speech recognition according to an embodiment of the present application.
  • FIG. 9 is a principle block diagram of a correction module of a symbol adding device in speech recognition according to an embodiment of the present application.
  • FIG. 10 is a schematic diagram of a computer device in an embodiment of the present application.
  • the method for adding symbols in speech recognition can be applied in the application environment as shown in FIG. 1, in which a client (computer device) communicates with a server through a network.
  • the client includes, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, cameras, and portable wearable devices.
  • the server can be implemented by an independent server or a server cluster composed of multiple servers.
  • a method for adding symbols in speech recognition is provided.
  • the method is applied to the server in FIG. 1 as an example, and includes the following steps:
  • the voice to be recognized may be obtained from a different voice type such as a recording, a subtitled video voice, a piece of music, or a voice conversation; the mute section may be located in front of a sentence in the voice to be recognized , End, or middle position.
  • the mute segment in the to-be-recognized voice is detected synchronously, and it is determined whether the duration of the mute segment exceeds the first duration.
  • the first duration can be set according to requirements. When the duration of the mute segment exceeds (greater than) the first duration, the mute segment is located at the end of a sentence in the speech to be recognized by default.
  • the sentence needs to be paused, and punctuation marks can be inserted at the corresponding position of the mute segment; when the duration of the mute segment does not exceed (less than or equal to) the first duration, the mute segment is located at the to-be-identified by default The middle position of a sentence in the speech.
  • the speech segment does not affect the output of the text sequence corresponding to the speech to be recognized, and at the position corresponding to the silent segment, the speech corresponding to the speech to be recognized can be continuously output through speech recognition. Text sequences without the need to insert punctuation marks to space or pause.
  • speech recognition is performed on the speech to be recognized, and silence segments in the speech to be recognized are detected synchronously, that is, speech recognition is performed on the speech to be recognized to generate a corresponding text sequence, and the requirements are met. Punctuation marks are inserted into the position of the mute segment, and can be output in real time.
  • the silent segment in the speech to be recognized is detected synchronously, and when the duration of the silent segment exceeds (greater than) the first duration, the default is The mute segment is located at the end of a sentence in the speech to be recognized. At this time, the speech segment needs to be paused, and punctuation marks can be inserted at the position corresponding to the mute segment. Further, it can be determined whether the duration of the mute segment If it exceeds the second duration, it is determined whether the inserted punctuation mark is a comma or a period based on the second duration.
  • the default silent segment is only a normal interval between words in one sentence, not between two sentences. There is no need to insert punctuation at this time, and you only need to output the text sequence continuously.
  • the inserted comma or period can be modified.
  • the method for adding symbols in speech recognition performs speech recognition on the speech to be recognized, and synchronously detects the mute segment in the speech to be recognized, and determines whether the duration of the mute segment exceeds the first duration. Avoid mis-sentences in the middle of a single sentence, and output the text sequence corresponding to the speech to be recognized in real time and segment the text sequence; insert a comma or period according to the position of the mute segment and the length of the mute segment, without having to recognize all of the user ’s speech as After the text sequence, punctuation marks are added to the pause positions between sentences in the text sequence. Punctuation marks can be output in real time to achieve accurate sentence segmentation.
  • the period or comma inserted into the text sequence is corrected by a preset discrimination model.
  • a preset discrimination model On the basis of accurate sentence segmentation, accurately determine the tone type of the sentence and achieve the purpose of accurately expressing the emotion of the sentence.
  • the method for adding symbols in speech recognition provided by the present application can simultaneously add punctuation and speech recognition text sequences in real time, which significantly improves the efficiency of speech recognition and improves the user experience.
  • step S10 after the step S10, the following steps are further included:
  • the mute when performing voice recognition on the speech to be recognized, and synchronously detecting a mute segment in the speech to be recognized, when the duration of the mute segment does not exceed the first duration, the mute may be determined.
  • the segment is located in the middle of a sentence in the speech to be recognized, and no segmentation is required.
  • the speech segment does not affect the output of the text sequence corresponding to the speech to be recognized, and can continue to output the text sequence after the silent segment, thereby continuously output A text sequence corresponding to the speech to be recognized.
  • the output text sequence may correspond to each A mute symbol is temporarily output at the position of the mute segment, such as "
  • the beating is performed. Is replaced with the recognized text sequence corresponding to the next speech or punctuation marks corresponding to the next mute segment.
  • step S20 when the duration of the mute segment exceeds the first duration, a text sequence before the mute segment is output, and according to the Inserting a comma or period in the text sequence corresponding to the length of the mute segment, including the following steps:
  • the duration of the mute segment exceeds the first duration
  • a text sequence before the mute segment is output.
  • punctuation marks can be inserted into the output text sequence, and it can be further judged whether the duration of the mute segment exceeds
  • the second duration is to determine whether the position corresponding to the mute segment is a comma or a period.
  • the second duration can be set according to requirements, and the second duration must be greater than the first duration.
  • a comma should be inserted at a position corresponding to the mute segment; if the duration of the mute segment exceeds the first duration, When the duration is two hours, a period should be inserted at the position corresponding to the mute section. For example, if the frame length of each frame of the speech to be recognized is 20ms, and if it is a comma pause duration of 3 to 6 frames, you can set the first duration to 3 frames and the second duration to 6 frames. At this time, in When the duration of the mute segment exceeds the first duration, it can be continuously determined whether the duration of the mute segment exceeds the second duration.
  • both the first duration and the second duration may be increased or decreased according to requirements.
  • the actual speech rate of the object that outputs the speech to be recognized can be obtained.
  • the actual speech The speed is compared with a preset ideal speech rate (the ideal speech rate is associated with a preset initial value of the first duration and the second duration). If the actual speech speed is greater than the ideal speech speed, the speed can be reduced.
  • the first duration so that the minimum threshold for inserting a comma at a position corresponding to the mute segment is reduced; if the actual speech rate is less than the ideal speech rate, increasing the first duration so that the corresponding period
  • the minimum threshold at which the comma is inserted increases.
  • the second duration can be reduced so that the minimum threshold for inserting a period at the position corresponding to the mute segment is reduced; when the actual speech rate is less than
  • the second duration may be increased, so that the minimum threshold value for inserting a period at a position corresponding to the mute segment is increased.
  • the first duration and the second duration form a duration range, and the duration range can be adjusted.
  • the duration range decreases and the second duration decreases; when the actual speech rate is greater than the ideal speech rate, the first duration increases and the second duration increases.
  • the frame length of each frame of the speech to be recognized be 20ms
  • the ideal speech rate is 0.32 characters per frame
  • the duration of the speech to be recognized is 100 frames, and output within 100 frames
  • the number of characters is 16, and according to the obtained duration and the number of words in the output text sequence corresponding to the duration, the actual speech rate is 0.16 characters per frame. It can be seen that the actual speech rate is less than the ideal speech rate, and the first duration can be increased. .
  • the frame length of each frame of the speech to be recognized be 20ms. If the continuous duration of 3 to 6 frames is a comma pause, you can set the first duration to 3 frames and the second duration to 6 frames. At this time, during synchronization detection When the duration of the mute segment is greater than 3 frames and the duration of the mute segment is less than or equal to 6 frames, a comma is inserted at a position corresponding to the mute segment.
  • the frame length of each frame of the speech to be recognized be 20ms. If the continuous duration of 3 to 6 frames is a comma pause, you can set the first duration to 3 frames and the second duration to 6 frames. At this time, during synchronization detection When the duration of the mute segment is greater than 3 frames and the duration of the mute segment is greater than 6 frames, a period is inserted at a position corresponding to the mute segment. It can be understood that when all the text sequences before the mute segment are output, and when it is recognized that the mute segment is a comma or period, the comma or period is inserted into the output text sequence in real time, and the The speech to be recognized is recognized and the corresponding text sequence is output in real time.
  • the method for adding symbols in speech recognition provided in the present application, when the duration of the silence segment exceeds the first duration is detected synchronously, the position corresponding to the silence segment in the text sequence is inserted according to the duration of the silence segment.
  • Period without adding punctuation to the pause position between sentences in the text sequence after the user's speech is all recognized as a text sequence. It can output punctuation in real time and improve the output efficiency of symbols in speech recognition to achieve the purpose of accurate sentence segmentation. , And improve the efficiency of speech recognition.
  • step S20 that is, when the duration of the mute segment exceeds the first duration, a text sequence before the mute segment is output, and according to the Inserting a comma or period in the position of the text sequence corresponding to the silence segment, including the following steps:
  • a text sequence before the mute segment is output, and a segmentation identifier is automatically generated after the outputted text sequence, and the segmentation identifier is output in the output.
  • the text sequence is then output in real time, that is, when the duration of the mute segment exceeds the first duration, the mute segment is defaulted to be located at the end of a sentence in the speech to be recognized. A pause is required, and a segmentation mark can be automatically generated at the end of the speech segment.
  • the sentence segment identifier may be a space or an underscore, that is, when the duration of the mute segment exceeds the first duration, a space, an underscore, or a vertical line number is output directly after the output text sequence. Wait, for example, "It's sunny today_", “If tomorrow is rainy
  • S205 Acquire the speech to be recognized after the mute segment and perform voice recognition on it, and simultaneously determine whether the duration of the mute segment exceeds a second duration, where the second duration is greater than the first duration.
  • the speech recognition is not interrupted, but continues to obtain the speech to be recognized after the mute segment for speech recognition and a text sequence is output in real time; at the same time, it is determined whether the duration of the mute segment exceeds
  • the second duration is to replace the sentence segment identifier with a period or a comma; that is, the operation of replacing the sentence segment identifier is synchronized with the operation of recognizing the speech to be recognized after the sentence segment identifier, and it is not necessary to identify the Sentence identification and delay the process of speech recognition.
  • step S30 the voice to be recognized after the mute segment is obtained and the voice recognition is performed, and the correction has been inserted into the
  • the comma or period in the text sequence includes the following steps:
  • the training text may collect various types of text content and punctuation marks corresponding to the text content from a network or a book, and divide the training text into a single sentence with a comma or other symbol (such as a period, question mark, or exclamation mark) as an identifier, That is, the single sentence may be sentences of different lengths and different moods, and the training text may include sentences of different mood types (the tone types include, but are not limited to, statement sentences, interrogative sentences, imperative sentences, exclamatory sentences, etc.).
  • the rule of using commas or other symbols can be obtained according to the discriminative model (for example, the probability of replacing a period with a question mark or an exclamation point after the silent segment can be obtained according to the discriminative model ).
  • the discriminative model can determine whether the mood symbol with the highest output probability is consistent with the detected comma or period, so as to sequentially correct the detected comma or period according to the determination result.
  • the comma or period is first inserted into the text sequence identified before the mute segment, and then the inserted comma or period is further corrected by the discriminant model.
  • the output before the mute segment is detected
  • the detected comma or period is sequentially corrected by the discriminant model.
  • the method for adding symbols in speech recognition obtained by the present application obtains the speech to be recognized after the silent segment and performs speech recognition on it, and at the same time, the correction has been inserted into the text sequence according to the preset discrimination model.
  • the comma or period is used to improve the accuracy of the symbols in speech recognition, in order to accurately determine the mood type of the sentence and express the emotion of the sentence.
  • step S303 that is, when a comma or period is detected in the outputted text sequence, the detected commas are sequentially detected by the discriminant model. Or period to correct, including the following steps:
  • determining the tone type of the sentence before the comma or period by the discrimination model includes:
  • the corresponding tone symbol of the output is a comma or a period; for example, "it will rain tomorrow.”, "He said that he won't go home today.” And so on.
  • the tone symbol corresponding to the output is a question mark; for example: “Why don't you go?”, “Are you back today?", "Have you been happy yesterday?”
  • the corresponding output tone symbol is an exclamation mark. For example: “No smoking!, “Wow! This dress is so beautiful! And so on.
  • S3032. Obtain the output probability of the tone symbol inserted at the end of the sentence corresponding to the tone type. At this time, through the discrimination model, the output probability corresponding to each mood symbol corresponding to the sentence before the comma or period can be obtained, and then the mood symbol with the highest output probability can be obtained. S3033. Determine whether the tone symbol with the highest output probability is consistent with the comma or period. At this time, by judging whether the tone symbol with the highest output probability is consistent with the comma or period, the comma or period in the text sequence output before insertion into the mute section is corrected.
  • the output text sequence is "Do you talk about guitar.”
  • the discriminant model can obtain the highest probability of outputting a question mark, that is, the probability of outputting a question mark is greater than the probability of outputting a comma or a period.
  • the period in a text sequence can be modified into a question mark.
  • the The speech to be recognized for speech recognition the output text that can be obtained is "Do you talk about guitar? No,".
  • a device for adding symbols in speech recognition corresponds to the method for adding symbols in speech recognition in the above embodiment.
  • the device for adding symbols in speech recognition includes a detection module 110, an output module 120, and a correction module 130.
  • the detailed description of each function module is as follows:
  • the detecting module 110 is configured to obtain a voice to be recognized, perform voice recognition on the voice to be recognized, and synchronously detect a mute segment in the to-be-recognized voice, and determine whether a duration of the mute segment exceeds a first duration.
  • An output module 120 is configured to output a text sequence before the mute segment when the duration of the mute segment exceeds the first duration, and corresponding to the text sequence in the text sequence according to the duration of the mute segment. Insert a comma or period at the position of the mute segment.
  • the correction module 130 is configured to obtain the speech to be recognized after the silent segment and perform speech recognition on the speech, and at the same time, correct the comma or period inserted in the text sequence according to a preset discrimination model.
  • the output module 120 specifically includes a determination sub-module 121, an output comma sub-module 122, and an output period sub-module 123.
  • the detailed description of each function sub-module is as follows:
  • a judging sub-module 121 configured to output a text sequence before the mute segment when the duration of the mute segment exceeds the first duration, and determine whether the duration of the mute segment exceeds a second duration, wherein the The second duration is greater than the first duration.
  • the output comma sub-module 122 is configured to insert a comma at a position corresponding to the mute segment when the duration of the mute segment does not exceed the second duration.
  • a period output sub-module 123 is configured to insert a period at a position corresponding to the mute segment when the duration of the mute segment exceeds the second duration.
  • the output module 120 is further configured to: when the duration of the mute segment exceeds the first duration, output a text sequence before the mute segment, and automatically generate a sentence segmentation identifier after the text sequence; Obtain the speech to be recognized after the mute segment and perform speech recognition on it, and determine whether the duration of the mute segment exceeds a second duration, wherein the second duration is greater than the first duration; and the duration of the mute segment When the second period does not exceed, replace the sentence segment identifier with a comma; when the duration of the mute segment exceeds the second period, replace the sentence segment identifier with a period.
  • the correction module 130 specifically includes a training sub-module 131, an input detection sub-module 132, and a correction sub-module 133.
  • the detailed description of each function sub-module is as follows:
  • a training sub-module 131 is configured to obtain training texts containing sentences of different mood types, and generate a discriminant model based on the training texts; the discriminant model is used to obtain output probabilities of tone symbols inserted at the end of the sentence.
  • the detection sub-module 132 is configured to obtain the speech to be recognized after the silent segment and perform speech recognition on it, and at the same time detect whether a comma or a question mark exists in the output text sequence according to the output sequence of the text sequence.
  • a correction sub-module 133 is configured to correct the detected comma or period in sequence through the discriminant model when it is detected that a comma or period exists in the output text sequence.
  • the correction submodule 133 is specifically configured to: when it is detected that a comma or a period exists in the outputted text sequence, determine the mood type of the sentence before the comma or period through the discriminative model; Describe the output probability of the mood symbol inserted at the end of the sentence corresponding to the mood type; determine whether the mood symbol with the highest output probability is consistent with the comma or period; keep the current comma when the mood symbol with the highest output probability is consistent with the comma or period Or the period is unchanged; when the tone symbol with the highest output probability is inconsistent with the comma or period, the current comma or period is modified to the tone symbol with the highest output probability.
  • Each module in the symbol adding device in the speech recognition described above may be implemented in whole or in part by software, hardware, and a combination thereof.
  • the above-mentioned modules may be embedded in the hardware form or independent of the processor in the computer device, or may be stored in the memory of the computer device in the form of software, so that the processor calls and performs the operations corresponding to the above modules.
  • a computer device is provided.
  • the computer device may be a server, and its internal structure diagram may be as shown in FIG. 10.
  • the computer device includes a processor, a memory, a network interface, and a database connected through a system bus.
  • the processor of the computer device is used to provide computing and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, computer-readable instructions, and a database.
  • the internal memory provides an environment for operating the operating system and computer-readable instructions in a non-volatile storage medium.
  • the computer-readable instructions are executed by a processor to implement a method for adding symbols in speech recognition.
  • a computer device including a memory, a processor, and computer-readable instructions stored on the memory and executable on the processor.
  • the processor executes the computer-readable instructions, the following steps are implemented:
  • Acquire the to-be-recognized voice perform voice recognition on the to-be-recognized voice, and synchronously detect the mute segment in the to-be-recognized voice to determine whether the duration of the mute segment exceeds the first duration.
  • a text sequence before the mute segment is output, and a comma or period is inserted in the text sequence at a position corresponding to the mute segment according to the duration of the mute segment .
  • one or more non-volatile readable storage media storing computer readable instructions are provided, and the non readable storage medium stores computer readable instructions, the computer readable instructions When executed by one or more processors, causes the one or more processors to perform the following steps:
  • Acquire the to-be-recognized voice perform voice recognition on the to-be-recognized voice, and synchronously detect the mute segment in the to-be-recognized voice to determine whether the duration of the mute segment exceeds the first duration.
  • a text sequence before the mute segment is output, and a comma or period is inserted in the text sequence at a position corresponding to the mute segment according to the duration of the mute segment .
  • Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory can include random access memory (RAM) or external cache memory.
  • RAM is available in various forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Synchlink DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
  • SRAM static RAM
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDRSDRAM dual data rate SDRAM
  • ESDRAM enhanced SDRAM
  • SLDRAM synchronous chain Synchlink DRAM
  • Rambus direct RAM
  • DRAM direct memory bus dynamic RAM
  • RDRAM memory bus dynamic RAM

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Machine Translation (AREA)
  • Telephonic Communication Services (AREA)

Abstract

本申请公开了一种语音识别中符号添加方法、装置、计算机设备及存储介质,所述方法包括:对获取到的待识别语音进行语音识别,同步检测待识别语音中的静音段,判断静音段的时长是否超过第一时长;在静音段的时长超过第一时长时,输出静音段之前的文本序列,并根据静音段的时长在文本序列中对应位置插入逗号或句号;对获取到的静音段之后的待识别语音进行语音识别,同时根据预设的判别模型修正已插入至文本序列中的逗号或句号。本申请用于提升语音识别中符号的输出效率与准确度,以达到提升语音识别效率、准确断句以及准确表达情感的目的。

Description

语音识别中符号添加方法、装置、计算机设备及存储介质
本申请以2018年8月1日提交的申请号为201810865807.X,名称为“语音识别中符号添加方法、装置、计算机设备及存储介质”的中国发明专利申请为基础,并要求其优先权。
技术领域
本申请涉及语音识别领域,具体涉及一种语音识别中符号添加方法、装置、计算机设备及存储介质。
背景技术
目前,在语音识别过程中,对自动添加标点符号的研究并不多,大都是对语音进行识别时,中间有停顿的地方识别为逗号,结束时自动添加句号,整个句子被视为陈述语气,这种识别方式会造成句子的停顿错误,一个单句被误拆分为多个不连通的词语组合,而在某些情形下不能表达出讲话者的语气和情感。因此,当前缺少一种能够解决语音识别自动补全符号的方法,以达到正常的断句以及情感的正常表达。
发明内容
基于此,有必要针对上述技术问题,提供一种语音识别中符号添加方法、装置、计算机设备及存储介质,用于提升语音识别中符号的输出效率与准确度,以达到提升语音识别效率、准确断句以及准确表达情感的目的。
一种语音识别中符号添加方法,包括:
获取待识别语音,对所述待识别语音进行语音识别,并同步检测所述待识别语音中的静音段,判断所述静音段的时长是否超过第一时长;
在所述静音段的时长超过所述第一时长时,输出所述静音段之前的文本序列,并根据所述静音段的时长,在所述文本序列中对应于所述静音段的位置插入逗号或句号;
获取所述静音段之后的待识别语音并对其进行语音识别,同时根据预设的判别模型修正已插入至所述文本序列中的逗号或句号。
一种语音识别中符号添加装置,包括:
检测模块,用于获取待识别语音,对所述待识别语音进行语音识别,并同步检测所述待识别语音中的静音段,判断所述静音段的时长是否超过第一时长;
输出模块,用于在所述静音段的时长超过所述第一时长时,输出所述静音段之前的文本序列,并根据所述静音段的时长,在所述文本序列中对应于所述静音段的位置插入逗号或句号;
修正模块,用于获取所述静音段之后的待识别语音并对其进行语音识别,同时根据预设的判别模型修正已插入至所述文本序列中的逗号或句号。
一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:
获取待识别语音,对所述待识别语音进行语音识别,并同步检测所述待识别语音中的静音段,判断所述静音段的时长是否超过第一时长;
在所述静音段的时长超过所述第一时长时,输出所述静音段之前的文本序列,并根据所述静音段的时长,在所述文本序列中对应于所述静音段的位置插入逗号或句号;
获取所述静音段之后的待识别语音并对其进行语音识别,同时根据预设的判别模型修正已插入至所述文本序列中的逗号或句号。
一个或多个存储有计算机可读指令的非易失性可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:
获取待识别语音,对所述待识别语音进行语音识别,并同步检测所述待识别语音中的静音段,判断所述静音段的时长是否超过第一时长;
在所述静音段的时长超过所述第一时长时,输出所述静音段之前的文本序列,并根据所述静音段的时长,在所述文本序列中对应于所述静音段的位置插入逗号或句号;
获取所述静音段之后的待识别语音并对其进行语音识别,同时根据预设的判别模型修正已插入至所述文本序列中的逗号或句号。
本申请的一个或多个实施例的细节在下面的附图和描述中提出,本申请的其他特征和优点将从说明书、附图以及权利要求变得明显。
附图说明
为了更清楚地说明本申请实施例的技术方案,下面将对本申请实施例的描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请一实施例中语音识别中符号添加方法的应用环境示意图;
图2是本申请一实施例中语音识别中符号添加方法的流程图;
图3是本申请一实施例中语音识别中符号添加方法的步骤S20的流程图;
图4是本申请另一实施例中语音识别中符号添加方法的步骤S20的流程图;
图5是本申请一实施例中语音识别中符号添加方法的步骤S30的流程图;
图6是本申请一实施例中语音识别中符号添加方法的步骤S303流程图;
图7是本申请一实施例中语音识别中符号添加装置的原理框图;
图8是本申请一实施例中语音识别中符号添加装置的输出模块的原理框图;
图9是本申请一实施例中语音识别中符号添加装置的修正模块的原理框图;
图10是本申请一实施例中计算机设备的示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描 述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请提供的语音识别中符号添加方法,可应用在如图1的应用环境中,其中,客户端(计算机设备)通过网络与服务器进行通信。其中,客户端(计算机设备)包括但不限于为各种个人计算机、笔记本电脑、智能手机、平板电脑、摄像头和便携式可穿戴设备。服务器可以用独立的服务器或者是多个服务器组成的服务器集群来实现。
在一实施例中,如图2所示,提供一种语音识别中符号添加方法,以该方法应用在图1中的服务器为例进行说明,包括以下步骤:
S10、获取待识别语音,对所述待识别语音进行语音识别,并同步检测所述待识别语音中的静音段,判断所述静音段的时长是否超过第一时长。
其中,所述待识别语音可以从一段录音,一段无字幕的视频语音,一段音乐或者一段语音对话等不同的语音型式中获取;所述静音段可以位于所述待识别语音中的一个语句的前端、末尾或中间位置。
具体的,对所述待识别语音进行语音识别过程中,同步检测所述待识别语音中的静音段,判断所述静音段的时长是否超过第一时长。所述第一时长可以根据需求进行设置,在所述静音段的时长超过(大于)所述第一时长时,默认所述静音段位于所述待识别语音中的一个语句的末尾位置,此时语句需要进行停顿,而在所述静音段对应的位置可以插入标点符号;在所述静音段的时长不超过(小于或等于)所述第一时长时,默认所述静音段位于所述待识别语音中的一个语句的中间位置,此时该语音段不影响待识别语音对应的文本序列的输出,而在所述静音段对应的位置,可以通过语音识别连续输出对应于所述待识别语音的文本序列,无需插入标点符号进行间隔或者停顿。可理解的,对所述待识别语音进行语音识别,同步对所述待识别语音中的静音段进行检测,也即,对所述待识别语音进行语音识别生成对应的文本序列,以及在符合要求的静音段的位置插入标点符号,均可实时输出。
S20、在所述静音段的时长超过所述第一时长时,输出所述静音段之前的文本序列,并根据所述静音段的时长,在所述文本序列中对应于所述静音段的位置插入逗号或句号。
具体的,将待识别语音转化为文本序列的过程中,同步对所述待识别语音中的静音段进行检测,在所述静音段的时长超过(大于)所述第一时长时,默认所述静音段位于所述待识别语音中的一个语句的末尾位置,此时语音段需要进行停顿,而在所述静音段对应的位置可以插入标点符号,进一步地,可以判断所述静音段的时长是否超过第二时长,根据所述第二时长判断所述插入的标点符号是逗号还是句号。可理解地,在所述静音段的时长未超过所述第一时长时(对应于步骤S40),默认所述静音段仅为一个语句中字词之间的正常间隔,而不是两个语句之间的停顿,此时无需插入标点符号,仅需连续输出文本序列即可。
S30、获取所述静音段之后的待识别语音并对其进行语音识别,同时根据预设的判别模型修正已插入至所述文本序列中的逗号或句号。
可理解的,获取所述静音段之后的待识别语言并对其进行语音识别,此时已经输出所述 静音段之前的文本序列以及文本序列中插入的逗号或句号。为了更好的达到语句的情感表达,在已经完成正常断句的基础上,可以将已经插入的逗号或句号进行修正。
综上所述,本申请提供的语音识别中符号添加方法对所述待识别语音进行语音识别,并同步检测所述待识别语音中的静音段,判断静音段的时长是否超过第一时长,可以避免出现单句中间出现误断句的现象,同时实时输出待识别语音对应的文本序列并对文本序列进行断句;根据静音段的位置和静音段的时长插入逗号或句号,无需在用户语音被全部识别为文字序列之后,才将标点符号添加到文字序列中语句间的停顿位置,可以实时输出标点符号,达到准确断句;通过预设的判别模型修正已插入至所述文本序列中的句号或逗号,在准确断句的基础上,准确判断语句的语气类型以及达到准确表达语句情感的目的。本申请提供的语音识别中符号添加方法可以同步实时进行添加标点符号与语音识别文本序列,明显提高了语音识别效率,提升了用户体验。
在另一实施例中,在所述步骤S10之后还包括以下步骤:
在所述静音段的时长未超过所述第一时长时,连续输出对应于所述待识别语音的文本序列。
具体的,在对所述待识别语音进行语音识别,并同步对所述待识别语音中的静音段进行检测,在所述静音段的时长未超过所述第一时长时,可以判断所述静音段位于所述待识别语音中的一个语句的中间位置,无需进行断句,此时所述语音段不影响待识别语音对应的文本序列的输出,可以继续输出静音段之后的文本序列,从而连续输出对应于所述待识别语音的文本序列。
优选的,在连续输出文本序列的过程中,若出现一静音段,且尚未判断所述静音段是否应插入标点符号或插入的标点符号尚未明确,此时可以在输出的文本序列中对应于各静音段的位置暂时输出一个跳动的符号,比如“|”、“-”或“_”等,当识别出下一语音对应的文本序列或者下一静音段对应的标点符号时,将所述跳动的符号替换为识别出的下一语音对应的文本序列或者下一静音段对应的标点符号。
在一实施例中,如图3所示,所述步骤S20中,即所述在所述静音段的时长超过所述第一时长时,输出所述静音段之前的文本序列,并根据所述静音段的时长,在所述文本序列中对应于所述静音段的位置插入逗号或句号,具体包括以下步骤:
S201、在所述静音段的时长超过所述第一时长时,输出所述静音段之前的文本序列,并判断所述静音段的时长是否超过第二时长,其中,所述第二时长大于所述第一时长。
具体的,在所述静音段的时长超过所述第一时长时,输出静音段之前的文本序列,此时可以在输出的文本序列中插入标点符号,可以进一步判断所述静音段的时长是否超过第二时长,也即判断所述静音段对应的位置为逗号还是句号。
其中,所述第二时长可以根据需求进行设置,且所述第二时长必大于第一时长。在本实施例中,若所述静音段的时长超过所述第一时长但不超过所述第二时长时,则在静音段对应的位置应插入逗号;若所述静音段时长超过所述第二时长时,则在静音段对应的位置应插入句号。比如,令待识别语音的每一帧的帧长为20ms,若连续为3~6帧为逗号的停顿时长,则 可设置第一时长为3帧,第二时长为6帧,此时,在同步检测到所述静音段的时长超过第一时长时,即可继续判断所述静音段的时长是否超过第二时长。
在另一实施例中,所述第一时长和所述第二时长均可根据需求进行增加或者减小。具体的,根据已获取到的待识别语音的时长和已获取到的时长对应的输出文本序列的字符数,可以得出输出待识别语音的对象的实际语速,此时,将所述实际语速与预设的理想语速(所述理想语速与第一时长与第二时长的预设初值相关联)进行比较,若所述实际语速大于所述理想语速,则可以减小第一时长,以使得在所述静音段对应的位置插入逗号的最小临界值降低;若所述实际语速小于所述理想语速,则增大第一时长,以使得在所述静音段对应的位置插入逗号的最小临界值增大。同理,当所述实际语速大于所述理想语速时,可以减小第二时长,以使得在所述静音段对应的位置插入句号的最低临界值降低;当所述实际语速小于所述理想语速时,可增大第二时长,以使得在所述静音段对应的位置插入句号的最低临界值增大。可理解的,第一时长与第二时长形成一个时长范围,所述时长范围可进行调整,可理解的,所述时长范围存在两种调整方式,当所述实际语速大于所述理想语速时,第一时长减小且第二时长减小;当所述实际语速大于所述理想语速时,第一时长增大且第二时长增大。比如,令待识别语音的每一帧的帧长为20ms,若第一时长为2帧,理想语速为0.32个字符每帧,获取到的待识别语音的时长为100帧,100帧内输出的字符数为16个,根据获取到的时长和该时长对应的输出文本序列的字数,可以得到实际语速为0.16个字符每帧,可知实际语速小于理想语速,可以增大第一时长。
S202、在所述静音段的时长不超过所述第二时长时,将逗号插入至所述静音段对应的位置。
令待识别语音的每一帧的帧长为20ms,若连续为3~6帧为逗号的停顿时长,则可设置第一时长为3帧,第二时长为6帧,此时,在同步检测到所述静音段的时长大于3帧,且所述静音段的时长小于或等于6帧时,将逗号插入至所述静音段对应的位置。
S203、在所述静音段的时长超过所述第二时长时,将句号插入至所述静音段对应的位置。
令待识别语音的每一帧的帧长为20ms,若连续为3~6帧为逗号的停顿时长,则可设置第一时长为3帧,第二时长为6帧,此时,在同步检测到所述静音段的时长大于3帧,且所述静音段的时长大于6帧时,将句号插入至所述静音段对应的位置。可理解在,在完成静音段之前所有的文本序列输出时,并在识别到所述静音段为逗号或句号时,实时将逗号或句号插入至输出的文本序列之后,并继续对静音段之后的待识别语音进行识别并实时输出对应的文本序列。
综上所述,本申请提供的语音识别中符号添加方法在同步检测到静音段的时长超过第一时长时,在文本序列中对应于静音段的位置,根据所述静音段的时长插入逗号或句号,无需在用户语音被全部识别为文字序列之后,才将标点符号添加到文字序列中语句间的停顿位置,可以实时输出标点符号,提升语音识别中符号的输出效率,以达到准确断句的目的,以及提升了语音识别的效率。
在另一实施例中,如图4所示,所述步骤S20中,即所述在所述静音段的时长超过所述 第一时长时,输出所述静音段之前的文本序列,并根据所述静音段的时长,在所述文本序列中对应于所述静音段的位置插入逗号或句号,具体包括以下步骤:
S204、在所述静音段的时长超过所述第一时长时,输出所述静音段之前的文本序列,并在所述文本序列之后自动生成一个断句标识。
具体地,在所述静音段的时长超过所述第一时长时,输出所述静音段之前的文本序列,并在输出的所述文本序列之后自动生成一个断句标识,将所述断句标识在输出的所述文本序列之后实时输出,也即,在所述静音段的时长超过所述第一时长时,默认所述静音段位于所述待识别语音中的一个语句的末尾位置,此时语音段需要进行停顿,可以在语音段的末尾位置自动生成一个断句标识。
在本实施例中,所述断句标识可以为空格或者下划线,也即,在所述静音段的时长超过所述第一时长时,直接在已输出的文本序列之后输出空格、下划线或竖线号等等,比如,“今天天气很晴朗_”,“如果明天是雨天|”。
S205、获取所述静音段之后的待识别语音并对其进行语音识别,同时判断静音段的时长是否超过第二时长,其中,所述第二时长大于所述第一时长。
具体的,在生成所述断句标识之后,语音识别并未中断,而是继续获取所述静音段之后的待识别语音进行语音识别并实时输出文本序列;同时,判断所述静音段的时长是否超过第二时长并用句号或逗号替换所述断句标识;也即,替换所述断句标识的操作与语音识别所述断句标识之后的待识别语音的操作是同步进行的,并不会因为需要识别所述断句标识而延误语音识别的过程。
S206、在所述静音段的时长不超过所述第二时长时,将所述断句标识替换为逗号。此时,在同步检测到所述静音段的时长超过第一时长,且所述静音段的时长不超过第二时长时,将已输出在文本序列之后的所述断句标识替换为逗号。
S207、在所述静音段的时长超过所述第二时长时,将所述断句标识替换为句号。此时,在同步检测到所述静音段的时长超过第一时长,且所述静音段的时长超过第二时长时,将已输出在文本序列之后的所述断句标识替换为句号。
在一实施例中,如图5所示,所述步骤S30中,即所述获取所述静音段之后的待识别语音并对其进行语音识别,同时根据预设的判别模型修正已插入至所述文本序列中的逗号或句号,具体包括以下步骤:
S301、获取包含不同语气类型语句的训练文本,根据所述训练文本生成判别模型;所述判别模型用于获取所述语句末端插入的语气符号的输出概率。
所述训练文本可以从网络或书籍上收集各类包含文本内容以及文本内容对应的标点符号,并将所述训练文本以逗号或其它符号(如:句号、问号或惊叹号)为标识划分成单句,也即所述单句可以为不同长度和不同语气的语句,所述训练文本可以包含不同的语气类型语句(所述语气类型包括但不限定于为陈述句、疑问句、祈使句和感叹句等)。在根据所述训练文本生成一个判别模型之后,根据所述判别模型可以得到语句使用逗号或者其它符号的规律(比如,根据所述判别模型可以得出静音段之后将句号替换为问号或感叹号的概率)。
S302、获取所述静音段之后的待识别语音并对其进行语音识别,同时按文本序列的输出顺序,检测已输出的所述文本序列中是否存在逗号或句号。
获取所述静音段之后的待识别语音并对其进行语音识别,同时,按文本序列的输出顺序,实时检测已前输出的所述文本序列中是否存在逗号或句号,在检测到已输出的所述文本序列中存在逗号或句号时,可以通过所述判别模型判断输出概率最高的语气符号与检测到的逗号或句号是否一致,从而根据判断结果顺次对检测到的逗号或句号进行修正。
S303、在检测到已输出的所述文本序列中存在逗号或句号时,通过所述判别模型顺次对检测到的逗号或句号进行修正。
可理解的,为了语音的同步输出,先将逗号或者句号插入至静音段之前所识别的文本序列之后,再进一步通过判别模型对已插入的逗号或句号进行修正,在检测到静音段之前输出的文本序列中存在逗号或句号时,通过所述判别模型顺次对检测到的逗号或句号进行修正。
综上所述,本申请提供的语音识别中符号添加方法获取所述静音段之后的待识别语音并对其进行语音识别,同时根据所述预设的判别模型修正已插入至所述文本序列中的逗号或句号,用于提升语音识别中符号的准确度,以达到准确判断语句的语气类型和表达语句情感的目的。
在一实施例中,如图6所示,所述步骤S303中,即所述在检测到已输出的所述文本序列中存在逗号或句号时,通过所述判别模型顺次对检测到的逗号或句号进行修正,具体包括以下步骤:
S3031、在检测到已输出的所述文本序列中存在逗号或句号时,通过所述判别模型确定所述逗号或句号之前的语句的语气类型。
其中,所述通过所述判别模型确定所述逗号或句号之前的语句的语气类型,包括:
在判断所述语句的语气类型为陈述句时,对应输出的语气符号为逗号或句号;比如:“明天要下雨。”,“他说了今天不回家的。”等。
在判断所述语句的语气类型为疑问句时,对应输出的语气符号为问号;比如:“你怎么不去呢?”,“你今天回来吗?”,“昨天玩的高兴不高兴?”等。
在判断所述语句的语气类型的语句为感叹句或祈使句时,对应输出的语气符号为惊叹号。比如:“禁止吸烟!”,“哇!这衣服真漂亮!”等。
S3032、获取所述语气类型对应的语句末端插入的语气符号的输出概率。此时,通过所述判别模型可以得出所述逗号或句号之前的语句所对应的各语气符号对应的输出概率,进而得到输出概率最高的语气符号。S3033、判断输出概率最高的语气符号与所述逗号或句号是否一致。此时,通过判断输出概率最高的语气符号与所述逗号或句号是否一致,以对插入至静音段之前输出的文本序列中存在逗号或句号进行修正。
S3034、在输出概率最高的语气符号与所述逗号或句号一致时,保持当前逗号或句号不变。此时,若输出概率最高的语气符号为逗号或句号时,无需改变当前的逗号或句号,将其保留在已输出的所述文本序列中。
S3035、在输出概率最高的语气符号与所述逗号或句号不一致时,将当前逗号或句号修正 为输出概率最高的语气符号。此时,若输出概率最高的语气符号为问号时,则将当前逗号或句号修正为问号;若输出概率最高的语气符号为惊叹号时,则将当前逗号或句号修正为惊叹号。
例如,若输出的文本序列为“你会谈吉他吗。”,此时,若检测到文本序列中的句号,则通过判别模型对已插入至文本序列中的逗号或句号进行修正,此时,通过所述判别模型可以得到输出问号的概率最高,也即输出问号的概率大于输出逗号或句号的概率,可以将文本序列中的句号修正为问号,在修正的过程中,同时在对静音段之后的待识别语音进行语音识别,可以得到的输出文本为“你会谈吉他吗?不会,”。
在一实施例中,如图7所示,提供一种语音识别中符号添加装置,该语音识别中符号添加装置与上述实施例中语音识别中符号添加方法一一对应。该语音识别中符号添加装置包括检测模块110、输出模块120和修正模块130。各功能模块详细说明如下:
检测模块110,用于获取待识别语音,对所述待识别语音进行语音识别,并同步检测所述待识别语音中的静音段,判断所述静音段的时长是否超过第一时长。
输出模块120,用于在所述静音段的时长超过所述第一时长时,输出所述静音段之前的文本序列,并根据所述静音段的时长,在所述文本序列中对应于所述静音段的位置插入逗号或句号。
修正模块130,用于获取所述静音段之后的待识别语音并对其进行语音识别,同时根据预设的判别模型修正已插入至所述文本序列中的逗号或句号。
优选地,如图8所示,所述输出模块120具体包括判断子模块121、输出逗号子模块122和输出句号子模块123。各功能子模块详细说明如下:
判断子模块121,用于在所述静音段的时长超过所述第一时长时,输出所述静音段之前的文本序列,并判断所述静音段的时长是否超过第二时长,其中,所述第二时长大于所述第一时长。
输出逗号子模块122,用于在所述静音段的时长不超过所述第二时长时,将逗号插入至所述静音段对应的位置。
输出句号子模块123,用于在所述静音段的时长超过所述第二时长时,将句号插入至所述静音段对应的位置。
优选地,所述输出模块120还用于:在所述静音段的时长超过所述第一时长时,输出所述静音段之前的文本序列,并在所述文本序列之后自动生成一个断句标识;获取所述静音段之后的待识别语音并对其进行语音识别,同时判断静音段的时长是否超过第二时长,其中,所述第二时长大于所述第一时长;在所述静音段的时长不超过所述第二时长时,将所述断句标识替换为逗号;在所述静音段的时长超过所述第二时长时,将所述断句标识替换为句号。
优选地,如图9所示,所述修正模块130具体包括训练子模块131、输检测子模块132和修正子模块133。各功能子模块详细说明如下:
训练子模块131,用于获取包含不同语气类型语句的训练文本,根据所述训练文本生成判别模型;所述判别模型用于获取所述语句末端插入的语气符号的输出概率。
检测子模块132,用于获取所述静音段之后的待识别语音并对其进行语音识别,同时按文本序列的输出顺序,检测已输出的所述文本序列中是否存在逗号或问号。
修正子模块133,用于在检测到已输出的所述文本序列中存在逗号或句号时,通过所述判别模型顺次对检测到的逗号或句号进行修正。
优选地,所述修正子模块133具体用于:在检测到已输出的所述文本序列中存在逗号或句号时,通过所述判别模型确定所述逗号或句号之前的语句的语气类型;获取所述语气类型对应的语句末端插入的语气符号的输出概率;判断输出概率最高的语气符号与所述逗号或句号是否一致;在输出概率最高的语气符号与所述逗号或句号一致时,保持当前逗号或句号不变;在输出概率最高的语气符号与所述逗号或句号不一致时,将当前逗号或句号修正为输出概率最高的语气符号。
关于语音识别中符号添加装置的具体限定可以参见上文中对于语音识别中符号添加方法的限定,在此不再赘述。上述语音识别中符号添加装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。
在一个实施例中,提供了一种计算机设备,该计算机设备可以是服务器,其内部结构图可以如图10所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机可读指令和数据库。该内存储器为非易失性存储介质中的操作系统和计算机可读指令的运行提供环境。该计算机可读指令被处理器执行时以实现一种语音识别中符号添加方法。
在一个实施例中,提供了一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机可读指令,处理器执行计算机可读指令时实现以下步骤:
获取待识别语音,对所述待识别语音进行语音识别,并同步检测所述待识别语音中的静音段,判断所述静音段的时长是否超过第一时长。
在所述静音段的时长超过第一时长时,输出所述静音段之前的文本序列,并根据所述静音段的时长,在所述文本序列中对应于所述静音段的位置插入逗号或句号。
获取所述静音段之后的待识别语音并对其进行语音识别,同时根据所述预设的判别模型修正已插入至所述文本序列中的逗号或句号。
在一个实施例中,提供了一个或多个存储有计算机可读指令的非易失性可读存储介质,该非易失性可读存储介质上存储有计算机可读指令,该计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器实现以下步骤:
获取待识别语音,对所述待识别语音进行语音识别,并同步检测所述待识别语音中的静音段,判断所述静音段的时长是否超过第一时长。
在所述静音段的时长超过第一时长时,输出所述静音段之前的文本序列,并根据所述静音段的时长,在所述文本序列中对应于所述静音段的位置插入逗号或句号。
获取所述静音段之后的待识别语音并对其进行语音识别,同时根据所述预设的判别模型修正已插入至所述文本序列中的逗号或句号。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一非易失性计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将所述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。
以上所述实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。

Claims (20)

  1. 一种语音识别中符号添加方法,其特征在于,包括:
    获取待识别语音,对所述待识别语音进行语音识别,并同步检测所述待识别语音中的静音段,判断所述静音段的时长是否超过第一时长;
    在所述静音段的时长超过所述第一时长时,输出所述静音段之前的文本序列,并根据所述静音段的时长,在所述文本序列中对应于所述静音段的位置插入逗号或句号;
    获取所述静音段之后的待识别语音并对其进行语音识别,同时根据预设的判别模型修正已插入至所述文本序列中的逗号或句号。
  2. 如权利要求1所述的语音识别中符号添加方法,其特征在于,所述在所述静音段的时长超过所述第一时长时,输出所述静音段之前的文本序列,并根据所述静音段的时长,在所述文本序列中对应于所述静音段的位置插入逗号或句号,包括:
    在所述静音段的时长超过所述第一时长时,输出所述静音段之前的文本序列,并判断所述静音段的时长是否超过第二时长,其中,所述第二时长大于所述第一时长;
    在所述静音段的时长不超过所述第二时长时,将逗号插入至所述静音段对应的位置;
    在所述静音段的时长超过所述第二时长时,将句号插入至所述静音段对应的位置。
  3. 如权利要求1所述的语音识别中符号添加方法,其特征在于,所述在所述静音段的时长超过所述第一时长时,输出所述静音段之前的文本序列,并根据所述静音段的时长,在所述文本序列中对应于所述静音段的位置插入逗号或句号,包括:
    在所述静音段的时长超过所述第一时长时,输出所述静音段之前的文本序列,并在所述文本序列之后自动生成一个断句标识;
    获取所述静音段之后的待识别语音并对其进行语音识别,同时判断静音段的时长是否超过第二时长,其中,所述第二时长大于所述第一时长;
    在所述静音段的时长不超过所述第二时长时,将所述断句标识替换为逗号;
    在所述静音段的时长超过所述第二时长时,将所述断句标识替换为句号。
  4. 如权利要求1所述语音识别中符号添加的方法,其特征在于,所述获取所述静音段之后的待识别语音并对其进行语音识别,同时根据预设的判别模型修正已插入至所述文本序列中的逗号或句号,包括:
    获取包含不同语气类型语句的训练文本,根据所述训练文本生成判别模型;所述判别模型用于获取语句末端插入的语气符号的输出概率;
    获取所述静音段之后的待识别语音并对其进行语音识别,同时按文本序列的输出顺序,检测已输出的所述文本序列中是否存在逗号或句号;
    在检测到已输出的所述文本序列中存在逗号或句号时,通过所述判别模型顺次对检测到的逗号或句号进行修正。
  5. 如权利要求4所述的语音识别中符号添加方法,其特征在于,所述在检测到已输出的所述文本序列中存在逗号或句号时,通过所述判别模型顺次对检测到的逗号或句号进行修正,包括:
    在检测到已输出的所述文本序列中存在逗号或句号时,通过所述判别模型确定所述逗号或句号之前的语句的语气类型;
    获取所述语气类型对应的语句末端插入的语气符号的输出概率;
    判断输出概率最高的语气符号与所述逗号或句号是否一致;
    在输出概率最高的语气符号与所述逗号或句号一致时,保持当前逗号或句号不变;
    在输出概率最高的语气符号与所述逗号或句号不一致时,将当前逗号或句号修正为输出概率最高的语气符号。
  6. 一种语音识别中符号添加装置,其特征在于,包括:
    检测模块,用于获取待识别语音,对所述待识别语音进行语音识别,并同步检测所述待识别语音中的静音段,判断所述静音段的时长是否超过第一时长;
    输出模块,用于在所述静音段的时长超过所述第一时长时,输出所述静音段之前的文本序列,并根据所述静音段的时长,在所述文本序列中对应于所述静音段的位置插入逗号或句号;
    修正模块,用于获取所述静音段之后的待识别语音并对其进行语音识别,同时根据预设的判别模型修正已插入至所述文本序列中的逗号或句号。
  7. 如权利要求6所述的语音识别中符号添加装置,其特征在于,所述输出模块包括:
    判断子模块,用于在所述静音段的时长超过所述第一时长时,输出所述静音段之前的文本序列,并判断所述静音段的时长是否超过第二时长,其中,所述第二时长大于所述第一时长;
    输出逗号子模块,用于在所述静音段的时长不超过所述第二时长时,将逗号插入至所述静音段对应的位置;
    输出句号子模块,用于在所述静音段的时长超过所述第二时长时,将句号插入至所述静音段对应的位置。
  8. 如权利要求6所述的语音识别中符号添加装置,其特征在于,所述输出模块还用于:
    在所述静音段的时长超过所述第一时长时,输出所述静音段之前的文本序列,并在所述文本序列之后自动生成一个断句标识;
    获取所述静音段之后的待识别语音并对其进行语音识别,同时判断静音段的时长是否超过第二时长,其中,所述第二时长大于所述第一时长;
    在所述静音段的时长不超过所述第二时长时,将所述断句标识替换为逗号;
    在所述静音段的时长超过所述第二时长时,将所述断句标识替换为句号。
  9. 如权利要求6所述的语音识别中符号添加装置,其特征在于,所述修正模块包括:
    训练子模块,用于获取包含不同语气类型语句的训练文本,根据所述训练文本生成判别模型;所述判别模型用于获取所述语句末端插入的语气符号的输出概率;
    检测子模块,用于获取所述静音段之后的待识别语音并对其进行语音识别,同时按文本序列的输出顺序,检测所述已输出的所述文本序列中是否存在逗号或句号;
    修正子模块,用于在检测到已输出的所述文本序列中存在逗号或句号时,通过所述判别 模型顺次对检测到的逗号或句号进行修正。
  10. 如权利要求8所述的语音识别中符号添加装置,其特征在于,所述修正子模块还用于:
    在检测到已输出的所述文本序列中存在逗号或句号时,通过所述判别模型确定所述逗号或句号之前的语句的语气类型;
    获取所述语气类型对应的语句末端插入的语气符号的输出概率;
    判断输出概率最高的语气符号与所述逗号或句号是否一致;
    在输出概率最高的语气符号与所述逗号或句号一致时,保持当前逗号或句号不变;
    在输出概率最高的语气符号与所述逗号或句号不一致时,将当前逗号或句号修正为输出概率最高的语气符号。
  11. 一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,其特征在于,所述处理器执行所述计算机可读指令时实现如下步骤:
    获取待识别语音,对所述待识别语音进行语音识别,并同步检测所述待识别语音中的静音段,判断所述静音段的时长是否超过第一时长;
    在所述静音段的时长超过所述第一时长时,输出所述静音段之前的文本序列,并根据所述静音段的时长,在所述文本序列中对应于所述静音段的位置插入逗号或句号;
    获取所述静音段之后的待识别语音并对其进行语音识别,同时根据预设的判别模型修正已插入至所述文本序列中的逗号或句号。
  12. 如权利要求11所述的计算机设备,其特征在于,所述在所述静音段的时长超过所述第一时长时,输出所述静音段之前的文本序列,并根据所述静音段的时长,在所述文本序列中对应于所述静音段的位置插入逗号或句号,包括:
    在所述静音段的时长超过所述第一时长时,输出所述静音段之前的文本序列,并判断所述静音段的时长是否超过第二时长,其中,所述第二时长大于所述第一时长;
    在所述静音段的时长不超过所述第二时长时,将逗号插入至所述静音段对应的位置;
    在所述静音段的时长超过所述第二时长时,将句号插入至所述静音段对应的位置。
  13. 如权利要求11所述的计算机设备,其特征在于,所述在所述静音段的时长超过所述第一时长时,输出所述静音段之前的文本序列,并根据所述静音段的时长,在所述文本序列中对应于所述静音段的位置插入逗号或句号,包括:
    在所述静音段的时长超过所述第一时长时,输出所述静音段之前的文本序列,并在所述文本序列之后自动生成一个断句标识;
    获取所述静音段之后的待识别语音并对其进行语音识别,同时判断静音段的时长是否超过第二时长,其中,所述第二时长大于所述第一时长;
    在所述静音段的时长不超过所述第二时长时,将所述断句标识替换为逗号;
    在所述静音段的时长超过所述第二时长时,将所述断句标识替换为句号。
  14. 如权利要求11所述的计算机设备,其特征在于,所述获取所述静音段之后的待识别 语音并对其进行语音识别,同时根据预设的判别模型修正已插入至所述文本序列中的逗号或句号,包括:
    获取包含不同语气类型语句的训练文本,根据所述训练文本生成判别模型;所述判别模型用于获取语句末端插入的语气符号的输出概率;
    获取所述静音段之后的待识别语音并对其进行语音识别,同时按文本序列的输出顺序,检测已输出的所述文本序列中是否存在逗号或句号;
    在检测到已输出的所述文本序列中存在逗号或句号时,通过所述判别模型顺次对检测到的逗号或句号进行修正。
  15. 如权利要求14所述的计算机设备,其特征在于,所述在检测到已输出的所述文本序列中存在逗号或句号时,通过所述判别模型顺次对检测到的逗号或句号进行修正,包括:
    在检测到已输出的所述文本序列中存在逗号或句号时,通过所述判别模型确定所述逗号或句号之前的语句的语气类型;
    获取所述语气类型对应的语句末端插入的语气符号的输出概率;
    判断输出概率最高的语气符号与所述逗号或句号是否一致;
    在输出概率最高的语气符号与所述逗号或句号一致时,保持当前逗号或句号不变;
    在输出概率最高的语气符号与所述逗号或句号不一致时,将当前逗号或句号修正为输出概率最高的语气符号。
  16. 一个或多个存储有计算机可读指令的非易失性可读存储介质,其特征在于,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:
    获取待识别语音,对所述待识别语音进行语音识别,并同步检测所述待识别语音中的静音段,判断所述静音段的时长是否超过第一时长;
    在所述静音段的时长超过所述第一时长时,输出所述静音段之前的文本序列,并根据所述静音段的时长,在所述文本序列中对应于所述静音段的位置插入逗号或句号;
    获取所述静音段之后的待识别语音并对其进行语音识别,同时根据预设的判别模型修正已插入至所述文本序列中的逗号或句号。
  17. 如权利要求16所述的非易失性可读存储介质,其特征在于,所述在所述静音段的时长超过所述第一时长时,输出所述静音段之前的文本序列,并根据所述静音段的时长,在所述文本序列中对应于所述静音段的位置插入逗号或句号,包括:
    在所述静音段的时长超过所述第一时长时,输出所述静音段之前的文本序列,并判断所述静音段的时长是否超过第二时长,其中,所述第二时长大于所述第一时长;
    在所述静音段的时长不超过所述第二时长时,将逗号插入至所述静音段对应的位置;
    在所述静音段的时长超过所述第二时长时,将句号插入至所述静音段对应的位置。
  18. 如权利要求16所述的非易失性可读存储介质,其特征在于,所述在所述静音段的时长超过所述第一时长时,输出所述静音段之前的文本序列,并根据所述静音段的时长,在所述文本序列中对应于所述静音段的位置插入逗号或句号,包括:
    在所述静音段的时长超过所述第一时长时,输出所述静音段之前的文本序列,并在所述 文本序列之后自动生成一个断句标识;
    获取所述静音段之后的待识别语音并对其进行语音识别,同时判断静音段的时长是否超过第二时长,其中,所述第二时长大于所述第一时长;
    在所述静音段的时长不超过所述第二时长时,将所述断句标识替换为逗号;
    在所述静音段的时长超过所述第二时长时,将所述断句标识替换为句号。
  19. 如权利要求16所述的非易失性可读存储介质,其特征在于,所述获取所述静音段之后的待识别语音并对其进行语音识别,同时根据预设的判别模型修正已插入至所述文本序列中的逗号或句号,包括:
    获取包含不同语气类型语句的训练文本,根据所述训练文本生成判别模型;所述判别模型用于获取语句末端插入的语气符号的输出概率;
    获取所述静音段之后的待识别语音并对其进行语音识别,同时按文本序列的输出顺序,检测已输出的所述文本序列中是否存在逗号或句号;
    在检测到已输出的所述文本序列中存在逗号或句号时,通过所述判别模型顺次对检测到的逗号或句号进行修正。
  20. 如权利要求19所述的非易失性可读存储介质,其特征在于,所述在检测到已输出的所述文本序列中存在逗号或句号时,通过所述判别模型顺次对检测到的逗号或句号进行修正,包括:
    在检测到已输出的所述文本序列中存在逗号或句号时,通过所述判别模型确定所述逗号或句号之前的语句的语气类型;
    获取所述语气类型对应的语句末端插入的语气符号的输出概率;
    判断输出概率最高的语气符号与所述逗号或句号是否一致;
    在输出概率最高的语气符号与所述逗号或句号一致时,保持当前逗号或句号不变;
    在输出概率最高的语气符号与所述逗号或句号不一致时,将当前逗号或句号修正为输出概率最高的语气符号。
PCT/CN2018/104046 2018-08-01 2018-09-05 语音识别中符号添加方法、装置、计算机设备及存储介质 WO2020024352A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810865807.XA CN108831481A (zh) 2018-08-01 2018-08-01 语音识别中符号添加方法、装置、计算机设备及存储介质
CN201810865807.X 2018-08-01

Publications (1)

Publication Number Publication Date
WO2020024352A1 true WO2020024352A1 (zh) 2020-02-06

Family

ID=64153440

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/104046 WO2020024352A1 (zh) 2018-08-01 2018-09-05 语音识别中符号添加方法、装置、计算机设备及存储介质

Country Status (2)

Country Link
CN (1) CN108831481A (zh)
WO (1) WO2020024352A1 (zh)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109754808B (zh) * 2018-12-13 2024-02-13 平安科技(深圳)有限公司 语音转换文字的方法、装置、计算机设备及存储介质
CN112151073A (zh) * 2019-06-28 2020-12-29 北京声智科技有限公司 一种语音处理方法、系统、设备及介质
CN110502631B (zh) * 2019-07-17 2022-11-04 招联消费金融有限公司 一种输入信息响应方法、装置、计算机设备和存储介质
CN110675861B (zh) * 2019-09-26 2022-11-01 深圳追一科技有限公司 语音断句方法、装置、设备及存储介质
CN111261162B (zh) * 2020-03-09 2023-04-18 北京达佳互联信息技术有限公司 语音识别方法、语音识别装置及存储介质
CN111986654B (zh) * 2020-08-04 2024-01-19 云知声智能科技股份有限公司 降低语音识别系统延时的方法及系统
CN112101003B (zh) * 2020-09-14 2023-03-14 深圳前海微众银行股份有限公司 语句文本的切分方法、装置、设备和计算机可读存储介质
CN114613357A (zh) * 2020-12-04 2022-06-10 广东博智林机器人有限公司 语音处理方法、系统、电子设备和存储介质
CN112712802A (zh) * 2020-12-23 2021-04-27 江西远洋保险设备实业集团有限公司 密集架智能信息处理语音识别操作控制系统
CN112634876B (zh) * 2021-01-04 2023-11-10 北京有竹居网络技术有限公司 语音识别方法、装置、存储介质及电子设备
CN112927679B (zh) * 2021-02-07 2023-08-15 虫洞创新平台(深圳)有限公司 一种语音识别中添加标点符号的方法及语音识别装置
CN112992117B (zh) * 2021-02-26 2023-05-26 平安科技(深圳)有限公司 多语言语音模型生成方法、装置、计算机设备及存储介质
CN115512687B (zh) * 2022-11-08 2023-02-17 之江实验室 一种语音断句方法、装置、存储介质及电子设备

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102231278A (zh) * 2011-06-10 2011-11-02 安徽科大讯飞信息科技股份有限公司 实现语音识别中自动添加标点符号的方法及系统
KR20120042381A (ko) * 2010-10-25 2012-05-03 한국전자통신연구원 음성인식 문장의 문형식별 장치 및 방법
CN107767870A (zh) * 2017-09-29 2018-03-06 百度在线网络技术(北京)有限公司 标点符号的添加方法、装置和计算机设备
CN107910021A (zh) * 2017-11-08 2018-04-13 天脉聚源(北京)传媒科技有限公司 一种符号插入方法及装置

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8566088B2 (en) * 2008-11-12 2013-10-22 Scti Holdings, Inc. System and method for automatic speech to text conversion
GB0905457D0 (en) * 2009-03-30 2009-05-13 Touchtype Ltd System and method for inputting text into electronic devices
CN107247706B (zh) * 2017-06-16 2021-06-25 中国电子技术标准化研究院 文本断句模型建立方法、断句方法、装置及计算机设备
CN107632980B (zh) * 2017-08-03 2020-10-27 北京搜狗科技发展有限公司 语音翻译方法和装置、用于语音翻译的装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20120042381A (ko) * 2010-10-25 2012-05-03 한국전자통신연구원 음성인식 문장의 문형식별 장치 및 방법
CN102231278A (zh) * 2011-06-10 2011-11-02 安徽科大讯飞信息科技股份有限公司 实现语音识别中自动添加标点符号的方法及系统
CN107767870A (zh) * 2017-09-29 2018-03-06 百度在线网络技术(北京)有限公司 标点符号的添加方法、装置和计算机设备
CN107910021A (zh) * 2017-11-08 2018-04-13 天脉聚源(北京)传媒科技有限公司 一种符号插入方法及装置

Also Published As

Publication number Publication date
CN108831481A (zh) 2018-11-16

Similar Documents

Publication Publication Date Title
WO2020024352A1 (zh) 语音识别中符号添加方法、装置、计算机设备及存储介质
CN107632980B (zh) 语音翻译方法和装置、用于语音翻译的装置
US10114809B2 (en) Method and apparatus for phonetically annotating text
US20220076693A1 (en) Bi-directional recurrent encoders with multi-hop attention for speech emotion recognition
US9502036B2 (en) Correcting text with voice processing
US10811005B2 (en) Adapting voice input processing based on voice input characteristics
US11450311B2 (en) System and methods for accent and dialect modification
US20160092438A1 (en) Machine translation apparatus, machine translation method and program product for machine translation
US10839788B2 (en) Systems and methods for selecting accent and dialect based on context
US11043213B2 (en) System and method for detection and correction of incorrectly pronounced words
CN108682420A (zh) 一种音视频通话方言识别方法及终端设备
US20200160850A1 (en) Speech recognition system, speech recognition method and computer program product
WO2021027029A1 (zh) 数据处理方法、装置、计算机设备和存储介质
US10304439B2 (en) Image processing device, animation display method and computer readable medium
US20140372117A1 (en) Transcription support device, method, and computer program product
WO2014183373A1 (en) Systems and methods for voice identification
CN107564526B (zh) 处理方法、装置和机器可读介质
US11676607B2 (en) Contextual denormalization for automatic speech recognition
US20140207451A1 (en) Method and Apparatus of Adaptive Textual Prediction of Voice Data
CN111192586B (zh) 语音识别方法及装置、电子设备、存储介质
CN114449310A (zh) 视频剪辑方法、装置、计算机设备及存储介质
CN113506586A (zh) 用户情绪识别的方法和系统
US11600279B2 (en) Transcription of communications
CN115862631A (zh) 一种字幕生成方法、装置、电子设备和存储介质
CN114783405B (zh) 一种语音合成方法、装置、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18928465

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18928465

Country of ref document: EP

Kind code of ref document: A1