WO2019120247A1 - 一种文字校验方法及装置 - Google Patents

一种文字校验方法及装置 Download PDF

Info

Publication number
WO2019120247A1
WO2019120247A1 PCT/CN2018/122343 CN2018122343W WO2019120247A1 WO 2019120247 A1 WO2019120247 A1 WO 2019120247A1 CN 2018122343 W CN2018122343 W CN 2018122343W WO 2019120247 A1 WO2019120247 A1 WO 2019120247A1
Authority
WO
WIPO (PCT)
Prior art keywords
timestamp
voice signal
text
signal segment
target
Prior art date
Application number
PCT/CN2018/122343
Other languages
English (en)
French (fr)
Inventor
王群
Original Assignee
北京君林科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京君林科技股份有限公司 filed Critical 北京君林科技股份有限公司
Publication of WO2019120247A1 publication Critical patent/WO2019120247A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/221Announcement of recognition results
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/225Feedback of the input speech

Definitions

  • the embodiments of the present invention relate to the field of voice text conversion technologies, and in particular, to a text verification method and apparatus.
  • Intelligent voice text conversion technology can be applied to meeting minutes, training records or interview records.
  • the feature parameters of the speech signal are first extracted, and the feature parameters are matched with the feature parameters corresponding to the characters in the speech database, thereby obtaining the text with the highest matching degree and outputting.
  • the accuracy is higher.
  • the speaker in a real-life scenario, the speaker inevitably has a certain local accent, and there is no guarantee that the recording will be performed in a quiet environment, so the accuracy of the voice text conversion cannot be guaranteed.
  • the converted text needs to be manually verified.
  • the verification personnel find the wrong text, it needs to be corrected according to the original recording content.
  • this method usually requires multiple attempts to play, and multiple auditions to accurately play the check.
  • the voice content that is needed so it is a waste of time, resulting in relatively low verification efficiency.
  • the embodiment of the invention provides a text verification method and a terminal to provide a method for improving the efficiency of text verification.
  • the embodiment of the invention provides a text verification method, including:
  • the text to be verified in the text is obtained, and the text to be verified includes at least one character
  • Determining a target speech signal segment corresponding to the character to be verified where the text field in which the character to be verified is located is a text field generated after the target speech signal segment is subjected to speech recognition, and the text includes at least one text field Each text field corresponds to a voice signal segment;
  • determining the target voice signal segment corresponding to the character to be verified includes:
  • the second timestamp being a timestamp of the speech signal segment tag.
  • determining a target voice signal segment corresponding to the second timestamp that is the same as the first timestamp includes:
  • the target timestamp searching for a second timestamp of each voice signal segment in order from the voice signal segment corresponding to the target timestamp, determining that the first timestamp is the same as the first timestamp The second timestamp corresponding to the target voice signal segment, otherwise searching for the second timestamp of each voice signal segment in order from the voice signal segment corresponding to the target timestamp, and determining the same as the first timestamp The target speech signal segment corresponding to the second timestamp.
  • the method includes:
  • marking the text field with a first timestamp and marking the voice signal segment with a second timestamp including:
  • the method further includes:
  • the embodiment of the invention further provides a text verification device, comprising:
  • a first acquiring unit configured to: when the play instruction is detected, acquire text to be verified in the text, where the text to be verified includes at least one character;
  • a determining unit configured to determine a target speech signal segment corresponding to the character to be verified, where the text field in which the character to be verified is a text field generated by the target speech signal segment after speech recognition, the text Include at least one text field, each text field corresponding to a voice signal segment;
  • a playing unit configured to play the target voice signal segment.
  • the determining unit includes:
  • a first determining subunit configured to determine a first timestamp of a text field in which the text to be verified is located
  • a second determining subunit configured to determine a target voice signal segment corresponding to the second timestamp that is the same as the first timestamp, where the second timestamp is a timestamp of the voice signal segment flag.
  • the second determining subunit is specifically configured to:
  • the target timestamp searching for a second timestamp of each voice signal segment in order from the voice signal segment corresponding to the target timestamp, determining that the first timestamp is the same as the first timestamp The second timestamp corresponding to the target voice signal segment, otherwise searching for the second timestamp of each voice signal segment in order from the voice signal segment corresponding to the target timestamp, and determining the same as the first timestamp The target speech signal segment corresponding to the second timestamp.
  • the device further includes:
  • a second acquiring unit configured to acquire a voice signal
  • a sentence segment unit configured to perform a segmentation at a sentence pause signal in the voice signal to generate a voice signal segment
  • a converting unit configured to convert the voice signal segment into a text field
  • a marking unit configured to mark the text field with a first timestamp, and mark the voice signal segment with a second timestamp.
  • the corresponding voice signal segment can be accurately determined and played according to the timestamp corresponding to the text to be verified. Therefore, the method provided in this embodiment is convenient to use and can improve the school. Test efficiency.
  • FIG. 1 is a flowchart of a text verification method according to an embodiment of the present invention
  • FIG. 2 is a structural block diagram of a character verification apparatus according to an embodiment of the present invention.
  • FIG. 3 is a structural block diagram of another text verification apparatus according to an embodiment of the present invention.
  • FIG. 4 is a structural block diagram of a character verification system according to an embodiment of the present invention.
  • the embodiments of the present invention can be applied to a terminal, such as a mobile phone, a computer, a tablet, and the like.
  • a terminal such as a mobile phone, a computer, a tablet, and the like.
  • one of the implementation manners may be real-time text conversion, that is, the voice signal output by the speaker is collected on the spot, and the voice signal is converted into text and saved.
  • a terminal with a voice signal acquisition function such as a terminal with a microphone
  • Another implementation manner may be text non-real-time conversion, that is, the device with the recording function pre-records the voice signal output by the speaker, and then transmits the recorded complete voice signal to the terminal, and the terminal pairs the acquired voice signal. Perform text conversion.
  • the embodiment of the present invention can also be applied to a terminal and a cloud server connected to the terminal.
  • the terminal can send the voice signal to be converted to the cloud server, and the cloud server performs text conversion on the obtained voice signal, and sends the converted text to the terminal.
  • the voice signal to be converted by the text may be recorded in real time by the terminal itself, or may be a voice signal sent to the terminal after recording by other recording devices.
  • a method for text verification is provided according to an embodiment of the present invention, and the method can be applied to a terminal. As shown in FIG. 1, the method can include the following steps.
  • step 11 the voice signal is obtained.
  • the speech signal is a speech signal to be converted into text.
  • the way to convert a voice signal into text can be real-time conversion, so the voice signal can be acquired in real time or non-real time.
  • the terminal itself may directly acquire a voice signal, or may use a voice signal collection device to acquire a voice signal.
  • Step 12 Perform a sentence break at a sentence pause signal in the voice signal to generate a voice signal segment.
  • the duration of the silent signal can be detected in real time while acquiring the speech signal.
  • the silent signal can be used as a sentence pause signal, and the sentence is broken on the pause signal of the sentence, thereby generating a segment of the voice signal, where the sentence is the end of the segment of the voice signal .
  • Determining the manner in which the statement is paused may first determine an average time interval between two syllables in the voice signal, and set a threshold according to the average time interval, when detecting that the duration of the silent signal exceeds the threshold , it can be determined that the silent signal is a statement pause signal.
  • the sentence segment can be any position on the statement pause signal, and the specific sentence segment position is not limited in the embodiment of the present invention.
  • the accuracy of the text segment into which each speech signal segment is converted is calculated separately.
  • the text segment is: Yan Yuejin, director of the Research Center of the Yiju Research Center, said that the accuracy rate is 80%; 10 hotspots such as Beijing, Shanghai, Shenzhen, Chengdu, Fuzhou, Nanjing, Hangzhou, Hefei, Zhengzhou and Wuxi have an accuracy rate of 80%; new houses have experienced negative growth year-on-year, with an accuracy rate of 60%.
  • the text with the color logo can be either the text background with the color or the text itself with the color.
  • the sentence pause signal in the complete speech signal can be determined first, and the sentence is broken at the sentence pause signal, thereby generating a plurality of speech signal segments.
  • the corresponding time at the sentence can be calculated to determine the start time and end time of each speech signal segment after the sentence is broken.
  • the start time of the first voice signal segment in the voice signal may be set to 0, and the end time may be the time at the first segment.
  • the speech signal may be time stamped with a start time and/or an end time of the speech signal segment. That is to say, the voice signal segment may be marked with a time stamp, which may be a start time stamp or an end time stamp of the voice signal segment, or the voice signal segment may also mark two time stamps, which are respectively a voice signal segment. Start time stamp and end time stamp.
  • the duration of the silent signal can be determined according to the waveform of the voice signal, and details are not described herein again.
  • step 13 the voice signal segment is converted into a text field.
  • the generated plurality of voice signal segments may be sequentially converted into text fields.
  • the voice signal segment may be voiceprinted, the speaker corresponding to the voice signal segment is determined, and the name of the speaker is added in the form of text to the voice signal segment.
  • the name of the speaker may be added at the forefront of the paragraph field, the name of the speaker may be marked with parentheses, or the name of the speaker may be followed by a colon.
  • Step 14 Mark the text field with a first timestamp, and mark the voice signal segment with a second timestamp, the first timestamp being the same as the second timestamp.
  • the start time and the end time of each voice signal segment may be calculated according to the sampling frequency of each voice signal segment, for time stamping each voice signal segment and the text field.
  • Each speech signal segment may be marked with a time stamp, which may be the start time or end time of the speech signal segment, or may be marked with two time stamps, which are the start time and end time of the speech signal segment, respectively.
  • Each text field can be tagged with the same timestamp as the voice signal segment.
  • the start time of the speech signal segment may be marked at the beginning of each segment of text, and the end time of the speech signal segment may be marked at the end of each segment of text.
  • the specific location of each text time stamp is not limited.
  • the voice signal and its corresponding timestamp are: "[0:00.000] Yan Yuejin, director of the Research Center of the Research Institute of Yiju Research Institute, said [0:03.145] October, Beijing, Shanghai, Shenzhen, Chengdu, Fuzhou, Nanjing, New homes in 10 hotspot cities such as Hangzhou, Hefei, Zhengzhou and Wuxi all experienced negative year-on-year growth.”
  • the voice signal segment can be marked with a time stamp of "0:00.000” at the beginning of the paragraph, "October, Beijing, Shanghai, Shenzhen. New homes in 10 hotspot cities such as Chengdu, Fuzhou, Nanjing, Hangzhou, Hefei, Zhengzhou and Wuxi all experienced negative year-on-year growth.
  • This voice signal segment can be marked with a time stamp of “0:03.145” at the beginning of the segment.
  • the manner of breaking the sentence of the voice signal is not limited.
  • the voice signal may be sentenced at a fixed time interval.
  • the time stamp is also a time stamp of the text field corresponding to each speech signal segment, the time stamp corresponds to
  • the text field is not a complete sentence in terms of semantics, so after the voice signal is acquired, the voice signal can be converted into text, and the text is sentenced according to the semantic meaning of the text, and a text field is generated, according to the text field.
  • the field field segments the speech signal to generate a speech signal segment, marks the text field with a first timestamp by using a start time and/or an end time of the speech signal segment, and marks the speech signal segment
  • the second timestamp, the first timestamp and the second timestamp may be the same.
  • the terminal may also send the voice signal segment to the cloud server, and the cloud server converts the voice signal segment into a text field, and the text field is Marking the first timestamp, and marking the voice signal segment with a second timestamp, and transmitting the timestamped text field and the voice signal segment to the terminal.
  • the above steps 11 to 14 can also be performed by the cloud server.
  • the terminal sends the collected voice signal to the cloud server in real time, and after executing the steps 11 to 14, the cloud server sends the text field corresponding to the voice signal segment to the terminal, thereby realizing real-time conversion.
  • the terminal can send the complete voice signal to the cloud server, and after the cloud server performs text conversion, the time-stamped text is sent to the terminal, and the time-stamped voice signal segment can also be sent. Give the terminal.
  • the converted text field can be composed of readable and editable text that can be verified on the terminal.
  • the process of verifying the text may include the following steps.
  • Step 15 When the play instruction is detected, the text to be verified in the text is obtained, and the text to be verified includes at least one character.
  • the verification personnel can select the text to be verified, click the play button, and click the play button, the system can detect the play instruction.
  • the display mode of the play button is not limited.
  • the play button may be displayed after the text is opened, or may be displayed after the text is selected, and the text may be selected and displayed after right clicking the mouse.
  • Step 16 Determine a target voice signal segment corresponding to the character to be verified, where the text field in which the character to be verified is located is a text field generated after the target voice signal segment is voice-recognized, and the text includes at least A text field, each text field corresponding to a speech signal segment.
  • the target voice signal segment corresponding to the character to be verified When determining the target voice signal segment corresponding to the character to be verified, determining a first timestamp of the text field in which the text to be verified is located, and determining a second time that is the same as the first timestamp.
  • the target voice signal segment corresponding to the time stamp, and the second time stamp is a time stamp of the voice signal segment mark.
  • the determining of the target voice signal segment corresponding to the second timestamp that is the same as the first timestamp may include:
  • step 161 all voice signal segments are grouped.
  • the number of packets corresponding to the number of different voice signal segments may be preset. For example, when the number of voice signal segments is in the range of 10 to 50, they are divided into two groups, and when they are in the range of 50 to 100, they are divided into three groups. The number of speech signal segments in each group is the same or similar.
  • voice signal segments may be grouped according to the chronological order of the voice signal segments, so as to search for each voice signal segment in chronological order.
  • Step 162 Determine a third timestamp of the first voice signal segment of each group of voice signal segments, the third timestamp being the same as the second timestamp of the first voice signal segment.
  • each set of speech signal segments includes 30 speech signal segments, and each speech signal segment can be arranged in the order of time stamps.
  • the time stamp corresponding to the voice signal segment in the first set of voice signal segments is a1 to a30
  • the time stamp corresponding to the voice signal segment in the second group of voice signal segments is b1 to b30
  • the voice signal in the third group of voice signal segments The timestamps corresponding to the segments are c1 to c30
  • the time stamps of the first voice signal segments in each group of voice signal segments are a1, b1, and c1, respectively.
  • Step 163 Determine a target timestamp with a minimum difference from the first timestamp, where the target timestamp is one of the third timestamps.
  • the voice signal segments of each group and the third timestamp corresponding to each group may be saved.
  • step 163 may be directly performed, and the grouping may not be performed. And re-determining the third timestamp corresponding to each group.
  • Step 164 If the first timestamp is greater than the target timestamp, search for a second timestamp of each voice signal segment in order from the voice signal segment corresponding to the target timestamp, and determine the first timestamp. a target voice signal segment corresponding to the second timestamp with the same timestamp, and searching for the second timestamp of each voice signal segment in turn from the voice signal segment corresponding to the target timestamp, determining the first timestamp with the first timestamp The same second timestamp corresponds to the target speech signal segment.
  • the second timestamp of each voice signal segment is searched sequentially starting from the voice signal segment corresponding to b1, that is, searching for b2, b3, b4 in order... Until the target speech signal segment corresponding to the second timestamp identical to the first timestamp is found.
  • this embodiment can not only improve the program calculation speed, but also quickly find the voice signal segment corresponding to the character to be verified, and can also reduce the occupancy rate of the terminal CPU resources.
  • step 17 the target speech signal segment is played.
  • the system can detect the play instruction, and then play the voice signal segment corresponding to the timestamp of the field in which the text is located, so that the correct text can be determined through the context.
  • the timestamp of each frame of the voice signal can be marked, and the timestamp of each character is marked, so that when a piece of text is sent and the play instruction is selected, the voice signal segment corresponding to the piece of text can be played for use.
  • the specific implementation process can refer to the above steps, and details are not described herein again.
  • the corresponding voice signal segment can be accurately determined and played according to the timestamp corresponding to the text to be verified. Therefore, the method provided in this embodiment is convenient to use and can improve the school. Test efficiency.
  • FIG. 2 is a text verification apparatus according to an embodiment of the present invention.
  • the apparatus may be deployed on a terminal or a terminal itself.
  • the apparatus may include a second acquiring unit 21, and the sentence breaking unit 22
  • the conversion unit 23 the marking unit 24, the first acquisition unit 25, the determination unit 25, and the playback unit 27.
  • the second obtaining unit 21 is configured to acquire a voice signal.
  • the sentence segment unit 22 is configured to perform a segmentation at a sentence pause signal in the voice signal to generate a voice signal segment.
  • the converting unit 23 is configured to convert the voice signal segment into a text field.
  • the conversion unit 23 can be connected to a cloud server, and the conversion unit 23 can convert the voice signal segment into a text field through a cloud server.
  • the marking unit 24 is configured to mark the text field with a first time stamp and mark the voice signal segment with a second time stamp.
  • the marking unit 24 is specifically configured to: mark the text field with a first time stamp by using a start time and/or an end time of the voice signal segment, and mark a second time stamp with the voice signal segment,
  • the first timestamp is the same as the second timestamp.
  • the marking unit 24 may also mark the text field with the first time stamp through the cloud server and mark the second time stamp with the voice signal segment.
  • the first obtaining unit 25 is configured to: when the play instruction is detected, acquire text to be verified in the text, where the text to be verified includes at least one character.
  • a determining unit 26 configured to determine a target voice signal segment corresponding to the character to be verified, where the text field of the character to be verified is a text field generated after the target voice signal segment is voice-recognized,
  • the text includes at least one text field, each text field corresponding to a speech signal segment.
  • the determining unit 26 can include:
  • a first determining subunit configured to determine a first timestamp of a text field in which the text to be verified is located
  • a second determining subunit configured to determine a target voice signal segment corresponding to the second timestamp that is the same as the first timestamp, where the second timestamp is a timestamp of the voice signal segment flag.
  • the second determining subunit may be specifically used for:
  • the target timestamp searching for a second timestamp of each voice signal segment in order from the voice signal segment corresponding to the target timestamp, determining that the first timestamp is the same as the first timestamp The second timestamp corresponding to the target voice signal segment, otherwise searching for the second timestamp of each voice signal segment in order from the voice signal segment corresponding to the target timestamp, and determining the same as the first timestamp The target speech signal segment corresponding to the second timestamp.
  • the playing unit 27 is configured to play the target voice signal segment.
  • the corresponding voice signal segment can be accurately determined and played according to the timestamp corresponding to the text to be verified. Therefore, the method provided in this embodiment is convenient to use and can improve the school. Test efficiency.
  • FIG. 3 is a text verification apparatus according to an embodiment of the present invention.
  • the apparatus may be deployed on a terminal or a terminal itself.
  • the apparatus may include a second acquiring unit 31, and the converting unit 32. , sentence unit 33, label unit 34.
  • the second obtaining unit 31 is configured to acquire a voice signal.
  • the converting unit 32 is configured to convert the voice signal into a text.
  • the sentence segment unit 33 is configured to perform a sentence segmentation on the text according to the semantic meaning of the text, generate a text field, and perform a segmentation on the voice signal according to the text field to generate a voice signal segment.
  • the marking unit 34 is configured to mark the text field with a first time stamp and mark the voice signal segment with a second time stamp.
  • the corresponding voice signal segment can be accurately determined and played according to the timestamp corresponding to the text to be verified. Therefore, the method provided in this embodiment is convenient to use and can improve the school. Test efficiency.
  • FIG. 4 is a text verification system according to an embodiment of the present invention.
  • the system includes a terminal 41 and a cloud server 42.
  • the terminal 41 is configured to send the collected voice signal to the cloud server 42.
  • the cloud server 42 may include a second obtaining unit 421, a sentence breaking unit 422, a converting unit 423, and a marking unit 424,
  • the second obtaining unit 421 is configured to acquire a voice signal.
  • the sentence segment unit 422 is configured to perform a segmentation at a sentence pause signal in the voice signal to generate a voice signal segment.
  • the converting unit 423 is configured to convert the voice signal segment into a text field.
  • the converting unit 424 can be connected to a cloud server, and the converting unit 22 can convert the voice signal segment into a text field through a cloud server.
  • the marking unit 424 is configured to mark the text field with a first time stamp and mark the voice signal segment with a second time stamp.
  • the cloud server 42 transmits the text field with the first time stamp and the voice signal segment with the second time stamp to the terminal 41.
  • the terminal 41 is configured to view and verify text, and may include a first obtaining unit 411, a determining unit 412, and a playing unit 413.
  • the first obtaining unit 411 is configured to: when the play instruction is detected, acquire text to be verified in the text, where the text to be verified includes at least one character.
  • a determining unit 412 configured to determine a target voice signal segment corresponding to the character to be verified, where the text field of the character to be verified is a text field generated after the target voice signal segment is voice-recognized,
  • the text includes at least one text field, each text field corresponding to a speech signal segment.
  • the playing unit 413 is configured to play the target voice signal segment.
  • the corresponding voice signal segment can be accurately determined and played according to the timestamp corresponding to the text to be verified. Therefore, the method provided in this embodiment is convenient to use and can improve the school. Test efficiency.
  • the description is relatively simple, and the relevant parts can be referred to the description of the method embodiment.
  • embodiments of the embodiments of the invention may be provided as a method, apparatus, or computer program product.
  • embodiments of the invention may be in the form of an entirely hardware embodiment, an entirely software embodiment, or a combination of software and hardware.
  • embodiments of the invention may take the form of a computer program product embodied on one or more computer usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
  • Embodiments of the invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or FIG.
  • These computer program instructions can be provided to a processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing terminal device to generate a machine such that instructions are executed by a processor of a computer or other programmable data processing terminal device Means are generated for implementing the functions specified in one or more blocks of a flow or a flow and/or a block of a block diagram.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing terminal device to operate in a particular manner, such that the instructions stored in the computer readable memory comprise an article of manufacture comprising the instruction device.
  • the instruction device implements the functions specified in one or more blocks of the flowchart or in a flow or block of the flowchart.

Abstract

一种文字校验方法及装置,方法包括当检测到播放指令时,获取文本中待校验的文字,待校验的文字包括至少一个文字(15);确定待校验的文字对应的目标语音信号片段,待校验的文字所在的文字段为目标语音信号片段经过语音识别后生成的文字段,文本包括至少一个文字段,每个文字段对应一个语音信号片段(16);播放目标语音信号片段(17)。在确定待校验的文字后,可以根据待校验的文字对应的时间戳准确地确定相应的语音信号片段并进行播放,使用比较方便,能够提高校验效率。

Description

一种文字校验方法及装置 技术领域
本发明实施例涉及语音文字转换技术领域,尤其涉及一种文字校验方法及装置。
背景技术
目前,随着智能语音文字转换技术的发展,将语音转换为文字的效率得到了大大提高。智能语音文字转换技术可以应用于会议记录、培训记录或采访记录中。在对语音信号进行文字转换时,首先提取语音信号的特征参数,再将该特征参数与语音数据库中文字对应的特征参数进行匹配,从而得到匹配度最高的文字并输出。对于在安静环境下的标准普通话语音的文字转换,准确率较高。但是,在现实场景下,发言者无可避免地会带有一定的地方口音,并且无法保证在安静的环境下进行录音,所以无法保证语音文字转换的准确率。
在无法保证语音文字转换准确率的情况下,需要人工对转换后的文字进行校验。当校验人员发现错误文字时,需要依据原录音内容进行改正。对于较长的录音,虽然可以根据错误文字的位置,判断用于校正错误文字的那段语音内容的大概位置,但是这种方式通常要多次尝试播放,多次试听,才能准确播放出校验需要的那段语音内容,所以比较浪费时间,导致校验效率比较低下。
发明内容
本发明实施例提供了一种将文字校验方法及终端,以提供一种能够提高文字校验效率的方法。
本发明实施例提供了一种文字校验方法,包括:
当检测到播放指令时,获取文本中待校验的文字,所述待校验的文字包括至少一个文字;
确定所述待校验的文字对应的目标语音信号片段,所述待校验的文字所在的文字段为所述目标语音信号片段经过语音识别后生成的文字段,所述文本包括至少一个文字段,每个文字段对应一个语音信号片段;
播放所述目标语音信号片段。
进一步地,确定所述待校验的文字对应的目标语音信号片段,包括:
确定所述待校验的文字所在的文字段的第一时间戳;
确定与所述第一时间戳相同的第二时间戳对应的目标语音信号片段,所述第二时间戳为语音信号片段标记的时间戳。
进一步地,确定与所述第一时间戳相同的第二时间戳对应的目标语音信号片段,包括:
对全部语音信号片段进行分组;
确定各组语音信号片段中第一个语音信号片段的第三时间戳,所述第三时间戳与所述第一个语音信号片段的第二时间戳相同;
确定与所述第一时间戳差值最小的目标时间戳,所述目标时间戳为所述第三时间戳中的一个;
如果所述第一时间戳大于所述目标时间戳,则从所述目标时间戳对应的语音信号片段开始向后依次搜索各个语音信号片段的第二时间戳,确定与所述第一时间戳相同的第二时间戳对应的目标语音信号片段,否则从所述目标时间戳对应的语音信号片段开始向前依次搜索各个语音信号片段的第二时间戳,确定与所述第一时间戳相同的第二时间戳对应的目标语音信号片段。
进一步地,当检测到播放指令时,获取文本中待校验的文字之前,包括:
获取语音信号;
在所述语音信号中的语句停顿信号处进行断句,生成语音信号片段;
将所述语音信号片段转换为文字段;
对所述文字段标记第一时间戳,以及对所述语音信号片段标记第二时间戳。
进一步地,对所述文字段标记第一时间戳,以及对所述语音信号片段标记第二时间戳,包括:
利用所述语音信号片段的起始时间和/或结束时间对所述文字段标记第一时间戳,以及对所述语音信号片段标记第二时间戳,所述第一时间戳与所述第二时间戳相同。
进一步地,获取语音信号之后,以及对所述文字段标记第一时间戳,对所述语音 信号片段标记第二时间戳之前,还包括:
将所述语音信号转换为文字;
根据所述文字的语意,对所述文字进行断句,生成文字段;
根据所述文字段对所述语音信号进行断句,生成语音信号片段。
本发明实施例还提供了一种文字校验装置,包括:
第一获取单元,用于当检测到播放指令时,获取文本中待校验的文字,所述待校验的文字包括至少一个文字;
确定单元,用于确定所述待校验的文字对应的目标语音信号片段,所述待校验的文字所在的文字段为所述目标语音信号片段经过语音识别后生成的文字段,所述文本包括至少一个文字段,每个文字段对应一个语音信号片段;
播放单元,用于播放所述目标语音信号片段。
进一步地,所述确定单元包括:
第一确定子单元,用于确定所述待校验的文字所在的文字段的第一时间戳;
第二确定子单元,用于确定与所述第一时间戳相同的第二时间戳对应的目标语音信号片段,所述第二时间戳为语音信号片段标记的时间戳。
进一步地,第二确定子单元具体用于:
对全部语音信号片段进行分组;
确定各组语音信号片段中第一个语音信号片段的第三时间戳,所述第三时间戳与所述第一个语音信号片段的第二时间戳相同;
确定与所述第一时间戳差值最小的目标时间戳,所述目标时间戳为所述第三时间戳中的一个;
如果所述第一时间戳大于所述目标时间戳,则从所述目标时间戳对应的语音信号片段开始向后依次搜索各个语音信号片段的第二时间戳,确定与所述第一时间戳相同的第二时间戳对应的目标语音信号片段,否则从所述目标时间戳对应的语音信号片段开始向前依次搜索各个语音信号片段的第二时间戳,确定与所述第一时间戳相同的第二时间戳对应的目标语音信号片段。
进一步地,所述装置还包括:
第二获取单元,用于获取语音信号;
断句单元,用于在所述语音信号中的语句停顿信号处进行断句,生成语音信号片段;
转换单元,用于将所述语音信号片段转换为文字段;
标记单元,用于对所述文字段标记第一时间戳,以及对所述语音信号片段标记第二时间戳。
本发明实施例在确定待校验的文字后,可以根据待校验的文字对应的时间戳准确地确定相应的语音信号片段并进行播放,所以本实施例提供的方法使用比较方便,能够提高校验效率。
附图说明
为了更清楚地说明本申请的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,对于本领域普通技术人员而言,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1为本发明实施例提供的一种文字校验方法的流程图;
图2为本发明实施例提供的一种文字校验装置的结构框图;
图3为本发明实施例提供的另一种文字校验装置的结构框图;
图4为本发明实施例提供的一种文字校验系统的结构框图。
具体实施方式
为使本发明的上述目的、特征和优点能够更加明显易懂,下面结合附图和具体实施方式对本发明作进一步详细的说明。
本发明实施例可以应用在终端上,例如手机、计算机、平板电脑等等。在将语音信号转换为文本后,可以在终端上进行文字校验。在对文本进行校验并发现错误的文字时,可以播放该文字对应的语音信号片段,以便通过语境确定正确的文字。为能够快速确定待校验文字对应的语音信号片段,可以在将语音信号转换为文字的过程中,对语音信号以及文字添加相同的时间戳,从而通过相同的时间戳快速确定待校验文字对应的语音信号片段。
在将语音信号转换为文字的过程中,其中一种实现方式可以是文字实时转换,即在当场采集发言者输出的语音信号的同时,将该语音信号转换为文字,并进行保存。在该实现方式中,可以使用带有语音信号采集功能的终端,例如带有话筒的终端。另一种实现方式可以是文字非实时转换,即利用带有录音功能的设备对发言者输出的语音信号进行预先录制,再将录制好的完整语音信号发送给终端,终端对获取到的语音信号进行文字转换。
本发明实施例还可以应用在终端和与终端连接的云端服务器上。终端可以将待进行文字转换的语音信号发送给云端服务器,云端服务器对获取到的语音信号进行文字转换,并将转换后的文字发送给终端。该待进行文字转换的语音信号可以是终端本身实时录制而成,也可以是其他录音设备录制完后发送给终端的语音信号。
参见图1,为本发明实施例提供的一种文字校验的方法,该方法可以应用在终端上。如图1所示,该方法可以包括以下步骤。
步骤11,获取语音信号。
该语音信号为待转换为文字的语音信号。将语音信号转换为文字的方式可以是实时转换,所以可以实时获取语音信号,也可以是非实时转换。在具体实现中,可以是终端本身直接获取语音信号,也可以是利用语音信号采集设备获取语音信号。
步骤12,在所述语音信号中的语句停顿信号处进行断句,生成语音信号片段。
在文字实时转换的方式中,可以在获取语音信号的同时,实时检测无声信号的延续时间。当检测到无声信号的延续时间超过预设时间时,该无声信号可以作为语句停顿信号,并在该语句停顿信号上进行断句,从而生成语音信号片段,该断句处为该语音信号片段的结束端。确定语句停顿信号的方式,可以先确定所述语音信号中两个音节之间的平均时间间隔,根据该平均时间间隔设定门限值,当检测到无声信号的延续时间超过该门限值时,可以确定该无声信号为语句停顿信号。需要说明的是,断句处可以是语句停顿信号上的任一位置,本发明实施例不对具体的断句位置进行限定。
在文字实时转化的方式中,分别计算每个语音信号片段转为的文字片段的准确率,例如:文字片段为:易居研究院智库研究中心总监严跃进表示,准确率达到80%;10月份,北京、上海、深圳、成都、福州、南京、杭州、合肥、郑州、无锡等10个热点城市,准确率达到80%;新房均出现同比负增长,准确率达到60%。
对文字片段进行突出显示标记时,可以先确定文字文字准确率所属的预设范围, 再根据该所属的预设范围确定颜色标识。当显示文本时,带有颜色标识的文字可以是文字背景标有颜色或文字本身标有颜色。居研究院智库研究中心总监严跃进表示,【黄色】,10月份,北京、上海、深圳、成都、福州、南京、杭州、合肥、郑州、无锡等10个热点城市【黄色】新房均出现同比负增长【红色】;在后续校验的过程中,从而通过一整段话的语境校验“房均出现同比负增长”的正误。
在文字非实时转换的方式中,可以先确定完整的语音信号中的语句停顿信号,并在语句停顿信号处进行断句,从而生成多个语音信号片段。
对语音信号进行断句后,可以计算该断句处对应的时间,以确定断句后的各个语音信号片段的起始时间以及结束时间。在具体实现中,可以将语音信号中第一个语音信号片段的起始时间设置为0,其结束时间可以是第一个断句处的时间。
在本发明实施例中,可以利用所述语音信号片段的起始时间和/或结束时间对所述语音信号标记时间戳。也就是说,该语音信号片段可以标记一个时间戳,可以是语音信号片段的起始时间戳或结束时间戳,或者,该语音信号片段也可以标记两个时间戳,分别是语音信号片段的起始时间戳和结束时间戳。
该无声信号的延续时间可以根据语音信号的波形进行确定,在此不再赘述。
步骤13,将所述语音信号片段转换为文字段。
在文字实时转换的方式中,可以每生成一个语音信号片段,就开始对该语音信号片段进行文字转换,从而节约文字转换的时间。并可以当场显示转换后的文字,当场校验文字错误,并对错误文字进行记录,方便后期修改。
在文字非实时转换的方式中,可以将生成的多个语音信号片段,依次转换为文字段。
在本发明实施例中,可以对所述语音信号片段进行声纹识别,确定所述语音信号片段对应的发言人,再将所述发言人的名称以文字的形式添加在所述语音信号片段对应的文字段中。具体地,可以将所述发言人的名称添加在该段文字段的最前端,发言人的名称可以用括号进行标注,或发言人的名称之后标注冒号。通过声纹识别,能够在多人发言的场景下,标注出各段文字对应的发言人,从而方便查看以及编辑文本记录。
步骤14,对所述文字段标记第一时间戳,以及对所述语音信号片段标记第二时间戳,所述第一时间戳与所述第二时间戳相同。
在具体实现中,可以根据各个语音信号片段的采样频率计算各个语音信号片段的起始时间和结束时间,以用于对各语音信号片段和文字段标记时间戳。各语音信号片段可以标记一个时间戳,可以是语音信号片段的起始时间或结束时间,或者,也可以标记两个时间戳,分别是语音信号片段的起始时间和结束时间。各文字段可以标记与语音信号片段一样的时间戳。语音信号片段的起始时间可以标记在每段文字的起始端,语音信号片段的结束时间可以标记在每段文字的结束端,本发明实施例,不对每段文字时间戳的具体位置进行限制。
例如,语音信号及其对应的时间戳为:“[0:00.000]易居研究院智库研究中心总监严跃进表示,[0:03.145]10月份,北京、上海、深圳、成都、福州、南京、杭州、合肥、郑州、无锡等10个热点城市新房均出现同比负增长。”
当该语音信号转换为文字后,“易居研究院智库研究中心总监严跃进表示”这一语音信号片段可以在段首标记“0:00.000”的时间戳,“10月份,北京、上海、深圳、成都、福州、南京、杭州、合肥、郑州、无锡等10个热点城市新房均出现同比负增长”这一语音信号片段可以在段首标记“0:03.145”的时间戳。
本实施例不对语音信号的断句方式进行限制,例如还可以按照固定的时间间隔对语音信号进行断句。
需要说明的是,当语音信号按照固定的时间间隔进行断句并对断句后的各个语音信号片段标记时间戳时,如果该时间戳也是各个语音信号片段对应的文字段的时间戳,该时间戳对应的文字段从语意上来讲并不是一段完整的话,所以获取语音信号之后,可以将所述语音信号转换为文字,根据所述文字的语意,对所述文字进行断句,生成文字段,根据所述文字段对所述语音信号进行断句,生成语音信号片段,利用所述语音信号片段的起始时间和/或结束时间对所述文字段标记第一时间戳,以及对所述语音信号片段标记第二时间戳,所述第一时间戳与所述第二时间戳可以相同。
在一种实施方式中,终端在执行完步骤12,生成语音信号片段后,也可以将该语音信号片段发送给云端服务器,云端服务器将该语音信号片段转换为文字段,并对所述文字段标记第一时间戳,以及对所述语音信号片段标记第二时间戳,再将标记有时间戳的文字段和语音信号片段发送给终端。
在另一种实施方式中,上述步骤11至14也可以由云端服务器执行。在进行文字实时转换的方式中,终端实时地将采集到的语音信号发送给云端服务器,云端服务器执行完步骤11至14后,将语音信号片段对应的文字段再发送给终端,从而实现实时 转换。在非实时转换的方式中,终端可以将完整语音信号发送给云端服务器,云端服务器进行文字转换后,再将带有时间戳的文字发送给终端,也可以将带有时间戳的语音信号片段发送给终端。
转换后的文字段可以组成可读和可编辑的文本,可以在终端上对该文本进行校验。对该文本进行校验的过程可以包括以下步骤。
步骤15,当检测到播放指令时,获取文本中待校验的文字,所述待校验的文字包括至少一个文字。
在具体实现时,校验人员可以选中待校验的文字,并点击播放按钮,点击播放按钮后,系统能够检测到播放指令。需要说明的是,本发明实施例不对该播放按钮的显示方式进行限定,例如,该播放按钮可以在打开文本后显示,也可以在选中文字后显示,还可以选中文字并右击鼠标后显示。
步骤16,确定所述待校验的文字对应的目标语音信号片段,所述待校验的文字所在的文字段为所述目标语音信号片段经过语音识别后生成的文字段,所述文本包括至少一个文字段,每个文字段对应一个语音信号片段。
在确定所述待校验的文字对应的目标语音信号片段时,可以先确定所述待校验的文字所在的文字段的第一时间戳,再确定与所述第一时间戳相同的第二时间戳对应的目标语音信号片段,所述第二时间戳为语音信号片段标记的时间戳。
确定与所述第一时间戳相同的第二时间戳对应的目标语音信号片段的步骤可以包括:
步骤161,对全部语音信号片段进行分组。
在具体实现中,可以预先设置不同的语音信号片段的个数对应的分组数目。例如,当语音信号片段的数目在10~50的范围内时,将其分为两组,在50~100的范围内时,将其分为三组。每组中的语音信号片段的数目相同或相近。
需要说明的是,可以按照语音信号片段的时间先后顺序对语音信号片段进行分组,以便于按照时间先后顺序依次对各个语音信号片段进行搜索。
步骤162,确定各组语音信号片段中第一个语音信号片段的第三时间戳,所述第三时间戳与所述第一个语音信号片段的第二时间戳相同。
例如,有三组语音信号片段,每组语音信号片段包括30个语音信号片段,各个 语音信号片段可以按照时间戳的先后顺序进行排列。第一组语音信号片段中的语音信号片段对应的时间戳为a1~a30,第二组语音信号片段中的语音信号片段对应的时间戳为b1~b30,第三组语音信号片段中的语音信号片段对应的时间戳为c1~c30,则每组语音信号片段中第一个语音信号片段的时间戳分别为a1,b1,c1。
步骤163,确定与所述第一时间戳差值最小的目标时间戳,所述目标时间戳为所述第三时间戳中的一个。
在具体实现中,可以对各组语音信号片段以及各组对应的第三时间戳可以进行保存,当再次确定待校验文字对应的语音信号片段时,可直接执行步骤163,可以不必进行重新分组以及重新确定各组对应的第三时间戳。
步骤164,如果所述第一时间戳大于所述目标时间戳,则从所述目标时间戳对应的语音信号片段开始向后依次搜索各个语音信号片段的第二时间戳,确定与所述第一时间戳相同的第二时间戳对应的目标语音信号片段,否则从所述目标时间戳对应的语音信号片段开始向前依次搜索各个语音信号片段的第二时间戳,确定与所述第一时间戳相同的第二时间戳对应的目标语音信号片段。
例如,第一时间戳与b1的差值最小,并且大于b1,则从b1对应的语音信号片段开始向后依次搜索各个语音信号片段的第二时间戳,即依次搜索b2、b3、b4……,直至找到与所述第一时间戳相同的第二时间戳对应的目标语音信号片段。
在确定与所述第一时间戳相同的第二时间戳对应的目标语音信号片段时,不需要从第一个语音信号片段的时间戳开始搜索,而是从距离第一时间戳最近的第三时间戳开始搜索,所以本实施例不仅能够提高程序计算速度,快速找到待校验的文字对应的语音信号片段,还可以降低对终端CPU资源的占用率。
步骤17,播放所述目标语音信号片段。
在具体实现时,点击播放按钮后,系统能够检测到播放指令,进而播放该文字所在字段的时间戳对应的语音信号片段,从而可以通过语境确定正确的文字。
例如,有文本为“[0:00.000]易居研究院智库研究中心总监严跃进表示,[0:03.145]10月份,北京、上海、深圳、成都、福州、南京、杭州、合肥、郑州、无锡等10个热点城市心房均出现同比负增长”,当选中“心房”,并点击播放按钮后,可以播放语音信号片段“10月份,北京、上海、深圳、成都、福州、南京、杭州、合肥、郑州、无锡等10个热点城市心房均出现同比负增长”,从而通过一整段话的语 境校验“心房”的正误。
在具体实现中,还可以标记每一帧语音信号的时间戳,以及标记每个文字的时间戳,从而实现当选中一段文字发送播放指令后,能够播放该段文字对应的语音信号片段,以用于文字校验,具体实施过程可参考上述步骤,在此不再赘述。
本发明实施例在确定待校验的文字后,可以根据待校验的文字对应的时间戳准确地确定相应的语音信号片段并进行播放,所以本实施例提供的方法使用比较方便,能够提高校验效率。
参见图2,为本发明实施例提供的一种文字校验装置,该装置可以部署在终端上,或者为终端本身,如图2所示,该装置可以包括第二获取单元21,断句单元22,转换单元23,标记单元24,第一获取单元25,确定单元25以及播放单元27。
第二获取单元21,用于获取语音信号。
断句单元22,用于在所述语音信号中的语句停顿信号处进行断句,生成语音信号片段。
转换单元23,用于将所述语音信号片段转换为文字段。
该转换单元23可以连接云端服务器,该转换单元23可以通过云端服务器将所述语音信号片段转换为文字段。
标记单元24,用于对所述文字段标记第一时间戳,以及对所述语音信号片段标记第二时间戳。
该标记单元24具体可以用于:利用所述语音信号片段的起始时间和/或结束时间对所述文字段标记第一时间戳,以及对所述语音信号片段标记第二时间戳,所述第一时间戳与所述第二时间戳相同。
当转换单元23连接云端服务器时,标记单元24也可以通过云端服务器对所述文字段标记第一时间戳,以及对所述语音信号片段标记第二时间戳。
第一获取单元25,用于当检测到播放指令时,获取文本中待校验的文字,所述待校验的文字包括至少一个文字.
确定单元26,用于确定所述待校验的文字对应的目标语音信号片段,所述待校验的文字所在的文字段为所述目标语音信号片段经过语音识别后生成的文字段,所述文本包括至少一个文字段,每个文字段对应一个语音信号片段。
该确定单元26可以包括:
第一确定子单元,用于确定所述待校验的文字所在的文字段的第一时间戳;
第二确定子单元,用于确定与所述第一时间戳相同的第二时间戳对应的目标语音信号片段,所述第二时间戳为语音信号片段标记的时间戳。
该第二确定子单元可以具体用于:
对全部语音信号片段进行分组;
确定各组语音信号片段中第一个语音信号片段的第三时间戳,所述第三时间戳与所述第一个语音信号片段的第二时间戳相同;
确定与所述第一时间戳差值最小的目标时间戳,所述目标时间戳为所述第三时间戳中的一个;
如果所述第一时间戳大于所述目标时间戳,则从所述目标时间戳对应的语音信号片段开始向后依次搜索各个语音信号片段的第二时间戳,确定与所述第一时间戳相同的第二时间戳对应的目标语音信号片段,否则从所述目标时间戳对应的语音信号片段开始向前依次搜索各个语音信号片段的第二时间戳,确定与所述第一时间戳相同的第二时间戳对应的目标语音信号片段。
播放单元27,用于播放所述目标语音信号片段。
本发明实施例在确定待校验的文字后,可以根据待校验的文字对应的时间戳准确地确定相应的语音信号片段并进行播放,所以本实施例提供的方法使用比较方便,能够提高校验效率。
参见图3,为本发明实施例提供的一种文字校验装置,该装置可以部署在终端上,或者为终端本身,如图3所示,该装置可以包括第二获取单元31,转换单元32,断句单元33,标记单元34。
第二获取单元31,用于获取语音信号。
转换单元32,用于将所述语音信号转换为文字。
断句单元33,用于根据所述文字的语意,对所述文字进行断句,生成文字段,以及根据所述文字段对所述语音信号进行断句,生成语音信号片段。
标记单元34,用于对所述文字段标记第一时间戳,以及对所述语音信号片段标 记第二时间戳。
本发明实施例在确定待校验的文字后,可以根据待校验的文字对应的时间戳准确地确定相应的语音信号片段并进行播放,所以本实施例提供的方法使用比较方便,能够提高校验效率。
参见图4,为本发明实施例提供的一种文字校验系统,该系统包括终端41和云端服务器42,所述终端41用于将采集到的语音信号发送给云端服务器42。云端服务器42可以包括第二获取单元421,断句单元422,转换单元423,标记单元424,
第二获取单元421,用于获取语音信号。
断句单元422,用于在所述语音信号中的语句停顿信号处进行断句,生成语音信号片段。
转换单元423,用于将所述语音信号片段转换为文字段。
该转换单元424可以连接云端服务器,该转换单元22可以通过云端服务器将所述语音信号片段转换为文字段。
标记单元424,用于对所述文字段标记第一时间戳,以及对所述语音信号片段标记第二时间戳。
云端服务器42将带有第一时间戳的文字段以及带有第二时间戳的语音信号片段发送给终端41。
终端41用于查看以及校验文本,可以包括第一获取单元411,确定单元412以及播放单元413。
第一获取单元411,用于当检测到播放指令时,获取文本中待校验的文字,所述待校验的文字包括至少一个文字.
确定单元412,用于确定所述待校验的文字对应的目标语音信号片段,所述待校验的文字所在的文字段为所述目标语音信号片段经过语音识别后生成的文字段,所述文本包括至少一个文字段,每个文字段对应一个语音信号片段。
播放单元413,用于播放所述目标语音信号片段。
本发明实施例在确定待校验的文字后,可以根据待校验的文字对应的时间戳准确地确定相应的语音信号片段并进行播放,所以本实施例提供的方法使用比较方便,能够提高校验效率。
对于装置实施例而言,由于其与方法实施例基本相似,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
本领域内的技术人员应明白,本发明实施例的实施例可提供为方法、装置、或计算机程序产品。因此,本发明实施例可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明实施例可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本发明实施例是参照根据本发明实施例的方法、终端设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理终端设备的处理器以生成一个机器,使得通过计算机或其他可编程数据处理终端设备的处理器执行的指令生成用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理终端设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令生成包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理终端设备上,使得在计算机或其他可编程终端设备上执行一系列操作步骤以生成计算机实现的处理,从而在计算机或其他可编程终端设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
尽管已描述了本发明实施例的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例做出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本发明实施例范围的所有变更和修改。
最后,还需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者终端设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包 括为这种过程、方法、物品或者终端设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者终端设备中还存在另外的相同要素。
以上对本发明所提供的一种文字校验方法及装置,进行了详细介绍,本文中应用了具体个例对本发明的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本发明的方法及其核心思想;同时,对于本领域的一般技术人员,依据本发明的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本发明的限制。

Claims (11)

  1. 一种文字校验方法,其特征在于,所述方法包括:
    当检测到播放指令时,获取文本中待校验的文字,所述待校验的文字包括至少一个文字;
    确定所述待校验的文字对应的目标语音信号片段,所述待校验的文字所在的文字段为所述目标语音信号片段经过语音识别后生成的文字段,所述文本包括至少一个文字段,每个文字段对应一个语音信号片段;
    播放所述目标语音信号片段。
  2. 根据权利要求1所述的方法,其特征在于,确定所述待校验的文字对应的目标语音信号片段,包括:
    确定所述待校验的文字所在的文字段的第一时间戳;
    确定与所述第一时间戳相同的第二时间戳对应的目标语音信号片段,所述第二时间戳为语音信号片段标记的时间戳。
  3. 根据权利要求2所述的方法,其特征在于,确定与所述第一时间戳相同的第二时间戳对应的目标语音信号片段,包括:
    对全部语音信号片段进行分组;
    确定各组语音信号片段中第一个语音信号片段的第三时间戳,所述第三时间戳与所述第一个语音信号片段的第二时间戳相同;
    确定与所述第一时间戳差值最小的目标时间戳,所述目标时间戳为所述第三时间戳中的一个;
    如果所述第一时间戳大于所述目标时间戳,则从所述目标时间戳对应的语音信号片段开始向后依次搜索各个语音信号片段的第二时间戳,确定与所述第一时间戳相同的第二时间戳对应的目标语音信号片段,否则从所述目标时间戳对应的语音信号片段开始向前依次搜索各个语音信号片段的第二时间戳,确定与所述第一时间戳相同的第二时间戳对应的目标语音信号片段。
  4. 根据权利要求2或3所述的方法,其特征在于,当检测到播放指令时,获取文本中待校验的文字之前,包括:
    获取语音信号;
    在所述语音信号中的语句停顿信号处进行断句,生成语音信号片段;
    将所述语音信号片段转换为文字段;
    对所述文字段标记第一时间戳,以及对所述语音信号片段标记第二时间戳。
  5. 根据权利要求4所述的方法,其特征在于,对所述文字段标记第一时间戳,以及对所述语音信号片段标记第二时间戳,包括:
    利用所述语音信号片段的起始时间和/或结束时间对所述文字段标记第一时间戳,以及对所述语音信号片段标记第二时间戳,所述第一时间戳与所述第二时间戳相同。
  6. 根据权利要求4所述的方法,其特征在于,获取语音信号之后,以及对所述文字段标记第一时间戳,对所述语音信号片段标记第二时间戳之前,还包括:
    将所述语音信号转换为文字;
    根据所述文字的语意,对所述文字进行断句,生成文字段;
    根据所述文字段对所述语音信号进行断句,生成语音信号片段。
  7. 根据权利要求6所述的方法,其特征在于,所述方法还包括:
    计算所述文字片段的准确度;
    根据所述准确度,对所述文字片段进行突出显示标记;
    根据所述显示标记,确定所述待播放的语音对应的时间戳;
    播放所述时间戳对应的语音信号片段。
  8. 一种文字校验装置,其特征在于,包括:
    第一获取单元,用于当检测到播放指令时,获取文本中待校验的文字,所述待校验的文字包括至少一个文字;
    确定单元,用于确定所述待校验的文字对应的目标语音信号片段,所述待校验的文字所在的文字段为所述目标语音信号片段经过语音识别后生成的文字段,所述文本包括至少一个文字段,每个文字段对应一个语音信号片段;
    播放单元,用于播放所述目标语音信号片段。
  9. 根据权利要求8所述的装置,其特征在于,所述确定单元包括:
    第一确定子单元,用于确定所述待校验的文字所在的文字段的第一时间戳;
    第二确定子单元,用于确定与所述第一时间戳相同的第二时间戳对应的目标语音信号片段,所述第二时间戳为语音信号片段标记的时间戳。
  10. 根据权利要求9所述的装置,其特征在于,第二确定子单元具体用于:
    对全部语音信号片段进行分组;
    确定各组语音信号片段中第一个语音信号片段的第三时间戳,所述第三时间戳与所述第一个语音信号片段的第二时间戳相同;
    确定与所述第一时间戳差值最小的目标时间戳,所述目标时间戳为所述第三时间戳中的一个;
    如果所述第一时间戳大于所述目标时间戳,则从所述目标时间戳对应的语音信号片段开始向后依次搜索各个语音信号片段的第二时间戳,确定与所述第一时间戳相同的第二时间戳对应的目标语音信号片段,否则从所述目标时间戳对应的语音信号片段开始向前依次搜索各个语音信号片段的第二时间戳,确定与所述第一时间戳相同的第二时间戳对应的目标语音信号片段。
  11. 根据权利要求8或9所述的装置,其特征在于,所述装置还包括:
    第二获取单元,用于获取语音信号;
    断句单元,用于在所述语音信号中的语句停顿信号处进行断句,生成语音信号片段;
    转换单元,用于将所述语音信号片段转换为文字段;
    标记单元,用于对所述文字段标记第一时间戳,以及对所述语音信号片段标记第二时间戳。
PCT/CN2018/122343 2017-12-20 2018-12-20 一种文字校验方法及装置 WO2019120247A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201711386355.9 2017-12-20
CN201711386355.9A CN109949828B (zh) 2017-12-20 2017-12-20 一种文字校验方法及装置

Publications (1)

Publication Number Publication Date
WO2019120247A1 true WO2019120247A1 (zh) 2019-06-27

Family

ID=66994409

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/122343 WO2019120247A1 (zh) 2017-12-20 2018-12-20 一种文字校验方法及装置

Country Status (2)

Country Link
CN (1) CN109949828B (zh)
WO (1) WO2019120247A1 (zh)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1282072A (zh) * 1999-07-27 2001-01-31 国际商业机器公司 对语音识别结果中的错误进行校正的方法和语音识别系统
JP2004151614A (ja) * 2002-11-01 2004-05-27 Nippon Hoso Kyokai <Nhk> 文字データ修正装置、その方法及びそのプログラム、並びに、字幕の生成方法
JP2008051895A (ja) * 2006-08-22 2008-03-06 Casio Comput Co Ltd 音声認識装置および音声認識処理プログラム
US20110110647A1 (en) * 2009-11-06 2011-05-12 Altus Learning Systems, Inc. Error correction for synchronized media resources
CN103065659A (zh) * 2012-12-06 2013-04-24 广东欧珀移动通信有限公司 一种多媒体记录方法
CN105120195A (zh) * 2015-09-18 2015-12-02 谷鸿林 内容录制、再现系统和方法
CN105185377A (zh) * 2015-09-24 2015-12-23 百度在线网络技术(北京)有限公司 一种基于语音的文件生成方法及装置
CN106528715A (zh) * 2016-10-27 2017-03-22 广东小天才科技有限公司 一种音频内容校核方法及装置

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7483833B2 (en) * 2003-10-21 2009-01-27 Koninklijke Philips Electronics N.V. Intelligent speech recognition with user interfaces
US7617106B2 (en) * 2003-11-05 2009-11-10 Koninklijke Philips Electronics N.V. Error detection for speech to text transcription systems
US10923116B2 (en) * 2015-06-01 2021-02-16 Sinclair Broadcast Group, Inc. Break state detection in content management systems
CN105159870B (zh) * 2015-06-26 2018-06-29 徐信 一种精准完成连续自然语音文本化的处理系统及方法
CN106448675B (zh) * 2016-10-21 2020-05-01 科大讯飞股份有限公司 识别文本修正方法及系统

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1282072A (zh) * 1999-07-27 2001-01-31 国际商业机器公司 对语音识别结果中的错误进行校正的方法和语音识别系统
JP2004151614A (ja) * 2002-11-01 2004-05-27 Nippon Hoso Kyokai <Nhk> 文字データ修正装置、その方法及びそのプログラム、並びに、字幕の生成方法
JP2008051895A (ja) * 2006-08-22 2008-03-06 Casio Comput Co Ltd 音声認識装置および音声認識処理プログラム
US20110110647A1 (en) * 2009-11-06 2011-05-12 Altus Learning Systems, Inc. Error correction for synchronized media resources
CN103065659A (zh) * 2012-12-06 2013-04-24 广东欧珀移动通信有限公司 一种多媒体记录方法
CN105120195A (zh) * 2015-09-18 2015-12-02 谷鸿林 内容录制、再现系统和方法
CN105185377A (zh) * 2015-09-24 2015-12-23 百度在线网络技术(北京)有限公司 一种基于语音的文件生成方法及装置
CN106528715A (zh) * 2016-10-27 2017-03-22 广东小天才科技有限公司 一种音频内容校核方法及装置

Also Published As

Publication number Publication date
CN109949828B (zh) 2022-05-24
CN109949828A (zh) 2019-06-28

Similar Documents

Publication Publication Date Title
US20180315429A1 (en) System and method for automated legal proceeding assistant
CN108766418B (zh) 语音端点识别方法、装置及设备
CN109410664B (zh) 一种发音纠正方法及电子设备
TW202008349A (zh) 語音標註方法、裝置及設備
US8843369B1 (en) Speech endpointing based on voice profile
CN103021409B (zh) 一种语音启动拍照系统
TWI711967B (zh) 播報語音的確定方法、裝置和設備
CN104538034A (zh) 一种语音识别方法及系统
CN105378830A (zh) 音频数据的处理
CN108305618B (zh) 语音获取及搜索方法、智能笔、搜索终端及存储介质
US10062384B1 (en) Analysis of content written on a board
EP4322029A1 (en) Method and apparatus for generating video corpus, and related device
CN108877779B (zh) 用于检测语音尾点的方法和装置
CN111402892A (zh) 一种基于语音识别的会议记录模板生成方法
WO2019120248A1 (zh) 一种将语音转换为文字的方法、装置及系统
CN109213970B (zh) 笔录生成方法及装置
CN111966839B (zh) 数据处理方法、装置、电子设备及计算机存储介质
CN113658594A (zh) 歌词识别方法、装置、设备、存储介质及产品
TWI769520B (zh) 多國語言語音辨識及翻譯方法與相關的系統
WO2019120247A1 (zh) 一种文字校验方法及装置
CN111221987A (zh) 混合音频标记方法和装置
US20220215839A1 (en) Method for determining voice response speed, related device and computer program product
CN109688430A (zh) 一种法院庭审文件回放方法、系统及存储介质
CN114999464A (zh) 语音数据处理方法及装置
CN114155841A (zh) 语音识别方法、装置、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18891782

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18891782

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 18891782

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 13.01.2021)

122 Ep: pct application non-entry in european phase

Ref document number: 18891782

Country of ref document: EP

Kind code of ref document: A1