WO2019120247A1 - 一种文字校验方法及装置 - Google Patents
一种文字校验方法及装置 Download PDFInfo
- Publication number
- WO2019120247A1 WO2019120247A1 PCT/CN2018/122343 CN2018122343W WO2019120247A1 WO 2019120247 A1 WO2019120247 A1 WO 2019120247A1 CN 2018122343 W CN2018122343 W CN 2018122343W WO 2019120247 A1 WO2019120247 A1 WO 2019120247A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- timestamp
- voice signal
- text
- signal segment
- target
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/221—Announcement of recognition results
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/225—Feedback of the input speech
Definitions
- the embodiments of the present invention relate to the field of voice text conversion technologies, and in particular, to a text verification method and apparatus.
- Intelligent voice text conversion technology can be applied to meeting minutes, training records or interview records.
- the feature parameters of the speech signal are first extracted, and the feature parameters are matched with the feature parameters corresponding to the characters in the speech database, thereby obtaining the text with the highest matching degree and outputting.
- the accuracy is higher.
- the speaker in a real-life scenario, the speaker inevitably has a certain local accent, and there is no guarantee that the recording will be performed in a quiet environment, so the accuracy of the voice text conversion cannot be guaranteed.
- the converted text needs to be manually verified.
- the verification personnel find the wrong text, it needs to be corrected according to the original recording content.
- this method usually requires multiple attempts to play, and multiple auditions to accurately play the check.
- the voice content that is needed so it is a waste of time, resulting in relatively low verification efficiency.
- the embodiment of the invention provides a text verification method and a terminal to provide a method for improving the efficiency of text verification.
- the embodiment of the invention provides a text verification method, including:
- the text to be verified in the text is obtained, and the text to be verified includes at least one character
- Determining a target speech signal segment corresponding to the character to be verified where the text field in which the character to be verified is located is a text field generated after the target speech signal segment is subjected to speech recognition, and the text includes at least one text field Each text field corresponds to a voice signal segment;
- determining the target voice signal segment corresponding to the character to be verified includes:
- the second timestamp being a timestamp of the speech signal segment tag.
- determining a target voice signal segment corresponding to the second timestamp that is the same as the first timestamp includes:
- the target timestamp searching for a second timestamp of each voice signal segment in order from the voice signal segment corresponding to the target timestamp, determining that the first timestamp is the same as the first timestamp The second timestamp corresponding to the target voice signal segment, otherwise searching for the second timestamp of each voice signal segment in order from the voice signal segment corresponding to the target timestamp, and determining the same as the first timestamp The target speech signal segment corresponding to the second timestamp.
- the method includes:
- marking the text field with a first timestamp and marking the voice signal segment with a second timestamp including:
- the method further includes:
- the embodiment of the invention further provides a text verification device, comprising:
- a first acquiring unit configured to: when the play instruction is detected, acquire text to be verified in the text, where the text to be verified includes at least one character;
- a determining unit configured to determine a target speech signal segment corresponding to the character to be verified, where the text field in which the character to be verified is a text field generated by the target speech signal segment after speech recognition, the text Include at least one text field, each text field corresponding to a voice signal segment;
- a playing unit configured to play the target voice signal segment.
- the determining unit includes:
- a first determining subunit configured to determine a first timestamp of a text field in which the text to be verified is located
- a second determining subunit configured to determine a target voice signal segment corresponding to the second timestamp that is the same as the first timestamp, where the second timestamp is a timestamp of the voice signal segment flag.
- the second determining subunit is specifically configured to:
- the target timestamp searching for a second timestamp of each voice signal segment in order from the voice signal segment corresponding to the target timestamp, determining that the first timestamp is the same as the first timestamp The second timestamp corresponding to the target voice signal segment, otherwise searching for the second timestamp of each voice signal segment in order from the voice signal segment corresponding to the target timestamp, and determining the same as the first timestamp The target speech signal segment corresponding to the second timestamp.
- the device further includes:
- a second acquiring unit configured to acquire a voice signal
- a sentence segment unit configured to perform a segmentation at a sentence pause signal in the voice signal to generate a voice signal segment
- a converting unit configured to convert the voice signal segment into a text field
- a marking unit configured to mark the text field with a first timestamp, and mark the voice signal segment with a second timestamp.
- the corresponding voice signal segment can be accurately determined and played according to the timestamp corresponding to the text to be verified. Therefore, the method provided in this embodiment is convenient to use and can improve the school. Test efficiency.
- FIG. 1 is a flowchart of a text verification method according to an embodiment of the present invention
- FIG. 2 is a structural block diagram of a character verification apparatus according to an embodiment of the present invention.
- FIG. 3 is a structural block diagram of another text verification apparatus according to an embodiment of the present invention.
- FIG. 4 is a structural block diagram of a character verification system according to an embodiment of the present invention.
- the embodiments of the present invention can be applied to a terminal, such as a mobile phone, a computer, a tablet, and the like.
- a terminal such as a mobile phone, a computer, a tablet, and the like.
- one of the implementation manners may be real-time text conversion, that is, the voice signal output by the speaker is collected on the spot, and the voice signal is converted into text and saved.
- a terminal with a voice signal acquisition function such as a terminal with a microphone
- Another implementation manner may be text non-real-time conversion, that is, the device with the recording function pre-records the voice signal output by the speaker, and then transmits the recorded complete voice signal to the terminal, and the terminal pairs the acquired voice signal. Perform text conversion.
- the embodiment of the present invention can also be applied to a terminal and a cloud server connected to the terminal.
- the terminal can send the voice signal to be converted to the cloud server, and the cloud server performs text conversion on the obtained voice signal, and sends the converted text to the terminal.
- the voice signal to be converted by the text may be recorded in real time by the terminal itself, or may be a voice signal sent to the terminal after recording by other recording devices.
- a method for text verification is provided according to an embodiment of the present invention, and the method can be applied to a terminal. As shown in FIG. 1, the method can include the following steps.
- step 11 the voice signal is obtained.
- the speech signal is a speech signal to be converted into text.
- the way to convert a voice signal into text can be real-time conversion, so the voice signal can be acquired in real time or non-real time.
- the terminal itself may directly acquire a voice signal, or may use a voice signal collection device to acquire a voice signal.
- Step 12 Perform a sentence break at a sentence pause signal in the voice signal to generate a voice signal segment.
- the duration of the silent signal can be detected in real time while acquiring the speech signal.
- the silent signal can be used as a sentence pause signal, and the sentence is broken on the pause signal of the sentence, thereby generating a segment of the voice signal, where the sentence is the end of the segment of the voice signal .
- Determining the manner in which the statement is paused may first determine an average time interval between two syllables in the voice signal, and set a threshold according to the average time interval, when detecting that the duration of the silent signal exceeds the threshold , it can be determined that the silent signal is a statement pause signal.
- the sentence segment can be any position on the statement pause signal, and the specific sentence segment position is not limited in the embodiment of the present invention.
- the accuracy of the text segment into which each speech signal segment is converted is calculated separately.
- the text segment is: Yan Yuejin, director of the Research Center of the Yiju Research Center, said that the accuracy rate is 80%; 10 hotspots such as Beijing, Shanghai, Shenzhen, Chengdu, Fuzhou, Nanjing, Hangzhou, Hefei, Zhengzhou and Wuxi have an accuracy rate of 80%; new houses have experienced negative growth year-on-year, with an accuracy rate of 60%.
- the text with the color logo can be either the text background with the color or the text itself with the color.
- the sentence pause signal in the complete speech signal can be determined first, and the sentence is broken at the sentence pause signal, thereby generating a plurality of speech signal segments.
- the corresponding time at the sentence can be calculated to determine the start time and end time of each speech signal segment after the sentence is broken.
- the start time of the first voice signal segment in the voice signal may be set to 0, and the end time may be the time at the first segment.
- the speech signal may be time stamped with a start time and/or an end time of the speech signal segment. That is to say, the voice signal segment may be marked with a time stamp, which may be a start time stamp or an end time stamp of the voice signal segment, or the voice signal segment may also mark two time stamps, which are respectively a voice signal segment. Start time stamp and end time stamp.
- the duration of the silent signal can be determined according to the waveform of the voice signal, and details are not described herein again.
- step 13 the voice signal segment is converted into a text field.
- the generated plurality of voice signal segments may be sequentially converted into text fields.
- the voice signal segment may be voiceprinted, the speaker corresponding to the voice signal segment is determined, and the name of the speaker is added in the form of text to the voice signal segment.
- the name of the speaker may be added at the forefront of the paragraph field, the name of the speaker may be marked with parentheses, or the name of the speaker may be followed by a colon.
- Step 14 Mark the text field with a first timestamp, and mark the voice signal segment with a second timestamp, the first timestamp being the same as the second timestamp.
- the start time and the end time of each voice signal segment may be calculated according to the sampling frequency of each voice signal segment, for time stamping each voice signal segment and the text field.
- Each speech signal segment may be marked with a time stamp, which may be the start time or end time of the speech signal segment, or may be marked with two time stamps, which are the start time and end time of the speech signal segment, respectively.
- Each text field can be tagged with the same timestamp as the voice signal segment.
- the start time of the speech signal segment may be marked at the beginning of each segment of text, and the end time of the speech signal segment may be marked at the end of each segment of text.
- the specific location of each text time stamp is not limited.
- the voice signal and its corresponding timestamp are: "[0:00.000] Yan Yuejin, director of the Research Center of the Research Institute of Yiju Research Institute, said [0:03.145] October, Beijing, Shanghai, Shenzhen, Chengdu, Fuzhou, Nanjing, New homes in 10 hotspot cities such as Hangzhou, Hefei, Zhengzhou and Wuxi all experienced negative year-on-year growth.”
- the voice signal segment can be marked with a time stamp of "0:00.000” at the beginning of the paragraph, "October, Beijing, Shanghai, Shenzhen. New homes in 10 hotspot cities such as Chengdu, Fuzhou, Nanjing, Hangzhou, Hefei, Zhengzhou and Wuxi all experienced negative year-on-year growth.
- This voice signal segment can be marked with a time stamp of “0:03.145” at the beginning of the segment.
- the manner of breaking the sentence of the voice signal is not limited.
- the voice signal may be sentenced at a fixed time interval.
- the time stamp is also a time stamp of the text field corresponding to each speech signal segment, the time stamp corresponds to
- the text field is not a complete sentence in terms of semantics, so after the voice signal is acquired, the voice signal can be converted into text, and the text is sentenced according to the semantic meaning of the text, and a text field is generated, according to the text field.
- the field field segments the speech signal to generate a speech signal segment, marks the text field with a first timestamp by using a start time and/or an end time of the speech signal segment, and marks the speech signal segment
- the second timestamp, the first timestamp and the second timestamp may be the same.
- the terminal may also send the voice signal segment to the cloud server, and the cloud server converts the voice signal segment into a text field, and the text field is Marking the first timestamp, and marking the voice signal segment with a second timestamp, and transmitting the timestamped text field and the voice signal segment to the terminal.
- the above steps 11 to 14 can also be performed by the cloud server.
- the terminal sends the collected voice signal to the cloud server in real time, and after executing the steps 11 to 14, the cloud server sends the text field corresponding to the voice signal segment to the terminal, thereby realizing real-time conversion.
- the terminal can send the complete voice signal to the cloud server, and after the cloud server performs text conversion, the time-stamped text is sent to the terminal, and the time-stamped voice signal segment can also be sent. Give the terminal.
- the converted text field can be composed of readable and editable text that can be verified on the terminal.
- the process of verifying the text may include the following steps.
- Step 15 When the play instruction is detected, the text to be verified in the text is obtained, and the text to be verified includes at least one character.
- the verification personnel can select the text to be verified, click the play button, and click the play button, the system can detect the play instruction.
- the display mode of the play button is not limited.
- the play button may be displayed after the text is opened, or may be displayed after the text is selected, and the text may be selected and displayed after right clicking the mouse.
- Step 16 Determine a target voice signal segment corresponding to the character to be verified, where the text field in which the character to be verified is located is a text field generated after the target voice signal segment is voice-recognized, and the text includes at least A text field, each text field corresponding to a speech signal segment.
- the target voice signal segment corresponding to the character to be verified When determining the target voice signal segment corresponding to the character to be verified, determining a first timestamp of the text field in which the text to be verified is located, and determining a second time that is the same as the first timestamp.
- the target voice signal segment corresponding to the time stamp, and the second time stamp is a time stamp of the voice signal segment mark.
- the determining of the target voice signal segment corresponding to the second timestamp that is the same as the first timestamp may include:
- step 161 all voice signal segments are grouped.
- the number of packets corresponding to the number of different voice signal segments may be preset. For example, when the number of voice signal segments is in the range of 10 to 50, they are divided into two groups, and when they are in the range of 50 to 100, they are divided into three groups. The number of speech signal segments in each group is the same or similar.
- voice signal segments may be grouped according to the chronological order of the voice signal segments, so as to search for each voice signal segment in chronological order.
- Step 162 Determine a third timestamp of the first voice signal segment of each group of voice signal segments, the third timestamp being the same as the second timestamp of the first voice signal segment.
- each set of speech signal segments includes 30 speech signal segments, and each speech signal segment can be arranged in the order of time stamps.
- the time stamp corresponding to the voice signal segment in the first set of voice signal segments is a1 to a30
- the time stamp corresponding to the voice signal segment in the second group of voice signal segments is b1 to b30
- the voice signal in the third group of voice signal segments The timestamps corresponding to the segments are c1 to c30
- the time stamps of the first voice signal segments in each group of voice signal segments are a1, b1, and c1, respectively.
- Step 163 Determine a target timestamp with a minimum difference from the first timestamp, where the target timestamp is one of the third timestamps.
- the voice signal segments of each group and the third timestamp corresponding to each group may be saved.
- step 163 may be directly performed, and the grouping may not be performed. And re-determining the third timestamp corresponding to each group.
- Step 164 If the first timestamp is greater than the target timestamp, search for a second timestamp of each voice signal segment in order from the voice signal segment corresponding to the target timestamp, and determine the first timestamp. a target voice signal segment corresponding to the second timestamp with the same timestamp, and searching for the second timestamp of each voice signal segment in turn from the voice signal segment corresponding to the target timestamp, determining the first timestamp with the first timestamp The same second timestamp corresponds to the target speech signal segment.
- the second timestamp of each voice signal segment is searched sequentially starting from the voice signal segment corresponding to b1, that is, searching for b2, b3, b4 in order... Until the target speech signal segment corresponding to the second timestamp identical to the first timestamp is found.
- this embodiment can not only improve the program calculation speed, but also quickly find the voice signal segment corresponding to the character to be verified, and can also reduce the occupancy rate of the terminal CPU resources.
- step 17 the target speech signal segment is played.
- the system can detect the play instruction, and then play the voice signal segment corresponding to the timestamp of the field in which the text is located, so that the correct text can be determined through the context.
- the timestamp of each frame of the voice signal can be marked, and the timestamp of each character is marked, so that when a piece of text is sent and the play instruction is selected, the voice signal segment corresponding to the piece of text can be played for use.
- the specific implementation process can refer to the above steps, and details are not described herein again.
- the corresponding voice signal segment can be accurately determined and played according to the timestamp corresponding to the text to be verified. Therefore, the method provided in this embodiment is convenient to use and can improve the school. Test efficiency.
- FIG. 2 is a text verification apparatus according to an embodiment of the present invention.
- the apparatus may be deployed on a terminal or a terminal itself.
- the apparatus may include a second acquiring unit 21, and the sentence breaking unit 22
- the conversion unit 23 the marking unit 24, the first acquisition unit 25, the determination unit 25, and the playback unit 27.
- the second obtaining unit 21 is configured to acquire a voice signal.
- the sentence segment unit 22 is configured to perform a segmentation at a sentence pause signal in the voice signal to generate a voice signal segment.
- the converting unit 23 is configured to convert the voice signal segment into a text field.
- the conversion unit 23 can be connected to a cloud server, and the conversion unit 23 can convert the voice signal segment into a text field through a cloud server.
- the marking unit 24 is configured to mark the text field with a first time stamp and mark the voice signal segment with a second time stamp.
- the marking unit 24 is specifically configured to: mark the text field with a first time stamp by using a start time and/or an end time of the voice signal segment, and mark a second time stamp with the voice signal segment,
- the first timestamp is the same as the second timestamp.
- the marking unit 24 may also mark the text field with the first time stamp through the cloud server and mark the second time stamp with the voice signal segment.
- the first obtaining unit 25 is configured to: when the play instruction is detected, acquire text to be verified in the text, where the text to be verified includes at least one character.
- a determining unit 26 configured to determine a target voice signal segment corresponding to the character to be verified, where the text field of the character to be verified is a text field generated after the target voice signal segment is voice-recognized,
- the text includes at least one text field, each text field corresponding to a speech signal segment.
- the determining unit 26 can include:
- a first determining subunit configured to determine a first timestamp of a text field in which the text to be verified is located
- a second determining subunit configured to determine a target voice signal segment corresponding to the second timestamp that is the same as the first timestamp, where the second timestamp is a timestamp of the voice signal segment flag.
- the second determining subunit may be specifically used for:
- the target timestamp searching for a second timestamp of each voice signal segment in order from the voice signal segment corresponding to the target timestamp, determining that the first timestamp is the same as the first timestamp The second timestamp corresponding to the target voice signal segment, otherwise searching for the second timestamp of each voice signal segment in order from the voice signal segment corresponding to the target timestamp, and determining the same as the first timestamp The target speech signal segment corresponding to the second timestamp.
- the playing unit 27 is configured to play the target voice signal segment.
- the corresponding voice signal segment can be accurately determined and played according to the timestamp corresponding to the text to be verified. Therefore, the method provided in this embodiment is convenient to use and can improve the school. Test efficiency.
- FIG. 3 is a text verification apparatus according to an embodiment of the present invention.
- the apparatus may be deployed on a terminal or a terminal itself.
- the apparatus may include a second acquiring unit 31, and the converting unit 32. , sentence unit 33, label unit 34.
- the second obtaining unit 31 is configured to acquire a voice signal.
- the converting unit 32 is configured to convert the voice signal into a text.
- the sentence segment unit 33 is configured to perform a sentence segmentation on the text according to the semantic meaning of the text, generate a text field, and perform a segmentation on the voice signal according to the text field to generate a voice signal segment.
- the marking unit 34 is configured to mark the text field with a first time stamp and mark the voice signal segment with a second time stamp.
- the corresponding voice signal segment can be accurately determined and played according to the timestamp corresponding to the text to be verified. Therefore, the method provided in this embodiment is convenient to use and can improve the school. Test efficiency.
- FIG. 4 is a text verification system according to an embodiment of the present invention.
- the system includes a terminal 41 and a cloud server 42.
- the terminal 41 is configured to send the collected voice signal to the cloud server 42.
- the cloud server 42 may include a second obtaining unit 421, a sentence breaking unit 422, a converting unit 423, and a marking unit 424,
- the second obtaining unit 421 is configured to acquire a voice signal.
- the sentence segment unit 422 is configured to perform a segmentation at a sentence pause signal in the voice signal to generate a voice signal segment.
- the converting unit 423 is configured to convert the voice signal segment into a text field.
- the converting unit 424 can be connected to a cloud server, and the converting unit 22 can convert the voice signal segment into a text field through a cloud server.
- the marking unit 424 is configured to mark the text field with a first time stamp and mark the voice signal segment with a second time stamp.
- the cloud server 42 transmits the text field with the first time stamp and the voice signal segment with the second time stamp to the terminal 41.
- the terminal 41 is configured to view and verify text, and may include a first obtaining unit 411, a determining unit 412, and a playing unit 413.
- the first obtaining unit 411 is configured to: when the play instruction is detected, acquire text to be verified in the text, where the text to be verified includes at least one character.
- a determining unit 412 configured to determine a target voice signal segment corresponding to the character to be verified, where the text field of the character to be verified is a text field generated after the target voice signal segment is voice-recognized,
- the text includes at least one text field, each text field corresponding to a speech signal segment.
- the playing unit 413 is configured to play the target voice signal segment.
- the corresponding voice signal segment can be accurately determined and played according to the timestamp corresponding to the text to be verified. Therefore, the method provided in this embodiment is convenient to use and can improve the school. Test efficiency.
- the description is relatively simple, and the relevant parts can be referred to the description of the method embodiment.
- embodiments of the embodiments of the invention may be provided as a method, apparatus, or computer program product.
- embodiments of the invention may be in the form of an entirely hardware embodiment, an entirely software embodiment, or a combination of software and hardware.
- embodiments of the invention may take the form of a computer program product embodied on one or more computer usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
- Embodiments of the invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or FIG.
- These computer program instructions can be provided to a processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing terminal device to generate a machine such that instructions are executed by a processor of a computer or other programmable data processing terminal device Means are generated for implementing the functions specified in one or more blocks of a flow or a flow and/or a block of a block diagram.
- the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing terminal device to operate in a particular manner, such that the instructions stored in the computer readable memory comprise an article of manufacture comprising the instruction device.
- the instruction device implements the functions specified in one or more blocks of the flowchart or in a flow or block of the flowchart.
Abstract
Description
Claims (11)
- 一种文字校验方法,其特征在于,所述方法包括:当检测到播放指令时,获取文本中待校验的文字,所述待校验的文字包括至少一个文字;确定所述待校验的文字对应的目标语音信号片段,所述待校验的文字所在的文字段为所述目标语音信号片段经过语音识别后生成的文字段,所述文本包括至少一个文字段,每个文字段对应一个语音信号片段;播放所述目标语音信号片段。
- 根据权利要求1所述的方法,其特征在于,确定所述待校验的文字对应的目标语音信号片段,包括:确定所述待校验的文字所在的文字段的第一时间戳;确定与所述第一时间戳相同的第二时间戳对应的目标语音信号片段,所述第二时间戳为语音信号片段标记的时间戳。
- 根据权利要求2所述的方法,其特征在于,确定与所述第一时间戳相同的第二时间戳对应的目标语音信号片段,包括:对全部语音信号片段进行分组;确定各组语音信号片段中第一个语音信号片段的第三时间戳,所述第三时间戳与所述第一个语音信号片段的第二时间戳相同;确定与所述第一时间戳差值最小的目标时间戳,所述目标时间戳为所述第三时间戳中的一个;如果所述第一时间戳大于所述目标时间戳,则从所述目标时间戳对应的语音信号片段开始向后依次搜索各个语音信号片段的第二时间戳,确定与所述第一时间戳相同的第二时间戳对应的目标语音信号片段,否则从所述目标时间戳对应的语音信号片段开始向前依次搜索各个语音信号片段的第二时间戳,确定与所述第一时间戳相同的第二时间戳对应的目标语音信号片段。
- 根据权利要求2或3所述的方法,其特征在于,当检测到播放指令时,获取文本中待校验的文字之前,包括:获取语音信号;在所述语音信号中的语句停顿信号处进行断句,生成语音信号片段;将所述语音信号片段转换为文字段;对所述文字段标记第一时间戳,以及对所述语音信号片段标记第二时间戳。
- 根据权利要求4所述的方法,其特征在于,对所述文字段标记第一时间戳,以及对所述语音信号片段标记第二时间戳,包括:利用所述语音信号片段的起始时间和/或结束时间对所述文字段标记第一时间戳,以及对所述语音信号片段标记第二时间戳,所述第一时间戳与所述第二时间戳相同。
- 根据权利要求4所述的方法,其特征在于,获取语音信号之后,以及对所述文字段标记第一时间戳,对所述语音信号片段标记第二时间戳之前,还包括:将所述语音信号转换为文字;根据所述文字的语意,对所述文字进行断句,生成文字段;根据所述文字段对所述语音信号进行断句,生成语音信号片段。
- 根据权利要求6所述的方法,其特征在于,所述方法还包括:计算所述文字片段的准确度;根据所述准确度,对所述文字片段进行突出显示标记;根据所述显示标记,确定所述待播放的语音对应的时间戳;播放所述时间戳对应的语音信号片段。
- 一种文字校验装置,其特征在于,包括:第一获取单元,用于当检测到播放指令时,获取文本中待校验的文字,所述待校验的文字包括至少一个文字;确定单元,用于确定所述待校验的文字对应的目标语音信号片段,所述待校验的文字所在的文字段为所述目标语音信号片段经过语音识别后生成的文字段,所述文本包括至少一个文字段,每个文字段对应一个语音信号片段;播放单元,用于播放所述目标语音信号片段。
- 根据权利要求8所述的装置,其特征在于,所述确定单元包括:第一确定子单元,用于确定所述待校验的文字所在的文字段的第一时间戳;第二确定子单元,用于确定与所述第一时间戳相同的第二时间戳对应的目标语音信号片段,所述第二时间戳为语音信号片段标记的时间戳。
- 根据权利要求9所述的装置,其特征在于,第二确定子单元具体用于:对全部语音信号片段进行分组;确定各组语音信号片段中第一个语音信号片段的第三时间戳,所述第三时间戳与所述第一个语音信号片段的第二时间戳相同;确定与所述第一时间戳差值最小的目标时间戳,所述目标时间戳为所述第三时间戳中的一个;如果所述第一时间戳大于所述目标时间戳,则从所述目标时间戳对应的语音信号片段开始向后依次搜索各个语音信号片段的第二时间戳,确定与所述第一时间戳相同的第二时间戳对应的目标语音信号片段,否则从所述目标时间戳对应的语音信号片段开始向前依次搜索各个语音信号片段的第二时间戳,确定与所述第一时间戳相同的第二时间戳对应的目标语音信号片段。
- 根据权利要求8或9所述的装置,其特征在于,所述装置还包括:第二获取单元,用于获取语音信号;断句单元,用于在所述语音信号中的语句停顿信号处进行断句,生成语音信号片段;转换单元,用于将所述语音信号片段转换为文字段;标记单元,用于对所述文字段标记第一时间戳,以及对所述语音信号片段标记第二时间戳。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711386355.9 | 2017-12-20 | ||
CN201711386355.9A CN109949828B (zh) | 2017-12-20 | 2017-12-20 | 一种文字校验方法及装置 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2019120247A1 true WO2019120247A1 (zh) | 2019-06-27 |
Family
ID=66994409
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2018/122343 WO2019120247A1 (zh) | 2017-12-20 | 2018-12-20 | 一种文字校验方法及装置 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN109949828B (zh) |
WO (1) | WO2019120247A1 (zh) |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1282072A (zh) * | 1999-07-27 | 2001-01-31 | 国际商业机器公司 | 对语音识别结果中的错误进行校正的方法和语音识别系统 |
JP2004151614A (ja) * | 2002-11-01 | 2004-05-27 | Nippon Hoso Kyokai <Nhk> | 文字データ修正装置、その方法及びそのプログラム、並びに、字幕の生成方法 |
JP2008051895A (ja) * | 2006-08-22 | 2008-03-06 | Casio Comput Co Ltd | 音声認識装置および音声認識処理プログラム |
US20110110647A1 (en) * | 2009-11-06 | 2011-05-12 | Altus Learning Systems, Inc. | Error correction for synchronized media resources |
CN103065659A (zh) * | 2012-12-06 | 2013-04-24 | 广东欧珀移动通信有限公司 | 一种多媒体记录方法 |
CN105120195A (zh) * | 2015-09-18 | 2015-12-02 | 谷鸿林 | 内容录制、再现系统和方法 |
CN105185377A (zh) * | 2015-09-24 | 2015-12-23 | 百度在线网络技术(北京)有限公司 | 一种基于语音的文件生成方法及装置 |
CN106528715A (zh) * | 2016-10-27 | 2017-03-22 | 广东小天才科技有限公司 | 一种音频内容校核方法及装置 |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7483833B2 (en) * | 2003-10-21 | 2009-01-27 | Koninklijke Philips Electronics N.V. | Intelligent speech recognition with user interfaces |
US7617106B2 (en) * | 2003-11-05 | 2009-11-10 | Koninklijke Philips Electronics N.V. | Error detection for speech to text transcription systems |
US10923116B2 (en) * | 2015-06-01 | 2021-02-16 | Sinclair Broadcast Group, Inc. | Break state detection in content management systems |
CN105159870B (zh) * | 2015-06-26 | 2018-06-29 | 徐信 | 一种精准完成连续自然语音文本化的处理系统及方法 |
CN106448675B (zh) * | 2016-10-21 | 2020-05-01 | 科大讯飞股份有限公司 | 识别文本修正方法及系统 |
-
2017
- 2017-12-20 CN CN201711386355.9A patent/CN109949828B/zh active Active
-
2018
- 2018-12-20 WO PCT/CN2018/122343 patent/WO2019120247A1/zh active Application Filing
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1282072A (zh) * | 1999-07-27 | 2001-01-31 | 国际商业机器公司 | 对语音识别结果中的错误进行校正的方法和语音识别系统 |
JP2004151614A (ja) * | 2002-11-01 | 2004-05-27 | Nippon Hoso Kyokai <Nhk> | 文字データ修正装置、その方法及びそのプログラム、並びに、字幕の生成方法 |
JP2008051895A (ja) * | 2006-08-22 | 2008-03-06 | Casio Comput Co Ltd | 音声認識装置および音声認識処理プログラム |
US20110110647A1 (en) * | 2009-11-06 | 2011-05-12 | Altus Learning Systems, Inc. | Error correction for synchronized media resources |
CN103065659A (zh) * | 2012-12-06 | 2013-04-24 | 广东欧珀移动通信有限公司 | 一种多媒体记录方法 |
CN105120195A (zh) * | 2015-09-18 | 2015-12-02 | 谷鸿林 | 内容录制、再现系统和方法 |
CN105185377A (zh) * | 2015-09-24 | 2015-12-23 | 百度在线网络技术(北京)有限公司 | 一种基于语音的文件生成方法及装置 |
CN106528715A (zh) * | 2016-10-27 | 2017-03-22 | 广东小天才科技有限公司 | 一种音频内容校核方法及装置 |
Also Published As
Publication number | Publication date |
---|---|
CN109949828B (zh) | 2022-05-24 |
CN109949828A (zh) | 2019-06-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20180315429A1 (en) | System and method for automated legal proceeding assistant | |
CN108766418B (zh) | 语音端点识别方法、装置及设备 | |
CN109410664B (zh) | 一种发音纠正方法及电子设备 | |
TW202008349A (zh) | 語音標註方法、裝置及設備 | |
US8843369B1 (en) | Speech endpointing based on voice profile | |
CN103021409B (zh) | 一种语音启动拍照系统 | |
TWI711967B (zh) | 播報語音的確定方法、裝置和設備 | |
CN104538034A (zh) | 一种语音识别方法及系统 | |
CN105378830A (zh) | 音频数据的处理 | |
CN108305618B (zh) | 语音获取及搜索方法、智能笔、搜索终端及存储介质 | |
US10062384B1 (en) | Analysis of content written on a board | |
EP4322029A1 (en) | Method and apparatus for generating video corpus, and related device | |
CN108877779B (zh) | 用于检测语音尾点的方法和装置 | |
CN111402892A (zh) | 一种基于语音识别的会议记录模板生成方法 | |
WO2019120248A1 (zh) | 一种将语音转换为文字的方法、装置及系统 | |
CN109213970B (zh) | 笔录生成方法及装置 | |
CN111966839B (zh) | 数据处理方法、装置、电子设备及计算机存储介质 | |
CN113658594A (zh) | 歌词识别方法、装置、设备、存储介质及产品 | |
TWI769520B (zh) | 多國語言語音辨識及翻譯方法與相關的系統 | |
WO2019120247A1 (zh) | 一种文字校验方法及装置 | |
CN111221987A (zh) | 混合音频标记方法和装置 | |
US20220215839A1 (en) | Method for determining voice response speed, related device and computer program product | |
CN109688430A (zh) | 一种法院庭审文件回放方法、系统及存储介质 | |
CN114999464A (zh) | 语音数据处理方法及装置 | |
CN114155841A (zh) | 语音识别方法、装置、设备及存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18891782 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 18891782 Country of ref document: EP Kind code of ref document: A1 |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 18891782 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 13.01.2021) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 18891782 Country of ref document: EP Kind code of ref document: A1 |