CN105845129A - Method and system for dividing sentences in audio and automatic caption generation method and system for video files - Google Patents

Method and system for dividing sentences in audio and automatic caption generation method and system for video files Download PDF

Info

Publication number
CN105845129A
CN105845129A CN201610178500.3A CN201610178500A CN105845129A CN 105845129 A CN105845129 A CN 105845129A CN 201610178500 A CN201610178500 A CN 201610178500A CN 105845129 A CN105845129 A CN 105845129A
Authority
CN
China
Prior art keywords
sentence
audio
audio frequency
pause
cutting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610178500.3A
Other languages
Chinese (zh)
Inventor
蔡炜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Leshi Zhixin Electronic Technology Tianjin Co Ltd
LeTV Holding Beijing Co Ltd
Original Assignee
Leshi Zhixin Electronic Technology Tianjin Co Ltd
LeTV Holding Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Leshi Zhixin Electronic Technology Tianjin Co Ltd, LeTV Holding Beijing Co Ltd filed Critical Leshi Zhixin Electronic Technology Tianjin Co Ltd
Priority to CN201610178500.3A priority Critical patent/CN105845129A/en
Publication of CN105845129A publication Critical patent/CN105845129A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4394Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/488Data services, e.g. news ticker
    • H04N21/4884Data services, e.g. news ticker for displaying subtitles

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The embodiment of the invention discloses a method and system for dividing sentences in audio and an automatic caption generation method and system for video files. The method for dividing sentences in audio includes the steps of identifying first pause, identifying a first sentence, identifying second pause, determining whether the audio is finished, and if not, repeating the above sentence/pause identification step until the audio is finished, wherein the pause has a minimal length restriction, the sentence has a minimal length restriction and a maximal length restriction. The speech recognition rate is thus increased, which makes full automatic caption production possible.

Description

A kind of in audio frequency the method and system of cutting sentence and the captions of video file automatically generate Method and system
Technical field
The present invention relates to electronic technology field, be specifically related to a kind of method of cutting sentence in audio frequency And system, and the captions automatic generation method of video file and system.
Background technology
Captions refer to the non-visual contents such as the dialogue inside with written form display films and television programs, also refer to shadow Regard as the word of product post-production, be indispensable for films and television programs.Existing captions system Make mainly to be accomplished manually by captions producer, including dictating, translate, polish, time shaft and after The flow processs such as phase, inefficiency, complex procedures, and need substantial amounts of manpower and materials.
Summary of the invention
Therefore, the technical problem to be solved in the present invention is that existing captions make efficiency is low, operation Complexity, and need substantial amounts of manpower and materials.
To this end, embodiments provide a kind of method of cutting sentence in audio frequency, including:
S1, identification the first pause, described pause includes quiet section and/or non-speech segment, and records described First time started paused and end time;
S2, identify that the first sentence, described sentence include voice segments, and opening of described first sentence is set Time beginning was the described first end time paused;
S3, identification the second pause, and recorded for the described second time started paused and end time, if The end time putting the first sentence was the described second time started paused, and completed described first sentence Cutting;
S4, judge whether audio frequency terminates, as do not terminated then repeating said steps S2-S3, terminate then to perform Step S5;
S5, end;
Wherein, described pause has minimum length and limits, and is used for ignoring short sound information;Described sentence There is minimum length limit, for filtering out the invalid information in short-term in audio frequency;Described sentence also has Greatest length limits, and for limiting the length of sentence, improves this recognition accuracy.
Preferably, the minimum length of described pause is limited to 2 audio sections.
Preferably, the minimum length of described sentence is limited to 3 audio sections.
Preferably, the greatest length of described sentence limits is 50 audio sections.
The embodiment of the present invention additionally provides the captions automatic generation method of a kind of video file, including following Step:
S1, the audio frequency extracted in pending video file;
S2, classifying the audio section in described audio frequency, classification includes quiet, voice and non-voice;
S3, with any one method of cutting sentence in audio frequency aforementioned, cutting sentence in described audio frequency;
S4, described sentence is carried out speech recognition, and when recording corresponding text and the start-stop of each sentence Between information;
S5, generate captions according to described text and beginning and ending time information.
Preferably, in described step S1, utilize ffmpeg to extract audio frequency, and solved by corresponding Code device says that described audio decoder is PCM data.
Preferably, in described step S2, utilize Marsyas that described audio section is classified.
Preferably, in described step S4, utilize HTK, as identification facility, described sentence is carried out language Sound identification.
The embodiment of the present invention additionally provides the system of a kind of sentence of cutting in video, including:
Pause identification module, for identifying the pause including quiet section and/or non-speech segment, and record stops The time started paused and end time;
Sentence identification module, for identifying the sentence including voice segments, and arranges the time started of sentence For the end time of adjacent previous pause, the end time of sentence is that adjacent later is paused Time started;
Audio frequency terminates judge module, is used for judging whether audio frequency terminates.
Wherein, described pause has minimum length and limits, and is used for ignoring short sound information;Described sentence There is minimum length limit, for filtering out the invalid information in short-term in audio frequency;Described sentence also has Greatest length limits, and for limiting the length of sentence, improves this recognition accuracy.
The embodiment of the present invention additionally provides the captions automatic creation system of a kind of video file, including:
Audio extraction module, for extracting the audio frequency in described video file;
Audio section sort module, for classifying the audio section in described audio frequency, classification includes quiet Sound, voice and non-voice;
Sentence cutting module, for utilizing the system of the sentence of cutting in video described in claim 9, Cutting sentence in described audio frequency;
Sound identification module, for described sentence carries out speech recognition, and records the right of each sentence Answer text and beginning and ending time information;
Captions generation module, generates word for the text corresponding according to described sentence and beginning and ending time information Curtain.
The embodiment of the present invention is the method and system of cutting sentence in audio frequency, and the captions of video file Automatic generation method and system, by increase pause minimum length limit, pause minimum length limit and Three variablees such as sentence greatest length restriction, improve phonetic recognization rate so that full automatic captions system It is made for possibility.
Accompanying drawing explanation
In order to be illustrated more clearly that the specific embodiment of the invention or technical scheme of the prior art, under The accompanying drawing used required in detailed description of the invention or description of the prior art will be briefly described by face, It should be evident that the accompanying drawing in describing below is some embodiments of the present invention, general for this area From the point of view of logical technical staff, on the premise of not paying creative work, it is also possible to obtain according to these accompanying drawings Obtain other accompanying drawing.
Fig. 1 is the flow chart of the method for cutting sentence in audio frequency of the embodiment of the present invention;
Fig. 2 is the flow chart of the captions automatic generation method of the video file of the embodiment of the present invention;
Fig. 3 is the structured flowchart of the system of the sentence of cutting in video of the embodiment of the present invention;
Fig. 4 is the structured flowchart of the captions automatic creation system of the video file of the embodiment of the present invention.
Detailed description of the invention
Below in conjunction with accompanying drawing, technical scheme is clearly and completely described, it is clear that Described embodiment is a part of embodiment of the present invention rather than whole embodiments.Based on this Embodiment in bright, those of ordinary skill in the art are obtained under not making creative work premise Every other embodiment, broadly fall into the scope of protection of the invention.
With specific embodiment, technical scheme is described in detail below in conjunction with the accompanying drawings.
As it is shown in figure 1, embodiments provide a kind of method of cutting sentence in audio frequency, bag Include:
S1, identifying the first pause, this pause includes quiet section and/or non-speech segment, and record this first The time started paused and end time.
Concrete, this first time started paused can be the time started of this audio frequency, the end time It can be time of starting of first voice segments.
S2, identifying the first sentence, sentence includes voice segments, and arranges the time started of this first sentence For this first end time paused.
S3, identification the second pause, and record this second time started paused and end time, arrange The end time of this first sentence is this second time started paused, and completes the cutting of the first sentence.
S4, judge whether audio frequency terminates, as do not terminated, repeat step S2-S3, terminate then to perform step S5。
S5, end.
Wherein, this pause has minimum length and limits, and is used for ignoring short sound information;This sentence has Minimum length limits, for filtering out the invalid information in short-term in audio frequency;This sentence also has and greatly enhances most Degree limits, and for limiting the length of sentence, improves this recognition accuracy.
The purpose of cutting sentence is to obtain the short sentence being prone to carry out speech recognition, accurately detecting sentence Time started and end time be crucial, because only that reach higher end-point detection precision, just may be used To accomplish with a definite target in view, it is achieved sentence length sum purpose is controlled.But, the breakpoint of detection sentence Easily cause two kinds of extreme cases: one is to have the most extremely short sentence, and some length is only one to two Audio section.These sentences the most only comprise one or two word, even do not comprise any effective voice letter Breath;Two is some long sentences occur, and some is up to even tens seconds tens of seconds, includes some semantemes complete Whole unit.Both of these case all can have a strong impact on discrimination.
The method of the cutting sentence of the embodiment of the present invention is by increasing above-mentioned three variablees, and pause is Little length limitation, the minimum length of sentence limit and the greatest length of sentence limits, it is possible to effective Avoid the generation of above two extreme case, thus improve phonetic recognization rate.
Preferably, the minimum length of this pause is limited to 2 audio sections.
As it has been described above, arranging minimum length and limiting is to ignore shorter sound information, such as speaking The instantaneous ventilation etc. of people, with protection integrity in short.Through research repeatedly and the experiment of applicant, Think and be limited to 2 audio sections by the minimum length arranging pause so that in continuous speech unit Single non-voice unit will not be regarded as a pause, thus protects the integrity of sentence.
Preferably, the minimum length of this sentence is limited to 3 audio sections.
Concrete, the number of the voice segments that the minimum length of sentence i.e. sentence is to be comprised.Increase sentence The effect that the minimum length of son limits is to filter out the invalid information in short-term in audio frequency, such as speaker's Tussicula.It has been found that by setting minimum sentence as 3 audio sections, i.e. ignore overall length and be less than The voice unit of 0.48 second, can effective filter out as tussiculaed, sighing, the invalid information in short-term such as ventilation.
Preferably, the greatest length of this sentence limits is 50 audio sections.
The length of sentence is long, will increase the difficulty of speech recognition, reduces discrimination.Therefore, one When the number of the voice segments that sentence is comprised reaches certain limit, method should be taked to make sentence as soon as possible Terminate.The present invention is 50 audio sections by arranging the greatest length of sentence, after reaching this limit Even single non-voice unit also can be regarded as a pause, effectively limit the length of sentence, Improve the recognition accuracy of sentence.
As in figure 2 it is shown, the embodiment of the present invention additionally provides the captions side of automatically generating of a kind of video file Method, comprises the following steps:
S1, the audio frequency extracted in pending video file.
S2, classifying the audio section in this audio frequency, classification includes quiet, voice and non-voice.
S3, with any one method of cutting sentence in audio frequency above-mentioned, cutting sentence in this audio frequency.
S4, this sentence is carried out speech recognition, and record corresponding text and the beginning and ending time of each sentence Information.
S5, generate captions according to described text and beginning and ending time information.
Concrete, captions are srt text subtitle.The kind of captions has a variety of, the most popular Subtitling format have graphical format and text formatting two class.For graphical format captions, text Form captions have that size is little, form simple, be easy to make and the feature such as amendment.Wherein srt form Text subtitle is most widely used, can compatible various conventional media players.
Preferably, in order to optimize display effect, spectators are facilitated to watch captions, by longer in recognition result Sentence cutting be that multirow shows.
Preferably, in step sl, ffmpeg is utilized to extract audio frequency, and by corresponding decoder Say that described audio decoder is PCM data.
Preferably, in described step S2, utilize Marsyas that described audio section is classified.
Concrete, the interface setting frame length provided by Marsyas is 32ms, and segment length is 0.16s, I.e. one audio section comprises 5 audio frames.
Preferably, in step s 4, utilize HTK, as identification facility, described sentence is carried out voice knowledge Not.
Specifically, HTK is utilized to carry out sentence identification as large vocabulary continuous speech recognition instrument, Become some texts throughout one's life, store identification text results and the start-stop of correspondence of each sentence Temporal information.
As it is shown on figure 3, the embodiment of the present invention additionally provides the system 1 of a kind of sentence of cutting in video, Including:
Pause identification module 2, for identifying the pause including quiet section and/or non-speech segment, and record The time started paused and end time;
Sentence identification module 3, for identifying the sentence including voice segments, and arranges the time started of sentence For the end time of adjacent previous pause, the end time of sentence is that adjacent later is paused Time started;
Audio frequency terminates judge module 4, is used for judging whether audio frequency terminates;
Wherein, described pause has minimum length and limits, and is used for ignoring short sound information;Described sentence There is minimum length limit, for filtering out the invalid information in short-term in audio frequency;Described sentence also has Greatest length limits, and for limiting the length of sentence, improves this recognition accuracy.
As shown in Figure 4, the embodiment of the present invention additionally provides the captions of a kind of video file and automatically generates and is System 11, including:
Audio extraction module 12, for extracting the audio frequency in described video file;
Audio section sort module 13, for classifying the audio section in described audio frequency, classification includes Quiet, voice and non-voice;
Sentence cutting module 14, for utilize the sentence of cutting in video described in claim 9 be System, cutting sentence in described audio frequency;
Sound identification module 15, for described sentence carries out speech recognition, and records each sentence Corresponding text and beginning and ending time information;
Captions generation module 16, generates for the text corresponding according to described sentence and beginning and ending time information Captions.
Obviously, above-described embodiment is only for clearly demonstrating example, and not to embodiment party The restriction of formula.For those of ordinary skill in the field, the most also may be used To make other changes in different forms.Here without also all of embodiment being given With exhaustive.And the obvious change thus extended out or variation are still in the guarantor of the invention Protect among scope.

Claims (10)

1. the method for cutting sentence in audio frequency, it is characterised in that comprise the following steps:
S1, identification the first pause, described pause includes quiet section and/or non-speech segment, and records described First time started paused and end time;
S2, identify that the first sentence, described sentence include voice segments, and opening of described first sentence is set Time beginning was the described first end time paused;
S3, identification the second pause, and recorded for the described second time started paused and end time, if The end time putting the first sentence was the described second time started paused, and completed described first sentence Cutting;
S4, judge whether audio frequency terminates, as do not terminated then repeating said steps S2-S3, terminate then to perform Step S5;
S5, end;
Wherein, described pause has minimum length and limits, and is used for ignoring short sound information;Described sentence There is minimum length limit, for filtering out the invalid information in short-term in audio frequency;Described sentence also has Greatest length limits, and for limiting the length of sentence, improves this recognition accuracy.
Method the most according to claim 1, it is characterised in that the minimum length limit of described pause It is made as 2 audio sections.
3. according to the method described in claim 1-2, it is characterised in that the minimum length of described sentence It is limited to 3 audio sections.
4. according to the method described in any one of claim 1-3, it is characterised in that described sentence is It is 50 audio sections that long length limits.
5. the captions automatic generation method of a video file, it is characterised in that comprise the following steps:
S1, the audio frequency extracted in pending video file;
S2, classifying the audio section in described audio frequency, classification includes quiet, voice and non-voice;
S3, the use method of cutting sentence in audio frequency according to any one of claim 1-4, in institute State cutting sentence in audio frequency;
S4, described sentence is carried out speech recognition, and when recording corresponding text and the start-stop of each sentence Between information;
S5, generate captions according to described text and beginning and ending time information.
Method the most according to claim 5, it is characterised in that in described step S1, utilizes Ffmpeg extracts audio frequency, and says that described audio decoder is PCM data by corresponding decoder.
7. according to the method described in claim 5-6, it is characterised in that in described step S2, profit With Marsyas, described audio section is classified.
8. according to the method described in claim 5-7, it is characterised in that in described step S4, profit As identification facility, described sentence is carried out speech recognition with HTK.
9. the system of a cutting sentence in video, it is characterised in that including:
Pause identification module, for identifying the pause including quiet section and/or non-speech segment, and record stops The time started paused and end time;
Sentence identification module, for identifying the sentence including voice segments, and arranges the time started of sentence For the end time of adjacent previous pause, the end time of sentence is that adjacent later is paused Time started;
Audio frequency terminates judge module, is used for judging whether audio frequency terminates;
Wherein, described pause has minimum length and limits, and is used for ignoring short sound information;Described sentence There is minimum length limit, for filtering out the invalid information in short-term in audio frequency;Described sentence also has Greatest length limits, and for limiting the length of sentence, improves this recognition accuracy.
10. the captions automatic creation system of a video file, it is characterised in that including:
Audio extraction module, for extracting the audio frequency in described video file;
Audio section sort module, for classifying the audio section in described audio frequency, classification includes quiet Sound, voice and non-voice;
Sentence cutting module, for utilizing the system of the sentence of cutting in video described in claim 9, Cutting sentence in described audio frequency;
Sound identification module, for described sentence carries out speech recognition, and records the right of each sentence Answer text and beginning and ending time information;
Captions generation module, generates word for the text corresponding according to described sentence and beginning and ending time information Curtain.
CN201610178500.3A 2016-03-25 2016-03-25 Method and system for dividing sentences in audio and automatic caption generation method and system for video files Pending CN105845129A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610178500.3A CN105845129A (en) 2016-03-25 2016-03-25 Method and system for dividing sentences in audio and automatic caption generation method and system for video files

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610178500.3A CN105845129A (en) 2016-03-25 2016-03-25 Method and system for dividing sentences in audio and automatic caption generation method and system for video files

Publications (1)

Publication Number Publication Date
CN105845129A true CN105845129A (en) 2016-08-10

Family

ID=56583579

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610178500.3A Pending CN105845129A (en) 2016-03-25 2016-03-25 Method and system for dividing sentences in audio and automatic caption generation method and system for video files

Country Status (1)

Country Link
CN (1) CN105845129A (en)

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106331844A (en) * 2016-08-17 2017-01-11 北京金山安全软件有限公司 Method and device for generating subtitles of media file and electronic equipment
CN106504773A (en) * 2016-11-08 2017-03-15 上海贝生医疗设备有限公司 A kind of wearable device and voice and activities monitoring system
CN106504754A (en) * 2016-09-29 2017-03-15 浙江大学 A kind of real-time method for generating captions according to audio output
CN106506335A (en) * 2016-11-10 2017-03-15 北京小米移动软件有限公司 The method and device of sharing video frequency file
CN106528715A (en) * 2016-10-27 2017-03-22 广东小天才科技有限公司 Method and device for checking audio content
CN106782506A (en) * 2016-11-23 2017-05-31 语联网(武汉)信息技术有限公司 A kind of method that recorded audio is divided into section
CN106792145A (en) * 2017-02-22 2017-05-31 杭州当虹科技有限公司 A kind of method and apparatus of the automatic overlapping text of audio frequency and video
CN107291676A (en) * 2017-06-20 2017-10-24 广东小天才科技有限公司 Block method, terminal device and the computer-readable storage medium of voice document
CN107766325A (en) * 2017-09-27 2018-03-06 百度在线网络技术(北京)有限公司 Text joining method and its device
CN107919130A (en) * 2017-11-06 2018-04-17 百度在线网络技术(北京)有限公司 Method of speech processing and device based on high in the clouds
CN108062955A (en) * 2017-12-12 2018-05-22 深圳证券信息有限公司 A kind of intelligence report-generating method, system and equipment
CN108597521A (en) * 2018-05-04 2018-09-28 徐涌 Audio role divides interactive system, method, terminal and the medium with identification word
CN109005445A (en) * 2018-06-26 2018-12-14 卫军征 Multi-medium play method, system, storage medium and playback equipment
CN109166570A (en) * 2018-07-24 2019-01-08 百度在线网络技术(北京)有限公司 A kind of method, apparatus of phonetic segmentation, equipment and computer storage medium
CN109379641A (en) * 2018-11-14 2019-02-22 腾讯科技(深圳)有限公司 A kind of method for generating captions and device
CN109389999A (en) * 2018-09-28 2019-02-26 北京亿幕信息技术有限公司 A kind of high performance audio-video is made pauses in reading unpunctuated ancient writings method and system automatically
CN110223697A (en) * 2019-06-13 2019-09-10 苏州思必驰信息科技有限公司 Interactive method and system
CN110246500A (en) * 2019-07-12 2019-09-17 携程旅游信息技术(上海)有限公司 Audio recognition method and system based on recording file
CN110265027A (en) * 2019-06-19 2019-09-20 安徽声讯信息技术有限公司 A kind of audio frequency transmission method for meeting shorthand system
CN110265026A (en) * 2019-06-19 2019-09-20 安徽声讯信息技术有限公司 A kind of meeting shorthand system and meeting stenography method
CN110263313A (en) * 2019-06-19 2019-09-20 安徽声讯信息技术有限公司 A kind of man-machine coordination edit methods for meeting shorthand
CN110335612A (en) * 2019-07-11 2019-10-15 招商局金融科技有限公司 Minutes generation method, device and storage medium based on speech recognition
CN110473519A (en) * 2018-05-11 2019-11-19 北京国双科技有限公司 A kind of method of speech processing and device
CN110942764A (en) * 2019-11-15 2020-03-31 北京达佳互联信息技术有限公司 Stream type voice recognition method
CN111970311A (en) * 2020-10-23 2020-11-20 北京世纪好未来教育科技有限公司 Session segmentation method, electronic device and computer readable medium
CN111986655A (en) * 2020-08-18 2020-11-24 北京字节跳动网络技术有限公司 Audio content identification method, device, equipment and computer readable medium
CN112287914A (en) * 2020-12-27 2021-01-29 平安科技(深圳)有限公司 PPT video segment extraction method, device, equipment and medium
CN112820293A (en) * 2020-12-31 2021-05-18 讯飞智元信息科技有限公司 Voice recognition method and related device
CN113207032A (en) * 2021-04-29 2021-08-03 读书郎教育科技有限公司 System and method for increasing subtitles by recording videos in intelligent classroom
CN113225618A (en) * 2021-05-06 2021-08-06 阿里巴巴新加坡控股有限公司 Video editing method and device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105159870A (en) * 2015-06-26 2015-12-16 徐信 Processing system for precisely completing continuous natural speech textualization and method for precisely completing continuous natural speech textualization
CN105280206A (en) * 2014-06-23 2016-01-27 广东小天才科技有限公司 Audio playing method and device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105280206A (en) * 2014-06-23 2016-01-27 广东小天才科技有限公司 Audio playing method and device
CN105159870A (en) * 2015-06-26 2015-12-16 徐信 Processing system for precisely completing continuous natural speech textualization and method for precisely completing continuous natural speech textualization

Cited By (46)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106331844A (en) * 2016-08-17 2017-01-11 北京金山安全软件有限公司 Method and device for generating subtitles of media file and electronic equipment
CN106504754A (en) * 2016-09-29 2017-03-15 浙江大学 A kind of real-time method for generating captions according to audio output
CN106528715A (en) * 2016-10-27 2017-03-22 广东小天才科技有限公司 Method and device for checking audio content
CN106504773A (en) * 2016-11-08 2017-03-15 上海贝生医疗设备有限公司 A kind of wearable device and voice and activities monitoring system
CN106506335B (en) * 2016-11-10 2019-08-30 北京小米移动软件有限公司 The method and device of sharing video frequency file
CN106506335A (en) * 2016-11-10 2017-03-15 北京小米移动软件有限公司 The method and device of sharing video frequency file
CN106782506A (en) * 2016-11-23 2017-05-31 语联网(武汉)信息技术有限公司 A kind of method that recorded audio is divided into section
CN106792145A (en) * 2017-02-22 2017-05-31 杭州当虹科技有限公司 A kind of method and apparatus of the automatic overlapping text of audio frequency and video
CN107291676A (en) * 2017-06-20 2017-10-24 广东小天才科技有限公司 Block method, terminal device and the computer-readable storage medium of voice document
CN107766325A (en) * 2017-09-27 2018-03-06 百度在线网络技术(北京)有限公司 Text joining method and its device
US11024332B2 (en) 2017-11-06 2021-06-01 Baidu Online Network Technology (Beijing) Co., Ltd. Cloud-based speech processing method and apparatus
CN107919130A (en) * 2017-11-06 2018-04-17 百度在线网络技术(北京)有限公司 Method of speech processing and device based on high in the clouds
CN107919130B (en) * 2017-11-06 2021-12-17 百度在线网络技术(北京)有限公司 Cloud-based voice processing method and device
CN108062955A (en) * 2017-12-12 2018-05-22 深圳证券信息有限公司 A kind of intelligence report-generating method, system and equipment
CN108597521A (en) * 2018-05-04 2018-09-28 徐涌 Audio role divides interactive system, method, terminal and the medium with identification word
CN110473519B (en) * 2018-05-11 2022-05-27 北京国双科技有限公司 Voice processing method and device
CN110473519A (en) * 2018-05-11 2019-11-19 北京国双科技有限公司 A kind of method of speech processing and device
CN109005445A (en) * 2018-06-26 2018-12-14 卫军征 Multi-medium play method, system, storage medium and playback equipment
CN109166570A (en) * 2018-07-24 2019-01-08 百度在线网络技术(北京)有限公司 A kind of method, apparatus of phonetic segmentation, equipment and computer storage medium
CN109389999A (en) * 2018-09-28 2019-02-26 北京亿幕信息技术有限公司 A kind of high performance audio-video is made pauses in reading unpunctuated ancient writings method and system automatically
CN110418208A (en) * 2018-11-14 2019-11-05 腾讯科技(深圳)有限公司 A kind of subtitle based on artificial intelligence determines method and apparatus
CN110381388B (en) * 2018-11-14 2021-04-13 腾讯科技(深圳)有限公司 Subtitle generating method and device based on artificial intelligence
CN110381389A (en) * 2018-11-14 2019-10-25 腾讯科技(深圳)有限公司 A kind of method for generating captions and device based on artificial intelligence
CN110381388A (en) * 2018-11-14 2019-10-25 腾讯科技(深圳)有限公司 A kind of method for generating captions and device based on artificial intelligence
CN110381389B (en) * 2018-11-14 2022-02-25 腾讯科技(深圳)有限公司 Subtitle generating method and device based on artificial intelligence
CN109379641A (en) * 2018-11-14 2019-02-22 腾讯科技(深圳)有限公司 A kind of method for generating captions and device
CN110223697A (en) * 2019-06-13 2019-09-10 苏州思必驰信息科技有限公司 Interactive method and system
CN110263313A (en) * 2019-06-19 2019-09-20 安徽声讯信息技术有限公司 A kind of man-machine coordination edit methods for meeting shorthand
CN110265026A (en) * 2019-06-19 2019-09-20 安徽声讯信息技术有限公司 A kind of meeting shorthand system and meeting stenography method
CN110265027A (en) * 2019-06-19 2019-09-20 安徽声讯信息技术有限公司 A kind of audio frequency transmission method for meeting shorthand system
CN110263313B (en) * 2019-06-19 2021-08-24 安徽声讯信息技术有限公司 Man-machine collaborative editing method for conference shorthand
CN110265026B (en) * 2019-06-19 2021-07-27 安徽声讯信息技术有限公司 Conference shorthand system and conference shorthand method
CN110335612A (en) * 2019-07-11 2019-10-15 招商局金融科技有限公司 Minutes generation method, device and storage medium based on speech recognition
CN110246500A (en) * 2019-07-12 2019-09-17 携程旅游信息技术(上海)有限公司 Audio recognition method and system based on recording file
CN110942764B (en) * 2019-11-15 2022-04-22 北京达佳互联信息技术有限公司 Stream type voice recognition method
CN110942764A (en) * 2019-11-15 2020-03-31 北京达佳互联信息技术有限公司 Stream type voice recognition method
WO2022037419A1 (en) * 2020-08-18 2022-02-24 北京字节跳动网络技术有限公司 Audio content recognition method and apparatus, and device and computer-readable medium
US11783808B2 (en) 2020-08-18 2023-10-10 Beijing Bytedance Network Technology Co., Ltd. Audio content recognition method and apparatus, and device and computer-readable medium
CN111986655B (en) * 2020-08-18 2022-04-01 北京字节跳动网络技术有限公司 Audio content identification method, device, equipment and computer readable medium
CN111986655A (en) * 2020-08-18 2020-11-24 北京字节跳动网络技术有限公司 Audio content identification method, device, equipment and computer readable medium
CN111970311A (en) * 2020-10-23 2020-11-20 北京世纪好未来教育科技有限公司 Session segmentation method, electronic device and computer readable medium
CN112287914A (en) * 2020-12-27 2021-01-29 平安科技(深圳)有限公司 PPT video segment extraction method, device, equipment and medium
CN112287914B (en) * 2020-12-27 2021-04-02 平安科技(深圳)有限公司 PPT video segment extraction method, device, equipment and medium
CN112820293A (en) * 2020-12-31 2021-05-18 讯飞智元信息科技有限公司 Voice recognition method and related device
CN113207032A (en) * 2021-04-29 2021-08-03 读书郎教育科技有限公司 System and method for increasing subtitles by recording videos in intelligent classroom
CN113225618A (en) * 2021-05-06 2021-08-06 阿里巴巴新加坡控股有限公司 Video editing method and device

Similar Documents

Publication Publication Date Title
CN105845129A (en) Method and system for dividing sentences in audio and automatic caption generation method and system for video files
CN103035247B (en) Based on the method and device that voiceprint is operated to audio/video file
CN105245917B (en) A kind of system and method for multi-media voice subtitle generation
CN107039034B (en) Rhythm prediction method and system
KR100828166B1 (en) Method of extracting metadata from result of speech recognition and character recognition in video, method of searching video using metadta and record medium thereof
Morgan et al. The meeting project at ICSI
CN108962227B (en) Voice starting point and end point detection method and device, computer equipment and storage medium
US9774747B2 (en) Transcription system
JP4600828B2 (en) Document association apparatus and document association method
US5649060A (en) Automatic indexing and aligning of audio and text using speech recognition
CN101751919B (en) Spoken Chinese stress automatic detection method
Brognaux et al. HMM-based speech segmentation: Improvements of fully automatic approaches
CN107305541A (en) Speech recognition text segmentation method and device
US20130035936A1 (en) Language transcription
CN106878805A (en) A kind of mixed languages subtitle file generation method and device
CN111785275A (en) Voice recognition method and device
CN106328146A (en) Video subtitle generation method and apparatus
CN110691258A (en) Program material manufacturing method and device, computer storage medium and electronic equipment
CN106373598A (en) Audio replay control method and apparatus
Haubold et al. Alignment of speech to highly imperfect text transcriptions
CN110740275A (en) nonlinear editing systems
Kurtic et al. A Corpus of Spontaneous Multi-party Conversation in Bosnian Serbo-Croatian and British English.
Yang et al. An automated analysis and indexing framework for lecture video portal
CN113782026A (en) Information processing method, device, medium and equipment
Álvarez et al. APyCA: Towards the automatic subtitling of television content in Spanish

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20160810

WD01 Invention patent application deemed withdrawn after publication