CN105845129A - Method and system for dividing sentences in audio and automatic caption generation method and system for video files - Google Patents
Method and system for dividing sentences in audio and automatic caption generation method and system for video files Download PDFInfo
- Publication number
- CN105845129A CN105845129A CN201610178500.3A CN201610178500A CN105845129A CN 105845129 A CN105845129 A CN 105845129A CN 201610178500 A CN201610178500 A CN 201610178500A CN 105845129 A CN105845129 A CN 105845129A
- Authority
- CN
- China
- Prior art keywords
- sentence
- audio
- audio frequency
- pause
- cutting
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 32
- 238000001914 filtration Methods 0.000 claims description 6
- 238000000605 extraction Methods 0.000 claims description 3
- 239000000284 extract Substances 0.000 claims 1
- 238000004519 manufacturing process Methods 0.000 abstract description 2
- 238000001514 detection method Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 238000009423 ventilation Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/04—Segmentation; Word boundary detection
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/439—Processing of audio elementary streams
- H04N21/4394—Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- H04N21/488—Data services, e.g. news ticker
- H04N21/4884—Data services, e.g. news ticker for displaying subtitles
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
Abstract
The embodiment of the invention discloses a method and system for dividing sentences in audio and an automatic caption generation method and system for video files. The method for dividing sentences in audio includes the steps of identifying first pause, identifying a first sentence, identifying second pause, determining whether the audio is finished, and if not, repeating the above sentence/pause identification step until the audio is finished, wherein the pause has a minimal length restriction, the sentence has a minimal length restriction and a maximal length restriction. The speech recognition rate is thus increased, which makes full automatic caption production possible.
Description
Technical field
The present invention relates to electronic technology field, be specifically related to a kind of method of cutting sentence in audio frequency
And system, and the captions automatic generation method of video file and system.
Background technology
Captions refer to the non-visual contents such as the dialogue inside with written form display films and television programs, also refer to shadow
Regard as the word of product post-production, be indispensable for films and television programs.Existing captions system
Make mainly to be accomplished manually by captions producer, including dictating, translate, polish, time shaft and after
The flow processs such as phase, inefficiency, complex procedures, and need substantial amounts of manpower and materials.
Summary of the invention
Therefore, the technical problem to be solved in the present invention is that existing captions make efficiency is low, operation
Complexity, and need substantial amounts of manpower and materials.
To this end, embodiments provide a kind of method of cutting sentence in audio frequency, including:
S1, identification the first pause, described pause includes quiet section and/or non-speech segment, and records described
First time started paused and end time;
S2, identify that the first sentence, described sentence include voice segments, and opening of described first sentence is set
Time beginning was the described first end time paused;
S3, identification the second pause, and recorded for the described second time started paused and end time, if
The end time putting the first sentence was the described second time started paused, and completed described first sentence
Cutting;
S4, judge whether audio frequency terminates, as do not terminated then repeating said steps S2-S3, terminate then to perform
Step S5;
S5, end;
Wherein, described pause has minimum length and limits, and is used for ignoring short sound information;Described sentence
There is minimum length limit, for filtering out the invalid information in short-term in audio frequency;Described sentence also has
Greatest length limits, and for limiting the length of sentence, improves this recognition accuracy.
Preferably, the minimum length of described pause is limited to 2 audio sections.
Preferably, the minimum length of described sentence is limited to 3 audio sections.
Preferably, the greatest length of described sentence limits is 50 audio sections.
The embodiment of the present invention additionally provides the captions automatic generation method of a kind of video file, including following
Step:
S1, the audio frequency extracted in pending video file;
S2, classifying the audio section in described audio frequency, classification includes quiet, voice and non-voice;
S3, with any one method of cutting sentence in audio frequency aforementioned, cutting sentence in described audio frequency;
S4, described sentence is carried out speech recognition, and when recording corresponding text and the start-stop of each sentence
Between information;
S5, generate captions according to described text and beginning and ending time information.
Preferably, in described step S1, utilize ffmpeg to extract audio frequency, and solved by corresponding
Code device says that described audio decoder is PCM data.
Preferably, in described step S2, utilize Marsyas that described audio section is classified.
Preferably, in described step S4, utilize HTK, as identification facility, described sentence is carried out language
Sound identification.
The embodiment of the present invention additionally provides the system of a kind of sentence of cutting in video, including:
Pause identification module, for identifying the pause including quiet section and/or non-speech segment, and record stops
The time started paused and end time;
Sentence identification module, for identifying the sentence including voice segments, and arranges the time started of sentence
For the end time of adjacent previous pause, the end time of sentence is that adjacent later is paused
Time started;
Audio frequency terminates judge module, is used for judging whether audio frequency terminates.
Wherein, described pause has minimum length and limits, and is used for ignoring short sound information;Described sentence
There is minimum length limit, for filtering out the invalid information in short-term in audio frequency;Described sentence also has
Greatest length limits, and for limiting the length of sentence, improves this recognition accuracy.
The embodiment of the present invention additionally provides the captions automatic creation system of a kind of video file, including:
Audio extraction module, for extracting the audio frequency in described video file;
Audio section sort module, for classifying the audio section in described audio frequency, classification includes quiet
Sound, voice and non-voice;
Sentence cutting module, for utilizing the system of the sentence of cutting in video described in claim 9,
Cutting sentence in described audio frequency;
Sound identification module, for described sentence carries out speech recognition, and records the right of each sentence
Answer text and beginning and ending time information;
Captions generation module, generates word for the text corresponding according to described sentence and beginning and ending time information
Curtain.
The embodiment of the present invention is the method and system of cutting sentence in audio frequency, and the captions of video file
Automatic generation method and system, by increase pause minimum length limit, pause minimum length limit and
Three variablees such as sentence greatest length restriction, improve phonetic recognization rate so that full automatic captions system
It is made for possibility.
Accompanying drawing explanation
In order to be illustrated more clearly that the specific embodiment of the invention or technical scheme of the prior art, under
The accompanying drawing used required in detailed description of the invention or description of the prior art will be briefly described by face,
It should be evident that the accompanying drawing in describing below is some embodiments of the present invention, general for this area
From the point of view of logical technical staff, on the premise of not paying creative work, it is also possible to obtain according to these accompanying drawings
Obtain other accompanying drawing.
Fig. 1 is the flow chart of the method for cutting sentence in audio frequency of the embodiment of the present invention;
Fig. 2 is the flow chart of the captions automatic generation method of the video file of the embodiment of the present invention;
Fig. 3 is the structured flowchart of the system of the sentence of cutting in video of the embodiment of the present invention;
Fig. 4 is the structured flowchart of the captions automatic creation system of the video file of the embodiment of the present invention.
Detailed description of the invention
Below in conjunction with accompanying drawing, technical scheme is clearly and completely described, it is clear that
Described embodiment is a part of embodiment of the present invention rather than whole embodiments.Based on this
Embodiment in bright, those of ordinary skill in the art are obtained under not making creative work premise
Every other embodiment, broadly fall into the scope of protection of the invention.
With specific embodiment, technical scheme is described in detail below in conjunction with the accompanying drawings.
As it is shown in figure 1, embodiments provide a kind of method of cutting sentence in audio frequency, bag
Include:
S1, identifying the first pause, this pause includes quiet section and/or non-speech segment, and record this first
The time started paused and end time.
Concrete, this first time started paused can be the time started of this audio frequency, the end time
It can be time of starting of first voice segments.
S2, identifying the first sentence, sentence includes voice segments, and arranges the time started of this first sentence
For this first end time paused.
S3, identification the second pause, and record this second time started paused and end time, arrange
The end time of this first sentence is this second time started paused, and completes the cutting of the first sentence.
S4, judge whether audio frequency terminates, as do not terminated, repeat step S2-S3, terminate then to perform step
S5。
S5, end.
Wherein, this pause has minimum length and limits, and is used for ignoring short sound information;This sentence has
Minimum length limits, for filtering out the invalid information in short-term in audio frequency;This sentence also has and greatly enhances most
Degree limits, and for limiting the length of sentence, improves this recognition accuracy.
The purpose of cutting sentence is to obtain the short sentence being prone to carry out speech recognition, accurately detecting sentence
Time started and end time be crucial, because only that reach higher end-point detection precision, just may be used
To accomplish with a definite target in view, it is achieved sentence length sum purpose is controlled.But, the breakpoint of detection sentence
Easily cause two kinds of extreme cases: one is to have the most extremely short sentence, and some length is only one to two
Audio section.These sentences the most only comprise one or two word, even do not comprise any effective voice letter
Breath;Two is some long sentences occur, and some is up to even tens seconds tens of seconds, includes some semantemes complete
Whole unit.Both of these case all can have a strong impact on discrimination.
The method of the cutting sentence of the embodiment of the present invention is by increasing above-mentioned three variablees, and pause is
Little length limitation, the minimum length of sentence limit and the greatest length of sentence limits, it is possible to effective
Avoid the generation of above two extreme case, thus improve phonetic recognization rate.
Preferably, the minimum length of this pause is limited to 2 audio sections.
As it has been described above, arranging minimum length and limiting is to ignore shorter sound information, such as speaking
The instantaneous ventilation etc. of people, with protection integrity in short.Through research repeatedly and the experiment of applicant,
Think and be limited to 2 audio sections by the minimum length arranging pause so that in continuous speech unit
Single non-voice unit will not be regarded as a pause, thus protects the integrity of sentence.
Preferably, the minimum length of this sentence is limited to 3 audio sections.
Concrete, the number of the voice segments that the minimum length of sentence i.e. sentence is to be comprised.Increase sentence
The effect that the minimum length of son limits is to filter out the invalid information in short-term in audio frequency, such as speaker's
Tussicula.It has been found that by setting minimum sentence as 3 audio sections, i.e. ignore overall length and be less than
The voice unit of 0.48 second, can effective filter out as tussiculaed, sighing, the invalid information in short-term such as ventilation.
Preferably, the greatest length of this sentence limits is 50 audio sections.
The length of sentence is long, will increase the difficulty of speech recognition, reduces discrimination.Therefore, one
When the number of the voice segments that sentence is comprised reaches certain limit, method should be taked to make sentence as soon as possible
Terminate.The present invention is 50 audio sections by arranging the greatest length of sentence, after reaching this limit
Even single non-voice unit also can be regarded as a pause, effectively limit the length of sentence,
Improve the recognition accuracy of sentence.
As in figure 2 it is shown, the embodiment of the present invention additionally provides the captions side of automatically generating of a kind of video file
Method, comprises the following steps:
S1, the audio frequency extracted in pending video file.
S2, classifying the audio section in this audio frequency, classification includes quiet, voice and non-voice.
S3, with any one method of cutting sentence in audio frequency above-mentioned, cutting sentence in this audio frequency.
S4, this sentence is carried out speech recognition, and record corresponding text and the beginning and ending time of each sentence
Information.
S5, generate captions according to described text and beginning and ending time information.
Concrete, captions are srt text subtitle.The kind of captions has a variety of, the most popular
Subtitling format have graphical format and text formatting two class.For graphical format captions, text
Form captions have that size is little, form simple, be easy to make and the feature such as amendment.Wherein srt form
Text subtitle is most widely used, can compatible various conventional media players.
Preferably, in order to optimize display effect, spectators are facilitated to watch captions, by longer in recognition result
Sentence cutting be that multirow shows.
Preferably, in step sl, ffmpeg is utilized to extract audio frequency, and by corresponding decoder
Say that described audio decoder is PCM data.
Preferably, in described step S2, utilize Marsyas that described audio section is classified.
Concrete, the interface setting frame length provided by Marsyas is 32ms, and segment length is 0.16s,
I.e. one audio section comprises 5 audio frames.
Preferably, in step s 4, utilize HTK, as identification facility, described sentence is carried out voice knowledge
Not.
Specifically, HTK is utilized to carry out sentence identification as large vocabulary continuous speech recognition instrument,
Become some texts throughout one's life, store identification text results and the start-stop of correspondence of each sentence
Temporal information.
As it is shown on figure 3, the embodiment of the present invention additionally provides the system 1 of a kind of sentence of cutting in video,
Including:
Pause identification module 2, for identifying the pause including quiet section and/or non-speech segment, and record
The time started paused and end time;
Sentence identification module 3, for identifying the sentence including voice segments, and arranges the time started of sentence
For the end time of adjacent previous pause, the end time of sentence is that adjacent later is paused
Time started;
Audio frequency terminates judge module 4, is used for judging whether audio frequency terminates;
Wherein, described pause has minimum length and limits, and is used for ignoring short sound information;Described sentence
There is minimum length limit, for filtering out the invalid information in short-term in audio frequency;Described sentence also has
Greatest length limits, and for limiting the length of sentence, improves this recognition accuracy.
As shown in Figure 4, the embodiment of the present invention additionally provides the captions of a kind of video file and automatically generates and is
System 11, including:
Audio extraction module 12, for extracting the audio frequency in described video file;
Audio section sort module 13, for classifying the audio section in described audio frequency, classification includes
Quiet, voice and non-voice;
Sentence cutting module 14, for utilize the sentence of cutting in video described in claim 9 be
System, cutting sentence in described audio frequency;
Sound identification module 15, for described sentence carries out speech recognition, and records each sentence
Corresponding text and beginning and ending time information;
Captions generation module 16, generates for the text corresponding according to described sentence and beginning and ending time information
Captions.
Obviously, above-described embodiment is only for clearly demonstrating example, and not to embodiment party
The restriction of formula.For those of ordinary skill in the field, the most also may be used
To make other changes in different forms.Here without also all of embodiment being given
With exhaustive.And the obvious change thus extended out or variation are still in the guarantor of the invention
Protect among scope.
Claims (10)
1. the method for cutting sentence in audio frequency, it is characterised in that comprise the following steps:
S1, identification the first pause, described pause includes quiet section and/or non-speech segment, and records described
First time started paused and end time;
S2, identify that the first sentence, described sentence include voice segments, and opening of described first sentence is set
Time beginning was the described first end time paused;
S3, identification the second pause, and recorded for the described second time started paused and end time, if
The end time putting the first sentence was the described second time started paused, and completed described first sentence
Cutting;
S4, judge whether audio frequency terminates, as do not terminated then repeating said steps S2-S3, terminate then to perform
Step S5;
S5, end;
Wherein, described pause has minimum length and limits, and is used for ignoring short sound information;Described sentence
There is minimum length limit, for filtering out the invalid information in short-term in audio frequency;Described sentence also has
Greatest length limits, and for limiting the length of sentence, improves this recognition accuracy.
Method the most according to claim 1, it is characterised in that the minimum length limit of described pause
It is made as 2 audio sections.
3. according to the method described in claim 1-2, it is characterised in that the minimum length of described sentence
It is limited to 3 audio sections.
4. according to the method described in any one of claim 1-3, it is characterised in that described sentence is
It is 50 audio sections that long length limits.
5. the captions automatic generation method of a video file, it is characterised in that comprise the following steps:
S1, the audio frequency extracted in pending video file;
S2, classifying the audio section in described audio frequency, classification includes quiet, voice and non-voice;
S3, the use method of cutting sentence in audio frequency according to any one of claim 1-4, in institute
State cutting sentence in audio frequency;
S4, described sentence is carried out speech recognition, and when recording corresponding text and the start-stop of each sentence
Between information;
S5, generate captions according to described text and beginning and ending time information.
Method the most according to claim 5, it is characterised in that in described step S1, utilizes
Ffmpeg extracts audio frequency, and says that described audio decoder is PCM data by corresponding decoder.
7. according to the method described in claim 5-6, it is characterised in that in described step S2, profit
With Marsyas, described audio section is classified.
8. according to the method described in claim 5-7, it is characterised in that in described step S4, profit
As identification facility, described sentence is carried out speech recognition with HTK.
9. the system of a cutting sentence in video, it is characterised in that including:
Pause identification module, for identifying the pause including quiet section and/or non-speech segment, and record stops
The time started paused and end time;
Sentence identification module, for identifying the sentence including voice segments, and arranges the time started of sentence
For the end time of adjacent previous pause, the end time of sentence is that adjacent later is paused
Time started;
Audio frequency terminates judge module, is used for judging whether audio frequency terminates;
Wherein, described pause has minimum length and limits, and is used for ignoring short sound information;Described sentence
There is minimum length limit, for filtering out the invalid information in short-term in audio frequency;Described sentence also has
Greatest length limits, and for limiting the length of sentence, improves this recognition accuracy.
10. the captions automatic creation system of a video file, it is characterised in that including:
Audio extraction module, for extracting the audio frequency in described video file;
Audio section sort module, for classifying the audio section in described audio frequency, classification includes quiet
Sound, voice and non-voice;
Sentence cutting module, for utilizing the system of the sentence of cutting in video described in claim 9,
Cutting sentence in described audio frequency;
Sound identification module, for described sentence carries out speech recognition, and records the right of each sentence
Answer text and beginning and ending time information;
Captions generation module, generates word for the text corresponding according to described sentence and beginning and ending time information
Curtain.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610178500.3A CN105845129A (en) | 2016-03-25 | 2016-03-25 | Method and system for dividing sentences in audio and automatic caption generation method and system for video files |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610178500.3A CN105845129A (en) | 2016-03-25 | 2016-03-25 | Method and system for dividing sentences in audio and automatic caption generation method and system for video files |
Publications (1)
Publication Number | Publication Date |
---|---|
CN105845129A true CN105845129A (en) | 2016-08-10 |
Family
ID=56583579
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610178500.3A Pending CN105845129A (en) | 2016-03-25 | 2016-03-25 | Method and system for dividing sentences in audio and automatic caption generation method and system for video files |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105845129A (en) |
Cited By (30)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106331844A (en) * | 2016-08-17 | 2017-01-11 | 北京金山安全软件有限公司 | Method and device for generating subtitles of media file and electronic equipment |
CN106504773A (en) * | 2016-11-08 | 2017-03-15 | 上海贝生医疗设备有限公司 | A kind of wearable device and voice and activities monitoring system |
CN106504754A (en) * | 2016-09-29 | 2017-03-15 | 浙江大学 | A kind of real-time method for generating captions according to audio output |
CN106506335A (en) * | 2016-11-10 | 2017-03-15 | 北京小米移动软件有限公司 | The method and device of sharing video frequency file |
CN106528715A (en) * | 2016-10-27 | 2017-03-22 | 广东小天才科技有限公司 | Method and device for checking audio content |
CN106782506A (en) * | 2016-11-23 | 2017-05-31 | 语联网(武汉)信息技术有限公司 | A kind of method that recorded audio is divided into section |
CN106792145A (en) * | 2017-02-22 | 2017-05-31 | 杭州当虹科技有限公司 | A kind of method and apparatus of the automatic overlapping text of audio frequency and video |
CN107291676A (en) * | 2017-06-20 | 2017-10-24 | 广东小天才科技有限公司 | Block method, terminal device and the computer-readable storage medium of voice document |
CN107766325A (en) * | 2017-09-27 | 2018-03-06 | 百度在线网络技术(北京)有限公司 | Text joining method and its device |
CN107919130A (en) * | 2017-11-06 | 2018-04-17 | 百度在线网络技术(北京)有限公司 | Method of speech processing and device based on high in the clouds |
CN108062955A (en) * | 2017-12-12 | 2018-05-22 | 深圳证券信息有限公司 | A kind of intelligence report-generating method, system and equipment |
CN108597521A (en) * | 2018-05-04 | 2018-09-28 | 徐涌 | Audio role divides interactive system, method, terminal and the medium with identification word |
CN109005445A (en) * | 2018-06-26 | 2018-12-14 | 卫军征 | Multi-medium play method, system, storage medium and playback equipment |
CN109166570A (en) * | 2018-07-24 | 2019-01-08 | 百度在线网络技术(北京)有限公司 | A kind of method, apparatus of phonetic segmentation, equipment and computer storage medium |
CN109379641A (en) * | 2018-11-14 | 2019-02-22 | 腾讯科技(深圳)有限公司 | A kind of method for generating captions and device |
CN109389999A (en) * | 2018-09-28 | 2019-02-26 | 北京亿幕信息技术有限公司 | A kind of high performance audio-video is made pauses in reading unpunctuated ancient writings method and system automatically |
CN110223697A (en) * | 2019-06-13 | 2019-09-10 | 苏州思必驰信息科技有限公司 | Interactive method and system |
CN110246500A (en) * | 2019-07-12 | 2019-09-17 | 携程旅游信息技术(上海)有限公司 | Audio recognition method and system based on recording file |
CN110265027A (en) * | 2019-06-19 | 2019-09-20 | 安徽声讯信息技术有限公司 | A kind of audio frequency transmission method for meeting shorthand system |
CN110265026A (en) * | 2019-06-19 | 2019-09-20 | 安徽声讯信息技术有限公司 | A kind of meeting shorthand system and meeting stenography method |
CN110263313A (en) * | 2019-06-19 | 2019-09-20 | 安徽声讯信息技术有限公司 | A kind of man-machine coordination edit methods for meeting shorthand |
CN110335612A (en) * | 2019-07-11 | 2019-10-15 | 招商局金融科技有限公司 | Minutes generation method, device and storage medium based on speech recognition |
CN110473519A (en) * | 2018-05-11 | 2019-11-19 | 北京国双科技有限公司 | A kind of method of speech processing and device |
CN110942764A (en) * | 2019-11-15 | 2020-03-31 | 北京达佳互联信息技术有限公司 | Stream type voice recognition method |
CN111970311A (en) * | 2020-10-23 | 2020-11-20 | 北京世纪好未来教育科技有限公司 | Session segmentation method, electronic device and computer readable medium |
CN111986655A (en) * | 2020-08-18 | 2020-11-24 | 北京字节跳动网络技术有限公司 | Audio content identification method, device, equipment and computer readable medium |
CN112287914A (en) * | 2020-12-27 | 2021-01-29 | 平安科技(深圳)有限公司 | PPT video segment extraction method, device, equipment and medium |
CN112820293A (en) * | 2020-12-31 | 2021-05-18 | 讯飞智元信息科技有限公司 | Voice recognition method and related device |
CN113207032A (en) * | 2021-04-29 | 2021-08-03 | 读书郎教育科技有限公司 | System and method for increasing subtitles by recording videos in intelligent classroom |
CN113225618A (en) * | 2021-05-06 | 2021-08-06 | 阿里巴巴新加坡控股有限公司 | Video editing method and device |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105159870A (en) * | 2015-06-26 | 2015-12-16 | 徐信 | Processing system for precisely completing continuous natural speech textualization and method for precisely completing continuous natural speech textualization |
CN105280206A (en) * | 2014-06-23 | 2016-01-27 | 广东小天才科技有限公司 | Audio playing method and device |
-
2016
- 2016-03-25 CN CN201610178500.3A patent/CN105845129A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105280206A (en) * | 2014-06-23 | 2016-01-27 | 广东小天才科技有限公司 | Audio playing method and device |
CN105159870A (en) * | 2015-06-26 | 2015-12-16 | 徐信 | Processing system for precisely completing continuous natural speech textualization and method for precisely completing continuous natural speech textualization |
Cited By (46)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106331844A (en) * | 2016-08-17 | 2017-01-11 | 北京金山安全软件有限公司 | Method and device for generating subtitles of media file and electronic equipment |
CN106504754A (en) * | 2016-09-29 | 2017-03-15 | 浙江大学 | A kind of real-time method for generating captions according to audio output |
CN106528715A (en) * | 2016-10-27 | 2017-03-22 | 广东小天才科技有限公司 | Method and device for checking audio content |
CN106504773A (en) * | 2016-11-08 | 2017-03-15 | 上海贝生医疗设备有限公司 | A kind of wearable device and voice and activities monitoring system |
CN106506335B (en) * | 2016-11-10 | 2019-08-30 | 北京小米移动软件有限公司 | The method and device of sharing video frequency file |
CN106506335A (en) * | 2016-11-10 | 2017-03-15 | 北京小米移动软件有限公司 | The method and device of sharing video frequency file |
CN106782506A (en) * | 2016-11-23 | 2017-05-31 | 语联网(武汉)信息技术有限公司 | A kind of method that recorded audio is divided into section |
CN106792145A (en) * | 2017-02-22 | 2017-05-31 | 杭州当虹科技有限公司 | A kind of method and apparatus of the automatic overlapping text of audio frequency and video |
CN107291676A (en) * | 2017-06-20 | 2017-10-24 | 广东小天才科技有限公司 | Block method, terminal device and the computer-readable storage medium of voice document |
CN107766325A (en) * | 2017-09-27 | 2018-03-06 | 百度在线网络技术(北京)有限公司 | Text joining method and its device |
US11024332B2 (en) | 2017-11-06 | 2021-06-01 | Baidu Online Network Technology (Beijing) Co., Ltd. | Cloud-based speech processing method and apparatus |
CN107919130A (en) * | 2017-11-06 | 2018-04-17 | 百度在线网络技术(北京)有限公司 | Method of speech processing and device based on high in the clouds |
CN107919130B (en) * | 2017-11-06 | 2021-12-17 | 百度在线网络技术(北京)有限公司 | Cloud-based voice processing method and device |
CN108062955A (en) * | 2017-12-12 | 2018-05-22 | 深圳证券信息有限公司 | A kind of intelligence report-generating method, system and equipment |
CN108597521A (en) * | 2018-05-04 | 2018-09-28 | 徐涌 | Audio role divides interactive system, method, terminal and the medium with identification word |
CN110473519B (en) * | 2018-05-11 | 2022-05-27 | 北京国双科技有限公司 | Voice processing method and device |
CN110473519A (en) * | 2018-05-11 | 2019-11-19 | 北京国双科技有限公司 | A kind of method of speech processing and device |
CN109005445A (en) * | 2018-06-26 | 2018-12-14 | 卫军征 | Multi-medium play method, system, storage medium and playback equipment |
CN109166570A (en) * | 2018-07-24 | 2019-01-08 | 百度在线网络技术(北京)有限公司 | A kind of method, apparatus of phonetic segmentation, equipment and computer storage medium |
CN109389999A (en) * | 2018-09-28 | 2019-02-26 | 北京亿幕信息技术有限公司 | A kind of high performance audio-video is made pauses in reading unpunctuated ancient writings method and system automatically |
CN110418208A (en) * | 2018-11-14 | 2019-11-05 | 腾讯科技(深圳)有限公司 | A kind of subtitle based on artificial intelligence determines method and apparatus |
CN110381388B (en) * | 2018-11-14 | 2021-04-13 | 腾讯科技(深圳)有限公司 | Subtitle generating method and device based on artificial intelligence |
CN110381389A (en) * | 2018-11-14 | 2019-10-25 | 腾讯科技(深圳)有限公司 | A kind of method for generating captions and device based on artificial intelligence |
CN110381388A (en) * | 2018-11-14 | 2019-10-25 | 腾讯科技(深圳)有限公司 | A kind of method for generating captions and device based on artificial intelligence |
CN110381389B (en) * | 2018-11-14 | 2022-02-25 | 腾讯科技(深圳)有限公司 | Subtitle generating method and device based on artificial intelligence |
CN109379641A (en) * | 2018-11-14 | 2019-02-22 | 腾讯科技(深圳)有限公司 | A kind of method for generating captions and device |
CN110223697A (en) * | 2019-06-13 | 2019-09-10 | 苏州思必驰信息科技有限公司 | Interactive method and system |
CN110263313A (en) * | 2019-06-19 | 2019-09-20 | 安徽声讯信息技术有限公司 | A kind of man-machine coordination edit methods for meeting shorthand |
CN110265026A (en) * | 2019-06-19 | 2019-09-20 | 安徽声讯信息技术有限公司 | A kind of meeting shorthand system and meeting stenography method |
CN110265027A (en) * | 2019-06-19 | 2019-09-20 | 安徽声讯信息技术有限公司 | A kind of audio frequency transmission method for meeting shorthand system |
CN110263313B (en) * | 2019-06-19 | 2021-08-24 | 安徽声讯信息技术有限公司 | Man-machine collaborative editing method for conference shorthand |
CN110265026B (en) * | 2019-06-19 | 2021-07-27 | 安徽声讯信息技术有限公司 | Conference shorthand system and conference shorthand method |
CN110335612A (en) * | 2019-07-11 | 2019-10-15 | 招商局金融科技有限公司 | Minutes generation method, device and storage medium based on speech recognition |
CN110246500A (en) * | 2019-07-12 | 2019-09-17 | 携程旅游信息技术(上海)有限公司 | Audio recognition method and system based on recording file |
CN110942764B (en) * | 2019-11-15 | 2022-04-22 | 北京达佳互联信息技术有限公司 | Stream type voice recognition method |
CN110942764A (en) * | 2019-11-15 | 2020-03-31 | 北京达佳互联信息技术有限公司 | Stream type voice recognition method |
WO2022037419A1 (en) * | 2020-08-18 | 2022-02-24 | 北京字节跳动网络技术有限公司 | Audio content recognition method and apparatus, and device and computer-readable medium |
US11783808B2 (en) | 2020-08-18 | 2023-10-10 | Beijing Bytedance Network Technology Co., Ltd. | Audio content recognition method and apparatus, and device and computer-readable medium |
CN111986655B (en) * | 2020-08-18 | 2022-04-01 | 北京字节跳动网络技术有限公司 | Audio content identification method, device, equipment and computer readable medium |
CN111986655A (en) * | 2020-08-18 | 2020-11-24 | 北京字节跳动网络技术有限公司 | Audio content identification method, device, equipment and computer readable medium |
CN111970311A (en) * | 2020-10-23 | 2020-11-20 | 北京世纪好未来教育科技有限公司 | Session segmentation method, electronic device and computer readable medium |
CN112287914A (en) * | 2020-12-27 | 2021-01-29 | 平安科技(深圳)有限公司 | PPT video segment extraction method, device, equipment and medium |
CN112287914B (en) * | 2020-12-27 | 2021-04-02 | 平安科技(深圳)有限公司 | PPT video segment extraction method, device, equipment and medium |
CN112820293A (en) * | 2020-12-31 | 2021-05-18 | 讯飞智元信息科技有限公司 | Voice recognition method and related device |
CN113207032A (en) * | 2021-04-29 | 2021-08-03 | 读书郎教育科技有限公司 | System and method for increasing subtitles by recording videos in intelligent classroom |
CN113225618A (en) * | 2021-05-06 | 2021-08-06 | 阿里巴巴新加坡控股有限公司 | Video editing method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105845129A (en) | Method and system for dividing sentences in audio and automatic caption generation method and system for video files | |
CN103035247B (en) | Based on the method and device that voiceprint is operated to audio/video file | |
CN105245917B (en) | A kind of system and method for multi-media voice subtitle generation | |
CN107039034B (en) | Rhythm prediction method and system | |
KR100828166B1 (en) | Method of extracting metadata from result of speech recognition and character recognition in video, method of searching video using metadta and record medium thereof | |
Morgan et al. | The meeting project at ICSI | |
CN108962227B (en) | Voice starting point and end point detection method and device, computer equipment and storage medium | |
US9774747B2 (en) | Transcription system | |
JP4600828B2 (en) | Document association apparatus and document association method | |
US5649060A (en) | Automatic indexing and aligning of audio and text using speech recognition | |
CN101751919B (en) | Spoken Chinese stress automatic detection method | |
Brognaux et al. | HMM-based speech segmentation: Improvements of fully automatic approaches | |
CN107305541A (en) | Speech recognition text segmentation method and device | |
US20130035936A1 (en) | Language transcription | |
CN106878805A (en) | A kind of mixed languages subtitle file generation method and device | |
CN111785275A (en) | Voice recognition method and device | |
CN106328146A (en) | Video subtitle generation method and apparatus | |
CN110691258A (en) | Program material manufacturing method and device, computer storage medium and electronic equipment | |
CN106373598A (en) | Audio replay control method and apparatus | |
Haubold et al. | Alignment of speech to highly imperfect text transcriptions | |
CN110740275A (en) | nonlinear editing systems | |
Kurtic et al. | A Corpus of Spontaneous Multi-party Conversation in Bosnian Serbo-Croatian and British English. | |
Yang et al. | An automated analysis and indexing framework for lecture video portal | |
CN113782026A (en) | Information processing method, device, medium and equipment | |
Álvarez et al. | APyCA: Towards the automatic subtitling of television content in Spanish |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20160810 |
|
WD01 | Invention patent application deemed withdrawn after publication |