CN105912615A - Human voice content index based audio and video file management method - Google Patents
Human voice content index based audio and video file management method Download PDFInfo
- Publication number
- CN105912615A CN105912615A CN201610212603.7A CN201610212603A CN105912615A CN 105912615 A CN105912615 A CN 105912615A CN 201610212603 A CN201610212603 A CN 201610212603A CN 105912615 A CN105912615 A CN 105912615A
- Authority
- CN
- China
- Prior art keywords
- file
- video
- audio
- audio file
- flesh
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000007726 management method Methods 0.000 title claims abstract description 14
- 238000000034 method Methods 0.000 claims abstract description 8
- 239000003550 marker Substances 0.000 claims abstract description 6
- 210000004369 blood Anatomy 0.000 claims description 25
- 239000008280 blood Substances 0.000 claims description 21
- 238000002372 labelling Methods 0.000 claims description 9
- 230000009466 transformation Effects 0.000 claims description 4
- 238000012217 deletion Methods 0.000 claims description 3
- 230000037430 deletion Effects 0.000 claims description 3
- 238000000605 extraction Methods 0.000 claims description 2
- 230000006835 compression Effects 0.000 description 3
- 238000007906 compression Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 230000010365 information processing Effects 0.000 description 1
- 230000001902 propagating effect Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/783—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/7844—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using original textual content or text extracted from visual content or transcript of audio data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/60—Information retrieval; Database structures therefor; File system structures therefor of audio data
- G06F16/68—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/683—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/685—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using automatically derived transcript of audio data, e.g. lyrics
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/06—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
- G10L21/10—Transforming into visible information
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Library & Information Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a human voice content index based audio and video file management method. The method recognizes human voice in an audio file and a video file through a voice recognition technique, and converts human voice content (conversation sound) into text information; the method serves the text information as important data for identifying essential content of the audio file or the video file through the feature of uniqueness of the conversation sound in different environments. The method serves the text information as file index marker information, and in this way, the audio file or the video file with the same essential content can be detected efficiently, and the audio file or the video file with different essential content can be quickly distinguished.
Description
Technical field
The invention belongs to the storage of Voice & Video file and management domain, be specifically related to a kind of Voice & Video file management method based on human speech content indexing.
Background technology
There is multiple different Store form in audio file and video file.Its ultimate principle is to use fixed time interval to sample the audio and video frequency signal in real world, and sampled result is stored with certain resolution form.Wherein audio file comprises the audio file formats that two classes are main: lossless format and damage form, and damaging file format is model based on acousticopsychology, removes the mankind and is difficult to hear or the most unheard sound.Voice signal and visual signal are generally deposited in one file by video file, conveniently play back video and audio content simultaneously.
Due to audio file and the difference of the signals collecting form such as video file sample rate and resolution, the Voice & Video file often bulky of acquired original, it is unfavorable for management and the distribution of its Voice & Video file content.To produced various audio file and video file coding and decoder for compressing and decompressing audio and video frequency signal.For audio file, generally use compression method compression data, be used for propagating on the internet and distributing audio content.For video file, its form is typically a general container, wherein can be respectively put into video information, audio-frequency information and some other information (such as, caption information, pictorial information or Viewing-angle information etc.).Video Codec can carry out encoding and decoding to the video file of specific format, completes the making to video file and broadcasting.
Due to the characteristic of Voice & Video file Codec, the data content of its actual storage tends to rely on specific encoding and decoding algorithm, and the file after identical original document is encoded by different encoders is the most different on file data content.Similarly, for same encoding and decoding algorithm, some minor variations of initial data, some minor variations (content of 0.1 second such as less) of such as audio frequency or video length, the data produced after its coding are also the most different on file data content.
Therefore, same audio or video content often also exists multiple corresponding Voice & Video file, and such as one film often exists the file of multiple different resolution, additionally also has file of different captions group compression etc..Identical for this actual content, and the complete different Voice & Video multimedia file of data content stored, how detecting the concordance in they fleshes and bloods is management and the key technology storing this class file.This patent proposes a kind of Voice & Video file management method based on human speech content indexing, for automatically detecting the flesh and blood of Voice & Video file.
Summary of the invention
The present invention proposes a kind of video based on human speech content indexing and audio file management method, the method utilizes the human speech in speech recognition technology identification Voice & Video file, human speech content therein (such as: telephone voice) is converted into text message, utilize the uniqueness feature of conversation content under varying environment, using text information as identifying this video or a significant data of audio file flesh and blood, thus by the inquiry of multimedia audio and video file content with the operation such as compare and change into the inquiry to text content and compare to wait and operate.If the text information of two videos or two audio files is identical, then the flesh and blood of labelling the two video or two audio files is identical, and otherwise, then the flesh and blood of labelling the two video or audio file is different.Use text information to do file index label information in video or audio file library, just can detect the audio or video file that flesh and blood is identical efficiently, pick out the different video of flesh and blood or audio file rapidly.
A kind of video based on human speech content indexing and audio file management method, concretely comprise the following steps:
(1) using speech recognition technology is text information by the human speech Content Transformation in video and audio file;
(2) use the text information that (1) obtains to identify the flesh and blood of video or audio file, if the text information of two videos or two audio files is identical, then the flesh and blood of labelling the two video or two audio files is identical, otherwise, then the flesh and blood of labelling the two video or audio file is different.Concretely comprise the following steps:
(2.1) in video or audio file library, the text information obtained in employing (1) is as the file index label information of video or audio database;
(2.2) for certain video or audio file, the text information obtained in (1) is used to retrieve in video or audio file library;
(2.3) if existing and there is the file of same file index marker information (the text information i.e. obtained in (1)), then it is that there is the file of identical flesh and blood by the file mark that retrieves in the file of (2.2) and library;Otherwise, then the file of labelling (2.2) is the file with unique flesh and blood;
(2.4) file and the file index label information thereof of (2.2) indication are updated in video or audio file library index.
(3) video with identical flesh and blood (2.3) marked or audio file, use certain data distribution strategy between multiple regions or in same region, to be distributed storage and management (includes that file is read, the multiple file operations such as file is write, file deletion) the identical video of these fleshes and bloods or audio file.Wherein region can be strong body or the soft entities such as server, frame, data center;
In the above-mentioned methods, step (2) both can use the urtext Word message obtained in step (1), can also use the information after the urtext Text extraction in step (1), as long as the information after Chu Liing can one_to_one corresponding with urtext Word message, it is possible to uniquely tagged urtext Word message.
Accompanying drawing explanation
Fig. 1 is the overall flow schematic diagram of the present invention;
Detailed description of the invention
The main body that the present invention relates to is video or audio storage server.
Fig. 1 is the overall flow schematic diagram of the present invention, concretely comprises the following steps:
(1) video or audio file are read;
(2) use speech recognition technology that the human speech Content Transformation in video or audio file is become text information;
(3) the text information (or to the information after text information processing) that employing (2) obtains is as the index marker information of this document;
(4) the index marker information using (3) to obtain is searched in video or audio file library, and whether inquiry exists the file of the index marker information identical with (3);Wherein the file in video or audio file library all uses text information (will the human speech Content Transformation in file after text information) as file index label information;
(5) if existing and there is the file of same index label information, then it is that there is video or the audio file of identical flesh and blood by the file of (3) indication and the file mark retrieved in library;Otherwise, the file of labelling (3) indication is video or the audio file with unique flesh and blood;
(6) file of (3) indication and file index label information thereof are updated to the library index of (4) indication;
(7) video with identical flesh and blood (5) marked or audio file, use certain data distribution strategy between multiple regions or in same region, to be distributed storage and management (includes that file is read, the multiple file operations such as file is write, file deletion) the identical video of these fleshes and bloods or audio file.Wherein region can be strong body or the soft entities such as server, frame, data center.
Claims (2)
1. video based on human speech content indexing and an audio file management method, concretely comprises the following steps:
(1) using speech recognition technology is text information by the human speech Content Transformation in video and audio file;
(2) use the text information that (1) obtains to identify the flesh and blood of video or audio file, if the text information of two videos or two audio files is identical, then the flesh and blood of labelling the two video or two audio files is identical, otherwise, then the flesh and blood of labelling the two video or audio file is different.Concretely comprise the following steps:
(2.1) in video or audio file library, the text information obtained in employing (1) is as the file index label information of video or audio database;
(2.2) for certain video or audio file, the text information obtained in (1) is used to retrieve in video or audio file library;
(2.3) if existing and there is the file of same file index marker information (the text information i.e. obtained in (1)), then it is that there is the file of identical flesh and blood by the file mark that retrieves in the file of (2.2) and library;Otherwise, then the file of labelling (2.2) is the file with unique flesh and blood;
(2.4) file and the file index label information thereof of (2.2) indication are updated in video or audio file library index.
(3) video with identical flesh and blood (2.3) marked or audio file, use certain data distribution strategy between multiple regions or in same region, to be distributed storage and management (includes that file is read, the multiple file operations such as file is write, file deletion) the identical video of these fleshes and bloods or audio file.Wherein region can be strong body or the soft entities such as server, frame, data center.
2. in the method described in claim 1, step (2) both can use the urtext Word message obtained in step (1), can also use the information after the urtext Text extraction in step (1), as long as the information after Chu Liing can one_to_one corresponding with urtext Word message, it is possible to uniquely tagged urtext Word message.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610212603.7A CN105912615A (en) | 2016-04-05 | 2016-04-05 | Human voice content index based audio and video file management method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610212603.7A CN105912615A (en) | 2016-04-05 | 2016-04-05 | Human voice content index based audio and video file management method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN105912615A true CN105912615A (en) | 2016-08-31 |
Family
ID=56744612
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610212603.7A Pending CN105912615A (en) | 2016-04-05 | 2016-04-05 | Human voice content index based audio and video file management method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105912615A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107484016A (en) * | 2017-09-05 | 2017-12-15 | 深圳Tcl新技术有限公司 | Video dubs switching method, television set and computer-readable recording medium |
CN109582823A (en) * | 2018-11-21 | 2019-04-05 | 平安科技(深圳)有限公司 | Video information chain type storage method, device, computer equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1932819A (en) * | 2006-09-25 | 2007-03-21 | 北京搜狗科技发展有限公司 | Clustering method, searching method and system for interconnection network audio file |
CN102037465A (en) * | 2008-04-14 | 2011-04-27 | 阿尔卡特朗讯 | Method for aggregating web feed minimizing redundancies |
CN102349087A (en) * | 2009-03-12 | 2012-02-08 | 谷歌公司 | Automatically providing content associated with captured information, such as information captured in real-time |
US20140257995A1 (en) * | 2011-11-23 | 2014-09-11 | Huawei Technologies Co., Ltd. | Method, device, and system for playing video advertisement |
-
2016
- 2016-04-05 CN CN201610212603.7A patent/CN105912615A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1932819A (en) * | 2006-09-25 | 2007-03-21 | 北京搜狗科技发展有限公司 | Clustering method, searching method and system for interconnection network audio file |
CN102037465A (en) * | 2008-04-14 | 2011-04-27 | 阿尔卡特朗讯 | Method for aggregating web feed minimizing redundancies |
CN102349087A (en) * | 2009-03-12 | 2012-02-08 | 谷歌公司 | Automatically providing content associated with captured information, such as information captured in real-time |
US20140257995A1 (en) * | 2011-11-23 | 2014-09-11 | Huawei Technologies Co., Ltd. | Method, device, and system for playing video advertisement |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107484016A (en) * | 2017-09-05 | 2017-12-15 | 深圳Tcl新技术有限公司 | Video dubs switching method, television set and computer-readable recording medium |
CN109582823A (en) * | 2018-11-21 | 2019-04-05 | 平安科技(深圳)有限公司 | Video information chain type storage method, device, computer equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108833973B (en) | Video feature extraction method and device and computer equipment | |
US6915012B2 (en) | System and method of storing data in JPEG files | |
CN106878632B (en) | Video data processing method and device | |
CN101821734B (en) | Detection and classification of matches between time-based media | |
CN103761261B (en) | A kind of media search method and device based on speech recognition | |
US8586847B2 (en) | Musical fingerprinting based on onset intervals | |
US10529340B2 (en) | Voiceprint registration method, server and storage medium | |
CN101673266B (en) | Method for searching audio and video contents | |
EP1760693A1 (en) | Extraction and matching of characteristic fingerprints from audio signals | |
KR102614021B1 (en) | Audio content recognition method and device | |
CN113326387B (en) | Intelligent conference information retrieval method | |
CN104994404A (en) | Method and device for obtaining keywords for video | |
CN114845130A (en) | Method for intelligently examining self-media audio and video contents | |
CN106550268B (en) | Video processing method and video processing device | |
CN105912615A (en) | Human voice content index based audio and video file management method | |
US20050232498A1 (en) | System and method of storing data in JPEG files | |
CN101673262B (en) | Method for searching audio content | |
CN105657575A (en) | Video annotation methods and apparatuses | |
KR101755238B1 (en) | Apparatus for restoring speech of damaged multimedia file and method thereof | |
CN101673267B (en) | Method for searching audio and video content | |
US20080196054A1 (en) | Method and system for facilitating analysis of audience ratings data for content | |
CN106101573A (en) | The grappling of a kind of video labeling and matching process | |
WO2009078613A1 (en) | Index database creating apparatus and index database retrieving apparatus | |
Koenig et al. | Forensic authenticity analyses of the metadata in re-encoded WAV files | |
CN106961626A (en) | The method and apparatus that a kind of video metamessage auto-complete is arranged |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20160831 |