CN105912615A - Human voice content index based audio and video file management method - Google Patents

Human voice content index based audio and video file management method Download PDF

Info

Publication number
CN105912615A
CN105912615A CN201610212603.7A CN201610212603A CN105912615A CN 105912615 A CN105912615 A CN 105912615A CN 201610212603 A CN201610212603 A CN 201610212603A CN 105912615 A CN105912615 A CN 105912615A
Authority
CN
China
Prior art keywords
file
video
audio
audio file
flesh
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610212603.7A
Other languages
Chinese (zh)
Inventor
谭玉娟
晏志超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University
Original Assignee
Chongqing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University filed Critical Chongqing University
Priority to CN201610212603.7A priority Critical patent/CN105912615A/en
Publication of CN105912615A publication Critical patent/CN105912615A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7844Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using original textual content or text extracted from visual content or transcript of audio data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/683Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/685Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using automatically derived transcript of audio data, e.g. lyrics
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/06Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
    • G10L21/10Transforming into visible information

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Library & Information Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a human voice content index based audio and video file management method. The method recognizes human voice in an audio file and a video file through a voice recognition technique, and converts human voice content (conversation sound) into text information; the method serves the text information as important data for identifying essential content of the audio file or the video file through the feature of uniqueness of the conversation sound in different environments. The method serves the text information as file index marker information, and in this way, the audio file or the video file with the same essential content can be detected efficiently, and the audio file or the video file with different essential content can be quickly distinguished.

Description

A kind of Voice & Video file management method based on human speech content indexing
Technical field
The invention belongs to the storage of Voice & Video file and management domain, be specifically related to a kind of Voice & Video file management method based on human speech content indexing.
Background technology
There is multiple different Store form in audio file and video file.Its ultimate principle is to use fixed time interval to sample the audio and video frequency signal in real world, and sampled result is stored with certain resolution form.Wherein audio file comprises the audio file formats that two classes are main: lossless format and damage form, and damaging file format is model based on acousticopsychology, removes the mankind and is difficult to hear or the most unheard sound.Voice signal and visual signal are generally deposited in one file by video file, conveniently play back video and audio content simultaneously.
Due to audio file and the difference of the signals collecting form such as video file sample rate and resolution, the Voice & Video file often bulky of acquired original, it is unfavorable for management and the distribution of its Voice & Video file content.To produced various audio file and video file coding and decoder for compressing and decompressing audio and video frequency signal.For audio file, generally use compression method compression data, be used for propagating on the internet and distributing audio content.For video file, its form is typically a general container, wherein can be respectively put into video information, audio-frequency information and some other information (such as, caption information, pictorial information or Viewing-angle information etc.).Video Codec can carry out encoding and decoding to the video file of specific format, completes the making to video file and broadcasting.
Due to the characteristic of Voice & Video file Codec, the data content of its actual storage tends to rely on specific encoding and decoding algorithm, and the file after identical original document is encoded by different encoders is the most different on file data content.Similarly, for same encoding and decoding algorithm, some minor variations of initial data, some minor variations (content of 0.1 second such as less) of such as audio frequency or video length, the data produced after its coding are also the most different on file data content.
Therefore, same audio or video content often also exists multiple corresponding Voice & Video file, and such as one film often exists the file of multiple different resolution, additionally also has file of different captions group compression etc..Identical for this actual content, and the complete different Voice & Video multimedia file of data content stored, how detecting the concordance in they fleshes and bloods is management and the key technology storing this class file.This patent proposes a kind of Voice & Video file management method based on human speech content indexing, for automatically detecting the flesh and blood of Voice & Video file.
Summary of the invention
The present invention proposes a kind of video based on human speech content indexing and audio file management method, the method utilizes the human speech in speech recognition technology identification Voice & Video file, human speech content therein (such as: telephone voice) is converted into text message, utilize the uniqueness feature of conversation content under varying environment, using text information as identifying this video or a significant data of audio file flesh and blood, thus by the inquiry of multimedia audio and video file content with the operation such as compare and change into the inquiry to text content and compare to wait and operate.If the text information of two videos or two audio files is identical, then the flesh and blood of labelling the two video or two audio files is identical, and otherwise, then the flesh and blood of labelling the two video or audio file is different.Use text information to do file index label information in video or audio file library, just can detect the audio or video file that flesh and blood is identical efficiently, pick out the different video of flesh and blood or audio file rapidly.
A kind of video based on human speech content indexing and audio file management method, concretely comprise the following steps:
(1) using speech recognition technology is text information by the human speech Content Transformation in video and audio file;
(2) use the text information that (1) obtains to identify the flesh and blood of video or audio file, if the text information of two videos or two audio files is identical, then the flesh and blood of labelling the two video or two audio files is identical, otherwise, then the flesh and blood of labelling the two video or audio file is different.Concretely comprise the following steps:
(2.1) in video or audio file library, the text information obtained in employing (1) is as the file index label information of video or audio database;
(2.2) for certain video or audio file, the text information obtained in (1) is used to retrieve in video or audio file library;
(2.3) if existing and there is the file of same file index marker information (the text information i.e. obtained in (1)), then it is that there is the file of identical flesh and blood by the file mark that retrieves in the file of (2.2) and library;Otherwise, then the file of labelling (2.2) is the file with unique flesh and blood;
(2.4) file and the file index label information thereof of (2.2) indication are updated in video or audio file library index.
(3) video with identical flesh and blood (2.3) marked or audio file, use certain data distribution strategy between multiple regions or in same region, to be distributed storage and management (includes that file is read, the multiple file operations such as file is write, file deletion) the identical video of these fleshes and bloods or audio file.Wherein region can be strong body or the soft entities such as server, frame, data center;
In the above-mentioned methods, step (2) both can use the urtext Word message obtained in step (1), can also use the information after the urtext Text extraction in step (1), as long as the information after Chu Liing can one_to_one corresponding with urtext Word message, it is possible to uniquely tagged urtext Word message.
Accompanying drawing explanation
Fig. 1 is the overall flow schematic diagram of the present invention;
Detailed description of the invention
The main body that the present invention relates to is video or audio storage server.
Fig. 1 is the overall flow schematic diagram of the present invention, concretely comprises the following steps:
(1) video or audio file are read;
(2) use speech recognition technology that the human speech Content Transformation in video or audio file is become text information;
(3) the text information (or to the information after text information processing) that employing (2) obtains is as the index marker information of this document;
(4) the index marker information using (3) to obtain is searched in video or audio file library, and whether inquiry exists the file of the index marker information identical with (3);Wherein the file in video or audio file library all uses text information (will the human speech Content Transformation in file after text information) as file index label information;
(5) if existing and there is the file of same index label information, then it is that there is video or the audio file of identical flesh and blood by the file of (3) indication and the file mark retrieved in library;Otherwise, the file of labelling (3) indication is video or the audio file with unique flesh and blood;
(6) file of (3) indication and file index label information thereof are updated to the library index of (4) indication;
(7) video with identical flesh and blood (5) marked or audio file, use certain data distribution strategy between multiple regions or in same region, to be distributed storage and management (includes that file is read, the multiple file operations such as file is write, file deletion) the identical video of these fleshes and bloods or audio file.Wherein region can be strong body or the soft entities such as server, frame, data center.

Claims (2)

1. video based on human speech content indexing and an audio file management method, concretely comprises the following steps:
(1) using speech recognition technology is text information by the human speech Content Transformation in video and audio file;
(2) use the text information that (1) obtains to identify the flesh and blood of video or audio file, if the text information of two videos or two audio files is identical, then the flesh and blood of labelling the two video or two audio files is identical, otherwise, then the flesh and blood of labelling the two video or audio file is different.Concretely comprise the following steps:
(2.1) in video or audio file library, the text information obtained in employing (1) is as the file index label information of video or audio database;
(2.2) for certain video or audio file, the text information obtained in (1) is used to retrieve in video or audio file library;
(2.3) if existing and there is the file of same file index marker information (the text information i.e. obtained in (1)), then it is that there is the file of identical flesh and blood by the file mark that retrieves in the file of (2.2) and library;Otherwise, then the file of labelling (2.2) is the file with unique flesh and blood;
(2.4) file and the file index label information thereof of (2.2) indication are updated in video or audio file library index.
(3) video with identical flesh and blood (2.3) marked or audio file, use certain data distribution strategy between multiple regions or in same region, to be distributed storage and management (includes that file is read, the multiple file operations such as file is write, file deletion) the identical video of these fleshes and bloods or audio file.Wherein region can be strong body or the soft entities such as server, frame, data center.
2. in the method described in claim 1, step (2) both can use the urtext Word message obtained in step (1), can also use the information after the urtext Text extraction in step (1), as long as the information after Chu Liing can one_to_one corresponding with urtext Word message, it is possible to uniquely tagged urtext Word message.
CN201610212603.7A 2016-04-05 2016-04-05 Human voice content index based audio and video file management method Pending CN105912615A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610212603.7A CN105912615A (en) 2016-04-05 2016-04-05 Human voice content index based audio and video file management method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610212603.7A CN105912615A (en) 2016-04-05 2016-04-05 Human voice content index based audio and video file management method

Publications (1)

Publication Number Publication Date
CN105912615A true CN105912615A (en) 2016-08-31

Family

ID=56744612

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610212603.7A Pending CN105912615A (en) 2016-04-05 2016-04-05 Human voice content index based audio and video file management method

Country Status (1)

Country Link
CN (1) CN105912615A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107484016A (en) * 2017-09-05 2017-12-15 深圳Tcl新技术有限公司 Video dubs switching method, television set and computer-readable recording medium
CN109582823A (en) * 2018-11-21 2019-04-05 平安科技(深圳)有限公司 Video information chain type storage method, device, computer equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1932819A (en) * 2006-09-25 2007-03-21 北京搜狗科技发展有限公司 Clustering method, searching method and system for interconnection network audio file
CN102037465A (en) * 2008-04-14 2011-04-27 阿尔卡特朗讯 Method for aggregating web feed minimizing redundancies
CN102349087A (en) * 2009-03-12 2012-02-08 谷歌公司 Automatically providing content associated with captured information, such as information captured in real-time
US20140257995A1 (en) * 2011-11-23 2014-09-11 Huawei Technologies Co., Ltd. Method, device, and system for playing video advertisement

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1932819A (en) * 2006-09-25 2007-03-21 北京搜狗科技发展有限公司 Clustering method, searching method and system for interconnection network audio file
CN102037465A (en) * 2008-04-14 2011-04-27 阿尔卡特朗讯 Method for aggregating web feed minimizing redundancies
CN102349087A (en) * 2009-03-12 2012-02-08 谷歌公司 Automatically providing content associated with captured information, such as information captured in real-time
US20140257995A1 (en) * 2011-11-23 2014-09-11 Huawei Technologies Co., Ltd. Method, device, and system for playing video advertisement

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107484016A (en) * 2017-09-05 2017-12-15 深圳Tcl新技术有限公司 Video dubs switching method, television set and computer-readable recording medium
CN109582823A (en) * 2018-11-21 2019-04-05 平安科技(深圳)有限公司 Video information chain type storage method, device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
CN108833973B (en) Video feature extraction method and device and computer equipment
US6915012B2 (en) System and method of storing data in JPEG files
CN106878632B (en) Video data processing method and device
CN101821734B (en) Detection and classification of matches between time-based media
CN103761261B (en) A kind of media search method and device based on speech recognition
US8586847B2 (en) Musical fingerprinting based on onset intervals
US10529340B2 (en) Voiceprint registration method, server and storage medium
CN101673266B (en) Method for searching audio and video contents
EP1760693A1 (en) Extraction and matching of characteristic fingerprints from audio signals
KR102614021B1 (en) Audio content recognition method and device
CN113326387B (en) Intelligent conference information retrieval method
CN104994404A (en) Method and device for obtaining keywords for video
CN114845130A (en) Method for intelligently examining self-media audio and video contents
CN106550268B (en) Video processing method and video processing device
CN105912615A (en) Human voice content index based audio and video file management method
US20050232498A1 (en) System and method of storing data in JPEG files
CN101673262B (en) Method for searching audio content
CN105657575A (en) Video annotation methods and apparatuses
KR101755238B1 (en) Apparatus for restoring speech of damaged multimedia file and method thereof
CN101673267B (en) Method for searching audio and video content
US20080196054A1 (en) Method and system for facilitating analysis of audience ratings data for content
CN106101573A (en) The grappling of a kind of video labeling and matching process
WO2009078613A1 (en) Index database creating apparatus and index database retrieving apparatus
Koenig et al. Forensic authenticity analyses of the metadata in re-encoded WAV files
CN106961626A (en) The method and apparatus that a kind of video metamessage auto-complete is arranged

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20160831