CN105912615A

CN105912615A - Human voice content index based audio and video file management method

Info

Publication number: CN105912615A
Application number: CN201610212603.7A
Authority: CN
Inventors: 谭玉娟; 晏志超
Original assignee: Chongqing University
Current assignee: Chongqing University
Priority date: 2016-04-05
Filing date: 2016-04-05
Publication date: 2016-08-31

Abstract

The invention provides a human voice content index based audio and video file management method. The method recognizes human voice in an audio file and a video file through a voice recognition technique, and converts human voice content (conversation sound) into text information; the method serves the text information as important data for identifying essential content of the audio file or the video file through the feature of uniqueness of the conversation sound in different environments. The method serves the text information as file index marker information, and in this way, the audio file or the video file with the same essential content can be detected efficiently, and the audio file or the video file with different essential content can be quickly distinguished.

Description

A kind of Voice & Video file management method based on human speech content indexing

Technical field

The invention belongs to the storage of Voice & Video file and management domain, be specifically related to a kind of Voice & Video file management method based on human speech content indexing.

Background technology

There is multiple different Store form in audio file and video file.Its ultimate principle is to use fixed time interval to sample the audio and video frequency signal in real world, and sampled result is stored with certain resolution form.Wherein audio file comprises the audio file formats that two classes are main: lossless format and damage form, and damaging file format is model based on acousticopsychology, removes the mankind and is difficult to hear or the most unheard sound.Voice signal and visual signal are generally deposited in one file by video file, conveniently play back video and audio content simultaneously.

Due to audio file and the difference of the signals collecting form such as video file sample rate and resolution, the Voice & Video file often bulky of acquired original, it is unfavorable for management and the distribution of its Voice & Video file content.To produced various audio file and video file coding and decoder for compressing and decompressing audio and video frequency signal.For audio file, generally use compression method compression data, be used for propagating on the internet and distributing audio content.For video file, its form is typically a general container, wherein can be respectively put into video information, audio-frequency information and some other information (such as, caption information, pictorial information or Viewing-angle information etc.).Video Codec can carry out encoding and decoding to the video file of specific format, completes the making to video file and broadcasting.

Due to the characteristic of Voice & Video file Codec, the data content of its actual storage tends to rely on specific encoding and decoding algorithm, and the file after identical original document is encoded by different encoders is the most different on file data content.Similarly, for same encoding and decoding algorithm, some minor variations of initial data, some minor variations (content of 0.1 second such as less) of such as audio frequency or video length, the data produced after its coding are also the most different on file data content.

Therefore, same audio or video content often also exists multiple corresponding Voice & Video file, and such as one film often exists the file of multiple different resolution, additionally also has file of different captions group compression etc..Identical for this actual content, and the complete different Voice & Video multimedia file of data content stored, how detecting the concordance in they fleshes and bloods is management and the key technology storing this class file.This patent proposes a kind of Voice & Video file management method based on human speech content indexing, for automatically detecting the flesh and blood of Voice & Video file.

Summary of the invention

The present invention proposes a kind of video based on human speech content indexing and audio file management method, the method utilizes the human speech in speech recognition technology identification Voice & Video file, human speech content therein (such as: telephone voice) is converted into text message, utilize the uniqueness feature of conversation content under varying environment, using text information as identifying this video or a significant data of audio file flesh and blood, thus by the inquiry of multimedia audio and video file content with the operation such as compare and change into the inquiry to text content and compare to wait and operate.If the text information of two videos or two audio files is identical, then the flesh and blood of labelling the two video or two audio files is identical, and otherwise, then the flesh and blood of labelling the two video or audio file is different.Use text information to do file index label information in video or audio file library, just can detect the audio or video file that flesh and blood is identical efficiently, pick out the different video of flesh and blood or audio file rapidly.

A kind of video based on human speech content indexing and audio file management method, concretely comprise the following steps:

(1) using speech recognition technology is text information by the human speech Content Transformation in video and audio file；

(2) use the text information that (1) obtains to identify the flesh and blood of video or audio file, if the text information of two videos or two audio files is identical, then the flesh and blood of labelling the two video or two audio files is identical, otherwise, then the flesh and blood of labelling the two video or audio file is different.Concretely comprise the following steps:

(2.1) in video or audio file library, the text information obtained in employing (1) is as the file index label information of video or audio database；

(2.2) for certain video or audio file, the text information obtained in (1) is used to retrieve in video or audio file library；

(2.3) if existing and there is the file of same file index marker information (the text information i.e. obtained in (1)), then it is that there is the file of identical flesh and blood by the file mark that retrieves in the file of (2.2) and library；Otherwise, then the file of labelling (2.2) is the file with unique flesh and blood；

(2.4) file and the file index label information thereof of (2.2) indication are updated in video or audio file library index.

(3) video with identical flesh and blood (2.3) marked or audio file, use certain data distribution strategy between multiple regions or in same region, to be distributed storage and management (includes that file is read, the multiple file operations such as file is write, file deletion) the identical video of these fleshes and bloods or audio file.Wherein region can be strong body or the soft entities such as server, frame, data center；

In the above-mentioned methods, step (2) both can use the urtext Word message obtained in step (1), can also use the information after the urtext Text extraction in step (1), as long as the information after Chu Liing can one_to_one corresponding with urtext Word message, it is possible to uniquely tagged urtext Word message.

Accompanying drawing explanation

Fig. 1 is the overall flow schematic diagram of the present invention；

Detailed description of the invention

The main body that the present invention relates to is video or audio storage server.

Fig. 1 is the overall flow schematic diagram of the present invention, concretely comprises the following steps:

(1) video or audio file are read；

(2) use speech recognition technology that the human speech Content Transformation in video or audio file is become text information；

(3) the text information (or to the information after text information processing) that employing (2) obtains is as the index marker information of this document；

(4) the index marker information using (3) to obtain is searched in video or audio file library, and whether inquiry exists the file of the index marker information identical with (3)；Wherein the file in video or audio file library all uses text information (will the human speech Content Transformation in file after text information) as file index label information；

(5) if existing and there is the file of same index label information, then it is that there is video or the audio file of identical flesh and blood by the file of (3) indication and the file mark retrieved in library；Otherwise, the file of labelling (3) indication is video or the audio file with unique flesh and blood；

(6) file of (3) indication and file index label information thereof are updated to the library index of (4) indication；

(7) video with identical flesh and blood (5) marked or audio file, use certain data distribution strategy between multiple regions or in same region, to be distributed storage and management (includes that file is read, the multiple file operations such as file is write, file deletion) the identical video of these fleshes and bloods or audio file.Wherein region can be strong body or the soft entities such as server, frame, data center.

Claims

1. video based on human speech content indexing and an audio file management method, concretely comprises the following steps:

(3) video with identical flesh and blood (2.3) marked or audio file, use certain data distribution strategy between multiple regions or in same region, to be distributed storage and management (includes that file is read, the multiple file operations such as file is write, file deletion) the identical video of these fleshes and bloods or audio file.Wherein region can be strong body or the soft entities such as server, frame, data center.

2. in the method described in claim 1, step (2) both can use the urtext Word message obtained in step (1), can also use the information after the urtext Text extraction in step (1), as long as the information after Chu Liing can one_to_one corresponding with urtext Word message, it is possible to uniquely tagged urtext Word message.