CN105095316A - Video content marking and searching method - Google Patents

Video content marking and searching method Download PDF

Info

Publication number
CN105095316A
CN105095316A CN201410219768.8A CN201410219768A CN105095316A CN 105095316 A CN105095316 A CN 105095316A CN 201410219768 A CN201410219768 A CN 201410219768A CN 105095316 A CN105095316 A CN 105095316A
Authority
CN
China
Prior art keywords
sound
fragment
audio
file
video content
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410219768.8A
Other languages
Chinese (zh)
Inventor
黄又勋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ali Corp
Original Assignee
Ali Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ali Corp filed Critical Ali Corp
Priority to CN201410219768.8A priority Critical patent/CN105095316A/en
Publication of CN105095316A publication Critical patent/CN105095316A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

An audio and video content searching method in an embodiment of the present invention comprises: (A) dividing an audio effect file into sound segments; (B) marking each sound segment obtained in the step (A), and recording corresponding start time and end time of each sound segment in the audio effect file; (C) generating a correspondence relationship table of a sound, a subtitle and an image, wherein the correspondence relationship table comprises character and graphic identification features that correspond to the subtitle, and the correspond start time and the end time of each sound segment in the audio effect file; and (D) a user performing searching in the correspondence relationship table of the sound, the subtitle and the image, which are generated in the step (C), by using a required search character string or voice.

Description

A kind of video content mark and search method
Technical field
The invention relates to a kind of Video content retrieval method, and relate to one especially and utilize audio frequency (audio) to help the search method of video (video).
Background technology
As far back as twentieth century end, just prophesy 21st century will be the epoch of information to scientist, particularly along with the development of computer technology and network technology, extend route of transmission and the speed of information greatly.Early stage method of consulting a large amount of paper documents in library wastes time and energy, will progressively substitute by electronic retrieval, the Document Quantity in the library that large-scale storage server just can be in large scale more than.Meanwhile, along with the development of technology, information no longer only includes lteral data, and is mostly audio file, video file more intuitively; Wherein video file is also the combination of audio file and image file.But the sound transitions in audio or video file cannot be word by prior art.
Existing search method is very efficient and convenient at retrieval lteral data, but helpless for the audio or video file in a large amount of audio files and video file.Such as, user needs to reach in the audio or video file of a few hours at one to retrieve fragment of sound wherein (such as a word wherein or a few words), existing method is by helpless, and user can only waste the plenty of time and browse whole audio or video file in the whole text.If user cannot learn that the fragment of sound needed for oneself is arranged in which audio file or video file, in the face of a large amount of Voice & Video files, have no way of doing it especially and retrieve.Because many search methods need a large amount of processing times, far beyond practical application acceptable level when processing video content.Temporal bottleneck makes video content quick-searching progress slowly.Therefore, about the technical products phoenix feathers and unicorn horns of video content quick-searching, the product that can directly apply to productive life is rare especially.
Current video content retrieval technique majority realizes based on classical image processing and mode identification technology, is roughly divided into following a few class.
Have technology to extract from video and comprise camera lens, scene, camera lens key frame, scene key frame, key frame image information and face information etc., these information extracted take graphic form as carrier.The retrieval of video content is equivalent to the retrieval of frame of video, is the retrieval of picture in essence.When frame of video is less, this technical method can have good effect.The frame number of general video is more, and particularly the data of monitor video are all in TB, and according to the frequency of 10-20 frame per second, video frame number will be magnanimity rank.To retrieve with numerical value with text and compare, image retrieval needs the long period, and therefore, when video frame number is more, the video search based on frame of video treatment technology will meet with serious time bottleneck.
Summary of the invention
The invention provides the search method of a kind of audio frequency, video content, it can promote the search speed of video.
A labeling method for audio frequency, video content in one embodiment of this invention, comprising: the audio effect file in audio file or video file is divided into fragment of sound by (1); (2) each fragment of sound of step (1) gained is marked, obtain each fragment of sound initial time corresponding in this audio effect file and end time.
In one embodiment of this invention, above-mentioned steps (1) is specially: by quietness technology, and the every a word in audio or video file is divided into a fragment of sound.
In one embodiment of this invention, more comprise after above-mentioned steps (2) and fragment of sound is marked on time point corresponding on time shaft, video file can be skipped to and mark time point arbitrarily.
A search method for audio frequency, video content in one embodiment of this invention, comprising: audio effect file is divided into fragment of sound by (A); (B) each fragment of sound of step (A) gained is marked, and the initial time of the correspondence in this audio effect file of each fragment of sound described in record and end time; (C) generate the mapping table of sound, captions and image, in this mapping table, comprise word corresponding to captions, figure recognition feature, and the initial time of each fragment of sound correspondence in this audio effect file and end time; (D) user uses required search word string or voice, retrieves in the mapping table of the generation sound of step (C) gained, captions and image.
In one embodiment of this invention, above-mentioned steps (A) is specially: by quietness technology, and the every a word in audio or video file is divided into a fragment of sound.
In one embodiment of this invention, more comprise after above-mentioned steps (B) fragment of sound is marked on time point corresponding on time shaft.
In one embodiment of this invention, above-mentioned steps (C) is specially: the mapping table generating sound, captions and image, described mapping table comprises all fragment of sound of this audio effect file, these captions are to deserved fragment of sound time point and characteristics of image, and the initial time of each fragment of sound in this audio effect file and end time.
In one embodiment of this invention, above-mentioned steps (D) is specially: after the mapping table of the generation sound of gained, captions and image is retrieved, video file can be skipped to the time point of mark be consistent most with result for retrieval.
For above-mentioned feature and advantage of the present invention can be become apparent, special embodiment below, and coordinate institute's accompanying drawings to be described in detail below.
Accompanying drawing explanation
Fig. 1 is the process flow diagram of labeling method of one embodiment of the invention audio frequency, video content.
Fig. 2 is the process flow diagram of search method of one embodiment of the invention audio frequency, video content.
Embodiment
With detailed reference to the preferred embodiment of the present invention, the example of described preferred embodiment is described in the accompanying drawings.In addition, all may part, in graphic and embodiment, use the element/component/step of identical label to represent identical or similar portions.
Please refer to Fig. 1, Fig. 1 is the labeling method process flow diagram of one embodiment of the invention audio frequency, video content.The present invention proposes the labeling method 100 of a kind of audio frequency, video content, comprising: when playing film, playing device utilizes voice/captions/figure recognition function, marks corresponding time point on time shaft.
Wherein step 110, when receiving film data by playing device or play film data, can analyze its film data obtained in the lump, carry out voice/captions/figure recognition function.Wherein the present invention is not defined as substance film and extracts, and anyly method provided by the present invention all can be utilized to carry out gathering and searching about the data such as audio/video file and stream.
Step 120, carries out voice/captions/figure identification for film data and carries out identification.When identification success, carry out step 130, by the successful captions of institute's identification, voice, image, relay data (the ground literal information about video) be stored into multimedia database, and the time point of above-mentioned data is recorded; When identification is unsuccessful, still carry out step 130, by this failure information point writing time, and also return step 110 continuation trial identification relevant information.When identification is unsuccessful, it still can be recorded to the initial time of fragment of sound, and voice content possibly only in its fragment of sound cannot identification.
Wherein, described identification step can be divided into fragment of sound for the audio effect file in audio file or video file, and each fragment of sound of step gained is marked, and obtains each fragment of sound initial time corresponding in this audio effect file and end time.Step 120 can be again: by quietness technology, and the every a word in audio or video file is divided into a fragment of sound, and the initial time of the correspondence in this audio or video file of each fragment of sound described in record and end time.Quietness technology is existing a kind of common method of audio or video file being carried out to segmentation.Quietness technology can detect the pause of sound, presets an interval time, then think that a word in sound terminates if pause to exceed.Every a word in audio or video file can be divided into a fragment of sound with this.The method splitting fragment of sound in prior art is not limited to quietness technology, in addition a variety of in addition, does not repeat one by one at this.
Please refer to Fig. 2, Fig. 2 is the search method process flow diagram of one embodiment of the invention audio frequency, video content.The search method 200 of a kind of audio frequency, video content, comprising: audio effect file is divided into fragment of sound in one embodiment of this invention; Each fragment of sound of gained is marked, and the initial time of the correspondence in this audio effect file of each fragment of sound described in record and end time; Generate the mapping table of sound, captions and image, in this mapping table, comprise word corresponding to captions, figure recognition feature, and the initial time of each fragment of sound correspondence in this audio effect file and end time; User uses required search word string or voice, retrieves at the mapping table of the generation sound of gained, captions and image.
Wherein step 210, when receiving film data by playing device or play film data, can analyze its film data obtained in the lump, carry out voice/captions/figure recognition function.Audio effect file is divided into fragment of sound, the initial time that each fragment of sound described in record is corresponding in this audio or video file and end time.Adopt in this way, the accurate location that each fragment of sound is positioned at audio or video file can be obtained.
Step 220, marks each fragment of sound of step 210 gained, and the initial time of the correspondence in this audio effect file of each fragment of sound described in record and end time.Wherein by quietness technology, the every a word in audio or video file is divided into a fragment of sound, more comprises and fragment of sound is marked on time point corresponding on time shaft; By speech recognition software, each fragment of sound is carried out speech recognition, obtain the word corresponding with sound; Or by stenography method, each fragment of sound is carried out speech recognition, obtains the word corresponding with fragment of sound.In order to ensure the accuracy identified, can proofread after speech recognition.
Step 230, successful for obtained identification captions, voice, image, relay data (the ground literal information about video) are stored into multimedia database, and generate the mapping table of sound, captions and image, described mapping table comprises word corresponding to captions, figure recognition feature, all fragment of sound of audio effect file, and the initial time corresponding in this audio effect file of each fragment of sound and end time.
In step 240, user uses required search word string or voice, retrieves in the mapping table of the generation sound of step 230 gained, captions and image.Finally after the mapping table of the generation sound of gained, captions and image is retrieved, video file is skipped to the time point of mark be consistent most with result for retrieval.
In sum, the invention provides the search method of a kind of audio frequency, video content.Described playing device, in response to this kind of search method, when user watches news, hears unclear sentence, can fast rewind to sentence-initial place and repeat to listen to by a key.User also can input crucial lines, and playing device contrast generates sound, the time point of the captions that are consistent is listed with the mapping table of image, for user's fast browsing by captions.In units of the time point of record, (starting points of such as every lines) carry out fast forwarding and fast rewinding again, more increase the efficiency of browsing or searching.Also IP Camera (webcam)/picture can be coordinated to search the time point of special object/shadow lattice appearance.
Although the present invention discloses as above with embodiment; so itself and be not used to limit the present invention; have in any art and usually know the knowledgeable; without departing from the spirit and scope of the present invention; when doing a little change and retouching, therefore protection scope of the present invention is when being as the criterion depending on the accompanying claim person of defining.

Claims (8)

1. a labeling method for audio frequency, video content, comprising:
(1) audio effect file in audio file or video file is divided into fragment of sound;
(2) each fragment of sound of step (1) gained is marked, obtain each fragment of sound initial time corresponding in this audio effect file and end time.
2. the labeling method of audio frequency according to claim 1, video content, is characterized in that, described step (1) is specially: by quietness technology, and the every a word in audio or video file is divided into a fragment of sound.
3. the labeling method of audio frequency according to claim 1, video content, it is characterized in that, more comprise after described step (2) and fragment of sound is marked on time point corresponding on time shaft, video file can be skipped to and mark time point arbitrarily.
4. a search method for audio frequency, video content, comprising:
(A) audio effect file is divided into fragment of sound;
(B) each fragment of sound of step (A) gained is marked, and the initial time of the correspondence in this audio effect file of each fragment of sound described in record and end time;
(C) generate the mapping table of sound, captions and image, in this mapping table, comprise word corresponding to captions, figure recognition feature, and the initial time of each fragment of sound correspondence in this audio effect file and end time;
(D) user uses required search word string or voice, retrieves in the mapping table of the generation sound of step (C) gained, captions and image.
5. the search method of audio frequency according to claim 4, video content, is characterized in that, described step (A) is specially: by quietness technology, and the every a word in audio or video file is divided into a fragment of sound.
6. the search method of audio frequency according to claim 4, video content, is characterized in that, more comprises fragment of sound is marked on time point corresponding on time shaft after described step (B).
7. the search method of audio frequency according to claim 4, video content, it is characterized in that, described step (C) is specially: the mapping table generating sound, captions and image, described mapping table comprises all fragment of sound of this audio effect file, these captions are to deserved fragment of sound time point and characteristics of image, and the initial time of each fragment of sound in this audio effect file and end time.
8. the search method of audio frequency according to claim 4, video content, it is characterized in that, described step (D) is specially: after the mapping table of the generation sound of gained, captions and image is retrieved, video file can be skipped to the time point of mark be consistent most with result for retrieval.
CN201410219768.8A 2014-05-22 2014-05-22 Video content marking and searching method Pending CN105095316A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410219768.8A CN105095316A (en) 2014-05-22 2014-05-22 Video content marking and searching method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410219768.8A CN105095316A (en) 2014-05-22 2014-05-22 Video content marking and searching method

Publications (1)

Publication Number Publication Date
CN105095316A true CN105095316A (en) 2015-11-25

Family

ID=54575765

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410219768.8A Pending CN105095316A (en) 2014-05-22 2014-05-22 Video content marking and searching method

Country Status (1)

Country Link
CN (1) CN105095316A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107124648A (en) * 2017-04-17 2017-09-01 浙江德塔森特数据技术有限公司 The method that advertisement video is originated is recognized by intelligent terminal

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107124648A (en) * 2017-04-17 2017-09-01 浙江德塔森特数据技术有限公司 The method that advertisement video is originated is recognized by intelligent terminal

Similar Documents

Publication Publication Date Title
CN110740387B (en) Barrage editing method, intelligent terminal and storage medium
CN109275046B (en) Teaching data labeling method based on double video acquisition
US10394887B2 (en) Audio and/or video scene detection and retrieval
CN103052953B (en) Messaging device, information processing method
US20090144056A1 (en) Method and computer program product for generating recognition error correction information
CN107968959B (en) Knowledge point segmentation method for teaching video
CN106462640B (en) Contextual search of multimedia content
CN112015949A (en) Video generation method and device, storage medium and electronic equipment
KR101916874B1 (en) Apparatus, method for auto generating a title of video contents, and computer readable recording medium
CN110781328A (en) Video generation method, system, device and storage medium based on voice recognition
CN108121715B (en) Character labeling method and character labeling device
CN104516892A (en) Distribution method, system and terminal of user generated content associated with rich media information
CN111279333B (en) Language-based search of digital content in a network
EP3260968A1 (en) Method and apparatus for displaying electronic picture, and mobile device
US10216989B1 (en) Providing additional information for text in an image
CN104349173A (en) Video repeating method and device
CN112382295A (en) Voice recognition method, device, equipment and readable storage medium
CN108153882A (en) A kind of data processing method and device
CN105095316A (en) Video content marking and searching method
CN116017088A (en) Video subtitle processing method, device, electronic equipment and storage medium
KR101783872B1 (en) Video Search System and Method thereof
JP2008204007A (en) Image dictionary generation method, device and program
TWI684964B (en) Knowledge point mark generation system and method thereof
JP2019061428A (en) Video management method, video management device, and video management system
CN113965798A (en) Video information generating and displaying method, device, equipment and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20151125