CN102074235A

CN102074235A - Method of video speech recognition and search

Info

Publication number: CN102074235A
Application number: CN 201010600817
Authority: CN
Inventors: 刘伟奇
Original assignee: Huaqin Telecom Technology Co Ltd
Current assignee: Huaqin Technology Co Ltd
Priority date: 2010-12-20
Filing date: 2010-12-20
Publication date: 2011-05-25
Anticipated expiration: 2030-12-20
Also published as: CN102074235B

Abstract

The invention discloses a method of video speech recognition and search, comprising the following steps: 1) converting all video sounds into a text by speech recognition; 2) independently storing the texts or attaching the texts to videos; 3) selecting a plurality of words which occur maximally in the texts as word labels of the videos, wherein the word labels are added behind the file names of the videos; and 4) searching the word labels of all videos. The method can be used to search the videos widely and specifically, and carry out quick positioning in public security and private goods search.

Description

The method of video speech recognition and retrieval

Technical field

The present invention relates to the video manufacture field, the method for particularly a kind of video speech recognition and retrieval.

Background technology

Present cloud and search technique extensively apply in the various industries, present video search technology is also still in heuristic process, video search is big because of its data volume, be difficult for reason such as expressions with the search of image content or video segment does not also reach detail, and label that the video search of widespread use at present all is based on filename and artificial increase is used as keyword search.Simultaneously, speech recognition technology also has been applied in the every field widely, but only is single speech recognition at present, and most of for only at discerning than the voice of short-movie section, does not do deep research and utilization.Simultaneously at present video can intercept intermediate segment and play or caught sometime sectional drawing content, but is not applied in the search at present.

In view of this, those skilled in the art provide the method for a kind of video speech recognition and retrieval at the problems referred to above.

Summary of the invention

The invention provides the method for a kind of video speech recognition and retrieval, overcome the difficulty of prior art, can carry out extensive and pointed search, also can use this technology to locate fast at public safety and personal objects aspect searching simultaneously video.

The present invention adopts following technical scheme:

The invention provides the method for a kind of video speech recognition and retrieval, may further comprise the steps:

(1) the sound part with all videos changes text into by speech recognition;

(2) text is stored separately respectively or is attached in its video;

(3) choose in the text and the word tag of the highest plurality of words of flat rate to occur, after described word tag is added on the filename of video as this video;

(4) retrieve the word tag of all videos.

Preferably, the text in the described step (2) is preserved with the word file form.

Preferably, the text in the described step (2) is preserved with the TXT document form.

Preferably, the individual character number of the word tag in the described step (3) is defined as 3.

Preferably, the individual character number of the word tag in the described step (3) is defined as 5.

Preferably, the individual character number of the word tag in the described step (3) is defined as 10.

Owing to adopted above-mentioned technology, compared with prior art, the present invention can carry out extensive and pointed search to video, also can use this technology to locate fast at public safety and personal objects aspect searching simultaneously.

Further specify the present invention below in conjunction with drawings and Examples.

Description of drawings

Fig. 1 is the process flow diagram of the method for video speech recognition of the present invention and retrieval;

Fig. 2 is the process flow diagram of method of video speech recognition and the retrieval of embodiment 1;

Fig. 3 is the process flow diagram of method of video speech recognition and the retrieval of embodiment 2;

Fig. 4 is the process flow diagram of method of video speech recognition and the retrieval of embodiment 3.

Embodiment

Introduce three kinds of specific embodiments of the present invention below by Fig. 1 to 4.

As shown in Figure 1, the method for a kind of video speech recognition of the present invention and retrieval may further comprise the steps:

(1) the sound part with all videos changes text into by speech recognition;

(2) text is stored separately respectively or is attached in its video;

(4) retrieve the word tag of all videos.

Text in the described step (2) is preserved with the word file form, or preserves with the TXT document form.

Preferably, the individual character number of the word tag in the described step (3) is defined as 3, or is 5, or is 10.

Actual operating position of the present invention is as follows:

Embodiment 1

As shown in Figure 2, in the public safety, in admission camera video content, obtain audio files and use speech recognition technology to handle accordingly, be stored in high in the clouds, or only preserve text beyond the clouds, with the actual audio-video document of other big data quantity memory bank storages easily, can carry out the screenshotss picture of single text retrieval or text, video segment and corresponding timeslice as result for retrieval at two kinds of situations during retrieval.

Embodiment 2

As shown in Figure 3, during the individual uses, can do the Internet video medium to video file equally similarly retrieves, the special application, enroll the memory location when for example putting article in order and quote corresponding Item Title, import corresponding Item Title during search and can find the article storage position, prevent because of the difficulty problem of forgeing or situation such as non-arrangement people finder exists of looking for, take when for example cleaning up the room and say: the clothing in summer is put here, daddy's shirt is put here, old mother's overcoat is put here, younger brother's pencil is put here, elder sister's cosmetics are all put here, when searching the input shirt, retrieve a plurality of shirt results then, according to screenshotss determine the target shirt timeslice or directly find the position to get final product.This domestic. applications can have been avoided greatly because of can not find the conflict that article or house person cause the reasons such as misunderstanding of the difference of same things memory, and the old man relatively poor at memory is especially convenient.

Embodiment 3

As shown in Figure 4, search at the Internet video medium, high in the clouds is analyzed with sound video and is transformed, and indicate by Time Line in the mode of similar captions, the user only need import corresponding text or says the content of wanting to search for (being converted to text by speech recognition technology equally) and can list corresponding captioned test and the screenshotss picture of video segment and corresponding Time Line during search.Example when the user only remembers the part lines of certain film, uses this technology to carry out video frequency searching at these part lines.

In summary, owing to adopted above-mentioned technology, the present invention can carry out extensive and pointed search to video, also can use this technology to locate fast at public safety and personal objects aspect searching simultaneously.

Above-described embodiment only is used to illustrate technological thought of the present invention and characteristics, its purpose is to make those skilled in the art can understand content of the present invention and implements according to this, can not only limit claim of the present invention with present embodiment, be all equal variation or modifications of doing according to disclosed spirit, still drop in the claim of the present invention.

Claims

1. the method for video speech recognition and retrieval is characterized in that: may further comprise the steps:

(1) the sound part with all videos changes text into by speech recognition;

(2) text is stored separately respectively or is attached in its video;

(4) retrieve the word tag of all videos.

2. the method for video speech recognition as claimed in claim 1 and retrieval, it is characterized in that: the text in the described step (2) is preserved with the word file form.

3. the method for video speech recognition as claimed in claim 1 and retrieval, it is characterized in that: the text in the described step (2) is preserved with the TXT document form.

4. the method for video speech recognition as claimed in claim 1 and retrieval, it is characterized in that: the individual character number of the word tag in the described step (3) is defined as 3.

5. the method for video speech recognition as claimed in claim 1 and retrieval, it is characterized in that: the individual character number of the word tag in the described step (3) is defined as 5.

6. the method for video speech recognition as claimed in claim 1 and retrieval, it is characterized in that: the individual character number of the word tag in the described step (3) is defined as 10.