CN103761261B - A kind of media search method and device based on speech recognition - Google Patents

A kind of media search method and device based on speech recognition Download PDF

Info

Publication number
CN103761261B
CN103761261B CN201310752909.8A CN201310752909A CN103761261B CN 103761261 B CN103761261 B CN 103761261B CN 201310752909 A CN201310752909 A CN 201310752909A CN 103761261 B CN103761261 B CN 103761261B
Authority
CN
China
Prior art keywords
media
speech recognition
unit
search
index
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310752909.8A
Other languages
Chinese (zh)
Other versions
CN103761261A (en
Inventor
高鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Purple Winter Of Beijing Is Voice Technology Co Ltd With Keen Determination
Original Assignee
Purple Winter Of Beijing Is Voice Technology Co Ltd With Keen Determination
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Purple Winter Of Beijing Is Voice Technology Co Ltd With Keen Determination filed Critical Purple Winter Of Beijing Is Voice Technology Co Ltd With Keen Determination
Priority to CN201310752909.8A priority Critical patent/CN103761261B/en
Publication of CN103761261A publication Critical patent/CN103761261A/en
Application granted granted Critical
Publication of CN103761261B publication Critical patent/CN103761261B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/43Querying
    • G06F16/432Query formulation
    • G06F16/433Query formulation using audio data
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search

Abstract

The present invention provides a kind of media search method and device based on speech recognition, and the method comprising the steps of:Obtain the content index and metadata information of media;Associate the content index and metadata information sets up media knowledge base;Parse the user's voice inquirement collected and obtain corresponding speech recognition text;Media research is carried out to the media knowledge base according to the speech recognition text.A kind of media search method and device based on speech recognition disclosed in this invention, more convenient interactive mode is provided the user with using speech recognition in front end, media content is identified in rear end, and corresponding knowledge base is built, it is finally reached the purpose that user is scanned for by voice to media content;Compared to traditional way of search, interactive voice mode is provided the user with client so that interaction more facilitates nature;Media are carried out based on content recognition and based on Natural Language Search in service end so that search of the user to media content is more accurate.

Description

A kind of media search method and device based on speech recognition
Technical field
The present invention relates to processing data information technical field, more particularly to a kind of media search method based on speech recognition And device.
Background technology
With the development of internet and digital multimedia content, Digital Media especially digital video is into explosive increase Situation, fast and effectively retrieval how is carried out to Digital Media has important application value.Because Digital Media is non-structural The data of change, want to reach the demand for retrieving digital media content, it is necessary to the content to Digital Media is identified, than It is text such as by the speech recognition in audio, the subtitle recognition in video is text, is then retrieved using text.
On the other hand, mobile Internet flourishes, interacting as important research direction between people and smart machine.Language Sound interaction receives the attention of enterprise and liking for user as a kind of means of man-machine interaction most naturally easily.
Speech recognition technology(Automatic Speech Recognition,ASR), also referred to as automatic speech recognition, Its target is that vocabulary Content Transformation in the voice by the mankind is computer-readable input, such as button, binary coding or Person's character string.It is different from Speaker Identification and speaker verification, the latter attempt identification or confirm send voice speaker and Non- vocabulary content included in it.
The application of speech recognition technology includes phonetic dialing, Voice Navigation, indoor equipment control, voice document searching, letter Single dictation data inputting etc..Speech recognition technology and other natural language processing techniques such as machine translation and speech synthesis technique It is combined, more complicated application can be constructed, such as based on media content and speech-sound intelligent interactive media searching method.
The content of the invention
The present invention solves the technical problem of how to provide a kind of media search method based on speech recognition and dress Put, realize that user carries out more accurate search to media content by voice.
For this purpose, the invention provides a kind of media search method based on speech recognition, this method includes following step Suddenly:
Obtain the content index and metadata information of media;
Associate the content index and metadata information sets up media knowledge base;
Parse the user's voice inquirement collected and obtain corresponding speech recognition text;
Media research is carried out to the media knowledge base according to the speech recognition text.
Wherein, the content index for obtaining media, is specifically included:
It is unified coded format by the media transcoding received;
The index of program layer is obtained to the mark that the media after transcoding carry out program terminal;
The index of slice layer is obtained to the cutting that each program in program layer carries out fragment;
Speech recognition is carried out to each fragment in the slice layer and subtitle recognition obtains the index of character layer.
Wherein, each fragment in the slice layer carries out speech recognition and subtitle recognition obtains the mark of character layer Draw, specifically include:
Obtain the corresponding speech recognition text in identification path and the identification path of the speech recognition;
Obtain the corresponding subtitle recognition text in identification path and the identification path of the subtitle recognition;
Merge the speech recognition text and subtitle recognition text, obtain the index of character layer.
Wherein, the metadata information includes but is not limited to director, personage, subject, type, region and the language of media Speech.
Wherein, the user's voice inquirement collected that parses obtains corresponding speech recognition text, specifically includes:
Receive the audio signal of user's voice inquirement;
The decoded audio signal is segmented;
Carry out speech recognition respectively to each section audio signal and obtain section identification text;
Described section of identification text for merging each section audio signal obtains the speech recognition text.
Wherein, it is described that media research is carried out to the media knowledge base according to the speech recognition text, specifically include:
Metadata information present in the speech recognition text is extracted according to default metadata dictionary;
Metasearch is carried out in the media knowledge base according to the metadata information of extraction;
Key word information present in the speech recognition text is extracted according to default keywords database;
Keyword search is carried out in the media knowledge base according to the key word information;
Merge the result of the metasearch and the result of the keyword search obtains complete search result.
In addition, the present invention also proposes a kind of media research device based on speech recognition, including:
Acquisition module, relating module, parsing module and search module;
Acquisition module, content index and metadata information for obtaining media;
Relating module, media knowledge is set up for associating content index and metadata information that the acquisition module gets Storehouse;
Parsing module, corresponding speech recognition text is obtained for parsing the user's voice inquirement collected;
Search module, for carrying out media research to the media knowledge base according to the speech recognition text.
Wherein, the acquisition module includes:Transcoding units, indexing unit, cutting unit and recognition unit;
Transcoding units, the media transcoding for that will receive is unified coded format;
Indexing unit, the index of program layer is obtained for the media after transcoding to be carried out with the mark of program terminal;
Cutting unit, the cutting for carrying out fragment to the program in the media obtains the index of slice layer;
Recognition unit, for carrying out speech recognition respectively to the fragment in the program and subtitle recognition obtains character layer Index.
Wherein, the parsing module includes:Receiving unit, decoding unit, segmenting unit, recognition unit and combining unit;
Receiving unit, the audio signal for receiving user's voice inquirement;
Decoding unit, for being decoded to the audio signal;
Segmenting unit, for the decoded audio signal to be segmented;
Recognition unit, section identification text is obtained for carrying out speech recognition respectively to each section audio signal;
Combining unit, the described section of identification text for merging each section audio signal obtains the speech recognition text.
Wherein, the search module includes:First extraction unit, the first search unit, the second extraction unit, the second search Unit and combining unit;
First extraction unit, for extracting first number present in the speech recognition text according to default metadata dictionary It is believed that breath;
First search unit, carries out metadata in the media knowledge base for the metadata information according to extraction and searches Rope;
Second extraction unit, for extracting keyword present in the speech recognition text according to default keywords database Information;
Second search unit, for carrying out keyword search in the media knowledge base according to the key word information;
Combining unit, metasearch result and second search unit for merging first search unit Keyword search results obtain complete search result.
By using a kind of media search method and device based on speech recognition disclosed in this invention, used in front end Media content is identified in rear end so as to provide the user with more convenient interactive mode for interactive voice, and builds corresponding Knowledge base, be finally reached the purpose that user is scanned for by voice to media content;, should compared to traditional way of search Method provides the user with interactive voice mode in client so that interaction more facilitates nature;Base is carried out to media in service end In content recognition and based on Natural Language Search so that search of the user to media content is more accurate.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing There is the accompanying drawing used required in technology description to be briefly described, it should be apparent that, drawings in the following description are only this Some embodiments of invention, for those of ordinary skill in the art, on the premise of not paying creative work, can be with Other accompanying drawings are obtained according to these accompanying drawings.
Fig. 1:It is a kind of flow chart of the media search method based on speech recognition of the present invention;
Fig. 2:It is a kind of FB(flow block) for media search method based on speech recognition that the embodiment of the present invention one is recorded;
Fig. 3:It is a kind of module map of the media research device based on speech recognition of the present invention.
Embodiment
Below in conjunction with the accompanying drawing of the present invention, the technical scheme in the embodiment of the present invention is clearly and completely described, Obviously described embodiment is a part of embodiment of the invention, rather than whole embodiments.Based on the implementation in the present invention Example, the every other embodiment that those of ordinary skill in the art are obtained under the premise of creative work is not made is belonged to The scope of protection of the invention.
The embodiment of the present invention one proposes a kind of media search method based on speech recognition, as shown in figure 1, including following Step:
Step 101, the content index and metadata information of media are obtained;
Step 102, associate the content index and metadata information sets up media knowledge base;
Step 103, user's voice inquirement that parsing is collected obtains corresponding speech recognition text;
Step 104, media research is carried out to the media knowledge base according to the speech recognition text.
Wherein, the content index for obtaining media, is specifically included:
It is unified coded format by the media transcoding received;
The index of program layer is obtained to the mark that the media after transcoding carry out program terminal;
The index of slice layer is obtained to the cutting that each program in program layer carries out fragment;
Speech recognition is carried out to each fragment in the slice layer and subtitle recognition obtains the index of character layer.
In the present embodiment, as shown in Fig. 2 carrying out content processing to the media obtained from different signal source, obtain on matchmaker The index held in vivo, specific steps include:
The media obtained from different signal source are transcoded onto to unified form.Media data is gathered, can both pass through broadcast Television acquisition card, gathers broadcast television signal, and the video on network can also be captured by web crawlers, can also be by other Mode, such as directly obtain from storage medium.For the video file for the digitized all kinds of forms being collected into, use Ffmpeg or other video code conversion softwares, are defined unified form by its transcoding.For example, the video after transcoding File is avi forms, and the audio file after transcoding is wav forms, and facing the media file storage after transcoding to computer When memory block.
The mark of the terminal of program is carried out for the media comprising multiple programs, the index of program layer is obtained.Program rises The mark of stop can be by the way of handmarking, it would however also be possible to employ the mode that computer is marked automatically.For using calculating The mode that machine is marked automatically, its step includes:
Collect the media file for all programs for needing to make marks, one program of each file correspondence;
The fingerprint characteristic of contents of media files is extracted, and saves as corresponding template;
Media file to be marked is matched with template.When on certain part of media file and some template matches When, the fragment of the media file matched is beginning and ending time of the program in media file corresponding to the template.
For a program, the cutting of camera lens fragment is carried out, the index of slice layer is obtained.Camera lens is video camera from being opened to The successive image frame that this process record gets off is closed, it is the minimal physical unit in video.Inside camera lens, adjacent and phase Near frame of video feature is close, varies less, but at camera lens conversion, obvious change often occurs for the feature of frame of video. The step of shot segmentation, is as follows:
Selected characteristic describes two field picture, it is preferred that extracts the colored rgb space histogram per two field picture and is used as the two field picture Feature.
Frame difference is calculated, that is, calculates the histogrammic difference of the colored rgb space of interframe.It is preferred that, entered using Euclidean distance Row measurement;
Selection Strategy analyzes these differences and determines shot boundary, it is preferred that determine camera lens using sliding window detection method Border.The index of slice layer is beginning and the result time point of camera lens.
Wherein, each fragment in the slice layer carries out speech recognition and subtitle recognition obtains the mark of character layer Draw, specifically include:
Step 301, the corresponding speech recognition text in identification path and the identification path of the speech recognition is obtained;
Step 302, the corresponding subtitle recognition text in identification path and the identification path of the subtitle recognition is obtained;
Step 303, merge the speech recognition text and subtitle recognition text, obtain the index of character layer.
For the video segment with voice or captions in the present embodiment, speech recognition and subtitle recognition are carried out respectively, and The voice identification result and caption identification of the video segment with voice and captions are merged, character layer is obtained Index.Captions and voice are description video media content important clues, and specific steps include:
Using automatic continuous audio recognition method, the preceding M bars for obtaining speech recognition preferably recognize path, and per paths Corresponding recognition result;
Using subtitle recognition method, the preceding M bars for obtaining subtitle recognition preferably recognize path, and the corresponding knowledge per paths Other result;
The preceding M bars that the preceding M bars of described speech recognition preferably recognize path and described subtitle recognition are preferably recognized into road Candidate result figure is merged into footpath;
To each candidate word collection in described candidate result figure, the word of highest scoring is selected to make according to ballot score rule It should be the corresponding word of node, and finally give the recognition result of fusion.The time point that the recognition result occurs together with each word, as The index of character layer is preserved.
Wherein, the metadata information includes but is not limited to director, personage, subject, type, region and the language of media Speech.
Wherein, the user's voice inquirement collected that parses obtains corresponding speech recognition text, specifically includes:
Step 401, the audio signal of user's voice inquirement is received;
Step 402, the decoded audio signal is segmented;
Step 403, carry out speech recognition respectively to each section audio signal and obtain section identification text;
Step 404, described section of identification text for merging each section audio signal obtains the speech recognition text.
The speech polling on media that user is gathered in the present embodiment is inputted.The speech polling input of user passes through client End recording module is recorded, and after compressed encoding, is handled by network transmission to server end.
Speech polling input to user carries out speech recognition, obtains the text results of speech recognition, its specific steps bag Include:The audio signal from client is received, and is decoded.It is preferred that, can be PCM format by audio decoder;After decoding Audio signal according to Jing Yin carry out end-point detection so that by continuous audio signal cutting be several sections;It will distinguish per section audio It is sent in distributed continuous speech recognition engine, the parallel processing for carrying out speech recognition;Reclaim the voice of all parallel processings The result fragment of identification, and splicing obtains complete voice identification result.
Wherein, it is described that media research is carried out to the media knowledge base according to the speech recognition text, specifically include:
Metadata information present in the speech recognition text is extracted according to default metadata dictionary;
Metasearch is carried out in the media knowledge base according to the metadata information of extraction;
Key word information present in the speech recognition text is extracted according to default keywords database;
Keyword search is carried out in the media knowledge base according to the key word information;
Merge the result of the metasearch and the result of the keyword search obtains complete search result.
Semantic understanding is carried out to the text results of speech recognition in the embodiment of the present invention, searching to the knowledge base of media is triggered Rope order, and search result is returned into user, the text results of speech recognition carry out semantic reason to text as query text Solution refers to, to extracting crucial, significant word in text, be used as the query word of query and search.This step provides two kinds of extractions and looked into The method for asking word, a kind of is that the query word based on metadata is extracted, and another is the extraction of the query word based on entity, concept. The search command to the knowledge base of media is triggered, and search result is returned into user, its specific steps includes:
Member in text results based on predefined metadata dictionary and user's query grammar Rule Extraction speech recognition Data message.
The mark of metadata is carried out to the new inquiry question sentence of user by the metadata information of the film and TV media of collection.
The user of mark is inquired about into question sentence and the user's query grammar collected in advance rule is matched, obtains most suitable The mark of metadata.
It is extended for metadata information, the metadata information after being expanded.Described extension is mainly basis and known Know the extension that collection of illustrative plates carries out synonym, related term etc..
The key word informations such as entity, concept are extracted from the text results of speech recognition.Using machine learning method from The language material learning of magnanimity is to key word informations such as entity, concepts.These information are recycled from the text results of speech recognition Extract the keywords such as entity, concept.
Key word information is extended, the key word information after being expanded.Described extension is mainly according to knowledge Collection of illustrative plates carries out the extension of synonym, related term etc..
Metasearch is carried out from the knowledge base of media using metadata information, the search knot based on metadata is obtained Really.
Keyword search is carried out using key word information and from the knowledge base of media, the search knot based on keyword is obtained Really;
Search result based on metadata and the search result based on keyword are merged, final search result is obtained, And return result to user.
In addition, a kind of media research device based on speech recognition is also proposed in the embodiment of the present invention two, as shown in figure 3, Including:
Acquisition module 1, relating module 2, parsing module 3 and search module 4;
Acquisition module 1, content index and metadata information for obtaining media;
Relating module 2, sets up media and knows for associating content index and metadata information that the acquisition module gets Know storehouse;
Parsing module 3, corresponding speech recognition text is obtained for parsing the user's voice inquirement collected;
Search module 4, for carrying out media research to the media knowledge base according to the speech recognition text.
Wherein, the acquisition module includes:Transcoding units, indexing unit, cutting unit and recognition unit;
Transcoding units, the media transcoding for that will receive is unified coded format;
Indexing unit, the index of program layer is obtained for the media after transcoding to be carried out with the mark of program terminal;
Cutting unit, the cutting for carrying out fragment to the program in the media obtains the index of slice layer;
Recognition unit, for carrying out speech recognition respectively to the fragment in the program and subtitle recognition obtains character layer Index.
Wherein, the parsing module includes:Receiving unit, decoding unit, segmenting unit, recognition unit and combining unit;
Receiving unit, the audio signal for receiving user's voice inquirement;
Decoding unit, for being decoded to the audio signal;
Segmenting unit, for the decoded audio signal to be segmented;
Recognition unit, section identification text is obtained for carrying out speech recognition respectively to each section audio signal;
Combining unit, the described section of identification text for merging each section audio signal obtains the speech recognition text.
Wherein, the search module includes:First extraction unit, the first search unit, the second extraction unit, the second search Unit and combining unit;
First extraction unit, for extracting first number present in the speech recognition text according to default metadata dictionary It is believed that breath;
First search unit, carries out metadata in the media knowledge base for the metadata information according to extraction and searches Rope;
Second extraction unit, for extracting keyword present in the speech recognition text according to default keywords database Information;
Second search unit, for carrying out keyword search in the media knowledge base according to the key word information;
Combining unit, metasearch result and second search unit for merging first search unit Keyword search results obtain complete search result.
By using a kind of media search method and device based on speech recognition disclosed in this invention, used in front end Media content is identified in rear end so as to provide the user with more convenient interactive mode for interactive voice, and builds corresponding Knowledge base, be finally reached the purpose that user is scanned for by voice to media content;And compared to traditional searcher Formula, this method provides the user with interactive voice mode in client so that interaction more facilitates nature;Media are entered in service end Row is based on content recognition and based on Natural Language Search so that search of the user to media content is more accurate.
The above embodiments are merely illustrative of the technical solutions of the present invention and it is non-limiting, reference only to preferred embodiment to this hair It is bright to be described in detail.It will be understood by those within the art that, technical scheme can be modified Or equivalent substitution, without departing from the spirit and scope of technical solution of the present invention, it all should cover in scope of the presently claimed invention It is central.

Claims (8)

1. a kind of media search method based on speech recognition, it is characterised in that including step:
Obtain the content index and metadata information of media;
Associate the content index and metadata information sets up media knowledge base;
Parse the user's voice inquirement collected and obtain corresponding speech recognition text;
Media research is carried out to the media knowledge base according to the speech recognition text;
The content index for obtaining media, is specifically included:
It is unified coded format by the media transcoding received;
The index of program layer is obtained to the mark that the media after transcoding carry out program terminal;
The index of slice layer is obtained to the cutting that each program in program layer carries out fragment;
Speech recognition is carried out to each fragment in the slice layer and subtitle recognition obtains the index of character layer;
The mark of the media progress program terminal to after transcoding obtains the index of program layer, including:
Collect the media file for all programs for needing to make marks, one program of each file correspondence;
The fingerprint characteristic of contents of media files is extracted, and saves as corresponding template;
Media file to be marked is matched with template, when on certain part of media file and some template matches, The fragment for the media file mixed is beginning and ending time of the program in media file corresponding to the template;
For each program, the cutting of camera lens fragment is carried out, the index of slice layer is obtained, step is as follows:
Selected characteristic describes two field picture, extracts the colored rgb space histogram per two field picture as the feature of the two field picture;
Frame difference is calculated, that is, calculates the histogrammic difference of the colored rgb space of interframe;
Selection Strategy analyzes these differences and determines shot boundary, and the index of slice layer is beginning and the result time of camera lens Point.
2. according to the method described in claim 1, it is characterised in that each fragment in the slice layer carries out voice knowledge Other and subtitle recognition obtains the index of character layer, specifically includes:
Obtain the corresponding speech recognition text in identification path and the identification path of the speech recognition;
Obtain the corresponding subtitle recognition text in identification path and the identification path of the subtitle recognition;
Merge the speech recognition text and subtitle recognition text, obtain the index of character layer.
3. according to the method described in claim 1, it is characterised in that the metadata information includes but is not limited to leading for media Drill, personage, subject, type, region and language.
4. according to the method described in claim 1, it is characterised in that the user's voice inquirement collected that parses obtains correspondence Speech recognition text, specifically include:
Receive the audio signal of user's voice inquirement;
The decoded audio signal is segmented;
Carry out speech recognition respectively to each section audio signal and obtain section identification text;
Described section of identification text for merging each section audio signal obtains the speech recognition text.
5. according to the method described in claim 1, it is characterised in that described that the media are known according to the speech recognition text Know storehouse and carry out media research, specifically include:
Metadata information present in the speech recognition text is extracted according to default metadata dictionary;
Metasearch is carried out in the media knowledge base according to the metadata information of extraction;
Key word information present in the speech recognition text is extracted according to default keywords database;
Keyword search is carried out in the media knowledge base according to the key word information;
Merge the result of the metasearch and the result of the keyword search obtains complete search result.
6. a kind of media research device based on speech recognition, it is characterised in that including acquisition module, relating module, parsing mould Block and search module;
Acquisition module, content index and metadata information for obtaining media;
Relating module, media knowledge base is set up for associating content index and metadata information that the acquisition module gets;
Parsing module, corresponding speech recognition text is obtained for parsing the user's voice inquirement collected;
Search module, for carrying out media research to the media knowledge base according to the speech recognition text
The acquisition module includes:Transcoding units, indexing unit, cutting unit and recognition unit;
Transcoding units, the media transcoding for that will receive is unified coded format;
Indexing unit, the index of program layer is obtained for the media after transcoding to be carried out with the mark of program terminal;
Cutting unit, the cutting for carrying out fragment to the program in the media obtains the index of slice layer;
Recognition unit, for carrying out speech recognition respectively to the fragment in the program and subtitle recognition obtains the mark of character layer Draw;
The indexing unit, specifically for:
Collect the media file for all programs for needing to make marks, one program of each file correspondence;
The fingerprint characteristic of contents of media files is extracted, and saves as corresponding template;
Media file to be marked is matched with template, when on certain part of media file and some template matches, The fragment for the media file mixed is beginning and ending time of the program in media file corresponding to the template;
The cutting unit, specifically for for each program, carrying out the cutting of camera lens fragment, obtaining the index of slice layer, Step is as follows:
Selected characteristic describes two field picture, extracts the colored rgb space histogram per two field picture as the feature of the two field picture;
Frame difference is calculated, that is, calculates the histogrammic difference of the colored rgb space of interframe;
Selection Strategy analyzes these differences and determines shot boundary, and the index of slice layer is beginning and the result time of camera lens Point.
7. device according to claim 6, it is characterised in that the parsing module includes:Receiving unit, decoding unit, Segmenting unit, recognition unit and combining unit;
Receiving unit, the audio signal for receiving user's voice inquirement;
Decoding unit, for being decoded to the audio signal;
Segmenting unit, for the decoded audio signal to be segmented;
Recognition unit, section identification text is obtained for carrying out speech recognition respectively to each section audio signal;
Combining unit, the described section of identification text for merging each section audio signal obtains the speech recognition text.
8. device according to claim 6, it is characterised in that the search module includes:First extraction unit, first are searched Cable elements, the second extraction unit, the second search unit and combining unit;
First extraction unit, believes for extracting metadata present in the speech recognition text according to default metadata dictionary Breath;
First search unit, metasearch is carried out for the metadata information according to extraction in the media knowledge base;
Second extraction unit, believes for extracting keyword present in the speech recognition text according to default keywords database Breath;
Second search unit, for carrying out keyword search in the media knowledge base according to the key word information;
Combining unit, for merging the metasearch result of first search unit and the key of second search unit Word search result obtains complete search result.
CN201310752909.8A 2013-12-31 2013-12-31 A kind of media search method and device based on speech recognition Active CN103761261B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310752909.8A CN103761261B (en) 2013-12-31 2013-12-31 A kind of media search method and device based on speech recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310752909.8A CN103761261B (en) 2013-12-31 2013-12-31 A kind of media search method and device based on speech recognition

Publications (2)

Publication Number Publication Date
CN103761261A CN103761261A (en) 2014-04-30
CN103761261B true CN103761261B (en) 2017-07-28

Family

ID=50528498

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310752909.8A Active CN103761261B (en) 2013-12-31 2013-12-31 A kind of media search method and device based on speech recognition

Country Status (1)

Country Link
CN (1) CN103761261B (en)

Families Citing this family (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105279168B (en) 2014-06-24 2019-04-23 华为技术有限公司 Support data query method, open platform and the user terminal of natural language
CN106162323A (en) * 2015-03-26 2016-11-23 无锡天脉聚源传媒科技有限公司 A kind of video data handling procedure and device
CN104951536B (en) * 2015-06-18 2021-01-22 百度在线网络技术(北京)有限公司 Searching method and device
CN105635849B (en) * 2015-12-25 2018-06-05 网易传媒科技(北京)有限公司 Text display method and device when multimedia file plays
US9848215B1 (en) * 2016-06-21 2017-12-19 Google Inc. Methods, systems, and media for identifying and presenting users with multi-lingual media content items
CN106294637A (en) * 2016-08-03 2017-01-04 王晓光 Realize the method and system of phonetic search
WO2018023482A1 (en) * 2016-08-03 2018-02-08 王晓光 Method and system for implementing voice search
CN106294692A (en) * 2016-08-06 2017-01-04 马岩 Realize the method and system of phonetic search
CN107071542B (en) * 2017-04-18 2020-07-28 百度在线网络技术(北京)有限公司 Video clip playing method and device
CN107424640A (en) * 2017-07-27 2017-12-01 上海与德科技有限公司 A kind of audio frequency playing method and device
CN107919127B (en) * 2017-11-27 2021-04-06 北京地平线机器人技术研发有限公司 Voice processing method and device and electronic equipment
CN108009303B (en) * 2017-12-30 2021-09-14 北京百度网讯科技有限公司 Search method and device based on voice recognition, electronic equipment and storage medium
CN108536672A (en) * 2018-03-12 2018-09-14 平安科技(深圳)有限公司 Intelligent robot Training Methodology, device, computer equipment and storage medium
CN108877781B (en) * 2018-06-13 2021-07-13 东方梦幻文化产业投资有限公司 Method and system for searching film through intelligent voice
CN109040779B (en) * 2018-07-16 2019-11-26 腾讯科技(深圳)有限公司 Caption content generation method, device, computer equipment and storage medium
CN111259897A (en) * 2018-12-03 2020-06-09 杭州翼心信息科技有限公司 Knowledge-aware text recognition method and system
CN109783693B (en) * 2019-01-18 2021-05-18 广东小天才科技有限公司 Method and system for determining video semantics and knowledge points
CN109858427A (en) * 2019-01-24 2019-06-07 广州大学 A kind of corpus extraction method, device and terminal device
CN110287384B (en) * 2019-06-10 2021-08-31 北京百度网讯科技有限公司 Intelligent service method, device and equipment
CN110674316B (en) * 2019-09-27 2022-05-31 腾讯科技(深圳)有限公司 Data conversion method and related device
CN111128183B (en) * 2019-12-19 2023-03-17 北京搜狗科技发展有限公司 Speech recognition method, apparatus and medium
CN111368100A (en) * 2020-02-28 2020-07-03 青岛聚看云科技有限公司 Media asset merging method and device thereof

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102650993A (en) * 2011-02-25 2012-08-29 北大方正集团有限公司 Index establishing and searching methods, devices and systems for audio-video file
CN102740014A (en) * 2011-04-07 2012-10-17 青岛海信电器股份有限公司 Voice controlled television, television system and method for controlling television through voice
CN103455642A (en) * 2013-10-10 2013-12-18 三星电子(中国)研发中心 Method and device for multi-media file retrieval

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070027844A1 (en) * 2005-07-28 2007-02-01 Microsoft Corporation Navigating recorded multimedia content using keywords or phrases

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102650993A (en) * 2011-02-25 2012-08-29 北大方正集团有限公司 Index establishing and searching methods, devices and systems for audio-video file
CN102740014A (en) * 2011-04-07 2012-10-17 青岛海信电器股份有限公司 Voice controlled television, television system and method for controlling television through voice
CN103455642A (en) * 2013-10-10 2013-12-18 三星电子(中国)研发中心 Method and device for multi-media file retrieval

Also Published As

Publication number Publication date
CN103761261A (en) 2014-04-30

Similar Documents

Publication Publication Date Title
CN103761261B (en) A kind of media search method and device based on speech recognition
CN101650958B (en) Extraction method and index establishment method of movie video scene fragment
Qi et al. Integrating visual, audio and text analysis for news video
CN106878632B (en) Video data processing method and device
KR100828166B1 (en) Method of extracting metadata from result of speech recognition and character recognition in video, method of searching video using metadta and record medium thereof
CN111931775B (en) Method, system, computer device and storage medium for automatically acquiring news headlines
KR101516995B1 (en) Context-based VOD Search System And Method of VOD Search Using the Same
CN110012349A (en) A kind of news program structural method and its structuring frame system end to end
CN112163122A (en) Method and device for determining label of target video, computing equipment and storage medium
CN101872346A (en) Method for generating video navigation system automatically
JP2001515634A (en) Multimedia computer system having story segmentation function and its operation program
Dumont et al. Automatic story segmentation for tv news video using multiple modalities
CN113766314B (en) Video segmentation method, device, equipment, system and storage medium
CN103730115A (en) Method and device for detecting keywords in voice
CN110781328A (en) Video generation method, system, device and storage medium based on voice recognition
US7349477B2 (en) Audio-assisted video segmentation and summarization
Husain et al. Multimodal fusion of speech and text using semi-supervised LDA for indexing lecture videos
CN116361510A (en) Method and device for automatically extracting and retrieving scenario segment video established by utilizing film and television works and scenario
TW201039149A (en) Robust algorithms for video text information extraction and question-answer retrieval
CN114547373A (en) Method for intelligently identifying and searching programs based on audio
CN113992944A (en) Video cataloging method, device, equipment, system and medium
Ghosh et al. Multimodal indexing of multilingual news video
KR20030014804A (en) Apparatus and Method for Database Construction of News Video based on Closed Caption and Method of Content-based Retrieval/Serching It
Hauptmann et al. Artificial intelligence techniques in the interface to a digital video library
WO2011039773A2 (en) Tv news analysis system for multilingual broadcast channels

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C53 Correction of patent for invention or patent application
CB02 Change of applicant information

Address after: 100080, room 5, floor 5331, China Resources Building, 6 South Third Street, Haidian District, Beijing, Zhongguancun

Applicant after: The purple winter of Beijing is voice technology company limited with keen determination

Address before: 100191, Nanjing building, No. 35, Haidian District, Beijing, 409, Xueyuan Road, China

Applicant before: The purple winter of Beijing is voice technology company limited with keen determination

CB03 Change of inventor or designer information

Inventor after: Gao Peng

Inventor after: Chen Jiansong

Inventor before: Gao Peng

COR Change of bibliographic data

Free format text: CORRECT: INVENTOR; FROM: GAO PENG TO: GAO PENG CHEN JIANSONG

C53 Correction of patent for invention or patent application
CB03 Change of inventor or designer information

Inventor after: Gao Peng

Inventor before: Gao Peng

Inventor before: Chen Jiansong

COR Change of bibliographic data

Free format text: CORRECT: INVENTOR; FROM: GAO PENG CHEN JIANSONG TO: GAO PENG

GR01 Patent grant
GR01 Patent grant