CN109982105A - Content retrieval system and method for broadcast platform - Google Patents

Content retrieval system and method for broadcast platform Download PDF

Info

Publication number
CN109982105A
CN109982105A CN201711440357.1A CN201711440357A CN109982105A CN 109982105 A CN109982105 A CN 109982105A CN 201711440357 A CN201711440357 A CN 201711440357A CN 109982105 A CN109982105 A CN 109982105A
Authority
CN
China
Prior art keywords
content
information
metadata
server
distributed storage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711440357.1A
Other languages
Chinese (zh)
Inventor
许颖浩
袁政
陆伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
WENGUANG INTERDYANMIC TV CO Ltd SHANGHAI
Original Assignee
WENGUANG INTERDYANMIC TV CO Ltd SHANGHAI
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by WENGUANG INTERDYANMIC TV CO Ltd SHANGHAI filed Critical WENGUANG INTERDYANMIC TV CO Ltd SHANGHAI
Priority to CN201711440357.1A priority Critical patent/CN109982105A/en
Publication of CN109982105A publication Critical patent/CN109982105A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/43Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/232Content retrieval operation locally within server, e.g. reading video streams from disk arrays
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/254Management at additional data server, e.g. shopping server, rights management server
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/435Processing of additional data, e.g. decrypting of additional data, reconstructing software from modules extracted from the transport stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/8543Content authoring using a description language, e.g. Multimedia and Hypermedia information coding Expert Group [MHEG], eXtensible Markup Language [XML]

Abstract

The invention discloses a kind of content retrieval system and method for broadcast platform, which includes text searching server, web server, database, Distributed Storage layer.Database is separately connected text searching server and Distributed Storage layer, and text searching server and Distributed Storage layer are accessed by web server.Distributed Storage layer includes message layer metadata and content layer metadata, and content layer metadata further comprises content object essential information, content object characteristic information, content substance information.Database parses the XML information of content layer metadata, and is stored.3D content metadata is described file and submits to full-text retrieval service system by database, establishes index for the full text of 3D metadata.The characteristics of present invention enables to searching system to adapt to stereotelevision, to promote recall precision.

Description

Content retrieval system and method for broadcast platform
Technical field
The present invention relates to broadcast platform system and methods, examine more specifically to a kind of content for broadcast platform Cable system and method.
Background technique
Stereotelevision is maximum with three-dimensional film the difference is that the cost of manufacture of three-dimensional program and broadcast time require not Together.Three-dimensional film more focuses on the content of program, three-dimensional scene etc., and the production input cost of a three-dimensional film is big, processes Period is long, is big cost, big investment, and a length of 120 minutes or so when program.In contrast, this media format of TV needs It is carried out continuously the broadcast of different programme contents, TV can not can be carried out high cost relative to film, put into program making greatly, and And TV requires to have the form of live broadcast according to programme content.Equally, for three-dimensional television, it is also not possible into Row Gao Chengben, greatly put into program making, stereotelevision live streaming be it is a kind of have more there is an urgent need to the business form, with this condition How stereoscopic visual effect, the compatibility that how to keep stereotelevision terminal and common TV are guaranteed, how based on existing net Network realizing television network stereotelevision business, these problems must be answered simultaneously in the research of 3D interactive television system architectural framework It solves the problems, such as.
And for the broadcast platform for playing stereotelevision, in order to adapt to storage, calling and the broadcasting of stereotelevision, extensively It broadcasts platform and needs to establish a set of content retrieval system suitable for stereotelevision, in order to which stereotelevision content is better achieved Quick-searching.However, broadcast platform searching system at this stage is established on the basis of common TV.If by such inspection Cable system is directly applied in stereotelevision content, then can not embody its difference where, and stereotelevision due to content file compared with Greatly, it undoubtedly can extend retrieval time using existing searching system, cause recall precision not high.
Summary of the invention
For the above-mentioned problems in the prior art, the object of the present invention is to provide a kind of contents for broadcast platform Searching system and method
To achieve the above object, the present invention adopts the following technical scheme:
A kind of content retrieval system for broadcast platform, including text searching server, web server, database, point Cloth data storage layer.Database is separately connected text searching server and Distributed Storage layer, and full article retrieval Device and Distributed Storage layer are accessed by web server.Distributed Storage layer includes message layer metadata and interior Hold layer metadata, content layer metadata further comprises content object essential information, content object characteristic information, content substance letter Breath.Database parses the XML information of content layer metadata, and is stored.3D content metadata is described file and mentioned by database Full-text retrieval service system is given, establishes index for the full text of 3D metadata.
Further, content object characteristic information includes audio feature information, video feature information and Streaming Media feature letter Breath.
Further, content substance information includes essential information, audio-frequency information, video information, pictorial information and Streaming Media Information.
Further, text searching server includes master index server and increment index server, increment index service Device receives data and updates, and data update is synchronized on Distributed Storage layer.
To achieve the above object, the present invention also adopts the following technical scheme that
A kind of content search method for broadcast platform, comprising: building text searching server, web server, data Library, Distributed Storage layer, database are separately connected text searching server and Distributed Storage layer, and full-text search Server and Distributed Storage layer are accessed by web server;Construct Distributed Storage layer, including message layer member Data and content layer metadata, content layer metadata further comprises content object essential information, content object characteristic information, interior Hold entity information;The XML information of content layer metadata is parsed, and is stored in the database;3D content metadata is described into file Full-text retrieval service system is submitted to, establishes index for the full text of 3D metadata.
Further, content object characteristic information includes audio feature information, video feature information and Streaming Media feature letter Breath.
Further, content substance information includes essential information, audio-frequency information, video information, pictorial information and Streaming Media Information.
Further, master index server and increment index server are constructed in text searching server, more by data Newly on increment index server, and data update is synchronized on Distributed Storage layer.
In the above-mentioned technical solutions, the content retrieval system and method for broadcast platform of the invention enables to retrieve System adapts to the characteristics of stereotelevision, to promote recall precision.
Detailed description of the invention
Fig. 1 is metadata hierarchical chart
Fig. 2 is the architecture diagram of searching system;
Fig. 3 is the method flow diagram of searching system.
Specific embodiment
Technical solution of the present invention is further illustrated with reference to the accompanying drawings and examples.
Referring to Fig.1, the present invention discloses a kind of content retrieval system for broadcast platform first, and applicable object is 3D The retrieval of TV (stereotelevision) programme content.Thinking of the invention be make full use of 3D content metadata specific properties (depending on The color of frequency, space layout, movement, picture depth of field feature), realize catalogue management and the flow configuration management of content, simultaneously Efficient full-text search is carried out to 3D content using the specific properties of 3D content element.
As shown in Figure 1,3D metadata is described as that 3D content metadata is described using XML tree shape structure.Data are adopted With tree, the stratification tissue of description information is supported, metadata schema can be made to have by the definition of optional node suitable Answer the ability of different type content.From back-up environment to the total demand of content metadata, content metadata is divided into core Collection, it is general can selected works, classification 3 parts of superset.
Core set: the attribute tags that any kind of content must all have, the main supplier including content, content mark The information such as knowledge.
It is general can selected works: information relevant to content itself, and there is the category of universal adaptability to different types of content Property set, such as whether manufacturer, synopsis (being briefly described), the validity period of content and the content of content the letter such as encrypt Breath, there are also 3D attribute, as the color of video, space layout, movement, picture depth of field depth of field feature.
Classification superset: according to the feature of different content respectively, formulating the attribute being closely related with such content characteristic, and Using some basic, necessary specific content categorical attributes as supplementing, such as content such for film, packet is provided Include the extended attributes such as performer, director, theme song, poster and films types;For different types of type, superset difference compared with Greatly.
3D holds metadata item and is generally divided into message layer metadata and content layer metadata, and content layer metadata includes content Object essential information, content object characteristic information (audio characteristic information, video properties information, Streaming Media characteristic information), content Entity information.Hierarchical structure and each layer metadata item are as shown in Figure 1.
Specifically, first independent sub-layer of the content object essential information as content layer metadata.
Second independent sub-layer of the content object characteristic information as content layer metadata further comprises audio frequency characteristics letter Breath, video feature information and Streaming Media characteristic information.Further, audio feature information includes (audio) essential information, video Characteristic information further comprises (video) essential information, (video) extension information, (video) gets information ready and (video) demolition is believed Breath, Streaming Media characteristic information further comprises (Streaming Media) essential information and program information.
Third independent sub-layer of the content substance information as content layer metadata further comprises that (content substance) is basic Information, (content substance) audio-frequency information, (content substance) video information, (content substance) pictorial information and (content substance) flow matchmaker Body information.
Content service management system parses the metadata XML information of content first, and is stored in relevant database management In system, facilitate the inquiry and modification of operator, when needing to send metadata toward other systems, further according to the letter in database Breath regenerates satisfactory metadata XML file.
3D content metadata is described into file and submits to full-text retrieval service system, establishes rope for the full text of 3D metadata Draw, and the full article retrieval of 3D content metadata is provided.
In order to improve the recall precision of metadata, the management of 3D content metadata and the use full-text search of retrieval subsystem are taken Business device uses (Sphinx) as search engine, completes the storage of search engine data, 3D content member number using Tokyo Tyrant According to management and retrieval subsystem realization framework as shown in Fig. 2, the content retrieval system for broadcast platform of the invention, main Framework includes text searching server, web server, database, Distributed Storage layer.
Referring to Fig. 2, database is separately connected text searching server and Distributed Storage layer, and full article retrieval Device and Distributed Storage layer are accessed by web server.Distributed Storage layer includes message layer metadata and interior Hold layer metadata, content layer metadata further comprises content object essential information, content object characteristic information, content substance letter Breath.Database parses the XML information of content layer metadata, and is stored.3D content metadata is described file and mentioned by database Full-text retrieval service system is given, establishes index for the full text of 3D metadata.
Specifically, text searching server use Sphinx, Sphinx is a distributed index server, it by Master index and increment index composition, data increase to first on increment index server, then by data on increment index server It regularly updates on main index server, the single index maximum that Sphinx is provided may include 100,000,000 records, remember at 1,000 Inquiry velocity in the case of record is Millisecond, the speed of the creation index of Shpinx are as follows: the index time of 1,000,000 records of creation It is 3~4 minutes, the increment index recorded comprising newest 100,000, rebuilding primary needs tens seconds.
Sphinx supports one-gram word.One-gram word is located at index upgrade module.Sphinx index engine for CJK (in Japan and Korea S) language (must be UTF-8 coding) support unitary cutting, it is assumed that [3D film A Fanda] this section of text, Sphinx can be incited somebody to action It is cut into [3D film A Fanda], then establishes reverse indexing to each word.If forming one with the word for including in the words The word being not present, such as [shadow Ah], can also be searched, so needing to add quotation marks, such as search [" A Fan when search Up to "], four words to connect together can be exactly matched, discontinuous [" shadow Ah "] would not be searched.It is searched for using being located at The Chinese word segmentation of enquiry module is handled.Sphinx also supports Chinese word segmentation.Chinese word segmentation is located at search inquiry module.Search " 3D film A Fanda ", " game of 3D film starvation ", first calls independent Chinese automatic word-cut, respectively cutting be " 3D film Ah It is all to reach ", " game of 3D film starvation ", at this time, then give the word of space-separated plus quotation marks, remove Sphinx search [" 3D electricity Shadow " " A Fanda "] or [" the hungry game of 3D film " " "], this can be searched and had recorded.Chinese word segmentation dictionary generation increasing, It deletes, change, without rebuilding entire Sphinx search index.
Search engine data store Tokyo Tyrant, and Tokyo Tyrant is a distributed data cache storage.It can Accelerate index speed so that Key-Value value to be saved in memory as Memcached.Simultaneously due to Tokyo Tyrant It is outstanding file/text-type database Tokyo Cabinet network interface, easily carries out system extension later, uses The content that non-relational database storage storage largely needs to be retrieved.Single Tokyo Tyrant server is supported 10000 times Request/second.It has identical Key with Sphinx.
Text searching server includes master index server and increment index server, and increment index server receives data It updates, and data update is synchronized on Distributed Storage layer.MySql passes through main table and increment list extension storage data.
Correspondingly, corresponding to above system frame the invention also discloses a kind of content search method for broadcast platform Structure, as shown in figure 3, it is mainly comprised the steps that
S1: building text searching server, web server, database, Distributed Storage layer, the database point Not Lian Jie text searching server and Distributed Storage layer, and text searching server and Distributed Storage layer are logical Cross web server access.
S2: building Distributed Storage layer, including message layer metadata and content layer metadata, the content layer member number According to further comprising content object essential information, content object characteristic information, content substance information.
S3: the XML information of parsing content layer metadata, and store in the database.
S4: describing file for 3D content metadata and submit to full-text retrieval service system, builds for the full text of 3D metadata Lithol draws.
Further, the treatment process of retrieval:
When website data is updated database, data are updated onto the increment index of text searching server, and By on data synchronization updating to search engine Distributed Storage, the two index ID having the same.Sphinx is responsible for foundation The index of data, facilitates full-text search.Tokyo Tyrant is responsible for quick response data.
When client retrieves content by Web server, Web server initiates retrieval request to Sphinx first, Sphinx retrieves the index ID list of response data, and returns to Web server;Web server will index ID list and send Web server is returned data to according to index ID to Tokyo Tyrant, Tokyo Tyrant, Web server returns data Back to client, data success is retrieved at End-Customer end.
Those of ordinary skill in the art it should be appreciated that more than embodiment be intended merely to illustrate the present invention, And be not used as limitation of the invention, as long as the change in spirit of the invention, to embodiment described above Change, modification will all be fallen within the scope of claims of the present invention.

Claims (8)

1. a kind of content retrieval system for broadcast platform characterized by comprising
Text searching server, web server, database, Distributed Storage layer, the database are separately connected full text and examine Rope server and Distributed Storage layer, and text searching server and Distributed Storage layer pass through web server Access;
The Distributed Storage layer includes message layer metadata and content layer metadata, and the content layer metadata is further Including content object essential information, content object characteristic information, content substance information;
Database parses the XML information of content layer metadata, and is stored;
3D content metadata is described file and submits to full-text retrieval service system by database, is established for the full text of 3D metadata Index.
2. being used for the content retrieval system of broadcast platform as described in claim 1, it is characterised in that:
The content object characteristic information includes audio feature information, video feature information and Streaming Media characteristic information.
3. being used for the content retrieval system of broadcast platform as described in claim 1, it is characterised in that:
The content substance information includes essential information, audio-frequency information, video information, pictorial information and stream media information.
4. being used for the content retrieval system of broadcast platform as described in claim 1, it is characterised in that:
The text searching server includes master index server and increment index server, and the increment index server receives Data update, and data update is synchronized on Distributed Storage layer.
5. a kind of content search method for broadcast platform characterized by comprising
Text searching server, web server, database, Distributed Storage layer are constructed, the database is separately connected entirely Literary retrieval server and Distributed Storage layer, and text searching server and Distributed Storage layer are taken by web Business device access;
Construct Distributed Storage layer, including message layer metadata and content layer metadata, the content layer metadata is into one Step includes content object essential information, content object characteristic information, content substance information;
The XML information of content layer metadata is parsed, and is stored in the database;
3D content metadata is described into file and submits to full-text retrieval service system, establishes index for the full text of 3D metadata.
6. being used for the content search method of broadcast platform as claimed in claim 5, it is characterised in that:
The content object characteristic information includes audio feature information, video feature information and Streaming Media characteristic information.
7. being used for the content search method of broadcast platform as claimed in claim 5, it is characterised in that:
The content substance information includes essential information, audio-frequency information, video information, pictorial information and stream media information.
8. being used for the content search method of broadcast platform as claimed in claim 5, it is characterised in that:
Master index server and increment index server are constructed in text searching server, and data are updated to increment index and are taken It is engaged on device, and data update is synchronized on Distributed Storage layer.
CN201711440357.1A 2017-12-27 2017-12-27 Content retrieval system and method for broadcast platform Pending CN109982105A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711440357.1A CN109982105A (en) 2017-12-27 2017-12-27 Content retrieval system and method for broadcast platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711440357.1A CN109982105A (en) 2017-12-27 2017-12-27 Content retrieval system and method for broadcast platform

Publications (1)

Publication Number Publication Date
CN109982105A true CN109982105A (en) 2019-07-05

Family

ID=67071365

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711440357.1A Pending CN109982105A (en) 2017-12-27 2017-12-27 Content retrieval system and method for broadcast platform

Country Status (1)

Country Link
CN (1) CN109982105A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101021855A (en) * 2006-10-11 2007-08-22 鲍东山 Video searching system based on content
CN101520800A (en) * 2009-03-27 2009-09-02 华中科技大学 Cryptogram-based safe full-text indexing and retrieval system
US20100257049A1 (en) * 2009-04-03 2010-10-07 Avichai Flombaum System and method for identifying and retrieving targeted advertisements or other related documents
US20110218997A1 (en) * 2010-03-08 2011-09-08 Oren Boiman Method and system for browsing, searching and sharing of personal video by a non-parametric approach
CN102831253A (en) * 2012-09-25 2012-12-19 北京科东电力控制系统有限责任公司 Distributed full-text retrieval system
US8948515B2 (en) * 2010-03-08 2015-02-03 Sightera Technologies Ltd. Method and system for classifying one or more images
CN107423349A (en) * 2017-05-18 2017-12-01 福建中金在线信息科技有限公司 A kind of method and system of full-text search

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101021855A (en) * 2006-10-11 2007-08-22 鲍东山 Video searching system based on content
CN101520800A (en) * 2009-03-27 2009-09-02 华中科技大学 Cryptogram-based safe full-text indexing and retrieval system
US20100257049A1 (en) * 2009-04-03 2010-10-07 Avichai Flombaum System and method for identifying and retrieving targeted advertisements or other related documents
US20110218997A1 (en) * 2010-03-08 2011-09-08 Oren Boiman Method and system for browsing, searching and sharing of personal video by a non-parametric approach
US8948515B2 (en) * 2010-03-08 2015-02-03 Sightera Technologies Ltd. Method and system for classifying one or more images
CN102831253A (en) * 2012-09-25 2012-12-19 北京科东电力控制系统有限责任公司 Distributed full-text retrieval system
CN107423349A (en) * 2017-05-18 2017-12-01 福建中金在线信息科技有限公司 A kind of method and system of full-text search

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
张宴: "亿级数据的高并发通用搜索引擎架构设计", 《张宴的博客》 *
顾国颖: "立体电视内容聚合与智能检索系统的设计与思考", 《电视工程》 *

Similar Documents

Publication Publication Date Title
CN100595765C (en) Medium player based key words content issue method and system
CN110704411B (en) Knowledge graph building method and device suitable for art field and electronic equipment
US9165085B2 (en) System and method for publishing aggregated content on mobile devices
US8862607B2 (en) Content receiving apparatus with search query generator
US8261178B2 (en) Audio data distribution system and method for generating a photo slideshow which automatically selects music
US10938940B2 (en) Caching of metadata objects
CN104516892B (en) It is associated with dissemination method, system and the terminal of the user-generated content of rich media information
CN1692354B (en) Information management system, information processing device, information processing method
CN106331778A (en) Video recommendation method and device
DE102017124876A1 (en) Determine search queries to obtain information during a user experience of an event
CN103092958A (en) Display method and device for search result
US20110119248A1 (en) Topic identification system, topic identification device, client terminal, program, topic identification method, and information processing method
WO2015096609A1 (en) Method and system for creating inverted index file of video resource
US20100077300A1 (en) Computer Method and Apparatus Providing Social Preview in Tag Selection
CN102682036A (en) Non-editing based method and system for searching media assets
CN111104583B (en) Live broadcast room recommendation method, storage medium, electronic equipment and system
CN112307318A (en) Content publishing method, system and device
CN107239568B (en) Distributed index implementation method and device
CN105893640B (en) Favorite merging method and device
US20110276557A1 (en) Method and apparatus for exchanging media service queries
US20090043785A1 (en) Managing structured content stored as a binary large object (blob)
CN105740251B (en) Method and system for integrating different content sources in bus mode
CN109982105A (en) Content retrieval system and method for broadcast platform
CN102158345A (en) Method, device and system for data management
EP3133820A1 (en) Interactive video distribution system with content similarity matching

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190705

RJ01 Rejection of invention patent application after publication