CN105554590B - 一种基于音频指纹的直播流媒体识别系统 - Google Patents

一种基于音频指纹的直播流媒体识别系统 Download PDF

Info

Publication number
CN105554590B
CN105554590B CN201510902809.8A CN201510902809A CN105554590B CN 105554590 B CN105554590 B CN 105554590B CN 201510902809 A CN201510902809 A CN 201510902809A CN 105554590 B CN105554590 B CN 105554590B
Authority
CN
China
Prior art keywords
fingerprint
audio
live
frequency
broadcast stream
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510902809.8A
Other languages
English (en)
Other versions
CN105554590A (zh
Inventor
李宏元
郭伟伟
孙彦龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dang Hong Polytron Technologies Inc
Original Assignee
Hangzhou Arcvideo Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Arcvideo Technology Co ltd filed Critical Hangzhou Arcvideo Technology Co ltd
Priority to CN201510902809.8A priority Critical patent/CN105554590B/zh
Publication of CN105554590A publication Critical patent/CN105554590A/zh
Application granted granted Critical
Publication of CN105554590B publication Critical patent/CN105554590B/zh
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/835Generation of protective data, e.g. certificates
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/258Client or end-user data management, e.g. managing client capabilities, user preferences or demographics, processing of multiple end-users preferences to derive collaborative data
    • H04N21/25866Management of end-user data
    • H04N21/25891Management of end-user data being end-user preferences
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/266Channel or content management, e.g. generation and management of keys and entitlement messages in a conditional access system, merging a VOD unicast channel into a multicast channel
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/442Monitoring of processes or resources, e.g. detecting the failure of a recording device, monitoring the downstream bandwidth, the number of times a movie has been viewed, the storage space available from the internal hard disk
    • H04N21/44213Monitoring of end-user related data
    • H04N21/44222Analytics of user selections, e.g. selection of programs or purchase activity
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/8106Monomedia components thereof involving special audio data, e.g. different tracks for different languages

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Social Psychology (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computer Graphics (AREA)
  • Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)

Abstract

本发明公开了一种基于音频指纹的直播流媒体识别系统,包括服务器端和用户端,服务器端包括音频指纹采集模块、指纹管理模块和指纹比对模块,用户端包括现场指纹采集模块,这种基于音频指纹的直播流媒体识别系统,结构简单,服务器端与用户端运算量小,识别率高,节目信号不必预先加工处理,抗干扰能力强,因此可以在广播电视播出,网络直播等场景下,实时识别终端的播放信号,可以作为内容提供者与观众的桥梁,也为观众喜爱节目的统计提供了有效手段。

Description

一种基于音频指纹的直播流媒体识别系统
技术领域
本发明涉及数字音频信号处理技术,特别涉及一种基于音频指纹的直播流媒体识别系统。
背景技术
在电视台或电台等直播运营机构,在调查各个电台或电视台的收视率/收听率,或者节目进行的时刻,与收视用户实施多方异地互动,都需识别用户正在收视的电台或电视台,此即频道识别。
多个频道的识别就是各个直播流媒体的识别。用户可用手机或其他终端设备将相关信息通过网络发送到运营商的服务器,服务器端再对信号进行处理并作出响应。
现有的实现方法中有将各个节目的流媒体信号进行预处理,加上各自独立的logo标志,比如在视频中添加的水印,在音频中加入的超声波等。检测用户传回的信号中的logo标志等,就能识别所收视的直播媒体流所在的频道。这些方法需要对节目流媒体信号进行预先加工,抗干扰能力有限。
发明内容
基于此,有必要提供一种不需要对节目信号进行预先加工处理,可在广播电视、网络直播等场景下实时识别终端播放信号,抗干扰能力强的基于音频指纹的直播流媒体识别系统。
根据本发明的一方面,提供了一种基于音频指纹的直播流媒体识别系统,包括服务器端和用户端,服务器端包括音频指纹采集模块、指纹管理模块和指纹比对模块,用户端包括现场指纹采集模块。
在其中一个实施例中,音频指纹采集模块用于直播流媒体的音频信号指纹。
在其中一个实施例中,指纹管理模块用于保存音频信号指纹。
在其中一个实施例中,现场指纹采集模块用于频道播放及接收的现场音频的指纹。
这种基于音频指纹的直播流媒体识别系统,结构简单,服务器端与用户端运算量小,识别率高,节目信号不必预先加工处理,抗干扰能力强,因此可以在广播电视播出,网络直播等场景下,实时识别终端的播放信号,可以作为内容提供者与观众的桥梁,也为观众喜爱节目的统计提供了有效手段。
附图说明
图1为本发明一实施方式的一种音频指纹直播流媒体识别的结构示意图。
具体实施方式
为了便于理解本发明,下面将用具体实施例对本发明进行更全面的描述。但是,本发明可以以许多不同的形式来实现,并不限于本文所描述的实施例。相反地,提供这实施例的目的是使对本发明的公开内容的理解更加透彻全面。
如图1所示,为本发明一实施例的一种基于音频指纹的直播流媒体识别系统,包括服务器端10和用户端30,服务器端10包括音频指纹采集模块110、指纹管理模块130和指纹比对模块150,用户端30包括现场指纹采集模块310。
服务器端10用于采集各个频道直播节目流媒体之音频指纹,实时更新频道指纹库。服务器端10用于响应用户的请求,对比指纹库与用户传回的指纹数据,进行指纹的相似度比对以识别用户正在收视的直播流。具体地,音频指纹采集模块110用于直播流媒体的音频信号指纹。指纹管理模块130用于保存音频信号指纹。现场指纹采集模块310用于频道播放及接收的现场音频的指纹。
图1中对于多频道电视直播而言,各直播流对应于各电视频道。此外,网络直播,及其他直播流媒体应用的场景,如多会场节目整合与观众互动等,直播流的识别都可采用上述机制。
服务器端10在传送音视频信号的同时,采集各个直播流媒体(比如各个电视台实时播放的节目信号)的音频信号指纹,建立动态指纹库。每个流的指纹库实时更新,其容量只需5秒至10秒的音频指纹,4k大小就已足够。用户端30采集收视现场500毫秒至1秒的音频指纹,通过网络传送到服务器,服务器端10将用户传来的指纹与动态指纹库中的指纹进行比对,即可识别用户收视的流媒体频道。
这种基于音频指纹的直播流媒体识别系统,结构简单,服务器端10与用户端30运算量小,识别率高,节目信号不必预先加工处理,抗干扰能力强,因此可以在广播电视播出,网络直播等场景下,实时识别终端的播放信号,可以作为内容提供者与观众的桥梁,也为观众喜爱节目的统计提供了有效手段。
具体地,本实施例的系统中用户端30的运算量:采集1秒左右的音频数据,将指纹传送至服务器。音频时间长度与指纹长度成正比关系,在本实施例的系统中,1秒的音频可大约生成近100个指纹,400个字节。
具体地,本实施例的系统中服务器端10的运算量:实施更新指纹库的指纹队列,采用与用户端30同样的指纹生成算法。另外,服务器端10还需将从用户端30传来的指纹与指纹队列中的指纹进行匹配。本实施例的系统中,每个指纹4个字节(32位bit)对应10毫秒的音频。
假设服务器拥有媒体流的个数为N,每个指纹库队列的长度为L(个指纹)。从用户端30传来的待测音频指纹串长度为d(个指纹),这里要求d<L/2。
完整的搜索过程如下:从每个队列开始,从头至尾,匹配长度为d的指纹串,共需(L-d)次匹配,如果在某一队列未能匹配成功,则从下一个队列开始继续搜索,直至匹配成功或搜索完所有队列。
每次匹配都是累加d个指纹对的汉明距离,即d次4Byte整数异或运算与(d-1)次整数加法运算。为计算方便,将-1略去,这不影响大致的运算量计算结果。距离累加和越小,则两者的相似度越大;超过设定的阀值,即为匹配成功。
每次匹配最多需要进行大约N*(L-d)*d次的异或与加法运算,加上N*(L-d)次整数比较;平均值则取其一半。
考虑到音频数据的短时稳定性,没有必要从队列头到队列尾依次搜索所有的指纹串,可以先按指纹帧宽一半的距离(本系统中是5个指纹)跳跃式地先进行粗匹配,若粗匹配结果在一可接受的阀值以内(预示这一段指纹内极有可能有匹配成功),再进行上述精细匹配。这样,实际上的最大运算量是约N*((L-d)/5+10)*d次的异或与加法运算,加上N*((L-d)/5+10)次比较;平均则是N*((L-d)/5+10)*d/2次异或加法,及N*((L-d)/5+10)比较。
以18个流、队列长度为10秒、待测1秒音频指纹串为例,则每次匹配搜索的最大计算量约为18*((1000-100)/5+10)*100=342000次异或与加法,与3420次比较。以目前的电脑性能来看,上述运算量是非常小的。
本实施例的音频指纹采集模块110和现场指纹采集模块310的指纹生成算法是基于已有的音频指纹生成算法。其原理是将音频信号进行傅立叶变换,再将其各频段能量进行比较后编码,即生成了指纹。在实施此算法的过程中,本实施例的算法在降采样之前增加了前端降噪处理,以滤除高频成分,避免采样增加音乐噪音。
此外,对于同样的音频数据,分帧时间点的偏移会造成指纹某些bit的改变。而待测音频的分帧位置不可能与生成指纹库参考指纹的分帧位置完全重合。故偶尔也有指纹搜索失败的情况。改进的方法可以是可以使用两套指纹库,其一是从原始音频信号中提取的指纹,其二是将原始音频信号偏移4毫秒,从此偏移信号提取的指纹。这样两套指纹库可以增加指纹搜索成功的命中率。
以上所述实施例仅表达了本发明的个别实施方式,其描述较为具体和详细,但并不能因此而理解为对本发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本发明构思的前提下,还可以做出若干变形和改进,这些都属于本发明的保护范围。因此,本发明专利的保护范围应以所附权利要求为准。

Claims (1)

1.一种基于音频指纹的直播流媒体识别系统,其特征在于,包括服务器端和用户端,
所述服务器端包括
用于直播流媒体的音频信号指纹采集的音频指纹采集模块,所述音频信号指纹为5~10秒、4KB大小音频指纹,
用于保存所述音频信号指纹的指纹管理模块,以及
用于指纹相似度比对识别直播流的指纹比对模块,
所述用户端包括
用于频道播放及接收现场音频指纹的现场指纹采集模块,所述现场音频指纹为500毫秒~1秒的音频指纹,所述现场音频指纹的指纹数d要求小于所述音频信号指纹的指纹数L的一半,
所述指纹比对模块先按指纹帧宽一半的距离进行跳跃式粗匹配,若粗匹配结果在预设阈值以内,再进行精细匹配,所述精细匹配为从每个队列从头开始,匹配长度为所述现场音频指纹的指纹数的指纹串,直至匹配成功或搜索完所有队列。
CN201510902809.8A 2015-12-10 2015-12-10 一种基于音频指纹的直播流媒体识别系统 Active CN105554590B (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510902809.8A CN105554590B (zh) 2015-12-10 2015-12-10 一种基于音频指纹的直播流媒体识别系统

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510902809.8A CN105554590B (zh) 2015-12-10 2015-12-10 一种基于音频指纹的直播流媒体识别系统

Publications (2)

Publication Number Publication Date
CN105554590A CN105554590A (zh) 2016-05-04
CN105554590B true CN105554590B (zh) 2018-12-04

Family

ID=55833490

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510902809.8A Active CN105554590B (zh) 2015-12-10 2015-12-10 一种基于音频指纹的直播流媒体识别系统

Country Status (1)

Country Link
CN (1) CN105554590B (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109919105B (zh) * 2019-03-11 2022-04-05 四川长虹电器股份有限公司 一种基于智能电视画面的对象识别方法和系统

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104572952A (zh) * 2014-12-29 2015-04-29 乐视网信息技术(北京)股份有限公司 直播多媒体文件的识别方法及装置
CN104598541A (zh) * 2014-12-29 2015-05-06 乐视网信息技术(北京)股份有限公司 多媒体文件的识别方法、装置

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150302086A1 (en) * 2014-04-22 2015-10-22 Gracenote, Inc. Audio identification during performance

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104572952A (zh) * 2014-12-29 2015-04-29 乐视网信息技术(北京)股份有限公司 直播多媒体文件的识别方法及装置
CN104598541A (zh) * 2014-12-29 2015-05-06 乐视网信息技术(北京)股份有限公司 多媒体文件的识别方法、装置

Also Published As

Publication number Publication date
CN105554590A (zh) 2016-05-04

Similar Documents

Publication Publication Date Title
EP1955458B1 (en) Social and interactive applications for mass media
CN102625982B (zh) 提供与主要广播媒体内容关联的辅助内容的方法、装置和制品
CN104429091B (zh) 用于识别媒体的方法和装置
CN104205859B (zh) 多媒体流的同步
CA2903452C (en) Signature matching of corrupted audio signal
US20110258211A1 (en) System and method for synchronous matching of media samples with broadcast media streams
CN104023251A (zh) 基于视频的互动方法和系统
CA2902508A1 (en) Systems and methods for interactive broadcast content
CN104598541A (zh) 多媒体文件的识别方法、装置
CN104853244B (zh) 用于管理音视频、音频或视频内容的方法和装置
CN106162321A (zh) 一种声纹特征和音频水印相结合的音频信号识别方法
KR20160086680A (ko) 오디오 신호 통신 방법 및 시스템
WO2014207833A1 (ja) 広告効果分析システム、広告効果分析装置および広告効果分析用プログラム
CN105554590B (zh) 一种基于音频指纹的直播流媒体识别系统
Bardeli et al. Audio fingerprinting for media synchronisation and duplicate detection
WO2015193790A1 (en) Synchronizing broadcast timeline metadata
WO2018039060A1 (en) Systems and methods for sourcing live streams
US9223458B1 (en) Techniques for transitioning between playback of media files
CN104202628B (zh) 客户端播放节目的识别系统和方法
Kim et al. A personal videocasting system with intelligent TV browsing for a practical video application environment
KR101403969B1 (ko) 타임코드를 상실한 동영상의 자막 재생 시점 인식 방법
WO2014108648A1 (fr) Système et méthode pour la distribution de l'information complémentaire à un terminal secondaire
US20120093259A1 (en) Method and apparatus for transmitting content, method and apparatus for receiving content, and content service system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address

Address after: 310000 A Block, 16th Floor, E Building, Paradise Software Park, No. 3 Xidoumen Road, Hangzhou City, Zhejiang Province

Patentee after: Hangzhou Dang Hong Polytron Technologies Inc

Address before: 310000 B2010, two floor, North (two), six and 368 Road, Binjiang District, Hangzhou, Zhejiang.

Patentee before: HANGZHOU DANGHONG TECHNOLOGY CO., LTD.

CP03 Change of name, title or address