CN111581333A - 基于Text-CNN的影音播单推送方法及影音播单推送系统 - Google Patents

基于Text-CNN的影音播单推送方法及影音播单推送系统 Download PDF

Info

Publication number
CN111581333A
CN111581333A CN202010375669.4A CN202010375669A CN111581333A CN 111581333 A CN111581333 A CN 111581333A CN 202010375669 A CN202010375669 A CN 202010375669A CN 111581333 A CN111581333 A CN 111581333A
Authority
CN
China
Prior art keywords
user
text
audio
information
voice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010375669.4A
Other languages
English (en)
Other versions
CN111581333B (zh
Inventor
宋永端
赖俊峰
周鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University
Original Assignee
Chongqing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University filed Critical Chongqing University
Priority to CN202010375669.4A priority Critical patent/CN111581333B/zh
Publication of CN111581333A publication Critical patent/CN111581333A/zh
Priority to US17/141,592 priority patent/US11580979B2/en
Application granted granted Critical
Publication of CN111581333B publication Critical patent/CN111581333B/zh
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/43Querying
    • G06F16/435Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/74Browsing; Visualisation therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • G06F40/35Discourse or dialogue representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1815Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/55Push-based network services
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/233Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/251Learning process for intelligent management, e.g. learning user preferences for recommending movies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/258Client or end-user data management, e.g. managing client capabilities, user preferences or demographics, processing of multiple end-users preferences to derive collaborative data
    • H04N21/25866Management of end-user data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/262Content or additional data distribution scheduling, e.g. sending additional data at off-peak times, updating software modules, calculating the carousel transmission frequency, delaying a video stream transmission, generating play-lists
    • H04N21/26258Content or additional data distribution scheduling, e.g. sending additional data at off-peak times, updating software modules, calculating the carousel transmission frequency, delaying a video stream transmission, generating play-lists for generating a list of items to be played back in a given order, e.g. playlist, or scheduling item distribution according to such list
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • H04N21/42203Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS] sound input device, e.g. microphone
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/482End-user interface for program selection
    • H04N21/4826End-user interface for program selection using recommendation lists, e.g. of programs or channels sorted out according to their score
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/20Arrangements for obtaining desired frequency or directional characteristics
    • H04R1/32Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only
    • H04R1/40Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only by combining a number of identical transducers
    • H04R1/406Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only by combining a number of identical transducers microphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/005Circuits for transducers, loudspeakers or microphones for combining the signals of two or more microphones
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Human Computer Interaction (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Otolaryngology (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Computational Mathematics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Operations Research (AREA)
  • Algebra (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computer Graphics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本发明公开了一种基于Text‑CNN的影音播单推送方法及影音播单推送系统,其包括本地语音交互终端、对话系统服务器和播单推荐引擎,对话系统服务器、播单推荐引擎分别与本地语音交互终端相连;本地语音交互终端包括麦克风阵列、上位机和语音合成芯片板,麦克风阵列与语音合成芯片板和上位机连接,上位机与对话系统服务器连接,语音合成芯片板与播单推荐引擎连接。播单推荐引擎基于神经网络构建的评分预测器来获取评分数据,而上位机将该数据解析为推荐的播单列表信息,语音终端将这些结果合成后以语音形式推送给用户。本发明可以为用户在智能家居场景下电影点播提供智能播单推荐服务,且具有交互便捷性、能满足用户个性化点播需求的优点。

Description

基于Text-CNN的影音播单推送方法及影音播单推送系统
技术领域
本发明涉及智能语音交互及机器学习技术领域,特别涉及一种以语音交互的方式对用户推送影音播单的方法及系统。
背景技术
在信息爆炸的时代,信息过载困扰着人们日常生活,最重要的信息检索系统也存在明显的缺陷,检索后的结果数量庞杂。因此如何提高检索准确性,过滤冗余信息,为用户提供其真正关心的信息是一个技术难题。
在目前交互式资讯推荐系统中,一般通过用户主动订阅相关信息后才会进行相关资讯的推送,不能根据用户关心的问题或者猜测用户的兴趣点来进行推荐播送,这种被动交互式推荐系统越来越难以满足人们的日常生活中的各个使用场景的多样化需求。
发明内容
有鉴于此,本发明的目的一种基于Text-CNN的影音播单推送方法及影音播单推送系统,以解决被动交互式推荐系统存在用户与观影系统交互的便捷性差、不能为用户提供个性化播单推送服务等技术问题。
本发明基于Text-CNN的影音播单推送方法,其特征在于:包括以下步骤:
1)构建用户基本信息库和影音信息数据库;
2)利用文本数值化技术处理影音信息数据库中的影音简介文本,得到全数值的结构化数据并将其作为text-CNN神经网络的输入,由下式计算影音简介文本隐特征:
Figure BDA0002479792480000021
其中W为text-CNN神经网络输入层特征提取权重系数,K为隐藏层特征提取权重系数,且
Figure BDA0002479792480000022
q∈RN,投影层Xw由输入层的n-1个词向量构成的长度为(n-1)m的向量构成;
经过计算得到yw={yw,1,yw,2,…,yw,N}后,设定wi代表由影音简介文本构成的语料库Context(wi)中的一个词,通过softmax函数归一化得到词wi在用户评分中电影的相似概率:
Figure BDA0002479792480000023
上式中iw为词w在语料库Context(wi)中的索引,
Figure BDA0002479792480000024
表示语料库为Context(w)时词w在语料库Context(wi)中索引为iw的概率;
整个卷积过程中,设定得到的电影简介文本隐特征为F,且F={F1,F2,…,FD},设Fj为电影简介的第j个隐特征,则:
Fj=text_cnn(W,X) (3)
其中W为text-CNN神经网络输入层特征提取权重系数,X为影音简介文本数值化后的概率矩阵;
3)text-CNN神经网络的卷积层对概率矩阵X进行评分特征提取,将卷积窗的大小设定为D×L;池化层对经卷积层处理后影响用户评分的特征放大提取为若干个特征图,即全连接层以N个一维向量HN作为输入;最后全连接层和输出层将代表电影主要特征信息的一维数值向量映射为用户评分矩阵维度大小为D的关于用户评分的电影隐特征矩阵V;
4)从Movielens 1m开放式数据集中统计历史用户初始评分信息并根据归一化函数转为[0,5]的数值化评分矩阵,用户集合为N和电影集合为M,Rij表示用户ui对电影mj的评分矩阵,用户整体初始评分矩阵为R=[Rij]m×n,R分解得到用户评分隐特征矩阵U∈RD×N和电影隐特征矩阵V∈RD×N,特征矩阵维度为D,同时计算用户相似度uSim(ui,uj),将相似度大于0.75的用户归为近邻用户;
Figure BDA0002479792480000031
上式中RM为具有评分结果的电影集合,ui,uj分别为参与评分的用户,
Figure BDA0002479792480000032
为用户ui对电影m的评分,
Figure BDA0002479792480000033
为其评分的均值;
5)将用户整体初始评分矩阵R进行基于模型的概率分解,σU为分解Rij得到的用户隐特征矩阵的方差,σV为分解Rij得到的电影隐特征矩阵的方差;构建用户潜在评分矩阵
Figure BDA0002479792480000034
即用户的评分预测器,
Figure BDA0002479792480000035
具体过程如下:
用户整体初始评分矩阵R的概率密度函数分别为:
Figure BDA0002479792480000036
上式中N为零均值高斯分布概率密度函数,σ为用户整体初始评分矩阵方差;I为用户观影后是否评分的标记函数,为在拟合过程中得到最能表示用户和电影的隐特征矩阵,通过梯度下降法,对U和V不断迭代更新直到损失函数E收敛;
Figure BDA0002479792480000037
上式中Iij为用户i对电影j是否评分的标记函数,参与评分Iij值为1,否则Iij值为0;φ,φUF为防止过拟合的正则化参数;
将损失函数E通过梯度下降法求用户隐特征矩阵U和电影隐特征矩阵V即:
Figure BDA0002479792480000038
再对U和V不断迭代更新求出用户隐特征矩阵U和电影隐特征矩阵V,直到E达到收敛;
Figure BDA0002479792480000041
上式中ρ为学习率;
6)将步骤5)训练后的算法模型存为模型文件,在播单推送引擎的服务程序中调用该模型文件;
7)在对话服务器中定义智能影音播单场景下的语义槽,一旦与近邻用户在智能影音播单场景下进行语音问答时,触发语义槽中定义的和影音播单相关的实体则为其提供影音播单推荐功能。
本发明中基于Text-CNN的影音播单推送方法的影音播单推送系统,其包括本地语音交互终端、对话系统服务器和播单推荐引擎,所述的对话系统服务器、播单推荐引擎分别与本地语音交互终端相连;
所述的本地语音交互终端包括:麦克风阵列、上位机和语音合成芯片板,所述的麦克风阵列与语音合成芯片板连接,所述语音合成芯片板和上位机连接,所述上位机与对话系统服务器连接;所述播单推荐引擎和语音合成芯片板。
进一步,所述的麦克风阵列用于采集用户的语音信息,并将采集的语音信息传送给上位机,上位机对语音信息进行处理后发送给对话系统服务器;
所述的对话系统服务器对上位机发送的语音信息通过语义匹配形成合适的问答文本信息,并根据TCP/IP协议将问答文本信息发送到上位机,上位机解析对话系统服务器发送的问答文本信息,并将解析后的问答文本信息发送给语音合成芯片板,语音合成芯片板将问答文本信息转换成语音信息并发送给麦克风阵列向用户播报;
所述的播单推荐引擎用于根据用户的问答信息为对话用户生成影音播单信息,并根据TCP/IP协议将影音播单信息传送到语音合成芯片板;语音合成芯片板根据影音播单信息生成语音播单推送消息,并将语音播单推送消息发送给麦克风阵列向用户播报。
本发明的有益效果:
本发明基于Text-CNN的影音播单推送方法及影音播单推送系统,其能给用户提供便捷的交互方式,改善传统UI或者手动点击等交互方式便捷化程度低的缺点,并且本发明在电影点播等智能家居场景下可有效与其他以语音控制为核心的软硬件服务进行集成,为用户提供更加便捷化的服务同时满足用户个性化电影点播需求,让产品或者服务在原本的基础设计上更加深入理解用户需求并适时对输出结果进行调整。
附图说明
图1为实施例中基于Text-CNN神经网络的影音播单推送系统的框架图。
图2为实施例中Text-CNN深度神经网络结构图。
图3为实施例中影音文本特征信息提取过程图。
图4为实施例中用户和电影基本信息的矩阵分解过程图。
图5为实施例中矩阵分解过程中引入概率模型图。
图6为实施例中基于Text-CNN神经网络的影音播单推送系统的程序流程图。
图1中附图标记说明:
101-交互用户,102-本地语音交互终端,103-语音交互接口,104-webUI交互接口,105-对话问答服务器,106-播单推送引擎。
具体实施方式
下面结合附图和实施例对本发明作进一步描述。
本实施例中基于Text-CNN的影音播单推送方法,其包括以下步骤:
1)构建用户基本信息库和影音信息数据库;具体可通过用户信息收集模块记录用户的基本信息到MySQL数据库中构成用户基本信息库,可通过搭建PySpider环境抓取电影信息到mongodb数据库中构成影音信息数据库。
2)利用文本数值化技术处理影音信息数据库中的影音简介文本,得到全数值的结构化数据并将其作为text-CNN神经网络的输入,由下式计算影音简介文本隐特征:
Figure BDA0002479792480000061
其中W为text-CNN神经网络输入层特征提取权重系数,K为隐藏层特征提取权重系数,且
Figure BDA0002479792480000062
q∈RN,投影层Xw由输入层的n-1个词向量构成的长度为(n-1)m的向量构成。
经过计算得到yw={yw,1,yw,2,…,yw,N}后,设定wi代表由影音简介文本构成的语料库Context(wi)中的一个词,通过softmax函数归一化得到词wi在用户评分中电影的相似概率:
Figure BDA0002479792480000063
上式中iw为词w在语料库Context(wi)中的索引,
Figure BDA0002479792480000064
表示语料库为Context(w)时词w在语料库Context(wi)中索引为iw的概率。
整个卷积过程中,设定得到的电影简介文本隐特征为F,且F={F1,F2,…,FD},设Fj为电影简介的第j个隐特征,则:
Fj=text_cnn(W,X) (3)
其中W为text-CNN神经网络输入层特征提取权重系数,X为影音简介文本数值化后的概率矩阵。
3)text-CNN神经网络的卷积层对概率矩阵X进行评分特征提取,将卷积窗的大小设定为D×L;池化层对经卷积层处理后影响用户评分的特征放大提取为若干个特征图,即全连接层以N个一维向量HN作为输入;最后全连接层和输出层将代表电影主要特征信息的一维数值向量映射为用户评分矩阵维度大小为D的关于用户评分的电影隐特征矩阵V。
4)从Movielens 1m开放式数据集中统计历史用户初始评分信息并根据归一化函数转为[0,5]的数值化评分矩阵,用户集合为N和电影集合为M,Rij表示用户ui对电影mj的评分矩阵,用户整体初始评分矩阵为R=[Rij]m×n,R分解得到用户评分隐特征矩阵U∈RD×N和电影隐特征矩阵V∈RD×N,特征矩阵维度为D,同时计算用户相似度uSim(ui,uj),将相似度大于0.75的用户归为近邻用户;
Figure BDA0002479792480000071
上式中RM为具有评分结果的电影集合,ui,uj分别为参与评分的用户,
Figure BDA0002479792480000072
为用户ui对电影m的评分,
Figure BDA0002479792480000073
为其评分的均值。
5)将用户整体初始评分矩阵R进行基于模型的概率分解,σU为分解Rij得到的用户隐特征矩阵的方差,σV为分解Rij得到的电影隐特征矩阵的方差,构建用户潜在评分矩阵
Figure BDA0002479792480000074
即用户的评分预测器,
Figure BDA0002479792480000075
具体过程如下:
用户整体初始评分矩阵R的概率密度函数分别为:
Figure BDA0002479792480000076
上式中N为零均值高斯分布概率密度函数,σ为用户整体初始评分矩阵方差;I是用户观影后是否评分的标记函数,为在拟合过程中得到最能表示用户和电影的隐特征矩阵,通过梯度下降法,对U和V不断迭代更新直到损失函数E收敛;
Figure BDA0002479792480000077
上式中Iij为用户i对电影j是否评分的标记函数,参与评分Iij值为1,否则Iij值为0;φ,φUF为防止过拟合的正则化参数。
将损失函数E通过梯度下降法求用户隐特征矩阵U和电影隐特征矩阵V即:
Figure BDA0002479792480000078
再对U和V不断迭代更新求出用户隐特征矩阵U和电影隐特征矩阵V,直到E达到收敛;
Figure BDA0002479792480000081
上式中ρ为学习率,本实施例中ρ取0.25。
6)将步骤5)训练后的算法模型存为模型文件,本实施例中是利用Tensorflow深度学习库将上述步骤5)训练后的算法模型存为Tensorflow模型文件,在播单推送引擎的服务程序中调用该模型文件。
7)在对话服务器中定义智能影音播单场景下的语义槽,一旦与近邻用户在智能影音播单场景下进行语音问答时,触发语义槽中定义的和影音播单相关的实体则为其提供影音播单推荐功能。
本实施例中基于Text-CNN的影音播单推送方法的影音播单推送系统,其包括本地语音交互终端102、对话系统服务器105和播单推荐引擎106,所述的对话系统服务器、播单推荐引擎分别与本地语音交互终端相连。
所述的本地语音交互终端包括:麦克风阵列、上位机和语音合成芯片板,所述语音合成芯片板和上位机连接,上位机为Linux上位机;本实施例中,所述的麦克风阵列与语音合成芯片板和上位机连接,所述上位机与对话系统服务器通过连接语音交互接口103连接;所述的语音合成芯片板与播单推荐引擎通过UI交互接口104连接,用于推荐播单的直观展示。
所述的麦克风阵列用于采集用户的语音信息,并将采集的语音信息传送给上位机,上位机对语音信息进行处理后发送给对话系统服务器。
所述的对话系统服务器对上位机发送的语音信息通过语义匹配形成合适的问答文本信息,并根据TCP/IP协议将问答文本信息发送到上位机,上位机解析对话系统服务器发送的问答文本信息,并将解析后的问答文本信息发送给语音合成芯片板,语音合成芯片板将问答文本信息转换成语音信息并发送给麦克风阵列向用户播报。
所述的播单推荐引擎用于根据用户的问答信息为对话用户生成影音播单信息,并根据TCP/IP协议将影音播单信息传送到语音合成芯片板;语音合成芯片板根据影音播单信息生成语音播单推送消息,并将语音播单推送消息发送给麦克风阵列向用户播报。
本实施例中基于Text-CNN的影音播单推送方法及影音播单推送系统,其能给用户提供便捷的交互方式,改善传统UI或者手动点击等交互方式便捷化程度低的缺点,并且能在电影点播等智能家居场景下有效与其他以语音控制为核心的软硬件服务进行集成,为用户提供更加便捷化的服务同时满足用户个性化电影点播需求,让产品或者服务在原本的基础设计上更加深入理解用户需求并适时对输出结果进行调整。
最后说明的是,以上实施例仅用以说明本发明的技术方案而非限制,尽管参照较佳实施例对本发明进行了详细说明,本领域的普通技术人员应当理解,可以对本发明的技术方案进行修改或者等同替换,而不脱离本发明技术方案的宗旨和范围,其均应涵盖在本发明的权利要求范围当中。

Claims (3)

1.一种基于Text-CNN的影音播单推送方法,其特征在于:包括以下步骤:
1)构建用户基本信息库和影音信息数据库;
2)利用文本数值化技术处理影音信息数据库中的影音简介文本,得到全数值的结构化数据并将其作为text-CNN神经网络的输入,由下式计算影音简介文本隐特征:
Figure FDA0002479792470000011
其中W为text-CNN神经网络输入层特征提取权重系数,K为隐藏层特征提取权重系数,且
Figure FDA0002479792470000012
q∈RN,投影层Xw由输入层的n-1个词向量构成的长度为(n-1)m的向量构成;
经过计算得到yw={yw,1,yw,2,…,yw,N}后,设定wi代表由影音简介文本构成的语料库Context(wi)中的一个词,通过softmax函数归一化得到词wi在用户评分中电影的相似概率:
Figure FDA0002479792470000013
上式中iw为词w在语料库Context(wi)中的索引,
Figure FDA0002479792470000014
表示语料库为Context(w)时词w在语料库Context(wi)中索引为iw的概率;
整个卷积过程中,设定得到的电影简介文本隐特征为F,且F={F1,F2,…,FD},设Fj为电影简介的第j个隐特征,则:
Fj=text_cnn(W,X) (3)
其中W为text-CNN神经网络输入层特征提取权重系数,X为影音简介文本数值化后的概率矩阵;
3)text-CNN神经网络的卷积层对概率矩阵X进行评分特征提取,将卷积窗的大小设定为D×L;池化层对经卷积层处理后影响用户评分的特征放大提取为若干个特征图,即全连接层以N个一维向量HN作为输入;最后全连接层和输出层将代表电影主要特征信息的一维数值向量映射为用户评分矩阵维度大小为D的关于用户评分的电影隐特征矩阵V;
4)从Movielens 1m开放式数据集中统计历史用户初始评分信息并根据归一化函数转为[0,5]的数值化评分矩阵,用户集合为N和电影集合为M,Rij表示用户ui对电影mj的评分矩阵,用户整体初始评分矩阵为R=[Rij]m×n,R分解得到用户评分隐特征矩阵U∈RD×N和电影隐特征矩阵V∈RD×N,同时计算用户相似度uSim(ui,uj),将相似度大于0.75的用户归为近邻用户;
Figure FDA0002479792470000021
上式中RM为具有评分结果的电影集合,ui,uj分别为参与评分的用户,
Figure FDA0002479792470000022
为用户ui对电影m的评分,
Figure FDA0002479792470000023
为其评分的均值;
5)将用户整体初始评分矩阵R进行基于模型的概率分解,σU为分解Rij得到的用户隐特征矩阵的方差,σV为分解Rij得到的电影隐特征矩阵的方差,构建用户潜在评分矩阵
Figure FDA0002479792470000024
即用户的评分预测器,
Figure FDA0002479792470000025
具体过程如下:
用户整体初始评分矩阵R的概率密度函数分别为:
Figure FDA0002479792470000026
上式中N为零均值高斯分布概率密度函数,σ为用户整体初始评分矩阵方差,I是用户观影后是否评分的标记函数;为在拟合过程中得到最能表示用户和电影的隐特征矩阵,通过梯度下降法,对U和V不断迭代更新直到损失函数E收敛;
Figure FDA0002479792470000027
上式中Iij为用户i对电影j是否评分的标记函数,参与评分Iij值为1,否则Iij值为0;φ,φUF为防止过拟合的正则化参数;
将损失函数E通过梯度下降法求用户隐特征矩阵U和电影隐特征矩阵V即:
Figure FDA0002479792470000031
再对U和V不断迭代更新求出用户隐特征矩阵U和电影隐特征矩阵V,直到E达到收敛;
Figure FDA0002479792470000032
上式中ρ为学习率;
6)将步骤5)训练后的算法模型存为模型文件,在播单推送引擎的服务程序中调用该模型文件;
7)在对话服务器中定义智能影音播单场景下的语义槽,一旦与近邻用户在智能影音播单场景下进行语音问答时,触发语义槽中定义的和影音播单相关的实体则为其提供影音播单推荐功能。
2.一种基于Text-CNN的影音播单推送方法的影音播单推送系统,其特征在于:包括本地语音交互终端、对话系统服务器和播单推荐引擎,所述的对话系统服务器、播单推荐引擎分别与本地语音交互终端相连;
所述的本地语音交互终端包括:麦克风阵列、上位机和语音合成芯片板,所述的麦克风阵列与语音合成芯片板连接,所述语音合成芯片板和上位机连接,所述上位机与对话系统服务器连接;所述播单推荐引擎和语音合成芯片板连接。
3.根据权利要求2所述的基于Text-CNN的影音播单推送方法的影音播单推送系统,其特征在于:所述的麦克风阵列用于采集用户的语音信息,并将采集的语音信息传送给上位机,上位机对语音信息进行处理后发送给对话系统服务器;
所述的对话系统服务器对上位机发送的语音信息通过语义匹配形成合适的问答文本信息,并根据TCP/IP协议将问答文本信息发送到上位机,上位机解析对话系统服务器发送的问答文本信息,并将解析后的问答文本信息发送给语音合成芯片板,语音合成芯片板将问答文本信息转换成语音信息并发送给麦克风阵列向用户播报;
所述的播单推荐引擎用于根据用户的问答信息为对话用户生成影音播单信息,并根据TCP/IP协议将影音播单信息传送到语音合成芯片板;语音合成芯片板根据影音播单信息生成语音播单推送消息,并将语音播单推送消息发送给麦克风阵列向用户播报。
CN202010375669.4A 2020-05-07 2020-05-07 基于Text-CNN的影音播单推送方法及影音播单推送系统 Active CN111581333B (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010375669.4A CN111581333B (zh) 2020-05-07 2020-05-07 基于Text-CNN的影音播单推送方法及影音播单推送系统
US17/141,592 US11580979B2 (en) 2020-05-07 2021-01-05 Methods and systems for pushing audiovisual playlist based on text-attentional convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010375669.4A CN111581333B (zh) 2020-05-07 2020-05-07 基于Text-CNN的影音播单推送方法及影音播单推送系统

Publications (2)

Publication Number Publication Date
CN111581333A true CN111581333A (zh) 2020-08-25
CN111581333B CN111581333B (zh) 2023-05-26

Family

ID=72124716

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010375669.4A Active CN111581333B (zh) 2020-05-07 2020-05-07 基于Text-CNN的影音播单推送方法及影音播单推送系统

Country Status (2)

Country Link
US (1) US11580979B2 (zh)
CN (1) CN111581333B (zh)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007148118A (ja) * 2005-11-29 2007-06-14 Infocom Corp 音声対話システム
WO2012013996A1 (en) * 2010-07-30 2012-02-02 Gravity Research & Development Kft. Recommender systems and methods
CN106980648A (zh) * 2017-02-27 2017-07-25 南京邮电大学 一种基于概率矩阵分解结合相似度的个性化推荐方法
CN107291845A (zh) * 2017-06-02 2017-10-24 北京邮电大学 一种基于预告片的电影推荐方法及系统
US20170329820A1 (en) * 2016-05-13 2017-11-16 TCL Research America Inc. Method and system for app page recommendation via inference of implicit intent in a user query
CN107491540A (zh) * 2017-08-24 2017-12-19 济南浚达信息技术有限公司 一种结合深度贝叶斯模型和协同异构信息嵌入的电影推荐方法
CN108536856A (zh) * 2018-04-17 2018-09-14 重庆邮电大学 基于双边网络结构的混合协同过滤电影推荐模型
CN108804683A (zh) * 2018-06-13 2018-11-13 重庆理工大学 结合矩阵分解和协同过滤算法的电影推荐方法
US20200027456A1 (en) * 2018-07-18 2020-01-23 Samsung Electronics Co., Ltd. Electronic device and method for providing artificial intelligence services based on pre-gathered conversations

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10114824B2 (en) * 2015-07-14 2018-10-30 Verizon Patent And Licensing Inc. Techniques for providing a user with content recommendations
US10515116B2 (en) * 2016-02-08 2019-12-24 Hulu, LLC Generation of video recommendations using connection networks
CN107918653B (zh) * 2017-11-16 2022-02-22 百度在线网络技术(北京)有限公司 一种基于喜好反馈的智能播放方法和装置
CN108959429B (zh) * 2018-06-11 2022-09-09 苏州大学 一种融合视觉特征端对端训练的电影推荐的方法及系统
CN109388731A (zh) * 2018-08-31 2019-02-26 昆明理工大学 一种基于深度神经网络的音乐推荐方法

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007148118A (ja) * 2005-11-29 2007-06-14 Infocom Corp 音声対話システム
WO2012013996A1 (en) * 2010-07-30 2012-02-02 Gravity Research & Development Kft. Recommender systems and methods
US20170329820A1 (en) * 2016-05-13 2017-11-16 TCL Research America Inc. Method and system for app page recommendation via inference of implicit intent in a user query
CN106980648A (zh) * 2017-02-27 2017-07-25 南京邮电大学 一种基于概率矩阵分解结合相似度的个性化推荐方法
CN107291845A (zh) * 2017-06-02 2017-10-24 北京邮电大学 一种基于预告片的电影推荐方法及系统
CN107491540A (zh) * 2017-08-24 2017-12-19 济南浚达信息技术有限公司 一种结合深度贝叶斯模型和协同异构信息嵌入的电影推荐方法
CN108536856A (zh) * 2018-04-17 2018-09-14 重庆邮电大学 基于双边网络结构的混合协同过滤电影推荐模型
CN108804683A (zh) * 2018-06-13 2018-11-13 重庆理工大学 结合矩阵分解和协同过滤算法的电影推荐方法
US20200027456A1 (en) * 2018-07-18 2020-01-23 Samsung Electronics Co., Ltd. Electronic device and method for providing artificial intelligence services based on pre-gathered conversations

Also Published As

Publication number Publication date
US20210350800A1 (en) 2021-11-11
US11580979B2 (en) 2023-02-14
CN111581333B (zh) 2023-05-26

Similar Documents

Publication Publication Date Title
CN107886949B (zh) 一种内容推荐方法及装置
CN110364146B (zh) 语音识别方法、装置、语音识别设备及存储介质
WO2022161298A1 (zh) 信息生成方法、装置、设备、存储介质及程序产品
EP3796110A1 (en) Method and apparatus for determining controlled object, and storage medium and electronic device
CN109740447A (zh) 基于人工智能的通信方法、设备及可读存储介质
US20050289582A1 (en) System and method for capturing and using biometrics to review a product, service, creative work or thing
WO2022017083A1 (zh) 一种数据处理方法、装置、设备及可读存储介质
CN110557659A (zh) 视频推荐方法、装置、服务器及存储介质
CN111372141B (zh) 表情图像生成方法、装置及电子设备
CN112100440B (zh) 视频推送方法、设备及介质
CN107507620A (zh) 一种语音播报声音设置方法、装置、移动终端及存储介质
CN113850162A (zh) 一种视频审核方法、装置及电子设备
CN112738557A (zh) 视频处理方法及装置
WO2024140434A1 (zh) 基于多模态知识图谱的文本分类方法、设备及存储介质
CN111488813A (zh) 视频的情感标注方法、装置、电子设备及存储介质
CN106920546A (zh) 智能识别语音的方法及装置
JP2019193023A (ja) 希望映像情報報知システム
CN111581333B (zh) 基于Text-CNN的影音播单推送方法及影音播单推送系统
WO2023142590A1 (zh) 手语视频的生成方法、装置、计算机设备及存储介质
CN116758189A (zh) 基于语音驱动的数字人图像生成方法、装置及存储介质
CN112165626B (zh) 图像处理方法、资源获取方法、相关设备及介质
CN113780370A (zh) 视觉问答方法、装置、设备及存储介质
CN117880566B (zh) 一种基于人工智能的数字人直播交互方法及系统
CN114363557B (zh) 一种面向语义保真的虚拟会议方法及三维虚拟会议系统
CN112218102B (zh) 视频内容发包制作方法、客户端及系统

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant