WO2012079254A1 - Program recommending device and program recommending method - Google Patents

Program recommending device and program recommending method Download PDF

Info

Publication number
WO2012079254A1
WO2012079254A1 PCT/CN2010/079958 CN2010079958W WO2012079254A1 WO 2012079254 A1 WO2012079254 A1 WO 2012079254A1 CN 2010079958 W CN2010079958 W CN 2010079958W WO 2012079254 A1 WO2012079254 A1 WO 2012079254A1
Authority
WO
WIPO (PCT)
Prior art keywords
electronic program
information
feature
unit
user
Prior art date
Application number
PCT/CN2010/079958
Other languages
French (fr)
Chinese (zh)
Inventor
徐金安
祝真宇
满志远
赵云龙
尹力
刘军
Original Assignee
北京交通大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京交通大学 filed Critical 北京交通大学
Priority to PCT/CN2010/079958 priority Critical patent/WO2012079254A1/en
Priority to CN201080070252.1A priority patent/CN103299651B/en
Publication of WO2012079254A1 publication Critical patent/WO2012079254A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/45Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
    • H04N21/466Learning process for intelligent management, e.g. learning user preferences for recommending movies
    • H04N21/4668Learning process for intelligent management, e.g. learning user preferences for recommending movies for recommending content, e.g. movies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/84Generation or processing of descriptive data, e.g. content descriptors

Definitions

  • the invention relates to the field of artificial intelligence research, in particular to a program recommendation device and a program recommendation method.
  • BACKGROUND OF THE INVENTION With the rapid development of network technology, digital television and communication technologies, cable digital television, digital digital television, satellite digital television and wireless digital television have been developed to a large-scale practical stage. Digital technology has brought about a large increase in TV channels, although EPG (Electronic Program Guide) has brought convenience to people because of the book
  • the existing program recommendation methods mainly include the following: rule-based recommendation methods, content filtering-based recommendation methods, collaborative filtering-based recommendation methods, and hybrid-based The recommended method of the strategy.
  • the rule-based recommendation method mainly uses various rules to implement the recommendation function of the program.
  • the acquisition of rules can be divided into manually written rules or mining rules based on association rules.
  • the advantage of this method is that the rules are simple and straightforward to make.
  • the recommendation method based on content filtering implements the recommendation function by comparing the description information of the program and the user.
  • This method can be implemented by machine learning methods such as vector space model, Bayesian method, decision tree, and support vector machine (SVM).
  • SVM support vector machine
  • the recommendation method based on collaborative filtering recommends programs according to the similarity of users.
  • this method can use various clustering and classification algorithms, such as K nearest neighbor (KNN), K-MEANs, fuzzy clustering, naive Bayes, SVM, etc. Wait to achieve.
  • KNN K nearest neighbor
  • K-MEANs K nearest neighbor
  • fuzzy clustering naive Bayes
  • SVM SVM
  • the recommendation method based on the hybrid strategy refers to a combination of a content filtering based method and a collaborative filtering based recommendation method.
  • the method can combine the advantages of the above two methods, and complement each other to form complementary advantages, which can improve the precision and effect of the recommendation to a certain extent.
  • the recommendation method based on content filtering has low recommendation effect and inefficiency for new programs, and there is a "cold start” problem.
  • the recommendation method based on collaborative filtering has low adaptive ability, low scalability, and can not solve the problem of "cold start” and the privacy of others.
  • the scalability of the recommendation method based on the hybrid strategy needs to be further improved, the system has poor anti-malware scoring ability and there is still a problem of "cold start”.
  • the existing program recommendation technology can not solve the "cold start” problem of the program recommendation system well under the premise of ensuring non-infringement of the user's personal privacy, which greatly affects the recommendation accuracy and performance of the program recommendation system. Summary of the invention
  • embodiments of the present invention provide a program recommendation apparatus and a program recommendation method.
  • the technical solution is as follows:
  • a program recommendation device comprising:
  • An input unit configured to receive language information input by a user
  • a program pre-selection unit configured to extract relevant electronic program information from an electronic program table database that has stored electronic program information according to language information received by the input unit;
  • a feature extraction unit configured to perform feature selection on the electronic program information extracted by the program pre-selection unit, obtain a feature element, and acquire association information of the feature element from a knowledge base in which the language knowledge is stored, and construct a feature set; a unit, configured to construct a statistical model by using a feature set obtained by the feature extraction unit and a machine learning method;
  • a program prediction unit configured to match, by using a statistical model constructed by the machine learning unit, a program in the electronic program table database
  • an output unit configured to output a result of the matching of the program prediction unit to the user.
  • the program pre-selection unit includes:
  • a first pre-selected sub-unit configured to: when the language information received by the input unit is a keyword set, perform logical calculation on the keyword set to extract related electronic program information from the electronic program table database.
  • the program pre-selection unit includes:
  • a second pre-selected sub-unit configured to perform word segmentation processing when the language information received by the input unit is a phrase or a sentence, calculate a segmentation result to obtain a spatial model of the user preference, and then calculate the spatial model and the electronic
  • the similarity of the electronic program information in the program table database, and the related electronic program information is extracted based on the similarity.
  • the feature extraction unit further includes:
  • a feedback subunit configured to use the feature element as a search keyword for an electronic program in the electronic program table database The information is searched and evaluated, and the result of the processing is fed back to the program preselection unit;
  • the program preselecting unit is further configured to receive a result of the feedback subunit feedback, extract relevant electronic program information from the electronic program table database according to the feedback result, and output the information to the feature extraction unit. .
  • the knowledge base includes synonymous, synonymous, antisense, conceptual similarity of words, and any one or more of words, word classes, and semantic attributes.
  • a program recommendation method comprising:
  • the result of the matching is output to the user.
  • the received language information is a keyword set
  • logical programming of the keyword set is performed to extract relevant electronic program information from the electronic program table database.
  • the word segmentation process is first performed, the word segmentation result is calculated to obtain a spatial model of the user's preference, and then the spatial model is similar to the electronic program information in the electronic program table database. Degree, extracting related electronic program information according to the similarity.
  • the method further includes:
  • association information of the feature element is obtained from the knowledge base of the stored language knowledge, and the feature set is constructed, including:
  • the association information of the new feature element is obtained from the knowledge base in which the language knowledge is stored, and the feature set is constructed.
  • the knowledge base includes synonymous, synonymous, antisense, conceptual similarity of words, and any one or more of words, word classes, and semantic attributes.
  • the technical solution provided by the embodiment of the present invention has the following beneficial effects: extracting relevant electronic program information from the electronic program table database according to the language information input by the user, and performing feature selection to obtain feature elements, and calling information stored in the knowledge base to feature
  • the element is expanded to obtain a feature set of the user's interest space, and the feature set and the machine learning method are used to construct a statistical model, thereby matching the electronic program table database to output the matching result to the user, realizing the program recommendation, and solving the prior art " The cold start "problem, and improved the accuracy, performance and usability of the program recommendation. Since the above device is located at the user end, the method is also performed on the user side, and does not involve collecting personal information of the user on the network server side or the user end.
  • the privacy information of the user can be fully protected from leakage and the confidentiality is improved.
  • the electronic program table database can be searched and evaluated by using the feature element as a search key, and then the program pre-selection is performed according to the processing result, thereby further expanding the user's interest and interest area and improving the accuracy of the program recommendation.
  • FIG. 1 is a structural diagram of a program recommendation apparatus according to Embodiment 1 of the present invention.
  • FIG. 2 is a structural diagram of a program recommendation apparatus according to Embodiment 2 of the present invention.
  • FIG. 3 is a flowchart of a program recommendation method according to Embodiment 3 of the present invention.
  • Embodiment 4 is a flow chart of a program recommendation method according to Embodiment 4 of the present invention. detailed description
  • this embodiment provides a program recommendation apparatus, including:
  • the input unit 100 is configured to receive language information input by the user;
  • the program pre-selection unit 110 is configured to extract related electronic program information from the electronic program table database in which the electronic program information has been stored according to the language information received by the input unit 100;
  • the feature extraction unit 120 is configured to perform feature selection on the electronic program information extracted by the program pre-selection unit 110, obtain a feature element, and acquire association information of the feature element from the knowledge base of the stored language knowledge to construct a feature set; the machine learning unit 130 , the feature set and the machine learning method obtained by the feature extraction unit 120 are constructed Statistical model
  • the program prediction unit 140 is configured to match the programs in the electronic program table database by using the statistical model constructed by the machine learning unit 130;
  • the output unit 150 is configured to output a result matched by the program prediction unit 140 to the user.
  • the electronic program guide (EPG) involved in the embodiment of the present invention is not limited to the EPG of a television program, and is acceptable to any other recommendation system composed of an electronic program guide.
  • the input unit 100 receives the language input by the user, and can be implemented in various manners, including but not limited to: a remote controller, a keyboard, a pointing device (such as a mouse), a handwritten character recognition, an optical character reader, and the like. Universal input modules, or voice input through a speech recognition system, and by reading text files or reading databases are acceptable.
  • the input unit 100 can use any method as long as it performs processing to finally obtain input of language information.
  • the user's input can be a keyword or a phrase or sentence describing the user's preferences.
  • the program pre-selection unit 110 may include:
  • a first pre-selected sub-unit configured to: when the language information received by the input unit 100 is a keyword set, perform logical calculation on the keyword set to extract relevant electronic program information from the electronic program table database; and/or,
  • a second pre-selected sub-unit configured to perform word segmentation processing when the language information received by the input unit 100 is a phrase or a sentence, calculate a segmentation result to obtain a spatial model of the user's preference, and then calculate a spatial model and an electronic program table database.
  • the similarity of the electronic program information, and the related electronic program information is extracted based on the similarity.
  • the first pre-selected sub-unit may directly extract the program from the EPG database by using the keyword set, and logical operations such as logical AND, logical OR, logical non-logical, logical and non-operation may be used between the keywords in the keyword set.
  • the second pre-selected sub-unit can be processed by using a word-dividing tool, and then the spatial model of the user's preference can be obtained by using a method for calculating the word frequency for the word segmentation result, and then the similarity between the spatial model and the electronic program information in the EPG database is calculated, and then sorted. Recommended results.
  • program pre-selection unit 110 may also provide the extracted electronic program information to the user, perform initial screening by the user, and then output the result of the user screening confirmation to the feature extraction unit 120.
  • the EPG database according to the embodiment of the present invention may be composed of an electronic program table according to a certain structure or semi-structured.
  • Digital TVs such as IPTV and cable TV, can generally provide programs that are available for two weeks from the day of viewing.
  • the data in the EPG database can be extracted from the digital wireless television receiver or from the Internet.
  • EPG Generally includes program number, program name, program introduction, channel, start and end time, etc., which can be stored in the EPG database according to a certain data format.
  • the program information accessed in the EPG may be past and present.
  • the embodiment information of the present invention is not specifically limited.
  • the EPG database of the present invention allows accumulating and storing EPG data of a past time, such as an electronic program of the past one year or six months or three months from the date of viewing by the user, the purpose of which is to provide the user with sufficient data space of interest selection. .
  • the feature extraction unit 120 performs a feature selection method, including but not limited to: a feature extraction method based on a document frequency, an information gain method, an X 2 statistical method, a mutual information method, and the like.
  • Feature selection can be calculated based on feature weights, such as Boolean weights, absolute TF (Term Frequency), IDF (Inverse Document Frequency), TF-IDF (Terminal Frequency and Inverse Document Frequency), TFC (Term Frequency Count), ITC, Entropy Weight, TF-IWF, etc., which are not specifically described in this embodiment of the present invention. limited.
  • the association information of the feature elements acquired by the feature extraction unit 120 from the knowledge base includes: attribute information such as semantics and concepts of the words, and the information may serve as a feature set of the user's interest and favorite space, thereby providing modeling for the machine learning unit 130. Data conditions and basis for judgment.
  • the knowledge base involved in the embodiments of the present invention includes synonymous, synonymous, antisense, similarity of concepts, and any one or more of words, word classes and semantic attributes.
  • the knowledge base can not only include the above-mentioned attribute features such as semantics and concepts, but also can include organizational information related to the attribute characteristics.
  • the organized information refers to the appropriate organization and management of the feature elements according to the structure of the knowledge in the knowledge base. Information, such as establishing the relationship of the concept and the semantic relationship of the envelope. Organizational management can be carried out according to the concept semantic network. At the same time, different weights can be assigned to each element according to the level of the conceptual semantic network to improve the performance of the system.
  • the knowledge base can be built manually, or it can use existing dictionaries or semantic dictionaries. For example, WordNet in English, HowNet in Chinese, and EDR electronic dictionary in Japanese. At the same time, various synonyms, synonym electronic dictionaries, and the like can be utilized.
  • the machine learning method used by the machine learning unit 130 is various, such as a supervised machine learning method or an unsupervised machine learning method, and a semi-supervised machine learning method, etc.; specifically, a support vector machine (SVM) is adopted.
  • the decision tree (decision tr ee ), the Bayesian, the maximum entropy, and the conditional random field are implemented by any one of the algorithms, and may also be implemented by using multiple construction hybrid algorithms, which are not implemented by the embodiment of the present invention. Specifically limited.
  • the program prediction unit 140 may further perform sorting processing on the matching result, and then output the sorted result to the output unit 150. Accordingly, the output unit 150 outputs the result to the user.
  • the output unit 150 outputs the result of the program recommendation to the user in various forms, which may be a file output, or a display output, etc., wherein the output may be output and presented to the user in a specific format, and the final expression manner. It may be in any form, such as a highlight recommendation, a voice reminder, etc., which is not specifically limited in this embodiment of the present invention.
  • the user can request to play the program he needs, thereby receiving the corresponding data stream for viewing.
  • the feature extraction unit 120 may also use a clustering or classification algorithm before or after feature selection.
  • the calculation may be performed by the machine learning unit 130 by using a clustering or classification algorithm before or after the statistical model is constructed, so as to further improve the accuracy of the program recommendation, which is not specifically limited in the embodiment of the present invention.
  • the device provided by the embodiment extracts relevant electronic program information from the electronic program table database according to the language information input by the user, and performs feature selection to obtain feature elements, and invokes information stored in the knowledge base to expand the feature element to obtain a user.
  • a feature set of the hobby space using the feature set and the machine learning method to construct a statistical model, thereby matching the electronic program table database output matching result to the user, realizing the program recommendation, and solving the "cold start" problem of the prior art, It also improves the accuracy, performance and usability of program recommendations. Since the above device is located at the user end, it does not involve collecting personal information of the user on the network server side or the user end. Therefore, the privacy information of the user can be fully protected from leakage and the confidentiality is improved.
  • the electronic program table database can be searched and evaluated by using the feature element as a search key, and then the program pre-selection can be performed again according to the processing result, thereby further expanding the user's interest and interest area and improving the accuracy of the program recommendation.
  • Example 2
  • the present embodiment provides a program recommendation apparatus, including: an input unit 100, a program preselection unit 110, a feature extraction unit 120, a machine learning unit 130, a program prediction unit 140, and an output unit 150.
  • the function of each unit is the same as that described in Embodiment 1, and the improvement on the basis of the improvement is that the feature extraction unit 120 may further include:
  • the feedback sub-unit 120a is configured to perform retrieval and evaluation processing on the electronic program information in the electronic program table database by using the above-mentioned feature element as a retrieval keyword, and feed back the result of the processing to the program pre-selection unit 110;
  • the program pre-selection unit 110 is further configured to receive the feedback of the feedback sub-unit, extract relevant electronic program information from the electronic program table database according to the result of the feedback, and output the information to the feature extraction unit 120, so that the feature extraction unit 120 can Selecting feature information according to the electronic program information extracted by the language information and the electronic program information extracted according to the feature element, obtaining a new feature element, and acquiring association information of the new feature element from a knowledge base of the stored language knowledge,
  • the feature set is constructed, so that the feature set can be expanded, and the user can select his favorite program more accurately, thereby improving the prediction accuracy of the system.
  • the feature extraction unit 120 may further determine whether it is necessary to pre-select, and if yes, perform the above-mentioned feedback operation; otherwise, continue the execution in the manner of Embodiment 1.
  • multiple ways can be used to determine whether it is necessary to pre-select, for example, a simple question window can be preset, whether the user needs to re-select the television program, or simultaneously output the feature element as a dynamic similar to the semantic network graphic.
  • a simple question window can be preset, whether the user needs to re-select the television program, or simultaneously output the feature element as a dynamic similar to the semantic network graphic.
  • the embodiment of the present invention does not specifically limit this.
  • the device provided by the embodiment extracts relevant electronic program information from the electronic program table database according to the language information input by the user, and performs feature selection to obtain feature elements, and invokes information stored in the knowledge base to expand the feature element to obtain a user.
  • a feature set of the hobby space using the feature set and the machine learning method to construct a statistical model, thereby matching the electronic program table database output matching result to the user, realizing the program recommendation, and solving the "cold start" problem of the prior art, It also improves the accuracy, performance and usability of program recommendations. Since the above device is located at the user end, the personal information of the user is not involved in the network server end or the user end. Therefore, the privacy information of the user can be fully protected from leakage and the confidentiality is improved.
  • the electronic program table database can be searched and evaluated by using the feature element as a search keyword, and then the program pre-selection is performed again according to the processing result, thereby further expanding the user's interest and interest area and improving the accuracy of the program recommendation.
  • Example 3
  • this embodiment provides a program recommendation method, including:
  • S01 receiving language information input by the user
  • S02 extract relevant electronic program information from an electronic program table database that has stored electronic program information according to the language information
  • the user inputs the program of interest or the space of interest of the user, and the input content may be a keyword or a phrase or sentence describing the user's preference.
  • S02 may specifically include:
  • S02a when the received language information is a keyword set, logically calculating the keyword set, and extracting relevant electronic program information from the electronic program table database; and/or,
  • a program can be directly extracted from the EPG database by using a keyword set, and each keyword in the keyword set can be implemented by using logical operations such as logical AND, logical OR, logical NOT, logical and non-operation.
  • the word segmentation tool can be used for processing, and then the user's favorite space model can be obtained by using the method of calculating the word frequency for the word segmentation result, and then the similarity between the space model and the electronic program information in the EPG database is calculated, and then the recommendation result is sorted.
  • the extracted electronic program information may be provided to the user, and the user may perform initial screening, and then the result of the user screening confirmation is used as the extracted electronic program information.
  • the EPG database of the embodiment of the present invention may be composed of an electronic program table according to a certain structure or a semi-structured structure, which is the same as that described in Embodiment 1, and details are not described herein again.
  • the knowledge base involved in this embodiment includes synonymous, synonymous, antisense, similarity of concepts, and any one or more of words, word classes and semantic attributes, which are specifically described in Embodiment 1. , will not repeat them here.
  • the feature selection may be performed based on the feature weights, and the feature weights are also calculated in many ways, such as a Boolean weight, an absolute word frequency TF, an IDF, a TF-IDF, a TFC, an ITC, an entropy weight, a TF-IWF, etc., in the embodiment of the present invention. This is not specifically limited.
  • the association information of the feature elements obtained from the knowledge base in S04 includes: attribute information such as semantics and concepts of the words, and the information can be used as a feature set of the user's interest and favorite space, thereby providing data conditions and judgment basis for modeling.
  • the machine learning methods used in S05 are various, such as supervised machine learning methods or unsupervised machine learning methods, and semi-supervised machine learning methods; in particular, such as support vector machine (SVM), decision making
  • SVM support vector machine
  • the decision tree is implemented by any one of the algorithms such as the Bayesian, the maximum entropy, and the conditional random field. It can also be implemented by using a plurality of construction hybrid algorithms, which are not specifically limited in the embodiment of the present invention.
  • the matching result may be further sorted in S06, and correspondingly, the sorted result is output to the user in S07.
  • the result of the S07 output program recommendation can be used by the user in various forms, such as file output, display output, etc.
  • the screen can be displayed to the user on one screen, or The embodiment of the present invention is not specifically limited. After obtaining the recommendation result, the user can request to play the program that he or she needs, and receive the corresponding data stream for viewing.
  • the clustering or classification algorithm may be used in the S03 before or after the feature selection, and the clustering or classification algorithm may be used in the S05 before or after the statistical model is constructed, thereby further improving the program recommendation. Accuracy, which is not specifically limited in this embodiment of the present invention.
  • the above method provided by the embodiment extracts related electronic program information from the electronic program table database according to the language information input by the user, and performs feature selection to obtain feature elements, and invokes information stored in the knowledge base to enter the feature element.
  • the line is expanded to obtain the feature set of the user's interest space, and the feature set and the machine learning method are used to construct the statistical model, thereby matching the electronic program table database to output the matching result to the user, realizing the program recommendation, and solving the "cold" of the prior art. Start the "problem, and improve the accuracy, performance and usability of the program recommendation. Since the foregoing method is performed on the user side, the user information of the user is not involved in the server side or the user end of the network side.
  • the privacy information of the user can be fully protected from leakage and the confidentiality is improved.
  • the electronic program table database can be searched and evaluated by using the feature element as a search keyword, and then the program pre-selection is performed again according to the processing result, thereby further expanding the user's interest and interest area and improving the accuracy of the program recommendation.
  • this embodiment provides a program recommendation method, which is improved in that the electronic program information is extracted from the EPG database again according to the obtained feature elements, thereby constructing a feature set, as shown in FIG.
  • the method specifically includes:
  • S12 extract relevant electronic program information from an electronic program table database that has stored electronic program information according to the language information
  • S14 it may be determined whether it is necessary to pre-select, if yes, proceed to S14 and subsequent steps; otherwise, directly select feature information of the electronic program information extracted according to the language information to obtain a feature element, and from the stored language knowledge Obtain the association information of the feature element in the knowledge base, construct the feature set, and then continue to perform the subsequent steps such as S17.
  • whether the above judgment needs to be pre-selected can be performed in various manners, for example, a simple question window can be preset, whether the user needs to re-select the television program, or simultaneously output the feature element as a dynamic similar semantic network diagram.
  • the manner of the present invention is provided to the user for the user to observe and analyze, and the like, which is not specifically limited in the embodiment of the present invention.
  • the above method provided by the embodiment extracts relevant electronic program information from the electronic program table database according to the language information input by the user, and performs feature selection to obtain feature elements, and calls the information stored in the knowledge base to expand the feature element to obtain the user.
  • a feature set of the hobby space using the feature set and the machine learning method to construct a statistical model, thereby matching the electronic program table database output matching result to the user, realizing the program recommendation, and solving the "cold start" problem of the prior art, It also improves the accuracy, performance and usability of program recommendations. Since the above method is performed on the user side, it does not involve collecting user personal information on the network server side or the user end. Therefore, the privacy information of the user can be fully protected from leakage and the confidentiality is improved.
  • the electronic program table database can be searched and evaluated by using the feature element as a search keyword, and then the program pre-selection is performed again according to the processing result, thereby further expanding the user's interest and interest area and improving the accuracy of the program recommendation.
  • All or part of the above technical solutions provided by the embodiments of the present invention may be completed by hardware related to program instructions, and the program may be stored in a readable storage medium, and the storage medium includes: a ROM, a RAM, a magnetic disk or an optical disk. And other media that can store program code.

Abstract

A program recommending device and program recommending method are provided in the present invention, which belongs to the field of artificial intelligence research. The device includes: an input unit (100), a program pre-selecting unit (110), a feature extracting unit (120), a machine learning unit (130), a program forecast unit (140) and an output unit (150). The method includes: receiving language information inputted by a user; according to the language information, extracting associated electronic program information from a electronic program list database in which the electronic program information has been stored; selecting features from the extracted electronic program information to obtain feature elements, obtaining the associated information of the feature elements from a knowledge database in which language knowledge has been stored, and constructing a feature set; constructing a statistic model by using the feature set and a machine learning method; matching programs in the electronic program list database by using the statistic model; and exporting a matching result to the user. The present invention resolves a problem of cold startup of a program recommending system, ensures privacy information of a user from leakage, and improves precision, performance and practicability of program recommending.

Description

节目推荐装置和节目推荐方法 技术领域  Program recommendation device and program recommendation method
本发明涉及人工智能研究领域, 特别涉及一种节目推荐装置和节目推荐方法。 背景技术 说 随着网络技术、 数字电视和通信技术的飞速发展, 当前, 有线数字电视、 网络数字电视、 卫星数字电视和无线数字电视已经发展到了大规模实用阶段。 数字技术带来电视频道的大量 增加、 尽管 EPG (Electronic Program Guide, 电子节目表)为人们带来了一定的便利, 由于电 书  The invention relates to the field of artificial intelligence research, in particular to a program recommendation device and a program recommendation method. BACKGROUND OF THE INVENTION With the rapid development of network technology, digital television and communication technologies, cable digital television, digital digital television, satellite digital television and wireless digital television have been developed to a large-scale practical stage. Digital technology has brought about a large increase in TV channels, although EPG (Electronic Program Guide) has brought convenience to people because of the book
视节目资源日趋繁多, 导致人们很难快速发现自己真正喜欢的节目。 Due to the increasing variety of program resources, it is difficult for people to quickly find the programs they really like.
为了解决信息过载问题, 形式多样的节目推荐系统应运而生, 现有的节目推荐方法主要 包括以下几种: 基于规则的推荐方法、 基于内容过滤的推荐方法、 基于协同过滤的推荐方法 和基于混合策略的推荐方法。  In order to solve the problem of information overload, a variety of program recommendation systems have emerged. The existing program recommendation methods mainly include the following: rule-based recommendation methods, content filtering-based recommendation methods, collaborative filtering-based recommendation methods, and hybrid-based The recommended method of the strategy.
基于规则的推荐方法主要运用各种规则来实现节目的推荐功能。 规则的获取又可以分为 人工编写的规则或基于关联规则的挖掘技术来加以实现。 该方法的优点在于规则的制作简单 直接。  The rule-based recommendation method mainly uses various rules to implement the recommendation function of the program. The acquisition of rules can be divided into manually written rules or mining rules based on association rules. The advantage of this method is that the rules are simple and straightforward to make.
基于内容过滤的推荐方法通过比较节目和用户的描述信息来实现推荐功能。 此方法可以 采用向量空间模型、 贝叶斯方法、 决策树、 支持向量机 (SVM) 等机器学习方法加以实现。 该方法的优点在于方法简单, 可以对用户潜在的需求做出适当的预测。  The recommendation method based on content filtering implements the recommendation function by comparing the description information of the program and the user. This method can be implemented by machine learning methods such as vector space model, Bayesian method, decision tree, and support vector machine (SVM). The advantage of this method is that the method is simple and can make appropriate predictions for the potential needs of the user.
基于协同过滤的推荐方法根据用户的相似性来推荐节目。在计算用户之间的相似度方面, 此方法可以采用各种聚类和分类算法, 如 K最近邻法 (KNN)、 K平均 (K-MEANs)、 模糊 聚类、 朴素贝叶斯、 SVM等等来实现。 该方法的优点在于能够为用户发现一部分新的感兴趣 的节目。  The recommendation method based on collaborative filtering recommends programs according to the similarity of users. In calculating the similarity between users, this method can use various clustering and classification algorithms, such as K nearest neighbor (KNN), K-MEANs, fuzzy clustering, naive Bayes, SVM, etc. Wait to achieve. The advantage of this method is the ability to discover a portion of the new program of interest for the user.
基于混合策略的推荐方法是指综合采用基于内容过滤的方法和基于协同过滤的推荐方 法。 该方法能综合上述两种方法的优点, 相互取长补短, 形成优势互补, 能够在一定程度上 改善推荐的精度和效果。  The recommendation method based on the hybrid strategy refers to a combination of a content filtering based method and a collaborative filtering based recommendation method. The method can combine the advantages of the above two methods, and complement each other to form complementary advantages, which can improve the precision and effect of the recommendation to a certain extent.
在实现本发明的过程中, 发明人发现上述现有技术至少具有以下缺点:  In carrying out the process of the present invention, the inventors have found that the above prior art has at least the following disadvantages:
基于规则的推荐方法中规则的主观性较强, 质量难以保证; 规则的增加会导致规则之间 相互冲突, 系统的管理和升级困难等问题。 基于内容过滤的推荐方法对于全新的节目的推荐 效果和效率不高, 存在 "冷启动" 问题。 基于协同过滤的推荐方法自适应能力低下、 可扩展 性能不高、 不能很好地解决 "冷启动 "问题、 还涉及他人隐私等问题。 基于混合策略的推荐方 法的可扩展性有待进一步提高, 系统抗恶意评分能力差且依旧存在 "冷启动" 的问题。 综上 所述, 现有的节目推荐技术无法在确保不侵犯用户个人隐私的前提下, 很好地解决节目推荐 系统的 "冷启动" 问题, 极大地影响了节目推荐系统的推荐精度和性能。 发明内容 In the rule-based recommendation method, the subjectivity of the rules is strong, and the quality is difficult to guarantee; the increase of rules leads to the rules. Conflicts, difficulties in system management and upgrades. The recommendation method based on content filtering has low recommendation effect and inefficiency for new programs, and there is a "cold start" problem. The recommendation method based on collaborative filtering has low adaptive ability, low scalability, and can not solve the problem of "cold start" and the privacy of others. The scalability of the recommendation method based on the hybrid strategy needs to be further improved, the system has poor anti-malware scoring ability and there is still a problem of "cold start". In summary, the existing program recommendation technology can not solve the "cold start" problem of the program recommendation system well under the premise of ensuring non-infringement of the user's personal privacy, which greatly affects the recommendation accuracy and performance of the program recommendation system. Summary of the invention
为了解决现有技术的问题, 本发明实施例提供了一种节目推荐装置和节目推荐方法。 所 述技术方案如下:  In order to solve the problems in the prior art, embodiments of the present invention provide a program recommendation apparatus and a program recommendation method. The technical solution is as follows:
一种节目推荐装置, 所述装置包括:  A program recommendation device, the device comprising:
输入单元, 用于接收用户输入的语言信息;  An input unit, configured to receive language information input by a user;
节目预选单元, 用于根据所述输入单元接收的语言信息, 从已存储电子节目信息的电子 节目表数据库中提取相关的电子节目信息;  a program pre-selection unit, configured to extract relevant electronic program information from an electronic program table database that has stored electronic program information according to language information received by the input unit;
特征抽取单元, 用于对所述节目预选单元提取的电子节目信息进行特征选取, 得到特征 元素, 并从已存储语言知识的知识库中获取所述特征元素的关联信息, 构建特征集合; 机器学习单元, 用于利用所述特征抽取单元得到的特征集合和机器学习方法, 构建统计 模型;  a feature extraction unit, configured to perform feature selection on the electronic program information extracted by the program pre-selection unit, obtain a feature element, and acquire association information of the feature element from a knowledge base in which the language knowledge is stored, and construct a feature set; a unit, configured to construct a statistical model by using a feature set obtained by the feature extraction unit and a machine learning method;
节目预测单元, 用于利用所述机器学习单元构建的统计模型, 对所述电子节目表数据库 中的节目进行匹配;  a program prediction unit, configured to match, by using a statistical model constructed by the machine learning unit, a program in the electronic program table database;
输出单元, 用于输出所述节目预测单元匹配的结果给用户。  And an output unit, configured to output a result of the matching of the program prediction unit to the user.
所述节目预选单元包括:  The program pre-selection unit includes:
第一预选子单元, 用于当所述输入单元接收的语言信息为关键词集合时, 对所述关键词 集合进行逻辑计算后从所述电子节目表数据库中提取相关的电子节目信息。  And a first pre-selected sub-unit, configured to: when the language information received by the input unit is a keyword set, perform logical calculation on the keyword set to extract related electronic program information from the electronic program table database.
所述节目预选单元包括:  The program pre-selection unit includes:
第二预选子单元, 用于当所述输入单元接收的语言信息为短语或句子时, 先进行分词处 理, 对分词结果进行计算得到用户喜好的空间模型, 然后计算所述空间模型与所述电子节目 表数据库中的电子节目信息的相似度, 根据所述相似度提取相关的电子节目信息。  a second pre-selected sub-unit, configured to perform word segmentation processing when the language information received by the input unit is a phrase or a sentence, calculate a segmentation result to obtain a spatial model of the user preference, and then calculate the spatial model and the electronic The similarity of the electronic program information in the program table database, and the related electronic program information is extracted based on the similarity.
所述特征抽取单元还包括:  The feature extraction unit further includes:
反馈子单元, 用于以所述特征元素为检索关键词对所述电子节目表数据库中的电子节目 信息进行检索和评价处理, 并把处理的结果反馈给所述节目预选单元; a feedback subunit, configured to use the feature element as a search keyword for an electronic program in the electronic program table database The information is searched and evaluated, and the result of the processing is fed back to the program preselection unit;
相应地, 所述节目预选单元还用于接收所述反馈子单元反馈的结果, 根据所述反馈的结 果从所述电子节目表数据库中提取相关的电子节目信息, 并输出给所述特征抽取单元。  Correspondingly, the program preselecting unit is further configured to receive a result of the feedback subunit feedback, extract relevant electronic program information from the electronic program table database according to the feedback result, and output the information to the feature extraction unit. .
所述知识库包括词的同义、 近义、 反义, 概念的相似度, 以及词、 词类和语义属性中的 任意的一种或多种。  The knowledge base includes synonymous, synonymous, antisense, conceptual similarity of words, and any one or more of words, word classes, and semantic attributes.
一种节目推荐方法, 所述方法包括:  A program recommendation method, the method comprising:
接收用户输入的语言信息;  Receiving language information input by the user;
根据所述语言信息, 从已存储电子节目信息的电子节目表数据库中提取相关的电子节目 信息;  Extracting related electronic program information from an electronic program table database in which electronic program information has been stored, based on the language information;
对所述提取的电子节目信息进行特征选取, 得到特征元素, 并从已存储语言知识的知识 库中获取所述特征元素的关联信息, 构建特征集合;  Performing feature selection on the extracted electronic program information, obtaining feature elements, and acquiring association information of the feature elements from a knowledge base in which the language knowledge is stored, and constructing a feature set;
利用所述特征集合和机器学习方法, 构建统计模型;  Constructing a statistical model using the feature set and the machine learning method;
利用所述统计模型, 对所述电子节目表数据库中的节目进行匹配;  Using the statistical model, matching programs in the electronic program table database;
输出所述匹配的结果给用户。  The result of the matching is output to the user.
根据所述语言信息, 从已存储电子节目信息的电子节目表数据库中提取相关的电子节目 信息, 包括:  Extracting related electronic program information from the electronic program table database in which the electronic program information has been stored, according to the language information, including:
当所述接收的语言信息为关键词集合时, 对所述关键词集合进行逻辑计算后从所述电子 节目表数据库中提取相关的电子节目信息。  When the received language information is a keyword set, logical programming of the keyword set is performed to extract relevant electronic program information from the electronic program table database.
根据所述语言信息, 从已存储电子节目信息的电子节目表数据库中提取相关的电子节目 信息, 包括:  Extracting related electronic program information from the electronic program table database in which the electronic program information has been stored, according to the language information, including:
当所述接收的语言信息为短语或句子时, 先进行分词处理, 对分词结果进行计算得到用 户喜好的空间模型, 然后计算所述空间模型与所述电子节目表数据库中的电子节目信息的相 似度, 根据所述相似度提取相关的电子节目信息。  When the received language information is a phrase or a sentence, the word segmentation process is first performed, the word segmentation result is calculated to obtain a spatial model of the user's preference, and then the spatial model is similar to the electronic program information in the electronic program table database. Degree, extracting related electronic program information according to the similarity.
对所述提取的电子节目信息进行特征选取, 得到特征元素之后, 还包括:  After performing feature selection on the extracted electronic program information to obtain a feature element, the method further includes:
以所述特征元素为检索关键词对所述电子节目表数据库中的电子节目信息进行检索和评 价处理, 并根据所述处理的结果从所述电子节目表数据库中提取相关的电子节目信息, 然后 对根据所述语言信息提取的电子节目信息和根据所述特征元素提取的电子节目信息进行特征 选取, 得到新的特征元素;  Searching and evaluating the electronic program information in the electronic program table database by using the feature element as a search keyword, and extracting relevant electronic program information from the electronic program table database according to the result of the processing, and then Feature selection of electronic program information extracted according to the language information and electronic program information extracted according to the feature element, to obtain a new feature element;
相应地, 从已存储语言知识的知识库中获取所述特征元素的关联信息, 构建特征集合, 包括: 从已存储语言知识的知识库中获取所述新的特征元素的关联信息, 构建特征集合。 所述知识库包括词的同义、 近义、 反义, 概念的相似度, 以及词、 词类和语义属性中的 任意的一种或多种。 Correspondingly, the association information of the feature element is obtained from the knowledge base of the stored language knowledge, and the feature set is constructed, including: The association information of the new feature element is obtained from the knowledge base in which the language knowledge is stored, and the feature set is constructed. The knowledge base includes synonymous, synonymous, antisense, conceptual similarity of words, and any one or more of words, word classes, and semantic attributes.
本发明实施例提供的技术方案的有益效果是: 根据用户输入的语言信息从电子节目表数 据库中提取相关的电子节目信息, 并进行特征选取得到特征元素, 以及调用知识库中存储的 信息对特征元素进行扩充得到用户兴趣爱好空间的特征集合, 用该特征集合和机器学习的方 法构建统计模型, 以此匹配电子节目表数据库输出匹配结果给用户, 实现了节目推荐, 解决 了现有技术的 "冷启动" 问题, 而且提高了节目推荐的精度、 性能和实用性。 由于上述装置 位于用户端, 该方法也是在用户侧执行的, 不涉及在网络服务器端或用户端采集用户个人信 息, 因此, 可以充分保障用户的隐私情报不泄漏, 提高了保密性。 另外, 还可以以特征元素 为检索关键词对电子节目表数据库进行检索和评价处理, 然后根据处理结果再次进行节目预 选, 从而可以进一步扩大用户兴趣爱好空间, 提高节目推荐的精度。 附图说明  The technical solution provided by the embodiment of the present invention has the following beneficial effects: extracting relevant electronic program information from the electronic program table database according to the language information input by the user, and performing feature selection to obtain feature elements, and calling information stored in the knowledge base to feature The element is expanded to obtain a feature set of the user's interest space, and the feature set and the machine learning method are used to construct a statistical model, thereby matching the electronic program table database to output the matching result to the user, realizing the program recommendation, and solving the prior art " The cold start "problem, and improved the accuracy, performance and usability of the program recommendation. Since the above device is located at the user end, the method is also performed on the user side, and does not involve collecting personal information of the user on the network server side or the user end. Therefore, the privacy information of the user can be fully protected from leakage and the confidentiality is improved. In addition, the electronic program table database can be searched and evaluated by using the feature element as a search key, and then the program pre-selection is performed according to the processing result, thereby further expanding the user's interest and interest area and improving the accuracy of the program recommendation. DRAWINGS
图 1是本发明实施例 1提供的节目推荐装置结构图;  1 is a structural diagram of a program recommendation apparatus according to Embodiment 1 of the present invention;
图 2是本发明实施例 2提供的节目推荐装置结构图;  2 is a structural diagram of a program recommendation apparatus according to Embodiment 2 of the present invention;
图 3是本发明实施例 3提供的节目推荐方法流程图;  3 is a flowchart of a program recommendation method according to Embodiment 3 of the present invention;
图 4是本发明实施例 4提供的节目推荐方法流程图。 具体实施方式  4 is a flow chart of a program recommendation method according to Embodiment 4 of the present invention. detailed description
为使本发明的目的、 技术方案和优点更加清楚, 下面将结合附图对本发明实施方式作进 一步地详细描述。  In order to make the objects, the technical solutions and the advantages of the present invention more apparent, the embodiments of the present invention will be further described in detail below with reference to the accompanying drawings.
实施例 1  Example 1
参见图 1, 本实施例提供了一种节目推荐装置, 包括:  Referring to FIG. 1, this embodiment provides a program recommendation apparatus, including:
输入单元 100, 用于接收用户输入的语言信息;  The input unit 100 is configured to receive language information input by the user;
节目预选单元 110, 用于根据输入单元 100接收的语言信息, 从已存储电子节目信息的 电子节目表数据库中提取相关的电子节目信息;  The program pre-selection unit 110 is configured to extract related electronic program information from the electronic program table database in which the electronic program information has been stored according to the language information received by the input unit 100;
特征抽取单元 120, 用于对节目预选单元 110提取的电子节目信息进行特征选取, 得到 特征元素, 并从已存储语言知识的知识库中获取特征元素的关联信息, 构建特征集合; 机器学习单元 130, 用于利用特征抽取单元 120得到的特征集合和机器学习方法, 构建 统计模型; The feature extraction unit 120 is configured to perform feature selection on the electronic program information extracted by the program pre-selection unit 110, obtain a feature element, and acquire association information of the feature element from the knowledge base of the stored language knowledge to construct a feature set; the machine learning unit 130 , the feature set and the machine learning method obtained by the feature extraction unit 120 are constructed Statistical model
节目预测单元 140, 用于利用机器学习单元 130构建的统计模型, 对电子节目表数据库 中的节目进行匹配;  The program prediction unit 140 is configured to match the programs in the electronic program table database by using the statistical model constructed by the machine learning unit 130;
输出单元 150, 用于输出节目预测单元 140匹配的结果给用户。  The output unit 150 is configured to output a result matched by the program prediction unit 140 to the user.
本发明实施例中涉及的电子节目表(EPG), 不仅仅限于电视节目的 EPG, 对其他任何用 电子节目表构成的推荐系统都是可以接受的。  The electronic program guide (EPG) involved in the embodiment of the present invention is not limited to the EPG of a television program, and is acceptable to any other recommendation system composed of an electronic program guide.
本实施例中, 输入单元 100接收由用户输入的语言, 可以采用多种方式实现, 包括但不 限于: 遥控器、 键盘、 定点装置 (如鼠标)、 手写字符识别、 光学字符读取器等任何通用输入 模块, 或者通过语音识别系统进行语音输入、 以及通过读取文本文件或读取数据库等形式都 是可以接受的。 输入单元 100可以使用任何方法, 只要其执行处理最终获得语言信息的输入 即可。 用户的输入可以是关键词, 也可以是描述用户的喜好的短语或句子。  In this embodiment, the input unit 100 receives the language input by the user, and can be implemented in various manners, including but not limited to: a remote controller, a keyboard, a pointing device (such as a mouse), a handwritten character recognition, an optical character reader, and the like. Universal input modules, or voice input through a speech recognition system, and by reading text files or reading databases are acceptable. The input unit 100 can use any method as long as it performs processing to finally obtain input of language information. The user's input can be a keyword or a phrase or sentence describing the user's preferences.
本实施例中, 节目预选单元 110可以包括:  In this embodiment, the program pre-selection unit 110 may include:
第一预选子单元, 用于当输入单元 100接收的语言信息为关键词集合时, 对关键词集合 进行逻辑计算后从电子节目表数据库中提取相关的电子节目信息; 和 /或,  a first pre-selected sub-unit, configured to: when the language information received by the input unit 100 is a keyword set, perform logical calculation on the keyword set to extract relevant electronic program information from the electronic program table database; and/or,
第二预选子单元, 用于当输入单元 100接收的语言信息为短语或句子时, 先进行分词处 理, 对分词结果进行计算得到用户喜好的空间模型, 然后计算空间模型与电子节目表数据库 中的电子节目信息的相似度, 根据该相似度提取相关的电子节目信息。  a second pre-selected sub-unit, configured to perform word segmentation processing when the language information received by the input unit 100 is a phrase or a sentence, calculate a segmentation result to obtain a spatial model of the user's preference, and then calculate a spatial model and an electronic program table database. The similarity of the electronic program information, and the related electronic program information is extracted based on the similarity.
具体地, 第一预选子单元可以直接利用关键词集合从 EPG数据库中抽取节目, 关键词集 合中的各个关键词之间可以采用逻辑与、 逻辑或、 逻辑非、 逻辑与非运算等逻辑运算方法加 以实现。  Specifically, the first pre-selected sub-unit may directly extract the program from the EPG database by using the keyword set, and logical operations such as logical AND, logical OR, logical non-logical, logical and non-operation may be used between the keywords in the keyword set. Implemented.
具体地, 第二预选子单元可以采用分词工具进行处理, 然后针对分词结果可以使用计算 词频等方法得到用户喜好的空间模型,再计算空间模型和 EPG数据库中电子节目信息的相似 度, 然后排序得到推荐结果。  Specifically, the second pre-selected sub-unit can be processed by using a word-dividing tool, and then the spatial model of the user's preference can be obtained by using a method for calculating the word frequency for the word segmentation result, and then the similarity between the spatial model and the electronic program information in the EPG database is calculated, and then sorted. Recommended results.
另外, 节目预选单元 110还可以将提取的电子节目信息提供给用户, 由用户对其进行初 期筛选, 然后将用户筛选确认后的结果输出到特征抽取单元 120。  In addition, the program pre-selection unit 110 may also provide the extracted electronic program information to the user, perform initial screening by the user, and then output the result of the user screening confirmation to the feature extraction unit 120.
本发明实施例涉及的 EPG数据库可以由按照一定结构或半结构化的电子节目表组成。如 现在放送的数字电视包括网络电视和有线电视等一般能够提供从视听当天开始 2周的节目。 EPG数据库中的数据可以从数字无线电视接收装置中提取, 也可以从互联网上获得。 EPG— 般包括节目号、 节目名称、 节目介绍、 频道、 起止时间等等信息, 可以根据需要按照一定的 数据格式存储到 EPG数据库中。本发明实施例中, EPG中存取的节目信息可以是过去、现在 或将来的节目信息, 本发明实施例对此不做具体限定。 本发明的 EPG数据库, 允许积累和存 储过去时间的 EPG数据, 如自用户视听当日算起, 过去 1年或半年或 3个月的电子节目, 其 目的在于为用户提供足够的兴趣选择的数据空间。 The EPG database according to the embodiment of the present invention may be composed of an electronic program table according to a certain structure or semi-structured. Digital TVs, such as IPTV and cable TV, can generally provide programs that are available for two weeks from the day of viewing. The data in the EPG database can be extracted from the digital wireless television receiver or from the Internet. EPG—Generally includes program number, program name, program introduction, channel, start and end time, etc., which can be stored in the EPG database according to a certain data format. In the embodiment of the present invention, the program information accessed in the EPG may be past and present. The embodiment information of the present invention is not specifically limited. The EPG database of the present invention allows accumulating and storing EPG data of a past time, such as an electronic program of the past one year or six months or three months from the date of viewing by the user, the purpose of which is to provide the user with sufficient data space of interest selection. .
本实施例中, 特征抽取单元 120进行特征选取的方法有多种, 包括但不限于: 基于文档 频率的特征提取方法, 信息增益法, X2统计方法和互信息方法等等。 特征选取中可以基于特 征权重进行计算, 该特征权重的计算方法也有很多, 如布尔权重、 绝对 TF (Term Frequency, 词步 ]¾)、 IDF ( Inverse Document Frequency,倒排文档步 ]¾度)、 TF-IDF ( Term Frequency and Inverse Document Frequency, 词频和倒排文档频度)、 TFC (Term Frequency Count, 词频计数)、 ITC、 熵权重、 TF-IWF等等, 本发明实施例对此不做具体限定。特征抽取单元 120从知识库中获取 的特征元素的关联信息包括: 词语的语义、 概念等属性信息, 这些信息可以作为用户的兴趣 和喜好空间的特征集合, 从而为机器学习单元 130提供建模的数据条件和判定依据。 In this embodiment, the feature extraction unit 120 performs a feature selection method, including but not limited to: a feature extraction method based on a document frequency, an information gain method, an X 2 statistical method, a mutual information method, and the like. Feature selection can be calculated based on feature weights, such as Boolean weights, absolute TF (Term Frequency), IDF (Inverse Document Frequency), TF-IDF (Terminal Frequency and Inverse Document Frequency), TFC (Term Frequency Count), ITC, Entropy Weight, TF-IWF, etc., which are not specifically described in this embodiment of the present invention. limited. The association information of the feature elements acquired by the feature extraction unit 120 from the knowledge base includes: attribute information such as semantics and concepts of the words, and the information may serve as a feature set of the user's interest and favorite space, thereby providing modeling for the machine learning unit 130. Data conditions and basis for judgment.
本发明实施例涉及的知识库包括词的同义、 近义、 反义, 概念的相似度, 以及词、 词类 和语义属性中的任意的一种或多种。 知识库不仅可以包含上述语义、 概念等属性特征, 同时 还可以包括与属性特征相关的组织化信息, 该组织化信息是指根据知识库中知识的结构对特 征元素进行适当的组织化管理后得到的信息, 如确立概念的所属关系和语义的包络关系等。 组织化管理可以根据概念语义网络进行, 同时还可以根据概念语义网络的层次赋予各个要素 不同的权重处理等, 以提高系统的性能。 知识库可以由人工构建, 也可以利用现有的词典或 义类词典等。 例如英语的 WordNet、 中文的 HowNet、 日语的 EDR电子词典等。 同时, 还可 以利用各种同义词、 近义词电子词典等。  The knowledge base involved in the embodiments of the present invention includes synonymous, synonymous, antisense, similarity of concepts, and any one or more of words, word classes and semantic attributes. The knowledge base can not only include the above-mentioned attribute features such as semantics and concepts, but also can include organizational information related to the attribute characteristics. The organized information refers to the appropriate organization and management of the feature elements according to the structure of the knowledge in the knowledge base. Information, such as establishing the relationship of the concept and the semantic relationship of the envelope. Organizational management can be carried out according to the concept semantic network. At the same time, different weights can be assigned to each element according to the level of the conceptual semantic network to improve the performance of the system. The knowledge base can be built manually, or it can use existing dictionaries or semantic dictionaries. For example, WordNet in English, HowNet in Chinese, and EDR electronic dictionary in Japanese. At the same time, various synonyms, synonym electronic dictionaries, and the like can be utilized.
本实施例中, 机器学习单元 130使用的机器学习方法多种多样, 如有监督机器学习方法 或无监督机器学习方法、 以及半监督机器学习方法等; 具体地, 如采用支持向量机 (SVM)、 决策树(decision tree)、 贝叶斯、 最大熵以及条件随机场等算法中的任何一种加以实现, 也可 以使用其中的多个构建混合算法加以实现, 本发明实施例对此不做具体限定。 In this embodiment, the machine learning method used by the machine learning unit 130 is various, such as a supervised machine learning method or an unsupervised machine learning method, and a semi-supervised machine learning method, etc.; specifically, a support vector machine (SVM) is adopted. The decision tree (decision tr ee ), the Bayesian, the maximum entropy, and the conditional random field are implemented by any one of the algorithms, and may also be implemented by using multiple construction hybrid algorithms, which are not implemented by the embodiment of the present invention. Specifically limited.
本实施例中, 节目预测单元 140还可以进一步对匹配的结果进行排序处理, 然后把排序 的结果输出到输出单元 150, 相应地, 输出单元 150再将该结果输出给用户。  In this embodiment, the program prediction unit 140 may further perform sorting processing on the matching result, and then output the sorted result to the output unit 150. Accordingly, the output unit 150 outputs the result to the user.
本实施例中, 输出单元 150输出节目推荐的结果给用户可以采用多种形式, 可以是文件 输出, 也可以是显示器输出等等, 其中, 可以以特定格式输出并展现给用户, 最终的表现方 式可以是任意形式的, 比如高亮推荐, 声音提醒等, 本发明实施例对此不做具体限定。 用户 在得到该推荐的节目后, 可以请求播放自己需要的节目, 从而接收相应的数据流进行观看。  In this embodiment, the output unit 150 outputs the result of the program recommendation to the user in various forms, which may be a file output, or a display output, etc., wherein the output may be output and presented to the user in a specific format, and the final expression manner. It may be in any form, such as a highlight recommendation, a voice reminder, etc., which is not specifically limited in this embodiment of the present invention. After obtaining the recommended program, the user can request to play the program he needs, thereby receiving the corresponding data stream for viewing.
本实施例中, 特征抽取单元 120在特征选取之前或之后还可以运用聚类或分类算法进行 计算, 机器学习单元 130也可以在构建统计模型之前或之后运用聚类或分类算法进行计算, 从而进一步提高节目推荐的精度, 本发明实施例对此不做具体限定。 In this embodiment, the feature extraction unit 120 may also use a clustering or classification algorithm before or after feature selection. The calculation may be performed by the machine learning unit 130 by using a clustering or classification algorithm before or after the statistical model is constructed, so as to further improve the accuracy of the program recommendation, which is not specifically limited in the embodiment of the present invention.
本实施例提供的上述装置根据用户输入的语言信息, 从电子节目表数据库中提取相关的 电子节目信息, 并进行特征选取得到特征元素, 以及调用知识库中存储的信息对特征元素进 行扩充得到用户兴趣爱好空间的特征集合, 用该特征集合和机器学习的方法构建统计模型, 以此匹配电子节目表数据库输出匹配结果给用户, 实现了节目推荐, 解决了现有技术的 "冷 启动" 问题, 而且提高了节目推荐的精度、 性能和实用性。 由于上述装置位于用户端, 不涉 及在网络服务器端或用户端采集用户个人信息, 因此, 可以充分保障用户的隐私情报不泄漏, 提高了保密性。 另外, 还可以以特征元素为检索关键词对电子节目表数据库进行检索和评价 处理, 然后根据处理结果再次进行节目预选, 从而可以进一步扩大用户兴趣爱好空间, 提高 节目推荐的精度。 实施例 2  The device provided by the embodiment extracts relevant electronic program information from the electronic program table database according to the language information input by the user, and performs feature selection to obtain feature elements, and invokes information stored in the knowledge base to expand the feature element to obtain a user. A feature set of the hobby space, using the feature set and the machine learning method to construct a statistical model, thereby matching the electronic program table database output matching result to the user, realizing the program recommendation, and solving the "cold start" problem of the prior art, It also improves the accuracy, performance and usability of program recommendations. Since the above device is located at the user end, it does not involve collecting personal information of the user on the network server side or the user end. Therefore, the privacy information of the user can be fully protected from leakage and the confidentiality is improved. In addition, the electronic program table database can be searched and evaluated by using the feature element as a search key, and then the program pre-selection can be performed again according to the processing result, thereby further expanding the user's interest and interest area and improving the accuracy of the program recommendation. Example 2
在实施例 1 的基础上, 本实施例提供了一种节目推荐装置, 包括: 输入单元 100、 节目 预选单元 110、 特征抽取单元 120、 机器学习单元 130、 节目预测单元 140和输出单元 150, 上述各个单元的功能均与实施例 1中描述的功能相同, 在此基础之上的改进之处在于, 特征 抽取单元 120还可以包括:  On the basis of Embodiment 1, the present embodiment provides a program recommendation apparatus, including: an input unit 100, a program preselection unit 110, a feature extraction unit 120, a machine learning unit 130, a program prediction unit 140, and an output unit 150. The function of each unit is the same as that described in Embodiment 1, and the improvement on the basis of the improvement is that the feature extraction unit 120 may further include:
反馈子单元 120a, 用于以上述特征元素为检索关键词对电子节目表数据库中的电子节目 信息进行检索和评价处理, 并把处理的结果反馈给节目预选单元 110;  The feedback sub-unit 120a is configured to perform retrieval and evaluation processing on the electronic program information in the electronic program table database by using the above-mentioned feature element as a retrieval keyword, and feed back the result of the processing to the program pre-selection unit 110;
相应地, 节目预选单元 110还用于接收反馈子单元反馈的结果, 根据反馈的结果从电子 节目表数据库中提取相关的电子节目信息, 并输出给特征抽取单元 120, 从而特征抽取单元 120 可以对根据所述语言信息提取的电子节目信息和根据所述特征元素提取的电子节目信息 进行特征选取, 得到新的特征元素, 从已存储语言知识的知识库中获取该新的特征元素的关 联信息, 构建特征集合, 从而可以扩大特征集合, 用户可以更精确地选择自己喜爱的节目, 从而提高系统的预测精度。  Correspondingly, the program pre-selection unit 110 is further configured to receive the feedback of the feedback sub-unit, extract relevant electronic program information from the electronic program table database according to the result of the feedback, and output the information to the feature extraction unit 120, so that the feature extraction unit 120 can Selecting feature information according to the electronic program information extracted by the language information and the electronic program information extracted according to the feature element, obtaining a new feature element, and acquiring association information of the new feature element from a knowledge base of the stored language knowledge, The feature set is constructed, so that the feature set can be expanded, and the user can select his favorite program more accurately, thereby improving the prediction accuracy of the system.
进一步地, 特征抽取单元 120还可以先判断是否需要再预选, 如果是, 则执行上述反馈 操作, 否则, 按照实施例 1中的方式继续执行。 其中, 可以采用多种方式判断是否需要再预 选, 如可以预设简单的提问窗口, 问用户是否需要对电视节目进行再预选, 或同时把特征元 素输出为动态的类似于语义网络图形的方式提供给用户, 供用户进行观察和分析等等, 本发 明实施例对此不做具体限定。 本实施例提供的上述装置根据用户输入的语言信息, 从电子节目表数据库中提取相关的 电子节目信息, 并进行特征选取得到特征元素, 以及调用知识库中存储的信息对特征元素进 行扩充得到用户兴趣爱好空间的特征集合, 用该特征集合和机器学习的方法构建统计模型, 以此匹配电子节目表数据库输出匹配结果给用户, 实现了节目推荐, 解决了现有技术的 "冷 启动" 问题, 而且提高了节目推荐的精度、 性能和实用性。 由于上述装置位于用户端, 不涉 及在网络服务器端或用户端采集用户个人信息, 因此, 可以充分保障用户的隐私情报不泄漏, 提高了保密性。 另外, 还可以以特征元素为检索关键词对电子节目表数据库进行检索和评价 处理, 然后根据处理结果再次进行节目预选, 从而可以进一步扩大用户兴趣爱好空间, 提高 节目推荐的精度。 实施例 3 Further, the feature extraction unit 120 may further determine whether it is necessary to pre-select, and if yes, perform the above-mentioned feedback operation; otherwise, continue the execution in the manner of Embodiment 1. Among them, multiple ways can be used to determine whether it is necessary to pre-select, for example, a simple question window can be preset, whether the user needs to re-select the television program, or simultaneously output the feature element as a dynamic similar to the semantic network graphic. For the user, for the user to observe and analyze, etc., the embodiment of the present invention does not specifically limit this. The device provided by the embodiment extracts relevant electronic program information from the electronic program table database according to the language information input by the user, and performs feature selection to obtain feature elements, and invokes information stored in the knowledge base to expand the feature element to obtain a user. A feature set of the hobby space, using the feature set and the machine learning method to construct a statistical model, thereby matching the electronic program table database output matching result to the user, realizing the program recommendation, and solving the "cold start" problem of the prior art, It also improves the accuracy, performance and usability of program recommendations. Since the above device is located at the user end, the personal information of the user is not involved in the network server end or the user end. Therefore, the privacy information of the user can be fully protected from leakage and the confidentiality is improved. In addition, the electronic program table database can be searched and evaluated by using the feature element as a search keyword, and then the program pre-selection is performed again according to the processing result, thereby further expanding the user's interest and interest area and improving the accuracy of the program recommendation. Example 3
参见图 3, 本实施例提供了一种节目推荐方法, 包括:  Referring to FIG. 3, this embodiment provides a program recommendation method, including:
S01 : 接收用户输入的语言信息;  S01: receiving language information input by the user;
S02: 根据该语言信息, 从已存储电子节目信息的电子节目表数据库中提取相关的电子节 目信息;  S02: extract relevant electronic program information from an electronic program table database that has stored electronic program information according to the language information;
S03 : 对提取的电子节目信息进行特征选取, 得到特征元素;  S03: performing feature selection on the extracted electronic program information to obtain a feature element;
S04: 从已存储语言知识的知识库中获取特征元素的关联信息, 构建特征集合;  S04: acquiring association information of the feature element from the knowledge base of the stored language knowledge, and constructing the feature set;
S05 : 利用该特征集合和机器学习方法, 构建统计模型;  S05: constructing a statistical model by using the feature set and the machine learning method;
S06: 利用该统计模型, 对电子节目表数据库中的节目进行匹配;  S06: using the statistical model to match programs in the electronic program table database;
S07: 输出匹配的结果给用户, 完成节目推荐。  S07: Output the matching result to the user and complete the program recommendation.
本实施例中, 由用户对自己感兴趣的节目或自己的兴趣空间进行输入, 输入的内容可以 是关键词, 也可以是描述用户的喜好的短语或句子。  In this embodiment, the user inputs the program of interest or the space of interest of the user, and the input content may be a keyword or a phrase or sentence describing the user's preference.
本实施例中, S02可以具体包括:  In this embodiment, S02 may specifically include:
S02a: 当接收的语言信息为关键词集合时, 对关键词集合进行逻辑计算后从电子节目表 数据库中提取相关的电子节目信息; 和 /或,  S02a: when the received language information is a keyword set, logically calculating the keyword set, and extracting relevant electronic program information from the electronic program table database; and/or,
S02b: 当接收的语言信息为短语或句子时, 先进行分词处理, 对分词结果进行计算得到 用户喜好的空间模型, 然后计算空间模型与电子节目表数据库中的电子节目信息的相似度, 根据相似度提取相关的电子节目信息。  S02b: When the received language information is a phrase or a sentence, the word segmentation process is first performed, the word segmentation result is calculated to obtain a user-preferred space model, and then the similarity between the space model and the electronic program information in the electronic program table database is calculated, according to the similarity Extract relevant electronic program information.
具体地, S02a中可以直接利用关键词集合从 EPG数据库中抽取节目,关键词集合中的各 个关键词之间可以采用逻辑与、 逻辑或、 逻辑非、 逻辑与非运算等逻辑运算方法加以实现。 具体地, S02b中可以采用分词工具进行处理, 然后针对分词结果可以使用计算词频等方 法得到用户的喜好空间模型, 再计算空间模型和 EPG数据库中电子节目信息的相似度, 然后 排序得到推荐结果。 Specifically, in S02a, a program can be directly extracted from the EPG database by using a keyword set, and each keyword in the keyword set can be implemented by using logical operations such as logical AND, logical OR, logical NOT, logical and non-operation. Specifically, in S02b, the word segmentation tool can be used for processing, and then the user's favorite space model can be obtained by using the method of calculating the word frequency for the word segmentation result, and then the similarity between the space model and the electronic program information in the EPG database is calculated, and then the recommendation result is sorted.
另外, 在 S02中还可以将提取的电子节目信息提供给用户, 由用户对其进行初期筛选, 然后将用户筛选确认后的结果作为提取的电子节目信息。  In addition, in S02, the extracted electronic program information may be provided to the user, and the user may perform initial screening, and then the result of the user screening confirmation is used as the extracted electronic program information.
本发明实施例涉及的 EPG数据库可以由按照一定结构或半结构化的电子节目表组成,具 体同实施例 1中的描述, 此处不再赘述。 本实施例中涉及的知识库包括词的同义、 近义、 反 义, 概念的相似度, 以及词、 词类和语义属性中的任意的一种或多种, 具体同实施例 1中的 描述, 此处不再赘述。  The EPG database of the embodiment of the present invention may be composed of an electronic program table according to a certain structure or a semi-structured structure, which is the same as that described in Embodiment 1, and details are not described herein again. The knowledge base involved in this embodiment includes synonymous, synonymous, antisense, similarity of concepts, and any one or more of words, word classes and semantic attributes, which are specifically described in Embodiment 1. , will not repeat them here.
本实施例中, S03 中进行特征选取的方法有多种, 包括但不限于: 基于文档频率的特征 提取方法, 信息增益法, X2统计方法和互信息方法等等。 特征选取中可以基于特征权重进行 计算, 该特征权重的计算方法也有很多, 如布尔权重、 绝对词频 TF、 IDF, TF-IDF, TFC、 ITC、 熵权重、 TF-IWF等等, 本发明实施例对此不做具体限定。 S04中从知识库中获取的特 征元素的关联信息包括: 词语的语义、 概念等属性信息, 这些信息可以作为用户的兴趣和喜 好空间的特征集合, 从而为建模提供数据条件和判定依据。 In this embodiment, there are various methods for performing feature selection in S03, including but not limited to: feature extraction method based on document frequency, information gain method, X 2 statistical method, mutual information method, and the like. The feature selection may be performed based on the feature weights, and the feature weights are also calculated in many ways, such as a Boolean weight, an absolute word frequency TF, an IDF, a TF-IDF, a TFC, an ITC, an entropy weight, a TF-IWF, etc., in the embodiment of the present invention. This is not specifically limited. The association information of the feature elements obtained from the knowledge base in S04 includes: attribute information such as semantics and concepts of the words, and the information can be used as a feature set of the user's interest and favorite space, thereby providing data conditions and judgment basis for modeling.
本实施例中, S05 中使用的机器学习方法多种多样, 如有监督机器学习方法或无监督机 器学习方法、以及半监督机器学习方法等;具体地,如采用支持向量机(SVM)、决策树 ( decision tree) 贝叶斯、 最大熵以及条件随机场等算法中的任何一种加以实现, 也可以使用其中的多 个构建混合算法加以实现, 本发明实施例对此不做具体限定。  In this embodiment, the machine learning methods used in S05 are various, such as supervised machine learning methods or unsupervised machine learning methods, and semi-supervised machine learning methods; in particular, such as support vector machine (SVM), decision making The decision tree is implemented by any one of the algorithms such as the Bayesian, the maximum entropy, and the conditional random field. It can also be implemented by using a plurality of construction hybrid algorithms, which are not specifically limited in the embodiment of the present invention.
本实施例中, S06中还可以进一步对匹配的结果进行排序处理, 相应地, S07中把排序的 结果输出给用户。  In this embodiment, the matching result may be further sorted in S06, and correspondingly, the sorted result is output to the user in S07.
本实施例中, S07 输出节目推荐的结果给用户可以采用多种形式, 可以是文件输出, 也 可以是显示器输出等等, 当输出的结果有多个时, 可以一屏显示给用户, 也可以分多屏显示 给用户, 本发明实施例对此不做具体限定。 用户在得到推荐结果后, 可以请求播放自己需要 的节目, 从而接收相应的数据流进行观看。  In this embodiment, the result of the S07 output program recommendation can be used by the user in various forms, such as file output, display output, etc. When there are multiple output results, the screen can be displayed to the user on one screen, or The embodiment of the present invention is not specifically limited. After obtaining the recommendation result, the user can request to play the program that he or she needs, and receive the corresponding data stream for viewing.
本实施例中, S03 中在特征选取之前或之后还可以运用聚类或分类算法进行计算, S05 中也可以在构建统计模型之前或之后运用聚类或分类算法进行计算, 从而进一步提高节目推 荐的精度, 本发明实施例对此不做具体限定。  In this embodiment, the clustering or classification algorithm may be used in the S03 before or after the feature selection, and the clustering or classification algorithm may be used in the S05 before or after the statistical model is constructed, thereby further improving the program recommendation. Accuracy, which is not specifically limited in this embodiment of the present invention.
本实施例提供的上述方法根据用户输入的语言信息, 从电子节目表数据库中提取相关的 电子节目信息, 并进行特征选取得到特征元素, 以及调用知识库中存储的信息对特征元素进 行扩充得到用户兴趣爱好空间的特征集合, 用该特征集合和机器学习的方法构建统计模型, 以此匹配电子节目表数据库输出匹配结果给用户, 实现了节目推荐, 解决了现有技术的 "冷 启动"问题, 而且提高了节目推荐的精度、 性能和实用性。 由于上述方法是在用户端执行的, 不涉及在网络侧服务器端或用户端采集用户个人信息, 因此, 可以充分保障用户的隐私情报 不泄漏, 提高了保密性。 另外, 还可以以特征元素为检索关键词对电子节目表数据库进行检 索和评价处理, 然后根据处理结果再次进行节目预选, 从而可以进一步扩大用户兴趣爱好空 间, 提高节目推荐的精度。 实施例 4 The above method provided by the embodiment extracts related electronic program information from the electronic program table database according to the language information input by the user, and performs feature selection to obtain feature elements, and invokes information stored in the knowledge base to enter the feature element. The line is expanded to obtain the feature set of the user's interest space, and the feature set and the machine learning method are used to construct the statistical model, thereby matching the electronic program table database to output the matching result to the user, realizing the program recommendation, and solving the "cold" of the prior art. Start the "problem, and improve the accuracy, performance and usability of the program recommendation. Since the foregoing method is performed on the user side, the user information of the user is not involved in the server side or the user end of the network side. Therefore, the privacy information of the user can be fully protected from leakage and the confidentiality is improved. In addition, the electronic program table database can be searched and evaluated by using the feature element as a search keyword, and then the program pre-selection is performed again according to the processing result, thereby further expanding the user's interest and interest area and improving the accuracy of the program recommendation. Example 4
在实施例 3的基础上, 本实施例提供了一种节目推荐方法, 其改进之处在于, 根据得到 的特征元素再次从 EPG数据库中提取电子节目信息, 以此来构建特征集合, 参见图 4, 该方 法具体包括:  On the basis of Embodiment 3, this embodiment provides a program recommendation method, which is improved in that the electronic program information is extracted from the EPG database again according to the obtained feature elements, thereby constructing a feature set, as shown in FIG. The method specifically includes:
S11 : 接收用户输入的语言信息;  S11: receiving language information input by the user;
S12: 根据该语言信息, 从已存储电子节目信息的电子节目表数据库中提取相关的电子节 目信息;  S12: extract relevant electronic program information from an electronic program table database that has stored electronic program information according to the language information;
S13 : 对提取的电子节目信息进行特征选取, 得到特征元素;  S13: performing feature selection on the extracted electronic program information to obtain a feature element;
S14: 以该特征元素为检索关键词,对电子节目表数据库中的电子节目信息进行检索和评 价处理, 并根据处理的结果从电子节目表数据库中提取相关的电子节目信息;  S14: searching and evaluating the electronic program information in the electronic program table database by using the feature element as a search keyword, and extracting relevant electronic program information from the electronic program table database according to the processing result;
S15 : 对 S12中根据语言信息提取的电子节目信息和 S14中根据特征元素提取的电子节 目信息进行特征选取, 得到新的特征元素;  S15: performing feature selection on the electronic program information extracted according to the language information in S12 and the electronic program information extracted according to the feature element in S14, to obtain a new feature element;
S16: 从已存储语言知识的知识库中获取该新的特征元素的关联信息, 构建特征集合; S16: Obtain association information of the new feature element from a knowledge base in which the language knowledge is stored, and construct a feature set;
S17: 利用该特征集合和机器学习方法, 构建统计模型; S17: constructing a statistical model by using the feature set and the machine learning method;
S18: 利用该统计模型, 对电子节目表数据库中的节目进行匹配;  S18: using the statistical model to match programs in the electronic program table database;
S19: 输出匹配的结果给用户, 完成节目推荐。  S19: Output the matching result to the user, and complete the program recommendation.
进一步地, S14中还可以先判断是否需要再预选, 如果是, 则继续执行 S14以及后续步 骤, 否则, 直接对根据语言信息提取的电子节目信息进行特征选取, 得到特征元素, 从已存 储语言知识的知识库中获取该特征元素的关联信息, 构建特征集合, 然后继续执行 S17等后 续步骤。  Further, in S14, it may be determined whether it is necessary to pre-select, if yes, proceed to S14 and subsequent steps; otherwise, directly select feature information of the electronic program information extracted according to the language information to obtain a feature element, and from the stored language knowledge Obtain the association information of the feature element in the knowledge base, construct the feature set, and then continue to perform the subsequent steps such as S17.
其中, 上述判断是否需要再预选可以采用多种方式进行, 如可以预设简单的提问窗口, 问用户是否需要对电视节目进行再预选, 或同时把特征元素输出为动态的类似于语义网络图 形的方式提供给用户, 供用户进行观察和分析等等, 本发明实施例对此不做具体限定。 Wherein, whether the above judgment needs to be pre-selected can be performed in various manners, for example, a simple question window can be preset, whether the user needs to re-select the television program, or simultaneously output the feature element as a dynamic similar semantic network diagram. The manner of the present invention is provided to the user for the user to observe and analyze, and the like, which is not specifically limited in the embodiment of the present invention.
本实施例提供的上述方法根据用户输入的语言信息, 从电子节目表数据库中提取相关的 电子节目信息, 并进行特征选取得到特征元素, 以及调用知识库中存储的信息对特征元素进 行扩充得到用户兴趣爱好空间的特征集合, 用该特征集合和机器学习的方法构建统计模型, 以此匹配电子节目表数据库输出匹配结果给用户, 实现了节目推荐, 解决了现有技术的 "冷 启动"问题, 而且提高了节目推荐的精度、 性能和实用性。 由于上述方法是在用户端执行的, 不涉及在网络服务器端或用户端采集用户个人信息, 因此, 可以充分保障用户的隐私情报不 泄漏, 提高了保密性。 另外, 还可以以特征元素为检索关键词对电子节目表数据库进行检索 和评价处理, 然后根据处理结果再次进行节目预选, 从而可以进一步扩大用户兴趣爱好空间, 提高节目推荐的精度。 本发明实施例提供的上述技术方案的全部或部分可以通过程序指令相关的硬件来完成, 所述程序可以存储在可读取的存储介质中, 该存储介质包括: ROM、 RAM, 磁碟或者光盘等 各种可以存储程序代码的介质。  The above method provided by the embodiment extracts relevant electronic program information from the electronic program table database according to the language information input by the user, and performs feature selection to obtain feature elements, and calls the information stored in the knowledge base to expand the feature element to obtain the user. A feature set of the hobby space, using the feature set and the machine learning method to construct a statistical model, thereby matching the electronic program table database output matching result to the user, realizing the program recommendation, and solving the "cold start" problem of the prior art, It also improves the accuracy, performance and usability of program recommendations. Since the above method is performed on the user side, it does not involve collecting user personal information on the network server side or the user end. Therefore, the privacy information of the user can be fully protected from leakage and the confidentiality is improved. In addition, the electronic program table database can be searched and evaluated by using the feature element as a search keyword, and then the program pre-selection is performed again according to the processing result, thereby further expanding the user's interest and interest area and improving the accuracy of the program recommendation. All or part of the above technical solutions provided by the embodiments of the present invention may be completed by hardware related to program instructions, and the program may be stored in a readable storage medium, and the storage medium includes: a ROM, a RAM, a magnetic disk or an optical disk. And other media that can store program code.
以上所述仅为本发明的较佳实施例, 并不用以限制本发明, 凡在本发明的精神和原则之 内, 所作的任何修改、 等同替换、 改进等, 均应包含在本发明的保护范围之内。  The above is only the preferred embodiment of the present invention, and is not intended to limit the present invention. Any modifications, equivalent substitutions, improvements, etc., which are within the spirit and scope of the present invention, should be included in the protection of the present invention. Within the scope.

Claims

权 利 要 求 书 Claim
1、 一种节目推荐装置, 其特征在于, 所述装置包括:  A program recommendation device, wherein the device comprises:
输入单元, 用于接收用户输入的语言信息;  An input unit, configured to receive language information input by a user;
节目预选单元, 用于根据所述输入单元接收的语言信息, 从已存储电子节目信息的电子 节目表数据库中提取相关的电子节目信息;  a program pre-selection unit, configured to extract relevant electronic program information from an electronic program table database that has stored electronic program information according to language information received by the input unit;
特征抽取单元, 用于对所述节目预选单元提取的电子节目信息进行特征选取, 得到特征 元素, 并从已存储语言知识的知识库中获取所述特征元素的关联信息, 构建特征集合; 机器学习单元, 用于利用所述特征抽取单元得到的特征集合和机器学习方法, 构建统计 模型;  a feature extraction unit, configured to perform feature selection on the electronic program information extracted by the program pre-selection unit, obtain a feature element, and acquire association information of the feature element from a knowledge base in which the language knowledge is stored, and construct a feature set; a unit, configured to construct a statistical model by using a feature set obtained by the feature extraction unit and a machine learning method;
节目预测单元, 用于利用所述机器学习单元构建的统计模型, 对所述电子节目表数据库 中的节目进行匹配;  a program prediction unit, configured to match, by using a statistical model constructed by the machine learning unit, a program in the electronic program table database;
输出单元, 用于输出所述节目预测单元匹配的结果给用户。  And an output unit, configured to output a result of the matching of the program prediction unit to the user.
2、 根据权利要求 1所述的装置, 其特征在于, 所述节目预选单元包括: 2. The apparatus according to claim 1, wherein the program preselecting unit comprises:
第一预选子单元, 用于当所述输入单元接收的语言信息为关键词集合时, 对所述关键词 集合进行逻辑计算后从所述电子节目表数据库中提取相关的电子节目信息。  And a first pre-selected sub-unit, configured to: when the language information received by the input unit is a keyword set, perform logical calculation on the keyword set to extract related electronic program information from the electronic program table database.
3、 根据权利要求 1所述的装置, 其特征在于, 所述节目预选单元包括: The device according to claim 1, wherein the program preselecting unit comprises:
第二预选子单元, 用于当所述输入单元接收的语言信息为短语或句子时, 先进行分词处 理, 对分词结果进行计算得到用户喜好的空间模型, 然后计算所述空间模型与所述电子节目 表数据库中的电子节目信息的相似度, 根据所述相似度提取相关的电子节目信息。  a second pre-selected sub-unit, configured to perform word segmentation processing when the language information received by the input unit is a phrase or a sentence, calculate a segmentation result to obtain a spatial model of the user preference, and then calculate the spatial model and the electronic The similarity of the electronic program information in the program table database, and the related electronic program information is extracted based on the similarity.
4、 根根权利要求 1所述的装置, 其特征在于, 所述特征抽取单元还包括: The device of claim 1, wherein the feature extraction unit further comprises:
反馈子单元, 用于以所述特征元素为检索关键词对所述电子节目表数据库中的电子节目 信息进行检索和评价处理, 并把处理的结果反馈给所述节目预选单元;  a feedback subunit, configured to perform retrieval and evaluation processing on the electronic program information in the electronic program table database by using the feature element as a retrieval keyword, and feed back a result of the processing to the program preselecting unit;
相应地, 所述节目预选单元还用于接收所述反馈子单元反馈的结果, 根据所述反馈的结 果从所述电子节目表数据库中提取相关的电子节目信息, 并输出给所述特征抽取单元。  Correspondingly, the program preselecting unit is further configured to receive a result of the feedback subunit feedback, extract relevant electronic program information from the electronic program table database according to the feedback result, and output the information to the feature extraction unit. .
5、根据权利要求 1至 4中任一权利要求所述的装置, 其特征在于, 所述知识库包括词的 同义、 近义、 反义, 概念的相似度, 以及词、 词类和语义属性中的任意的一种或多种。 5. Apparatus according to any one of claims 1 to 4 wherein said knowledge base comprises words Synonymous, synonymous, antisense, conceptual similarity, and any one or more of words, word classes, and semantic attributes.
6、 一种节目推荐方法, 其特征在于, 所述方法包括: 6. A program recommendation method, the method comprising:
接收用户输入的语言信息;  Receiving language information input by the user;
根据所述语言信息, 从已存储电子节目信息的电子节目表数据库中提取相关的电子节目 信息;  Extracting related electronic program information from an electronic program table database in which electronic program information has been stored, based on the language information;
对所述提取的电子节目信息进行特征选取, 得到特征元素, 并从已存储语言知识的知识 库中获取所述特征元素的关联信息, 构建特征集合;  Performing feature selection on the extracted electronic program information, obtaining feature elements, and acquiring association information of the feature elements from a knowledge base in which the language knowledge is stored, and constructing a feature set;
利用所述特征集合和机器学习方法, 构建统计模型;  Constructing a statistical model using the feature set and the machine learning method;
利用所述统计模型, 对所述电子节目表数据库中的节目进行匹配;  Using the statistical model, matching programs in the electronic program table database;
输出所述匹配的结果给用户。  The result of the matching is output to the user.
7、 根据权利要求 6所述的方法, 其特征在于, 根据所述语言信息, 从已存储电子节目信 息的电子节目表数据库中提取相关的电子节目信息, 包括: The method according to claim 6, wherein the extracting the related electronic program information from the electronic program table database in which the electronic program information has been stored according to the language information comprises:
当所述接收的语言信息为关键词集合时, 对所述关键词集合进行逻辑计算后从所述电子 节目表数据库中提取相关的电子节目信息。  When the received language information is a keyword set, logical programming of the keyword set is performed to extract relevant electronic program information from the electronic program table database.
8、 根据权利要求 6所述的方法, 其特征在于, 根据所述语言信息, 从已存储电子节目信 息的电子节目表数据库中提取相关的电子节目信息, 包括: 8. The method according to claim 6, wherein extracting related electronic program information from the electronic program table database in which the electronic program information has been stored according to the language information comprises:
当所述接收的语言信息为短语或句子时, 先进行分词处理, 对分词结果进行计算得到用 户喜好的空间模型, 然后计算所述空间模型与所述电子节目表数据库中的电子节目信息的相 似度, 根据所述相似度提取相关的电子节目信息。  When the received language information is a phrase or a sentence, the word segmentation process is first performed, the word segmentation result is calculated to obtain a spatial model of the user's preference, and then the spatial model is similar to the electronic program information in the electronic program table database. Degree, extracting related electronic program information according to the similarity.
9、根根权利要求 6所述的方法,其特征在于,对所述提取的电子节目信息进行特征选取, 得到特征元素之后, 还包括: The method of claim 6, wherein the extracting the electronic program information to perform feature selection, and obtaining the feature element, further comprising:
以所述特征元素为检索关键词对所述电子节目表数据库中的电子节目信息进行检索和评 价处理, 并根据所述处理的结果从所述电子节目表数据库中提取相关的电子节目信息, 然后 对根据所述语言信息提取的电子节目信息和根据所述特征元素提取的电子节目信息进行特征 选取, 得到新的特征元素;  Searching and evaluating the electronic program information in the electronic program table database by using the feature element as a search keyword, and extracting relevant electronic program information from the electronic program table database according to the result of the processing, and then Feature selection of electronic program information extracted according to the language information and electronic program information extracted according to the feature element, to obtain a new feature element;
相应地, 从已存储语言知识的知识库中获取所述特征元素的关联信息, 构建特征集合, 包括: Correspondingly, acquiring association information of the feature element from a knowledge base in which the language knowledge is stored, and constructing a feature set, include:
从已存储语言知识的知识库中获取所述新的特征元素的关联信息, 构建特征集合。  The association information of the new feature element is obtained from the knowledge base in which the language knowledge is stored, and the feature set is constructed.
10、 根据权利要求 6至 9中任一权利要求所述的方法, 其特征在于, 所述知识库包括词 的同义、 近义、 反义, 概念的相似度, 以及词、 词类和语义属性中的任意的一种或多种。 The method according to any one of claims 6 to 9, wherein the knowledge base comprises synonymous, synonymous, antisense, concept similarity, and word, word class and semantic attribute of a word. Any one or more of them.
PCT/CN2010/079958 2010-12-17 2010-12-17 Program recommending device and program recommending method WO2012079254A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2010/079958 WO2012079254A1 (en) 2010-12-17 2010-12-17 Program recommending device and program recommending method
CN201080070252.1A CN103299651B (en) 2010-12-17 2010-12-17 Program recommendation apparatus and program commending method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2010/079958 WO2012079254A1 (en) 2010-12-17 2010-12-17 Program recommending device and program recommending method

Publications (1)

Publication Number Publication Date
WO2012079254A1 true WO2012079254A1 (en) 2012-06-21

Family

ID=46243996

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2010/079958 WO2012079254A1 (en) 2010-12-17 2010-12-17 Program recommending device and program recommending method

Country Status (2)

Country Link
CN (1) CN103299651B (en)
WO (1) WO2012079254A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103970858A (en) * 2014-05-07 2014-08-06 百度在线网络技术(北京)有限公司 Recommended content determining system and method
CN104836720A (en) * 2014-02-12 2015-08-12 北京三星通信技术研究有限公司 Method for performing information recommendation in interactive communication, and device
WO2015188699A1 (en) * 2014-06-10 2015-12-17 华为技术有限公司 Item recommendation method and device
CN108810640A (en) * 2018-06-15 2018-11-13 重庆知遨科技有限公司 A kind of recommendation method of TV programme

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104602040B (en) * 2014-11-28 2017-08-29 中国传媒大学 System and method is formulated in a kind of programme
CN106484810A (en) * 2016-09-23 2017-03-08 广州视源电子科技股份有限公司 A kind of recommendation method and system of multimedia programming
CN107124653B (en) * 2017-05-16 2020-09-29 四川长虹电器股份有限公司 Method for constructing television user portrait
CN109978580A (en) * 2017-12-28 2019-07-05 北京京东尚科信息技术有限公司 Object recommendation method, apparatus and computer readable storage medium
CN108965937A (en) * 2018-06-27 2018-12-07 广东技术师范学院 A kind of dynamic interest model construction method of network-oriented TV family user
CN111599349B (en) * 2020-04-01 2023-04-18 云知声智能科技股份有限公司 Method and system for training language model
US11869015B1 (en) 2022-12-09 2024-01-09 Northern Trust Corporation Computing technologies for benchmarking

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1524236A (en) * 2000-03-29 2004-08-25 �ʼҷ����ֵ������޹�˾ Search user interface providing mechanism for manipulation of explicit and implicit criteria
CN101094335A (en) * 2006-06-20 2007-12-26 株式会社日立制作所 TV program recommender and method thereof
CN101527815A (en) * 2008-03-06 2009-09-09 株式会社东芝 Program recommending apparatus and method
US7685276B2 (en) * 1999-12-28 2010-03-23 Personalized User Model Automatic, personalized online information and product services

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6484164B1 (en) * 2000-03-29 2002-11-19 Koninklijke Philips Electronics N.V. Data search user interface with ergonomic mechanism for user profile definition and manipulation

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7685276B2 (en) * 1999-12-28 2010-03-23 Personalized User Model Automatic, personalized online information and product services
CN1524236A (en) * 2000-03-29 2004-08-25 �ʼҷ����ֵ������޹�˾ Search user interface providing mechanism for manipulation of explicit and implicit criteria
CN101094335A (en) * 2006-06-20 2007-12-26 株式会社日立制作所 TV program recommender and method thereof
CN101527815A (en) * 2008-03-06 2009-09-09 株式会社东芝 Program recommending apparatus and method

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104836720A (en) * 2014-02-12 2015-08-12 北京三星通信技术研究有限公司 Method for performing information recommendation in interactive communication, and device
CN103970858A (en) * 2014-05-07 2014-08-06 百度在线网络技术(北京)有限公司 Recommended content determining system and method
WO2015188699A1 (en) * 2014-06-10 2015-12-17 华为技术有限公司 Item recommendation method and device
US20170083965A1 (en) * 2014-06-10 2017-03-23 Huawei Technologies Co., Ltd. Item Recommendation Method and Apparatus
CN108810640A (en) * 2018-06-15 2018-11-13 重庆知遨科技有限公司 A kind of recommendation method of TV programme

Also Published As

Publication number Publication date
CN103299651A (en) 2013-09-11
CN103299651B (en) 2016-08-03

Similar Documents

Publication Publication Date Title
WO2012079254A1 (en) Program recommending device and program recommending method
CN106156204B (en) Text label extraction method and device
Li et al. Filtering out the noise in short text topic modeling
CN109255053B (en) Resource searching method, device, terminal, server and computer readable storage medium
US6556987B1 (en) Automatic text classification system
US9087297B1 (en) Accurate video concept recognition via classifier combination
CN110019794B (en) Text resource classification method and device, storage medium and electronic device
US8150822B2 (en) On-line iterative multistage search engine with text categorization and supervised learning
US20150074112A1 (en) Multimedia Question Answering System and Method
US20020099730A1 (en) Automatic text classification system
CN112417863B (en) Chinese text classification method based on pre-training word vector model and random forest algorithm
CN108009135B (en) Method and device for generating document abstract
JP5161658B2 (en) Keyword input support device, keyword input support method, and program
CN110888990A (en) Text recommendation method, device, equipment and medium
CN110430476A (en) Direct broadcasting room searching method, system, computer equipment and storage medium
JP2011529600A (en) Method and apparatus for relating datasets by using semantic vector and keyword analysis
KR20150096295A (en) System and method for buinding q&as database, and search system and method using the same
KR102373884B1 (en) Image data processing method for searching images by text
CN103384883A (en) Semantic enrichment by exploiting Top-K processing
CN101556596A (en) Input method system and intelligent word making method
CN109460477B (en) Information collection and classification system and method and retrieval and integration method thereof
CN116414968A (en) Information searching method, device, equipment, medium and product
CN110413770B (en) Method and device for classifying group messages into group topics
CN110209765B (en) Method and device for searching keywords according to meanings
CN107357881A (en) A kind of Chinese Text Classification System based on news data

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 10860647

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS (EPO FORM 1205A DATED 02-10-2013)

122 Ep: pct application non-entry in european phase

Ref document number: 10860647

Country of ref document: EP

Kind code of ref document: A1