CN111291191B - Broadcast television knowledge graph construction method and device - Google Patents

Broadcast television knowledge graph construction method and device Download PDF

Info

Publication number
CN111291191B
CN111291191B CN201811495424.4A CN201811495424A CN111291191B CN 111291191 B CN111291191 B CN 111291191B CN 201811495424 A CN201811495424 A CN 201811495424A CN 111291191 B CN111291191 B CN 111291191B
Authority
CN
China
Prior art keywords
user
program
word
package
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811495424.4A
Other languages
Chinese (zh)
Other versions
CN111291191A (en
Inventor
万倩
欧阳峰
朱里越
赵明
牛妍华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Academy of Broadcasting Science Research Institute
Original Assignee
Academy of Broadcasting Science Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Academy of Broadcasting Science Research Institute filed Critical Academy of Broadcasting Science Research Institute
Priority to CN201811495424.4A priority Critical patent/CN111291191B/en
Publication of CN111291191A publication Critical patent/CN111291191A/en
Application granted granted Critical
Publication of CN111291191B publication Critical patent/CN111291191B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a broadcast television knowledge graph construction method and a device, which are characterized in that a preset data source is analyzed and processed and stored in a database, visual display of the image characteristics of a user image, a program label and a package label is carried out on the user image, the program and the package based on the database, the relations among the person, the program and the person are acquired from the database, the knowledge graph is respectively carried out, finally, the knowledge graph application of the broadcast television industry is constructed according to the image characteristics and the knowledge graph, and the broadcast television big data knowledge graph provided by the invention not only can intuitively and comprehensively show massive knowledge information under typical application scenes of the broadcast television field, but also can display behavior habit and interest preference of each user at multiple angles, thereby helping a broadcast television operator manage better grasp the characteristics of each user, and further carrying out accurate recommendation, market analysis and the like.

Description

Broadcast television knowledge graph construction method and device
Technical Field
The invention relates to the technical field of computers, in particular to a method and a device for constructing a broadcast television knowledge graph.
Background
With the evolution development of the cable digital television network from a unidirectional broadcast television network to a bidirectional next generation broadcast television network, the traditional broadcast television operator is being changed from an original single network operator to a comprehensive information service operator, in the changing process, interest preference, viewing habit and consumption characteristics of users are timely obtained to become a key factor, and a broadcast television knowledge graph system is built by analyzing characteristics of massive data and mining data values, so that the broadcast television operator can grasp the user behavior characteristics and program resource characteristics conveniently, and further, the user can be served more timely and accurately, user experience is greatly improved, user consumption is guided, user viscosity is improved, and auxiliary support is provided for personalized services and intelligent operation decisions of the broadcast television operator.
Although the knowledge graph expresses the relation dependence among different knowledge nodes by means of the characteristics of the graph structure, the existing knowledge graph cannot show the behavior habit and the interest of each user from multiple angles, so that an operator cannot well master the characteristics of each user.
Disclosure of Invention
The invention provides a broadcast television knowledge graph construction method and device, which are used for solving the problem that the existing knowledge graph in the prior art cannot show the behavior habit and the interest of each user from multiple angles, so that an operator cannot well master the characteristics of each user.
In one aspect, the invention provides a broadcast television knowledge graph construction method, which comprises the following steps:
analyzing and processing a preset data source and storing the data source in a database;
Based on the database, carrying out visual display on the portrait features of the user, the program and the package, and obtaining the relations among the person, the program and the person from the database, and respectively carrying out visual display on the knowledge maps;
And constructing a knowledge graph application in the broadcasting and television industry according to the portrait features and the knowledge graph.
Preferably, the data sources include user viewing behavior data, user ordering behavior data, media asset system data and program information data published by the internet inside the broadcast television.
Preferably, the analyzing and processing the preset data source includes:
Step one, establishing a Hive mapping table, and importing the preprocessed user viewing behaviors, user ordering behaviors, program information in a media resource system and data of the program information obtained by crawling from the Internet in a Hive data warehouse;
Extracting user basic attribute information and package basic attribute information from user viewing behaviors and user ordering behaviors by using a distributed computing framework Spark, fusing program information of a media resource system and the Internet, marking the program, finally combining the generated label with the program basic attribute information, storing the combined label into a Mysql relational database, marking the package by using a natural language processing technology based on package names in the package basic attribute information, combining the package basic attribute information with the package labels, and storing the combined package basic attribute information and the package labels into the Mysql;
Thirdly, word frequency statistics is carried out on the programs watched by the user and the labels of the ordered packages in the watching behaviors and the ordering behaviors of the user by using Spark, topN with the largest frequency is selected as the interest label of the programs watched by the user and the interest label of the packages ordered by the user, and the user basic attribute information and the interest label of the programs watched by the user and the interest label of the packages ordered by the user are combined and stored in Mysql.
Preferably, the third step further includes:
and extracting the user-viewing-program from the user viewing behavior and the user ordering behavior by Spark, and importing the triplet relation of the user-ordering-package into a Neo4j graph database for storage.
Preferably, the method further comprises: program tags are constructed.
Preferably, constructing the program label includes:
Step one: simplified and complex conversion is carried out on the program brief introduction text, and word segmentation is carried out after all the text is converted into Chinese;
Step two: counting word frequency by using TF to obtain word frequency of words counted based on word segmentation results of all program introduction, sorting word frequency list from big to small, filtering word segmentation results of each program introduction by using the word frequency counting results, and only retaining words with frequency greater than a preset value;
step three: performing deactivated word processing on the selected high-frequency words;
Step four: after the stop Word processing is carried out, each program introduction can obtain a characteristic Word list, and training data for a deep learning model Word2Vec is constructed by taking the characteristic Word list of each program as a sample;
step five: calculating a clustering result by using the word vector space and the similar word proximity characteristics provided by the word vector space and a Kmeans algorithm;
Step six: for each cluster of the clustering result, selecting one of the most representative feature words as a representation word of the cluster, and finally counting the representation words corresponding to the feature words of all programs, wherein the more important representation word in the counting result is used as a label of the program.
Preferably, the fourth step includes:
training by using a Word2Vec continuous Word bag CBOW model and HIERARCHICAL SOFTMAX framework and using a feature Word list of a program as input data, so as to obtain a Word vector space which is sufficient for representing all feature words, wherein each feature Word obtains a unique vector representation, and the CBOW model comprises three layers: an input layer, a projection layer and an output layer.
Preferably, the fifth step includes:
An initial multi-category dividing region is constructed, the cluster classification of each sample is continuously adjusted during each iteration by calculating the mass center of each dividing region, the mass center is recalculated by using a new cluster dividing result in the next iteration, the iteration is repeated until the mass center is stable, a reliable clustering result can be finally obtained, a Kmeans model is constructed by taking the representation of all feature words in a word vector space as input, the clustering result of the feature words is trained, and each feature word belongs to a specific cluster.
The second aspect of the present invention provides a broadcast and television knowledge graph construction device, which comprises:
The storage analysis module is used for analyzing and processing a preset data source and storing the data source in a database;
The visualization module is used for carrying out visual display on the portrait characteristics of the user, the program and the package, and the portrait characteristics of the user portrait, the program label and the package label based on the database, and obtaining the relations among people, programs and people and packages from the database to respectively carry out visual display on the knowledge maps;
And the processing module is used for constructing the knowledge graph application of the broadcasting and television industry according to the portrait features and the knowledge graph.
A third aspect of the present invention provides a computer-readable storage medium storing a computer program for signal mapping, which when executed by at least one processor, implements any one of the above-described broadcast and television knowledge graph construction methods.
The invention has the following beneficial effects:
The broadcast television big data knowledge graph provided by the invention not only can intuitively and comprehensively show massive knowledge information in a typical application scene in the broadcast television field, but also can display behavior habits and interest preferences of each user at multiple angles, thereby helping broadcast television operators to better grasp the characteristics of each user and further carrying out accurate recommendation, market analysis and the like.
The foregoing description is only an overview of the present invention, and is intended to be implemented in accordance with the teachings of the present invention in order that the same may be more clearly understood and to make the same and other objects, features and advantages of the present invention more readily apparent.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:
FIG. 1 is a schematic flow chart of a method for constructing a broadcast television knowledge graph according to an embodiment of the invention;
FIG. 2 is a schematic diagram of a radio and television knowledge graph system in accordance with an embodiment of the present invention;
FIG. 3 is a flow chart of a broadcast television user portrayal construction system in accordance with an embodiment of the present invention;
FIG. 4 is a schematic diagram of a program label construction process according to an embodiment of the present invention;
FIG. 5 is a diagram of a CBOW-based word2vec model, according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of a knowledge base construction flow in accordance with an embodiment of the invention;
FIG. 7 is a schematic diagram of a typical knowledge graph scene in accordance with an embodiment of the invention;
FIG. 8 is a schematic diagram of an attribute map model according to an embodiment of the present invention;
fig. 9 is a top page effect schematic diagram of a broadcast television knowledge graph system according to an embodiment of the invention;
fig. 10a is a basic information effect diagram of a broadcast television knowledge graph system according to an embodiment of the present invention;
Fig. 10b is a basic information effect diagram of the broadcast television knowledge graph system according to the embodiment of the invention;
fig. 10c is a basic information effect diagram of the broadcast television knowledge graph system according to the embodiment of the invention;
FIG. 11 is a graph showing the effect of a single-node relationship graph of the broadcast television knowledge graph system according to the embodiment of the invention;
Fig. 12 is a graph of a dual-node relationship graph effect of the broadcast and television knowledge graph system according to an embodiment of the present invention;
FIG. 13 is a diagram of intelligent recommendation effects of a broadcast and television knowledge graph system according to an embodiment of the invention;
fig. 14 is a schematic structural diagram of a broadcast and television knowledge graph construction device according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
The first embodiment of the invention provides a broadcast television knowledge graph construction method, referring to fig. 1, comprising the following steps:
S101, analyzing and processing a preset data source and storing the data source in a database;
Because the main business scene in the broadcast and television field is that a user watches programs, the user pays to purchase packages and the like, the data nodes when the knowledge graph is constructed can be divided into three categories of users, programs and packages, and the association relationship exists between the users and the programs and between the users and the packages.
The data source for constructing the broadcast television knowledge graph in the embodiment of the invention mainly comprises two major parts, namely user viewing behavior data, user ordering behavior data and media information system data from the inside of the broadcast television, wherein the data is characterized by huge data volume, real and reliable data and lack of arrangement and aggregation; and secondly, program information data from Internet disclosure can supplement, enrich and perfect data in the media resource system.
S102, based on the database, carrying out visual display on the portrait features of the user, the program and the package, and the portrait features of the program label and the package label, and obtaining the relations among people, programs and people and packages from the database to respectively carry out visual display on the knowledge maps;
s103, constructing a knowledge graph application of the broadcasting and television industry according to the portrait features and the knowledge graph.
In general, the core of the embodiment of the invention is to provide and design a set of broadcast and television knowledge graph system aiming at the characteristics and the requirements of broadcast and television services, and the construction of user portraits and knowledge bases is completed by constructing data node labels, and interactive analysis and visualization functions are provided at the same time, so that an analyst can conveniently perform operation analysis.
Fig. 2 is a schematic diagram of a broadcast television knowledge graph system according to an embodiment of the present invention, and as shown in fig. 2, the general schematic diagram of the broadcast television knowledge graph system according to the embodiment of the present invention is divided into four layers from bottom to top: a data source, a storage analysis layer, a visualization layer, and an application layer.
The data sources in the embodiment of the invention comprise user viewing behavior data, user ordering behavior data, media resource system data and program information data disclosed by the Internet in the broadcasting and television.
In the embodiment of the invention, the storage analysis layer is used for analyzing and processing the preset data source, and the method comprises the following steps:
Step one, establishing a Hive mapping table, and importing the preprocessed user viewing behaviors, user ordering behaviors, program information in a media resource system and data of the program information obtained by crawling from the Internet in a Hive data warehouse;
Extracting user basic attribute information and package basic attribute information from user viewing behaviors and user ordering behaviors by using a distributed computing framework Spark, fusing program information of a media resource system and the Internet, marking the program, finally combining the generated label with the program basic attribute information, storing the combined label into a Mysql relational database, marking the package by using a natural language processing technology based on package names in the package basic attribute information, combining the package basic attribute information with the package labels, and storing the combined package basic attribute information and the package labels into the Mysql;
Thirdly, word frequency statistics is carried out on the programs watched by the user and the labels of the ordered packages in the watching behaviors and the ordering behaviors of the user by using Spark, topN with the largest frequency is selected as the interest label of the programs watched by the user and the interest label of the packages ordered by the user, and the user basic attribute information and the interest label of the programs watched by the user and the interest label of the packages ordered by the user are combined and stored in Mysql.
And step four, extracting the user-watching-program from the user watching behavior and the user ordering behavior by Spark, and importing the triplet relation of the user-ordering-package into a Neo4j graph database for storage.
The visual layer of the embodiment of the invention comprises basic information and visual display of a knowledge base, wherein the bottom layer stores Mysql and Neo4j databases respectively, basic information of a user, a program and a package is obtained from the Mysql databases through a Java JDBC interface to perform visual display of the user portrait, the program label and the package label, and the Java API or REST API is utilized to obtain the relationship among the person-person, the person-program and the person-package from the Neo4j databases to perform visual display respectively.
A system flow chart of the broadcast television user portrait construction in the embodiment of the invention is shown in fig. 3, and the flow chart describes the process of constructing the whole broadcast television big data user portrait based on mass data.
In the visual layer, the basic data needs to be cleaned and disambiguated, and the part is processed by Spark due to the large data volume, so that the finally generated data can effectively support subsequent data processing and association. The processing of the basic data includes the establishment of a user library containing user basic attributes, a package library containing package basic attributes, a program database containing program basic attributes, and the like. Typical treatments are as follows: filtering of invalid information, cleaning of attribute values, and the like.
One key point of the processing unit of the visualization layer is the label construction of the data.
1) Package label construction
The package in the package library is from the business design of the broadcast and television operators, the package name in the package basic attribute information contains important information, and the tag which is enough for representing the characteristics of the package can be obtained by directly segmenting the package through a natural language processing technology.
2) Program label construction
The method comprises the steps that as the related media information does not exist in part of programs in a program library, data have certain incomplete property, basic information of the data needs to be crawled from the Internet, unstructured program brief introduction texts are selected from the basic information to serve as data sources for label construction, and then label construction is carried out by adopting a label construction method based on Word2Vec Word embedding and Kmeans clustering, and the flow of the method is shown in figure 4.
Step one: and performing simplified and complex conversion on the program introduction text, and performing word segmentation processing by applying a Chinese word segmentation technology common in the field of natural language processing after all the text is converted into Chinese.
Step two: and counting word frequency by using TF to obtain word frequency of words counted based on word segmentation results of all program introduction, and sorting the word frequency list from big to small in frequency. The word frequency statistical result is used for filtering word segmentation results of each program introduction, and only words with frequency greater than a certain value (the value is dynamically adjustable, so as to ensure that most programs can be represented by corresponding characteristic words).
Step three: and (3) performing de-stop word processing on the selected high-frequency words, and removing common nonsensical words including all non-nouns and non-adjective words and words which cannot express characteristics in the nouns.
Step four: after the stop Word processing, each program introduction can obtain a characteristic Word list, and training data for the deep learning model Word2Vec is constructed by taking the characteristic Word list of each program as a sample. The Word2Vec is called Word2Vector, and is a Word Vector training model based on an iterative method. In the invention, a Continuous Word Bag model (CBOW) and a HIERARCHICAL SOFTMAX framework Of Word2Vec are selected, and feature Word list Of programs is used as input data for training, so that a Word vector space which is enough for representing all feature Words is obtained, and each feature Word obtains a unique vector representation. The CBOW model contains three layers: an input layer, a projection layer and an output layer, as shown in fig. 5.
Word2Vec builds a neural network topology, and when a sequence of a plurality of words is input, CBOW model takes Word vector expressions of the words adjacent to the Word as input, and designates the Word vector expressions output as the Word. According to the working principle of the neural network, after multiple iterations, all Word units used for Word2Vec training can be represented by a vector, and because the vector is obtained by considering the vectors of words on the left side and the right side of the Word, the distribution of words with similar meanings in a Word vector space is relatively similar except for uniquely representing a certain Word.
Step five: and calculating a clustering result by using the word vector space and the similar word proximity characteristics provided by the word vector space and applying a Kmeans algorithm. Kmeans is a cluster analysis algorithm in the field of machine learning. The basic principle is that an initial multi-category dividing area is firstly constructed, then the mass center of each dividing area (called a cluster) is calculated, the cluster classification of each sample is continuously adjusted in each iteration round, the mass center is recalculated by using a new cluster dividing result in the next iteration round, and the iteration is repeated until the mass center is stable, so that a reliable clustering result can be obtained. The method comprises the steps of constructing a Kmeans model by taking the representation of all feature words in a word vector space as input, and training a clustering result of the feature words, wherein each feature word belongs to a specific cluster.
Step six: for each cluster of the clustering result, we select one of the most representative feature words as the representation word of the cluster, and finally count the representation words corresponding to the feature words of all programs, so that the more important representation word in the statistics result is used as the label of the program.
3) User portrayal construction
For the user in the user table, the data association is combined, the statistics analysis is carried out on the label set of the program watched by the user and the ordered package, the most important label is selected to form the label representation of the user, the label representation comprises the program interest label and the package interest label, and finally the label representation and the user attribute information are integrated into the user portrait.
The knowledge base is the core of the knowledge spectrum, the knowledge spectrum is only one mode of knowledge base presentation, and the invention focuses on knowledge base construction. The knowledge base construction flow is shown in fig. 6, and includes data acquisition, knowledge base construction and knowledge base storage from bottom to top.
The data acquisition is derived from unstructured data, semi-structured data, and structured data. The unstructured data is mainly program information crawled on the Internet, the semi-structured data comprises package basic information and media resource data, and the structured data comprises user basic attributes, package basic attributes and program basic attributes.
Knowledge base construction comprises information extraction, knowledge fusion and knowledge processing. The method comprises the steps that Spark is utilized to extract user entities, program entities, package entities, and watching and ordering relations between users and programs and between users and packages from live broadcasting viewing behaviors, on-demand viewing behaviors and package ordering behaviors respectively, and attributes and attribute values of various entities are included; and then for the entities with different sources, performing entity disambiguation and coreference resolution according to the ID and the attribute of the entity to realize fusion of the triplet relations with different sources, and finally, completing construction of a knowledge base after removing the repeated triplet relations and the triplet relations with lost information.
Fig. 7 is a diagram of three different physical node data connected into a directed graph structure. The figure shows a typical knowledge-graph scenario from the user's perspective, pointing to packages ordered by the user and the programs they have watched.
The knowledge base of the graph structure is generally stored by adopting a graph database, neo4j is the most popular graph database at present, not only provides a storage function, but also provides a concise visual interactive query interface, is convenient for quick verification of data quality and analysis models, and saves scientific research exploration time. Unlike traditional databases, neo4j stores user-defined nodes and relationships in a graph manner, and in this manner, the relationship between two nodes can be found from a certain node through the relationship between the nodes in an efficient manner, and the attribute graph model is shown in fig. 8. Two different data of nodes and relations are recorded in one graph, the relations can be used for associating the two node data, and the nodes and the relations can all have own attributes.
The knowledge graph system of the embodiment of the invention mainly comprises three major functions: front page, basic information and data analysis.
The first page mainly displays statistical data such as the number of users, the number of programs, the number of packages, the closing coefficient and the like contained in the system in a digital and graphic mode, and simultaneously performs multidimensional analysis on the user types, the program types and the like from different dimensions by using a pie chart, a bar chart, a tree chart, a ring chart and the like, wherein the specific effect is shown in fig. 9.
The basic information part is displayed in a list form from the angles of a user, a program and a package, so that the user can conveniently check and inquire attribute records of the three entities, wherein the user information provides personalized program labels and package labels besides the basic attributes such as gender, age and the like; part of program attribute information and program labels are listed in the program information list, and richer program attributes can be checked by clicking check details; package information contains basic attributes such as package name and price, and package labels are also provided. Specific examples are shown in fig. 10a, 10b and 10c, respectively.
The data analysis part comprises two functions of knowledge graph and intelligent recommendation, and the knowledge graph can be subdivided into single-node relation query and double-node relation query. The single-node query result is that the relationship between the nodes is expanded from a hierarchy with a certain node as a center, for example, all the relationships with the query user id 28089749 degrees are queried, as shown in fig. 11.
And the dual node query can query the relationship between any two of the user, the program and the package, such as querying all the relationship types of 3 degrees with the user id 123504102 and the program id 4058786, as shown in fig. 12.
Intelligent recommendations are other packages that recommend other programs and orders that are watched by a user with a common viewing or ordering action with the user. Such as querying all recommendations with user id 27988189, as shown in fig. 13.
In general, the key of the invention is that a set of high-availability broadcast and television knowledge map system is supported to be constructed by applying Word2Vec Word embedding, kmeans clustering and other machine learning and deep learning methods.
The broadcast television big data knowledge graph system not only can intuitively and comprehensively show massive knowledge information under typical application scenes in the broadcast television field, but also can display behavior habits and interest preferences of each user at multiple angles, thereby helping broadcast television operators to better grasp the characteristics of each user and further carrying out accurate recommendation, market analysis and the like. The Word2Vec Word vector space can enable the expression of feature words to be more accurate, the similar feature words can be supported to be clustered by using the characteristic that the approximate words of the Word vector space are distributed nearby, then a group of feature words are represented by selecting unified representation words, so that the original sparse and scattered feature words are unified to a limited capacity, different programs can be organized in a tree mode, meanwhile, statistical ordering of the tags has practical significance when the user tags are built by using the program tags, and the user watching records are used for building the user preference tags.
A second embodiment of the present invention provides a broadcast and television knowledge graph construction apparatus, referring to fig. 14, including:
The storage analysis module is used for analyzing and processing a preset data source and storing the data source in a database;
The visualization module is used for carrying out visual display on the portrait characteristics of the user, the program and the package, and the portrait characteristics of the user portrait, the program label and the package label based on the database, and obtaining the relations among people, programs and people and packages from the database to respectively carry out visual display on the knowledge maps;
And the processing module is used for constructing the knowledge graph application of the broadcasting and television industry according to the portrait features and the knowledge graph.
The storage analysis module of the embodiment of the invention is used for realizing the corresponding function of the storage analysis layer in the first embodiment of the invention, the visualization module is used for realizing the function of the visualization layer in the first embodiment of the invention, and the processing module is used for realizing the function of the application layer in the first embodiment of the invention.
The related content of the embodiment of the present invention can be understood by referring to the first embodiment of the present invention, and will not be described in detail herein.
A third embodiment of the present invention provides a computer readable storage medium storing a computer program for mapping signals, where the computer program is executed by at least one processor to implement the broadcast and television knowledge graph construction method according to the first embodiment of the present invention. The relevant content of the embodiments of the present invention can be understood with reference to the first embodiment of the present invention, and will not be discussed in detail herein.
The algorithms and displays presented herein are not inherently related to any particular computer, virtual system, or other apparatus. Various general-purpose systems may also be used with the teachings herein. The required structure for a construction of such a system is apparent from the description above. In addition, the present invention is not directed to any particular programming language. It will be appreciated that the teachings of the present invention described herein may be implemented in a variety of programming languages, and the above description of specific languages is provided for disclosure of enablement and best mode of the present invention.
In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. This method of disclosure, however, is not to be interpreted as reflecting an intention that the claimed invention requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.
Those skilled in the art will appreciate that the modules in the apparatus of the embodiments may be adaptively changed and disposed in one or more apparatuses different from the embodiments. The modules or units or components of the embodiments may be combined into one module or unit or component and, furthermore, they may be divided into a plurality of sub-modules or sub-units or sub-components. Any combination of all features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or units of any method or apparatus so disclosed, may be used in combination, except insofar as at least some of such features and/or processes or units are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings), may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features but not others included in other embodiments, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments can be used in any combination.
Various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that some or all of the functions of some or all of the components in a distributed file system data import apparatus according to embodiments of the present invention may be implemented in practice using a microprocessor or Digital Signal Processor (DSP). The present invention can also be implemented as an apparatus or device program (e.g., a computer program and a computer program product) for performing a portion or all of the methods described herein. Such a program embodying the present invention may be stored on a computer readable medium, or may have the form of one or more signals. Such signals may be downloaded from an internet website, provided on a carrier signal, or provided in any other form.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the words first, second, third, etc. do not denote any order. These words may be interpreted as names.

Claims (3)

1. The broadcast and television knowledge graph construction method is characterized by comprising the following steps of:
analyzing and processing a preset data source and storing the data source in a database;
Based on the database, carrying out visual display on the portrait features of the user, the program and the package, and obtaining the relations among the person, the program and the person from the database, and respectively carrying out visual display on the knowledge maps;
Constructing a knowledge graph application of the broadcasting and television industry according to the portrait features and the knowledge graph;
the data sources comprise user viewing behavior data, user ordering behavior data, media resource system data and program information data disclosed by the Internet in the broadcasting and television;
analyzing and processing a preset data source, including:
Step one, establishing a Hive mapping table, and importing the preprocessed user viewing behaviors, user ordering behaviors, program information in a media resource system and data of the program information obtained by crawling from the Internet in a Hive data warehouse;
Extracting user basic attribute information and package basic attribute information from user viewing behaviors and user ordering behaviors by using a distributed computing framework Spark, fusing program information of a media resource system and the Internet, marking the program, finally combining the generated label with the program basic attribute information, storing the combined label into a Mysql relational database, marking the package by using a natural language processing technology based on package names in the package basic attribute information, combining the package basic attribute information with the package labels, and storing the combined package basic attribute information and the package labels into the Mysql;
Thirdly, performing word frequency statistics on the programs watched by the user and the labels of the ordered packages in the user watching behaviors and the ordering behaviors by using Spark, respectively selecting N words topN with the highest ranking times as interest labels of the programs watched by the user and the interest labels of the packages ordered by the user, merging the basic attribute information of the user and the interest labels of the programs watched by the user and the interest labels of the packages ordered by the user, and storing the merged information into Mysql;
The third step further comprises the following steps: extracting a user-watching-program from the user watching behavior and the user ordering behavior by Spark, and importing the triple relation of the user-ordering-package into a graphic database Neo4j for storage;
further comprises: constructing a program label;
The program label construction comprises the following steps:
Step one: simplified and complex conversion is carried out on the program brief introduction text, and word segmentation is carried out after all the text is converted into Chinese;
Step two: counting word frequency by using TF to obtain word frequency of words counted based on word segmentation results of all program introduction, sorting word frequency list from big to small, filtering word segmentation results of each program introduction by using word frequency counting results, and only retaining words with frequency greater than preset value;
step three: performing deactivated word processing on the selected high-frequency words;
Step four: after the stop Word processing is carried out, each program introduction can obtain a characteristic Word list, and training data for a deep learning model Word2Vec is constructed by taking the characteristic Word list of each program as a sample;
the fourth step further comprises:
Training by using a Word2Vec continuous Word bag CBOW model and HierarchicalSoftmax framework and using a feature Word list of a program as input data, so as to obtain a Word vector space which is sufficient for representing all feature words, wherein each feature Word obtains a unique vector representation, and the CBOW model comprises three layers: an input layer, a projection layer and an output layer;
step five: calculating a clustering result by using the word vector space and the similar word proximity characteristics provided by the word vector space and a Kmeans algorithm;
Step six: for each cluster of the clustering result, selecting one of the most representative feature words as a representation word of the cluster, and finally counting the representation words corresponding to the feature words of all programs, wherein the more important representation word in the counting result is used as a label of the program.
2. The method according to claim 1, wherein the fifth step comprises:
An initial multi-category dividing region is constructed, the cluster classification of each sample is continuously adjusted during each iteration by calculating the mass center of each dividing region, the mass center is recalculated by using a new cluster dividing result in the next iteration, the iteration is repeated until the mass center is stable, a reliable clustering result can be finally obtained, a Kmeans model is constructed by taking the representation of all feature words in a word vector space as input, the clustering result of the feature words is trained, and each feature word belongs to a specific cluster.
3. A computer readable storage medium, wherein the computer readable storage medium stores a computer program of signal mapping, which when executed by at least one processor, implements the broadcast television knowledge graph construction method of any one of claims 1-2.
CN201811495424.4A 2018-12-07 2018-12-07 Broadcast television knowledge graph construction method and device Active CN111291191B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811495424.4A CN111291191B (en) 2018-12-07 2018-12-07 Broadcast television knowledge graph construction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811495424.4A CN111291191B (en) 2018-12-07 2018-12-07 Broadcast television knowledge graph construction method and device

Publications (2)

Publication Number Publication Date
CN111291191A CN111291191A (en) 2020-06-16
CN111291191B true CN111291191B (en) 2024-05-03

Family

ID=71025223

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811495424.4A Active CN111291191B (en) 2018-12-07 2018-12-07 Broadcast television knowledge graph construction method and device

Country Status (1)

Country Link
CN (1) CN111291191B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111967897A (en) * 2020-07-29 2020-11-20 哈尔滨商业大学 Agent-based consumer preference analysis system
CN111767440B (en) * 2020-09-03 2021-01-05 平安国际智慧城市科技股份有限公司 Vehicle portrayal method based on knowledge graph, computer equipment and storage medium
CN112650898A (en) * 2020-12-28 2021-04-13 武汉烽火信息集成技术有限公司 Industrial chain visual analysis method, device and equipment
CN112954025B (en) * 2021-01-29 2023-07-18 北京百度网讯科技有限公司 Information pushing method, device, equipment and medium based on hierarchical knowledge graph
CN115080764B (en) * 2022-07-21 2022-11-01 神州医疗科技股份有限公司 Medical similar entity classification method and system based on knowledge graph and clustering algorithm

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101557502A (en) * 2009-05-26 2009-10-14 四川长虹电器股份有限公司 Immediate subscribing method, system and set-top box for digital television program
CN107633075A (en) * 2017-09-22 2018-01-26 吉林大学 A kind of multi-source heterogeneous data fusion platform and fusion method
WO2018036239A1 (en) * 2016-08-24 2018-03-01 慧科讯业有限公司 Method, apparatus and system for monitoring internet media events based on industry knowledge mapping database
CN108694223A (en) * 2018-03-26 2018-10-23 北京奇艺世纪科技有限公司 The construction method and device in a kind of user's portrait library

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101557502A (en) * 2009-05-26 2009-10-14 四川长虹电器股份有限公司 Immediate subscribing method, system and set-top box for digital television program
WO2018036239A1 (en) * 2016-08-24 2018-03-01 慧科讯业有限公司 Method, apparatus and system for monitoring internet media events based on industry knowledge mapping database
CN107633075A (en) * 2017-09-22 2018-01-26 吉林大学 A kind of multi-source heterogeneous data fusion platform and fusion method
CN108694223A (en) * 2018-03-26 2018-10-23 北京奇艺世纪科技有限公司 The construction method and device in a kind of user's portrait library

Also Published As

Publication number Publication date
CN111291191A (en) 2020-06-16

Similar Documents

Publication Publication Date Title
CN111291191B (en) Broadcast television knowledge graph construction method and device
CN108062375B (en) User portrait processing method and device, terminal and storage medium
US10360623B2 (en) Visually generated consumer product presentation
CN107220365B (en) Accurate recommendation system and method based on collaborative filtering and association rule parallel processing
CN106796578B (en) Autoknowledge system and method and memory
CN106372194B (en) Method and system for presenting search results
US11023545B2 (en) Method and device for displaying recommended contents
TWI631474B (en) Method and device for product identification label and method for product navigation
JP6190887B2 (en) Image search system and information recording medium
CN107679960B (en) Personalized clothing recommendation method based on clothing image and label text bimodal content analysis
CN110532479A (en) A kind of information recommendation method, device and equipment
CN110019396A (en) A kind of data analysis system and method based on distributed multidimensional analysis
CN105718184A (en) Data processing method and apparatus
Fried et al. Isomatch: Creating informative grid layouts
WO2020238502A1 (en) Article recommendation method and apparatus, electronic device and storage medium
JP2010507843A (en) Personal music recommendation mapping
EP2377052A1 (en) Method and system for monitoring online media and dynamically charting the results to facilitate human pattern detection
CN108710628A (en) A kind of visual analysis method and system towards multi-modal data based on sketch interaction
Salah et al. Combining cultural analytics and networks analysis: Studying a social network site with user-generated content
CN111611304A (en) Knowledge-driven joint big data query and analysis platform
CN111177559A (en) Text travel service recommendation method and device, electronic equipment and storage medium
CN111429161B (en) Feature extraction method, feature extraction device, storage medium and electronic equipment
Kleiman et al. Dynamic maps for exploring and browsing shapes
CN110717089A (en) User behavior analysis system and method based on weblog
CN116521937A (en) Video form generation method, device, equipment, storage medium and program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant