CN106131582B - A kind of wrong source investigation method based on video text message - Google Patents

A kind of wrong source investigation method based on video text message Download PDF

Info

Publication number
CN106131582B
CN106131582B CN201610554564.9A CN201610554564A CN106131582B CN 106131582 B CN106131582 B CN 106131582B CN 201610554564 A CN201610554564 A CN 201610554564A CN 106131582 B CN106131582 B CN 106131582B
Authority
CN
China
Prior art keywords
video
source
program
crawl
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610554564.9A
Other languages
Chinese (zh)
Other versions
CN106131582A (en
Inventor
刘强
王长福
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Casicloud Co ltd
Original Assignee
Space Cloud Network Technology Development LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Space Cloud Network Technology Development LLC filed Critical Space Cloud Network Technology Development LLC
Priority to CN201610554564.9A priority Critical patent/CN106131582B/en
Publication of CN106131582A publication Critical patent/CN106131582A/en
Application granted granted Critical
Publication of CN106131582B publication Critical patent/CN106131582B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/232Content retrieval operation locally within server, e.g. reading video streams from disk arrays
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/262Content or additional data distribution scheduling, e.g. sending additional data at off-peak times, updating software modules, calculating the carousel transmission frequency, delaying a video stream transmission, generating play-lists
    • H04N21/26291Content or additional data distribution scheduling, e.g. sending additional data at off-peak times, updating software modules, calculating the carousel transmission frequency, delaying a video stream transmission, generating play-lists for providing content or additional data updates, e.g. updating software modules, stored at the client
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/266Channel or content management, e.g. generation and management of keys and entitlement messages in a conditional access system, merging a VOD unicast channel into a multicast channel
    • H04N21/2665Gathering content from different sources, e.g. Internet and satellite
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/45Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
    • H04N21/454Content or additional data filtering, e.g. blocking advertisements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/858Linking data to content, e.g. by linking an URL to a video object, by creating a hotspot
    • H04N21/8586Linking data to content, e.g. by linking an URL to a video object, by creating a hotspot by using a URL

Abstract

The present invention relates to a kind of, and method is checked in the wrong source based on video text message, searches for each video website by programm name, determines whether each video website has the video source of the corresponding programm name;Video playing link is grabbed from the corresponding video website of video source: by the way of periodically grabbing;Storage crawl result forms crawl historical record;Text information in crawl historical record is analyzed, wrong source is found out, and accordingly deletes the corresponding crawl result in mistake source in crawl historical record;According to the crawl historical record in the error-free source eventually formed, agreement in television convergence, presented in a manner of program and realize broadcasting.The present invention carries out video grabber misarrangement based on text information, solves the problems, such as " of the same name not homologous " occurred in TV internet video aggregated application when grabbing video playing link.

Description

A kind of wrong source investigation method based on video text message
Technical field
The present invention relates to TV internet videos to polymerize field of error correction, specifically a kind of mistake based on video text message Check method in source.
Background technique
Traditional TV programme are to be presented in a manner of television channel after TV station edits meticulously and realize broadcasting.
As integration of three networks process is gradually risen, (the TV internet video polymerization of TV internet video aggregated application APP) technology enters people's lives and rapidly develops, and the technology is by the multimedia content on internet (especially in video Hold) it is presented in a manner of program in television and realizes broadcasting.
Television present each program typically correspond to more than one video source, therefore, the technical requirements first from Video playing link is grabbed on the corresponding video website of video source, then passes through agreement (stream media protocol, such as HLS (HTTP again The dynamic code rate adaptive technique of Live Streaming, Apple) agreement etc.) presented in television convergence, in a manner of program And realize broadcasting.
The method of the crawl video playing link of current main-stream is: matching video source by programm name and grabs video and broadcasts Link is put, but this method there are problems that " of the same name not homologous ", such as:
" Hero Shooting Vulture " is widely known TV play, there is several versions in history, chronologically divide have 1983 editions, 1994 editions, 2003 editions, 2008 editions etc., when matching video source by programm name and grabbing video playing link, program names Title is all " Hero Shooting Vulture ", and the video playing link grabbed at this time also needs to compare details ability related with film The particular content known.
That is: difference on programme content can not correctly be distinguished by only relying on programm name.Problems are led in video aggregation Domain is to take place frequently and inevitable problem.For correcting action, very high human cost is generally required.
Summary of the invention
In view of the deficiencies in the prior art, the purpose of the present invention is to provide a kind of mistakes based on video text message Method is checked in source, when grabbing video playing link, is carried out video grabber misarrangement based on text information, is solved TV internet view " of the same name not homologous " problem occurred in frequency aggregated application.
To achieve the above objectives, the technical solution adopted by the present invention is that:
A kind of wrong source investigation method based on video text message, which comprises the steps of:
Step 1, each video website is searched for by programm name, determines whether each video website has the corresponding programm name Video source;
Step 2, video playing link is grabbed from the corresponding video website of video source: by the way of periodically grabbing, At least grab the following contents:
Video playing corresponding with video source links,
For demarcating the text information of the video content of the video source;
Step 3, the crawl result of storing step 2 forms crawl historical record;
Step 4, the text information in crawl historical record is analyzed, finds out wrong source, and accordingly delete crawl history The corresponding crawl result in mistake source in record;
Step 5, the crawl historical record in the error-free source eventually formed according to step 4, by agreement television convergence, with The mode of program presents and realizes broadcasting.
Based on the above technical solution, in step 1, the programm name includes but is not limited to: TV play title or Movie name.
Based on the above technical solution, step 1 is realized by shell script.
Based on the above technical solution, in step 2, the mode periodically grabbed refers to:
The update cycle of program and the priority of program are preset,
The update cycle of the priority height of program then corresponding program is short,
The update cycle of the low then corresponding program of the priority of program is long,
The range of choice of the update cycle of program are as follows: 1 hour to 1 week;
Grasping manipulation is periodically carried out according to the update cycle of program.
Based on the above technical solution, described for demarcating the text envelope of the video content of the video source in step 2 Breath includes at least: director, protagonist, age, program category, area, programme contribution number, single collection duration, alias and brief introduction.
Based on the above technical solution, in step 3, crawl historical record is formed and with list of meta data metadata The form of list stores;
It include that at least one metadata is recorded in list of meta data metadata list;
Every metadata record at least stores the following contents: programm name, for demarcating the video content of the video source Text information.
Based on the above technical solution, in step 3, the formation grabs historical record, the specific steps are as follows:
Judge whether the programm name of the video source newly grabbed is present in crawl historical record,
If it does not exist, then the video source that a metadata record storage newly grabs is created;
If existing, the video source that a newly-built metadata record storage newly grabs, after be held after the completion of grasping manipulation Row step 4.
Based on the above technical solution, in step 4, the specific steps are as follows:
Similarity mode is carried out to the identical metadata record of programm name in crawl historical record:
The text of the video content for demarcating the video source in two metadata record identical to programm name Information is carried out similarity mode item by item;
The result of comprehensive items similarity mode;
If similarity meets or exceeds criterion, then it is assumed that the video source newly grabbed and metadata record in deposit earlier The video source of storage is the same program;
Text information element according to the text information element of the video source newly grabbed, in completion metadata record;
If similarity is not up to criterion, then it is assumed that the video source newly grabbed is wrong source, should give exclusion, will newly grab Video source as new program.
Based on the above technical solution, the similarity criterion are as follows:
Regard text information element as parameter in a vector respectively;
Similarity judgement is to be compared each parameter in above-mentioned vector respectively, obtains similar value, then by similar value It is added and obtains the similarity of the text information of the video content for demarcating the video source;
The similar value is normalized with similarity, finally obtained for demarcating in the video of the video source The form of the similarity of the text information of appearance percentage indicates.
Method is checked in wrong source of the present invention based on video text message, when grabbing video playing link, is based on Text information carries out video grabber misarrangement, solves the problems, such as " of the same name not homologous " occurred in TV internet video aggregated application.
Detailed description of the invention
The present invention has following attached drawing:
Check flow chart in Fig. 1 mistake source;
Fig. 2 historical record forms flow chart;
The structural schematic diagram of Fig. 3 historical record.
Specific embodiment
Below in conjunction with attached drawing, invention is further described in detail.
As shown in Figures 1 to 3, method is checked in the wrong source of the present invention based on video text message, is included the following steps:
Step 1, each video website is searched for by programm name (also known as title), determines whether each video website has correspondence The video source of the programm name;
Step 2, video playing link is grabbed from the corresponding video website of video source: by the way of periodically grabbing, At least grab the following contents:
Video playing corresponding with video source links,
For demarcating the text information of the video content of the video source;
Step 3, the crawl result of storing step 2 forms crawl historical record (referred to as historical record);
Step 4, the text information in crawl historical record is analyzed, finds out wrong source, and accordingly delete crawl history The corresponding crawl result in mistake source in record;
Step 5, the crawl historical record in the error-free source eventually formed according to step 4, by agreement television convergence, with The mode of program presents and realizes broadcasting.
Based on the above technical solution, in step 1, the programm name includes but is not limited to: TV play title or Movie name.
Further, programm name can be some or certain several keywords in TV play title or movie name.
Further, programm name can be simplified form of Chinese Character, Chinese-traditional, Korean, Japanese or English.
Based on the above technical solution, step 1 is realized by shell script.Wherein:
Video site list comprising default in shell script, each video website of described search are the video website by default List scans for one by one;
The video site list of the default is stored in shell script;
And/or: it include customized video site list in shell script, each video website of described search is by customized Video site list scan for one by one;
The local of equipment where the customized video site list is stored in shell script;
And/or: it include the video site list in cloud in shell script, each video website of described search is the view by cloud Frequency site list scans for one by one;
The video site list in the cloud is stored in one or more Cloud Servers.
Based on the above technical solution, in step 2, the mode periodically grabbed refers to:
The update cycle of program and the priority of program are preset,
The update cycle of the priority height of program then corresponding program is short,
The update cycle of the low then corresponding program of the priority of program is long,
The range of choice of the update cycle of program are as follows: 1 hour to 1 week;
Grasping manipulation is periodically carried out according to the update cycle of program.
Wherein;
The priority of program is ranked up according to the retrieval frequent degree of recent programm name;
It is described in the recent period include but is not limited to: the same day, it is three days nearest, nearest one week or one month nearest;
And/or: the priority of program is ranked up according to the issuing date distance of program;
And/or: the priority of program is ranked up according to user's history viewing record;
The content recorded in the user's history viewing record includes but is not limited to: the duration of user's viewing and the day of viewing Time phase, the type of user's viewing, the production company of user's viewing, the director of user's viewing or user's viewing protagonist;
Preferably, at least should include user's viewing duration and viewing date-time, according to the date of viewing Time, which calculates, learns viewing on weekdays or weekend, is working day or weekend further according to the same day, in conjunction with user's viewing when It is long, it is ranked up according to the duration of program.
Based on the above technical solution, described for demarcating the text envelope of the video content of the video source in step 2 Breath includes at least: director, protagonist, age, program category, area, programme contribution number, single collection duration, alias and brief introduction.
Further, described " director, protagonist, age, program category, area, programme contribution number, single collection duration, alias and letter It is situated between " it is text information element (the text information element in the text information of the video content for demarcating the video source), such as Wherein some or certain several text information element elements lack, then the text information element of the missing is left a blank, or is filled with "None" word, Or " missing " word etc. is filled with to show difference.
Based on the above technical solution, in step 3, crawl historical record is formed and with list of meta data metadata The form of list stores;
It include that at least one metadata is recorded in list of meta data metadata list, it may be assumed that several metadata notes Record constitutes crawl historical record of the present invention;
Every metadata record at least stores the following contents: programm name (title), for demarcating the view of the video source The text information of frequency content.
Metadata definition: about the information of the tissue of data, data field and its relationship, in short, metadata be exactly about The data of data.
Another embodiment are as follows: every metadata record storage the following contents: programm name (title), video playing Link, the text information of the video content for demarcating the video source.Need: how video playing link, which is handled, is not Present invention key content to be protected, therefore the content for being related to video playing link is no longer described in detail.
Based on the above technical solution, in step 3, the formation grabs historical record, the specific steps are as follows:
Judge whether the programm name of the video source newly grabbed is present in crawl historical record,
If it does not exist, then the video source that a metadata record storage newly grabs is created;
If existing, the video source that a newly-built metadata record storage newly grabs, after be held after the completion of grasping manipulation Row step 4.
Based on the above technical solution, in step 4, the specific steps are as follows:
Similarity mode is carried out to the identical metadata record of programm name in crawl historical record:
The text of the video content for demarcating the video source in two metadata record identical to programm name Information is carried out similarity mode item by item;
The result of comprehensive items similarity mode;
If similarity meets or exceeds criterion, then it is assumed that the video source newly grabbed and metadata record in deposit earlier The video source of storage is the same program;
Text information element according to the text information element of the video source newly grabbed, in completion metadata record;
If similarity is not up to criterion, then it is assumed that the video source newly grabbed is wrong source, should give exclusion, will newly grab Video source as new program.
Based on the above technical solution, the similarity criterion are as follows:
By text information element (guidance drills, acts the leading role, the age, program category, area, programme contribution number, single collection duration, alias, Brief introduction) regard parameter in a vector as respectively;
Similarity judgement is to be compared each parameter in above-mentioned vector respectively, obtains similar value, then by similar value It is added and obtains the similarity of the text information of the video content for demarcating the video source;
The similar value is normalized with similarity, finally obtained for demarcating in the video of the video source The form of the similarity of the text information of appearance percentage indicates.
Based on the above technical solution, each parameter be compared respectively mainly have three ways, such as it is following:
Mode 1: also referred to as discrete class Boolean type compares, if to refer to that the parameter to compare only exists identical or not for which With two kinds as a result, the similar value then provided is only there are two types of value;
Citing: if the director for two programs that compare is identical, otherwise it is 0 that the similar value of " director " this parameter, which is 1,; It is merely illustrative, according to algorithm operational effect, when comparison result is not identical, 0 will not be generally taken, may not also take 1 when identical, but It is that there are two types of values for inevitable of comparison result;
Mode 2: also referred to as continuity type compares, and when which refers to the parameter difference to compare, is normalized and reflects Processing is penetrated, similar value is some value on [0,1];
The normalized frequently with method have Method of Cosine, sigmoid function, index method;
Citing: if the age for two programs that compare is identical, output is 1, provides one according to Method of Cosine if not identical A similar value;Such as the metadata record middle age in historical record on behalf of 2016, then the age information in new storage information It is bigger closer to 2016 Nian Zeqi similarity values, such as it is 0.2 that similar value in 2015, which is 0.9,2000,;
Mode 3: also referred to as simhash type compares, and which refers to for rich text information, using well known Simhash method obtains the cryptographic Hash of two rich text information first, then calculates the Hamming distances of cryptographic Hash, last basis The Hamming distances are obtained similar value as normalized by the digit of cryptographic Hash;
Citing: cryptographic Hash is calculated separately to the brief introduction of two programs A and B, it is assumed that be expressed as hashA=with 6 110001, hashB=101011;The then Hamming distances of two cryptographic Hash are as follows: hamingD (hashA, hashB)=count_1 (A Xor B)=count_1 (100001)=2.The value range of Hamming distances is relevant to the digit of cryptographic Hash, therefore can be with Normalized is made to the distance, which can simply state are as follows: when Hamming distances are 6, similar value is 1,;For 0 phase It is 0 like value, is quantified when other values using being uniformly distributed on the section [0, maxbit (hash)].In this example, 2 similar value Are as follows:
1*bit [2, maxbit (hash)]/count [0, maxbit (hash)]=1*bit [2,6]/count [0,6]= 1*3/7=0.43
By the calculating of three of the above mode, after obtaining two vectors relatively after the similar value of each parameter, by items multiplied by Weight factor is simultaneously added, and obtains final similarity.The experience that the weight factor is accumulated from long campaigns video traffic;
Citing: the weight factor of director is 0.2, and the weight factor of protagonist is 0.3, it will be understood that, two same reputation and integrity Mesh, and (compared to direct it is identical) act the leading role it is identical its it is similar a possibility that it is bigger.Because of the opposite protagonist of element number in director's set For to lack.The inference is a kind of a kind of possibility retrodicted out from result, is really that situation should be much more complex.
The forming process of historical record is described in detail below by way of citing (which includes similarity comparison processes).The act Content described in step 3 and step 4 in the corresponding specific embodiment of example.
If an existing record is as follows in historical record:
Metadata_ORG{
Programm name: Hero Shooting Vulture,
Director: Li Guoli,
It acts the leading role: (actor1: Lin Yichen, actor2: Hu Ge ...),
Age: 2008,
Program category: (tag1: swordsman),
Area: China's Mainland,
Programme contribution number: null,
Single collection duration: null,
Alias: (name1:08 editions are penetrated carving),
Brief introduction: Southern Song Dynasty's period, monarch ...)
};
If the metadata record that crawl two is newly put in storage:
Metadata1{
Programm name: Hero Shooting Vulture,
Director: Li Tiansheng,
It acts the leading role: (actor1: Zhang Zhilin, actor2: Zhu Yin ...),
Age: 1994,
Program category: (tag1: swordsman),
Area: Hong Kong,
Programme contribution number: 35,
Single collection duration: null,
Alias: (name1: Hero Shooting Vulture),
Brief introduction: story occurs ...)
};
Metadata2{
Programm name: Hero Shooting Vulture,
Director: Li Guoli,
It acts the leading role: (actor1: Hu Ge, actor2: Lin Yichen ...),
Age: 2007,
Program category: (tag1: swordsman, tag2: love, tag3: ancient costume),
Area: China's Mainland,
Programme contribution number: 50,
Single collection duration: 43,
Alias: (name1: new Hero Shooting Vulture, name2:08 editions are penetrated carving),
Brief introduction: Southern Song Dynasty's period, monarch ...)
};
Calculating process is as follows:
Step 1 is tabled look-up in historical record finds that metadata1 is identical as the programm name of metadata_ORG, then starts Calculate the similarity degree of two records.
Step 2 regards two records as two vectors comprising several parameters, and parameter " is led in calculating metadata1 first Drill " value " Li Tiansheng " and metadata_ORG in parameter " director " value " Li Guoli " between similar value.Assuming that the parameter Usage mode 1 calculates (discrete class Boolean type compares), since director is different, calculated result 0.1;And so on, according to parameter Type selects the similar value of each parameter of one of three kinds of modes calculating.Assuming that finally obtaining following similar value calculated result:
SimVector1=(director: 0.1, it acts the leading role: 0.1, the age: 0.2, program category: 0.7, area: 0.2, programme contribution Number: null, single to collect duration: null, alias: 0.1, brief introduction: 0.8)
Items in simVector1 multiplied by weight factor and are added by step 3.Weight factor can also be regarded as one to Amount, it is assumed that weight factor vector are as follows:
WeightVector (director: 0.2. is acted the leading role: 0.3, the age: and 0.05, program category: 0.1, area: 0.1, programme contribution Number: 0.05, single duration that collects: 0.05, alias: 0.05, brief introduction: 0.1) then final similarity:
SimValue1=simVector1*weightVector=0.1*0.2+0.1*0.3+0.2*0. 05+0.7*0.1+ 0.2*0.1+0*0.05+0*0.05+0.1*0.05+0.8*0.1=0.235
Step 4 judges whether similarity reaches criterion.If criterion is similarity less than 0.5, then it is assumed that be wrong Source, then metadata1 is judged as wrong source, should give record where excluding metadata_ORG, and a newly-built record “metadata1”。
Step 5 is tabled look-up in historical record finds that metadata2 is identical as the programm name of metadata_ORG, then starts Calculate the similarity degree of two records.
Step 6 repeats the above steps 2-3, it is assumed that the similarity simValue2=0.65 being calculated at this time
Step 7 judges whether similarity reaches criterion.If criterion is similarity less than 0.5, then it is assumed that be wrong Source, then metadata2 is judged as homologous, at this time according to content completion metadata_ of the update rule in metadata2 Content in ORG.
It summarizes: as can be seen that metadata2 and metadata_ORG describe same program, but it is not all Parameter is all identical as in historical record, such as age, performer's sequence, partial parameters incompleteness.Just by similarity calculation at this time It both can find out similarity with higher, and with the information completion metadata_ORG in metadata2;, when similarity not Reach criterion (metadata1) and then thinks that the video source of text information calibration for wrong source, should give exclusion.
The content being not described in detail in this specification belongs to the prior art well known to professional and technical personnel in the field.

Claims (7)

1. method is checked in a kind of wrong source based on video text message, which comprises the steps of:
Step 1, each video website is searched for by programm name, determines whether each video website has the video of the corresponding programm name Source;
Step 2, video playing link is grabbed from the corresponding video website of video source: by the way of periodically grabbing, at least Grab the following contents: video playing corresponding with video source links,
For demarcating the text information of the video content of the video source;
Step 3, the crawl result of storing step 2 forms crawl historical record;Wherein, the formation grabs historical record, specifically Steps are as follows: whether the programm name for the video source that judgement newly grabs is present in crawl historical record, if it does not exist, then newly Build the video source that a metadata record storage newly grabs;If existing, a newly-built metadata record storage is newly grabbed The video source taken executes step 4 after the completion of grasping manipulation;
Step 4, the text information in crawl historical record is analyzed, finds out wrong source, and accordingly delete crawl historical record The corresponding crawl result in middle mistake source;Wherein, the identical metadata record of programm name in crawl historical record is carried out similar Degree matching: the text envelope of the video content for demarcating the video source in two metadata record identical to programm name Breath is carried out similarity mode item by item;The result of comprehensive items similarity mode;If similarity meets or exceeds criterion, Then think that the video source newly grabbed and the video source stored earlier in metadata record are the same program;According to what is newly grabbed The text information element of video source, completion metadata record in text information element;If similarity is not up to criterion, Then think that the video source newly grabbed for wrong source, should give exclusion, using the video source newly grabbed as new program;
Step 5, the crawl historical record in the error-free source eventually formed according to step 4, by agreement in television convergence, with program Mode present and realize broadcasting.
2. method is checked in the wrong source based on video text message as described in claim 1, it is characterised in that: described in step 1 Programm name includes but is not limited to: TV play title or movie name.
3. method is checked in the wrong source based on video text message as described in claim 1, it is characterised in that: step 1 passes through foot This program is realized.
4. method is checked in the wrong source based on video text message as described in claim 1, it is characterised in that: described in step 2 The mode periodically grabbed refers to: presetting the update cycle of program and the priority of program, the priority Gao Ze of program The update cycle of corresponding program is short, and the update cycle of the low then corresponding program of the priority of program is long, the update cycle of program Range of choice are as follows: 1 hour to 1 week;
Grasping manipulation is periodically carried out according to the update cycle of program.
5. method is checked in the wrong source based on video text message as described in claim 1, it is characterised in that: described in step 2 Text information for demarcating the video content of the video source includes at least: director, protagonist, age, program category, area, section Mesh collection number, single collection duration, alias and brief introduction.
6. method is checked in the wrong source based on video text message as described in claim 1, it is characterised in that: in step 3, formed It grabs historical record and is stored in the form of list of meta data metadata list;
It include that at least one metadata is recorded in list of meta data metadata list;
Every metadata record at least stores the following contents: programm name, the text of the video content for demarcating the video source This information.
7. method is checked in the wrong source based on video text message as described in claim 1, it is characterised in that: the similarity is sentenced Calibration is quasi- are as follows: regards text information element as parameter in a vector respectively;
Similarity judgement is to be compared each parameter in above-mentioned vector respectively, obtains similar value, then similar value is added Obtain the similarity of the text information of the video content for demarcating the video source;
The similar value is normalized with similarity, finally obtained for demarcating the video content of the video source The form of the similarity of text information percentage indicates.
CN201610554564.9A 2016-07-14 2016-07-14 A kind of wrong source investigation method based on video text message Active CN106131582B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610554564.9A CN106131582B (en) 2016-07-14 2016-07-14 A kind of wrong source investigation method based on video text message

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610554564.9A CN106131582B (en) 2016-07-14 2016-07-14 A kind of wrong source investigation method based on video text message

Publications (2)

Publication Number Publication Date
CN106131582A CN106131582A (en) 2016-11-16
CN106131582B true CN106131582B (en) 2019-09-03

Family

ID=57282671

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610554564.9A Active CN106131582B (en) 2016-07-14 2016-07-14 A kind of wrong source investigation method based on video text message

Country Status (1)

Country Link
CN (1) CN106131582B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108989890B (en) * 2018-09-03 2021-01-26 四川长虹电器股份有限公司 Audio and video source fault troubleshooting method based on GStreamer frame
CN110868619A (en) * 2019-11-28 2020-03-06 湖南快乐阳光互动娱乐传媒有限公司 Global video playing record aggregation method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103685420A (en) * 2012-09-24 2014-03-26 华为技术有限公司 Method, server and system for media file duplication removal
CN103702168A (en) * 2013-12-12 2014-04-02 乐视网信息技术(北京)股份有限公司 Method of displaying video list and video client
CN105718524A (en) * 2016-01-15 2016-06-29 合一网络技术(北京)有限公司 Method and device for determining video originals

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080065693A1 (en) * 2006-09-11 2008-03-13 Bellsouth Intellectual Property Corporation Presenting and linking segments of tagged media files in a media services network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103685420A (en) * 2012-09-24 2014-03-26 华为技术有限公司 Method, server and system for media file duplication removal
CN103702168A (en) * 2013-12-12 2014-04-02 乐视网信息技术(北京)股份有限公司 Method of displaying video list and video client
CN105718524A (en) * 2016-01-15 2016-06-29 合一网络技术(北京)有限公司 Method and device for determining video originals

Also Published As

Publication number Publication date
CN106131582A (en) 2016-11-16

Similar Documents

Publication Publication Date Title
US10262045B2 (en) Application representation for application editions
US20220318324A1 (en) Content Recommendation System With Weighted Metadata Annotations
US10860860B1 (en) Matching videos to titles using artificial intelligence
US11055407B2 (en) Distribution-based analysis of queries for anomaly detection with adaptive thresholding
AU2016202425B2 (en) Automatically recommending content
US8005841B1 (en) Methods, systems, and products for classifying content segments
US8260117B1 (en) Automatically recommending content
KR101629338B1 (en) Just-in-time distributed video cache
WO2017107453A1 (en) Video content recommendation method, device, and system
US20090300008A1 (en) Adaptive recommender technology
US8869205B2 (en) Method and system for video collection management, invalid video replacement and continuous video replay
US20120158527A1 (en) Systems, Methods and/or Computer Readable Storage Media Facilitating Aggregation and/or Personalized Sequencing of News Video Content
WO2010144671A2 (en) Trend analysis in content identification based on fingerprinting
CN102438004B (en) Method and system for acquiring metadata information of media file and multimedia player
CN104063448A (en) Distributed type microblog data capturing system related to field of videos
CN112653908B (en) Intelligent television media asset real-time recommendation method
CN106131582B (en) A kind of wrong source investigation method based on video text message
WO2020252783A1 (en) Asset metadata service
US20110252455A1 (en) Method and System for Comparing Media Assets
Mukherjee et al. A context-aware recommendation system considering both user preferences and learned behavior
US10893312B2 (en) Digital content provision
EP3726845A1 (en) System and method for electronic program guide data searching
Xu et al. Time dependency in TV viewer clustering.
Warren et al. Metadata Independent Hashing for Media Identification & P2P Transfer Optimisation
Teng et al. Rating prediction algorithm and recommendation based on user beahavior in IPTV

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20170814

Address after: 100039, Yongding Road, Beijing, No. 3, floor 51, 303, Haidian District

Applicant after: CASICLOUD-TECH CO.,LTD.

Address before: 100098 No. 1, building 17, building 2, Wanshou temple, Haidian District, Beijing, No. 35

Applicant before: Xu Shan

GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20221219

Address after: 100144 1206, Floor 12, Building 7, Yard 49, Badachu Road, Shijingshan District, Beijing

Patentee after: BEIJING CASICLOUD CO.,LTD.

Address before: 100039 303, 3 / F, No.51, Yongding Road, Haidian District, Beijing

Patentee before: CASICLOUD-TECH CO.,LTD.