CN101944111B - Method and device for searching news video - Google Patents

Method and device for searching news video Download PDF

Info

Publication number
CN101944111B
CN101944111B CN2010102801754A CN201010280175A CN101944111B CN 101944111 B CN101944111 B CN 101944111B CN 2010102801754 A CN2010102801754 A CN 2010102801754A CN 201010280175 A CN201010280175 A CN 201010280175A CN 101944111 B CN101944111 B CN 101944111B
Authority
CN
China
Prior art keywords
news video
website
news
video
url
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN2010102801754A
Other languages
Chinese (zh)
Other versions
CN101944111A (en
Inventor
朱明�
尹文科
崔昊旻
李自勉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ANHUI GUANGXING COMMUNICATION TECHNOLOGY Co Ltd
Original Assignee
University of Science and Technology of China USTC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology of China USTC filed Critical University of Science and Technology of China USTC
Priority to CN2010102801754A priority Critical patent/CN101944111B/en
Publication of CN101944111A publication Critical patent/CN101944111A/en
Application granted granted Critical
Publication of CN101944111B publication Critical patent/CN101944111B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a method and a device for searching a news video. The method mainly comprises the following steps: constructing body knowledge for searching news video websites based on semantic association information, and searching out the news video website from the Internet by using the body knowledge; evaluating the news video website in time, and setting the pick-up time interval of the news video website by utilizing the in-time evaluation result; and picking up the contents in the news video website in time through the set searching method by utilizing the pick-up time interval of the news video website, and acquiring the news video in the contents. The invention effectively solves the problems in automatic, accurate and timely searching and integration of the Internet news video, can quickly and accurately identify the news video website, and can automatically find and integrate the news video in time.

Description

The searching method of news video and device
Technical field
The present invention relates to the Computer Applied Technology field, relate in particular to a kind of searching method and device of news video.
Background technology
In order to support the professional evolution of the integration of three networks, need research how based on the terminal device of resource-constrained, support to carry out more television services, relatively attract spectators' news especially at present in the television services.How to make televiewer's news that can teleview at any time, enjoy the personalization of TV news and the service of special topicization, become the problem that merits attention under the integration of three networks background.
The method of a kind of Web page subject identification of the prior art and Web page information extraction mainly comprises: on the basis that Web page subject is analyzed, merge into a virtual page to all webpages of website, adopt the words-frequency feature vector to carry out websites collection.Adopt vector space model, utilize the distance between vector to carry out the website subject analysis, adopt theme frequency vector to describe the theme feature of website, come the weights of corresponding definite vector element according to the webpage number that comprises each theme in the website.In addition, the internal links structure of website usually is regarded as a kind of tree or graph structure of level.For example: physics and logical connection structure according to website merge Web page subject, thereby confirm the website theme.
Then, utilize artificial constructed information extraction system, have the information extraction system of supervision, semi-supervised information extraction system and unsupervised information extraction system to carry out Web page information extraction.
In realizing process of the present invention; The inventor finds that there is following problem at least in the method for above-mentioned Web page subject identification of the prior art and Web page information extraction: need carry out complicated statistics and analysis to the whole link structure of website; In the face of the network size that increases fast, applicability has much room for improvement.Can't identify the news video website quickly and accurately, also can't find automatically, in time and integrated news video.
Summary of the invention
Embodiments of the invention provide a kind of searching method and device of news video, to realize automatically, accurately and in time to find and integrated news video.
A kind of searching method of news video comprises:
Based on the ontology knowledge of semantic association information architecture search news video website, utilize said ontology knowledge from the internet, to search out the news video website;
The evaluation of promptness is carried out in said news video website, utilize the assessment result of said promptness to set the time interval of picking up of said news video website;
Utilize the time interval of picking up of said news video website, pick up the content in the said news video website in real time, obtain the news video in the said content through the searching method of setting.
A kind of searcher of news video comprises:
News video site search module is used for the ontology knowledge based on semantic association information architecture search news video website, utilizes said ontology knowledge from the internet, to search out the news video website;
Pick up time interval setting module, be used for the evaluation of promptness is carried out in the news video website that said news video site search module searches for out, utilize the assessment result of said promptness to set the time interval of picking up of said news video website;
The news video acquisition module; Be used to utilize the said time interval of picking up of picking up news video website that time interval setting module sets; Searching method through setting is picked up the content in the said news video website in real time, obtains the news video in the said content.
Technical scheme by the embodiment of the invention described above provides can be found out; The embodiment of the invention has solved the internet news video effectively and has searched for automatically, accurately, timely and integrated problem; Can identify the news video website quickly and accurately, can find automatically, in time and integrated news video.
Description of drawings
In order to be illustrated more clearly in the technical scheme of the embodiment of the invention; The accompanying drawing of required use is done to introduce simply in will describing embodiment below; Obviously, the accompanying drawing in describing below only is some embodiments of the present invention, for those of ordinary skills; Under the prerequisite of not paying creative work property, can also obtain other accompanying drawing according to these accompanying drawings.
The principle schematic of the searching method of a kind of news video that Fig. 1 provides for the embodiment of the invention one;
The processing flow chart of the searching method of a kind of news video that Fig. 2 provides for the embodiment of the invention one;
The structure principle schematic of a kind of ontology knowledge that Fig. 3 provides for the embodiment of the invention one;
The processing flow chart of a kind of website subject identifying method that Fig. 4 provides for the embodiment of the invention one;
A kind of concrete processing flow chart that ontology knowledge is carried out new url generation power, degree of subject relativity evaluation that Fig. 5 provides for the embodiment of the invention one;
A kind of processing flow chart that the news video website of storing in the news video database is carried out the promptness evaluation that Fig. 6 provides for the embodiment of the invention one;
A kind of processing flow chart that the news video website of storing in the news video database is carried out the novelty evaluation that Fig. 7 provides for the embodiment of the invention one;
A kind of processing flow chart that the news video website of storing in the news video database is carried out original evaluation that Fig. 8 provides for the embodiment of the invention one;
The processing flow chart of a kind of content-based duplicate detection technology that Fig. 9 provides for the embodiment of the invention one;
A kind of processing flow chart of picking up the content of the news video website of storing in the news video database in real time that Figure 10 provides for the embodiment of the invention one;
The structural representation of the searcher of a kind of news video that Figure 11 provides for the embodiment of the invention two.
Embodiment
In embodiments of the present invention, based on the ontology knowledge of semantic association information architecture search news video website, utilize said ontology knowledge from the internet, to search out the news video website.The evaluation of promptness is carried out in said news video website, utilize the assessment result of said promptness to set the time interval of picking up of said news video website.Then, utilize the time interval of picking up of said news video website, pick up the content in the said news video website in real time, obtain the news video in the said content through the searching method of setting.
For ease of the understanding to the embodiment of the invention, will combine accompanying drawing below is that example is done further and explained with several specific embodiments, and each embodiment does not constitute the qualification to the embodiment of the invention.
Embodiment one
The principle schematic of the searching method of a kind of news video that this embodiment provides is as shown in Figure 1, and the concrete treatment scheme of the searching method of this news video is as shown in Figure 2, comprises following treatment step:
Step 21, based on the ontology knowledge of semantic association information architecture search news video website; Utilize above-mentioned ontology knowledge, first search technique and website subject identifying method from the internet, to search out the news video website, and with the news video web site stores in the news video site databases.
At first, utilize the news video data in advance of small quantities of seed website to set up the news video database, the descriptor of each news video of storage and each news video in this news video database.Above-mentioned seed website comprises websites such as " www.xinhuanet.com's news ", " rising fast net news ".
In embodiments of the present invention, also to set up the news video site databases in advance, each news video website of storage in this news video site databases, and the evaluation information of each news video website, pick up information such as time interval.
Ontology knowledge based on semantic association information architecture search news video website.The structure principle schematic of this ontology knowledge is as shown in Figure 3.Above-mentioned semantic association information spinner will comprise: the searching key word that search engine itself provides, search for the content keyword of the news video website of discovery, search for the content institutional framework keyword of the news video website of discovery and the content description keyword of having searched for the news video website of discovery.The content keyword of above-mentioned news video website comprises: the keyword in the title of the content of news video website, the content description keyword of above-mentioned news video website comprises: the focus video title.Therefore, mainly comprise four kinds of keywords in the above-mentioned ontology knowledge, i.e. searching key word, content keyword, content institutional framework keyword and content description keyword.
To each keyword in the above-mentioned ontology knowledge; Utilize the searching request of first search technique structure to the search engine in the internet; The Search Results that the above-mentioned search engine of extraction setting quantity returns; Extract the URL (Universal Resource Locator, URL) that comprises in the return results.Identify the URL of the news video website that comprises among the above-mentioned URL through the website subject identifying method.
Treatment scheme such as Fig. 4 of a kind of above-mentioned website subject identifying method that this embodiment provides are said, and concrete processing procedure mainly comprises:
At first utilize the pattern information of the URL that comprises in the above-mentioned return results, like the information such as length, the degree of depth and form of URL, using technology such as decision tree or rule set to identify above-mentioned URL is website URL or webpage URL.
For each the website URL that identifies; Grasp all webpages in the ground floor of website; Utilize the broadcast page recognition technology to calculate the ratio of the video playback page or leaf in above-mentioned all webpages; If this ratio, thinks then that this website URL is irrelevant with news video website theme, gets rid of this website URL less than predefined video playback page or leaf threshold value; Otherwise, think that above-mentioned website URL is relevant with news video website theme.
Utilize the corresponding literal (anchor literal) that links of video playback page or leaf in the above-mentioned website relevant that the news video database of setting up is in advance carried out fuzzy query, count total analog result number with news video website theme.Calculate average every corresponding analog result number of link literal,, think that then this website and news video website theme are irrelevant if this analog result number is counted threshold value less than predefined analog result; Otherwise, think that above-mentioned website URL is relevant with news video website theme, promptly identifying above-mentioned website is the news video website.
Then, with the news video web site stores that identifies in the news video site databases of setting up in advance.
In embodiments of the present invention; The news video website that can also utilize above-mentioned website subject identifying method to be identified; The ontology knowledge of above-mentioned structure is carried out the evaluation that new url produces power, degree of subject relativity two aspects; The concrete processing flow chart of this evaluation procedure is as shown in Figure 5, mainly comprises following process:
To each keyword in the above-mentioned ontology knowledge, utilize the searching request of first search technique structure to the search engine in the internet, the Search Results that the above-mentioned search engine of extraction setting quantity returns extracts the URL that comprises in the return results.
Obtain the URL of the news video website that comprises among the above-mentioned URL through the website subject identifying method; The quantity of calculating the URL of this news video website accounts for the ratio of the total quantity of the URL that comprises in the above-mentioned return results; If this ratio is less than predetermined subject degree of correlation threshold value; Think that then the theme of this keyword and news video website is irrelevant, this keyword is weeded out from above-mentioned ontology knowledge; Otherwise, think that this keyword is relevant with the theme of news video website.Continuation is carried out the relevant evaluation of new url generation power to this keyword.
In the news video site databases, search the URL of above-mentioned all news video websites of identifying; The quantity that calculates the URL of the news video website that is not included in the news video site databases according to lookup result accounts for the ratio between the total quantity of URL of above-mentioned news video website; If this ratio produces capacity threshold less than predefined new url; Think that then this keyword does not have new url and produces ability, this keyword is weeded out from above-mentioned ontology knowledge; Otherwise, think that this keyword has topic relativity and new url produces ability.
In general, it is better that above-mentioned website degree of subject relativity threshold value and new url generation capacity threshold all is made as 0.1 effect.
Step 22, promptness, novelty and original evaluation are carried out in the news video website of storing in the news video database, utilize the promptness assessment result of news video website to set the time interval of picking up of news video website.
The news video website of storing in the news video database is carried out the evaluation of promptness, novelty and original three aspects.
This embodiment provides, and a kind of that the treatment scheme that promptness estimates is carried out in the news video website of storing in the news video database is as shown in Figure 6, and concrete processing procedure comprises:
Obtain the news video on the same day of some in the above-mentioned seed website, the news video database is carried out fuzzy query according to the news video on the above-mentioned same day.The news video quantity similar with the news video above-mentioned same day that comprise in each news video website in the statistics news video database, a plurality of similar news video that belongs to same news video website that same news video searches out only writes down once.
Descending sort is carried out by the news video quantity similar with the news video above-mentioned same day that comprise in all news video websites; Rank preceding 10% be made as 5 minutes, rank 10%~30% be made as 4 minutes, rank 30~70% be made as 3 minutes, being made as 2 fens of rank 70%~90%; The last 10% be made as 1 fen is that 0 news video website directly was made as 0 fen for the news video quantity similar with the news video above-mentioned same day that comprise in addition.
At last, the promptness evaluation result of above-mentioned each news video website is deposited in the news website database, as the tolerance foundation of the content promptness of each news video website.
Utilize the promptness assessment result of news video website to set the time interval of picking up of news video website.According to the above-mentioned news video quantity similar with the news video on the said same day that comprise time interval of picking up of each news video website is set, the website that the news video quantity similar with the news video said same day that comprise is many is corresponding, and to pick up the time interval short.
A kind of feasible establishing method of picking up the time interval is: it is set in 5 minutes news video website of promptness score, and to pick up the time interval be 5 minutes; Being made as 10 minutes of score 4 minutes; Score 3 is divided into establishes 20 minutes; Being made as 40 minutes of score 2 minutes, being made as 80 minutes of score 1 minute, being made as 1 day of score 0 minute.
This embodiment provides, and a kind of that the treatment scheme that novelty estimates is carried out in the news video website of storing in the news video database is as shown in Figure 7, and concrete processing procedure comprises:
Utilize content-based duplicate detection technology that the news video that from each news video website, newly obtains is carried out cluster, from each cluster, select the discovery time comparison news video early of some to keep.Then, count total number of clicks of all news videos in each the news video website that remains, and then calculate the number of clicks of average each news video.
Number of clicks by above-mentioned average each news video is carried out descending sort to each news video website; Rank preceding 10% be made as 5 minutes, rank 10%~30% be made as 4 minutes, rank 30~70% be made as 3 minutes, being made as 2 fens of rank 70%~90%; The last 10% be made as 1 fen is that 0 news video website directly was made as 0 fen for average each video number of clicks in addition.
At last, the novelty evaluation result of above-mentioned each news video website is deposited in the news website database, as the tolerance foundation of the novelty of each news video website.
This embodiment provides, and a kind of that the original treatment scheme of estimating is carried out in the news video website of storing in the news video database is as shown in Figure 8, and concrete processing procedure comprises:
Utilize content-based duplicate detection technology that the news video that from each news video website, newly obtains is carried out cluster; From each cluster, select the discovery time comparison news video early of some to keep the follow-up news video of remaining news video.Count total video quantity and repeated quantity that each news video website comprises, and then calculate the repeated ratio of each news video website.All news video websites are arranged in the ascending order of repeated ratio; Rank preceding 10% be made as 5 minutes, rank 10%~30% be made as 4 minutes, rank 30~70% be made as 3 minutes, being made as 2 fens of rank 70%~90%; The last 10% be made as 1 fen is that 100% news video website directly was made as 0 fen for the repeated ratio in addition.
At last, the original evaluation result of above-mentioned each news video website is deposited in the news website database, as the tolerance foundation of the originality of each news video website.
The treatment scheme of a kind of above-mentioned content-based duplicate detection technology that this embodiment provides is as shown in Figure 9, and concrete processing procedure comprises as follows:
At first extract the key frame of video of the some of each news video; Use Harris (Harris) operator to detect angle point to each key frame of video; Utilize the proper vector of the angle point subregion of SIFT (conversion of yardstick invariant features) the above-mentioned key frame of video of latent structure, and utilize PCA (principal component analysis (PCA)) to reduce the dimension of above-mentioned proper vector.Between the key frame of video in twos of two news videos, use KNN (K arest neighbors) algorithm, nearest preceding K the proper vector of computed range is right; BIC (Bayes's information measure) algorithm is used for the characteristic value sequence X={x1 of an above-mentioned K proper vector to forming; X2 ..., the comparison of xN} (N=2K); If have trip point in the above-mentioned characteristic value sequence X sequence, judge that then two key frame of video do not repeat; Otherwise, judge that two key frame of video repeat.
Count the quantity of the key frame of video of two repetitions between the news video, the key frame of video that calculates repetition accounts for the ratio of total key frame of video, if greater than the key frame of video threshold value of setting, judge that then two news videos are repetitions; Otherwise, judge that two news videos do not repeat.
Step 23, utilize time interval of picking up of news video website, pick up the news video in the news video website in real time, the news video of picking up is deposited in the news video database through the searching method of setting.
The treatment scheme of a kind of content of picking up the news video website of storing in the news video database in real time that this embodiment provides is shown in figure 10, and concrete processing procedure is following:
At first from the news video site databases, obtain the URL and the promptness assessment result of each news video website; Utilize the promptness assessment result of news video website to set the time interval of picking up of news video website; A kind of feasible time interval establishing method of picking up is: it is set in 5 minutes news video website of promptness score, and to pick up the time interval be 5 minutes; Being made as 10 minutes of score 4 minutes, score 3 is divided into establishes 20 minutes, being made as 40 minutes of score 2 minutes; Being made as 80 minutes of score 1 minute, being made as 1 day of score 0 minute.
Judge successively according to certain arrangement sequence whether each news video website in the news video site databases has surpassed the corresponding time interval of picking up apart from the time interval of picking up when finishing last time; If surpass, then the content of a corresponding news video site promoter new round is picked up process; Otherwise, judge whether the time interval when end was picked up apart from last time in next website has surpassed the corresponding time interval of picking up.
For each news video website to be picked up, through the searching method of setting the content in the above-mentioned news video website to be picked up, the searching method of above-mentioned setting comprises: the methods such as BFS method that the degree of depth is limited.
Utilize the limited BFS method of the degree of depth that above-mentioned news video website is traveled through, concrete degree of depth restriction can be the constant of an overall situation, also can change with the difference of news video website.For each webpage that runs in the above-mentioned ergodic process; At first utilize the broadcast page recognition technology to judge whether it is the video playback page or leaf; Utilize webpage noise remove technology to remove the noise information that it comprises for the video playback page or leaf; The noise here comprises: ground unrest, random noise, and residual noise.With information remaining in the video playback page or leaf as news video.
Utilize above-mentioned content-based duplicate detection technology to carry out duplicate detection to this news video, the news video for duplicate detection is passed through utilizes the image quality that improves news video based on the inverse iteration sciagraphy in video compress territory.After utilizing existing instrument that news video is carried out the transcoding processing, obtain the news video of MP4 or FLV (FLV stream media format) encapsulation format.Then, news video and corresponding descriptor are deposited in the news video database.When end is picked up in the news video website, will deposit in the concluding time in the news video site databases.
News video in the above-mentioned news video site databases can use for the video on-demand system towards the TV news door.Can the description and the related information of news video be pushed to Portal (door) website.Behind user's STB (Set Top Box, STB) the visit Portal website, can see up-to-date news video tabulation, the user can browse the news video in the news video tabulation, order and program request.
Embodiment two
The structural representation of the searcher of a kind of news video that this embodiment provides is shown in figure 11, comprises following module:
News video site search module 11 is used for the ontology knowledge based on semantic association information architecture search news video website, utilizes said ontology knowledge from the internet, to search out the news video website;
News video website evaluation module 12 is used for the evaluation of promptness is carried out in the news video website that said news video site search module searches for out, utilizes the assessment result of said promptness to set the time interval of picking up of said news video website;
News video acquisition module 13; Be used to utilize the time interval of picking up of news video website that said news video website evaluation module sets; Searching method through setting is picked up the content in the said news video website in real time, obtains the news video in the said content.
The searcher of described news video can also comprise:
Ontology knowledge evaluation module 14; Be used for to each keyword of above-mentioned ontology knowledge; Utilize the searching request of first search technique structure to the search engine in the internet, the Search Results that the above-mentioned search engine of extraction setting quantity returns extracts the URL that comprises in the return results.
Obtain the URL of the news video website that comprises among the above-mentioned URL through the website subject identifying method; The quantity of calculating the URL of this news video website accounts for the ratio of the total quantity of the URL that comprises in the above-mentioned return results; If this ratio is less than predetermined subject degree of correlation threshold value; Think that then the theme of this keyword and news video website is irrelevant, this keyword is weeded out from above-mentioned ontology knowledge; Otherwise, think that this keyword is relevant with the theme of news video website.Continuation is carried out the relevant evaluation of new url generation power to this keyword.
In the news video site databases, search the URL of above-mentioned all news video websites of identifying; The quantity that calculates the URL of the news video website that is not included in the news video site databases according to lookup result accounts for the ratio between the total quantity of URL of above-mentioned news video website; If this ratio produces capacity threshold less than predefined new url; Think that then this keyword does not have new url and produces ability, this keyword is weeded out from above-mentioned ontology knowledge; Otherwise, think that this keyword has topic relativity and new url produces ability.
Described news video site search module 11 specifically can comprise:
Search module 111; Be used for each keyword to said ontology knowledge; Utilize the searching request of first search technique structure to the search engine in the internet; The Search Results that the said search engine of extraction setting quantity returns extracts the uniform resource position mark URL that comprises in the return results;
Identification module 112 is used for identifying through the website subject identifying method URL of the news video website that URL that said search module extracts comprises, with the news video web site stores that identifies at the news video site databases of setting up in advance.
Described news video website evaluation module 12 specifically can comprise:
Statistical module 121; Be used in seed website, obtaining the news video on the same day of some; News video according to the said same day is carried out fuzzy query to the news video database; The news video quantity similar with the news video said same day that comprise in each news video website in the statistics news video database deposits the evaluation result of this news video quantity as the promptness of news video website in the news video site databases in;
Setting module 122 is used for setting according to the said news video quantity similar with the news video on the said same day that comprise time interval of picking up of each news video website, and the news video website that news video quantity is many is corresponding, and to pick up the time interval short.
Described news video acquisition module 13 specifically can comprise:
Pick up module 131; Be used for when the news video website of news video site databases picked up apart from last time time when finishing surpassed said news video website pick up the time interval after, through the searching method of setting the content in the said news video website is picked up;
Identification module 132; Be used for utilizing the broadcast page recognition technology to judge whether it is the video playback page or leaf to each webpage of picking up from said news video website; After removing its noise information that comprises for the video playback page or leaf of judging, with the information of remainder as news video;
Detect and enforcement module 133; Be used for utilizing content-based duplicate detection technology to carry out duplicate detection to said news video; Utilization strengthens the quality of the news video that duplicate detection passes through based on the inverse iteration sciagraphy in video compress territory; Then, said news video and corresponding descriptor are deposited in the news video database.
Described news video website evaluation module 12 can also comprise:
Novelty evaluation module 123 is used for utilizing content-based duplicate detection technology that the news video that newly obtains from each news video website is carried out cluster, from each cluster, selects the discovery time comparison news video early of some to keep.Then, count total number of clicks of all news videos in each the news video website that remains, and then calculate the number of clicks of average each news video.
Set time interval of picking up of each news video website according to the said news video quantity similar with the news video on the said same day that comprise, the news video website that news video quantity is many is corresponding, and to pick up the time interval short.
Number of clicks by above-mentioned average each news video is carried out the novelty evaluation to each news video website; The novelty evaluation result of each news video website is deposited in the news website database, as the tolerance foundation of the novelty of each news video website.
Original evaluation module 124; Be used for utilizing content-based duplicate detection technology that the news video that newly obtains from each news video website is carried out cluster; From each cluster, select the discovery time comparison news video early of some to keep the follow-up news video of remaining news video.Count total video quantity and repeated quantity that each news video website comprises, and then calculate the repeated ratio of each news video website.
Repeated ratio in above-mentioned each news video website is carried out originality evaluation to each news video website; The original evaluation result of each news video website is deposited in the news website database, as the tolerance foundation of the originality of each news video website.
One of ordinary skill in the art will appreciate that all or part of flow process that realizes in the foregoing description method; Be to instruct relevant hardware to accomplish through computer program; Described program can be stored in the computer read/write memory medium; This program can comprise the flow process like the embodiment of above-mentioned each side method when carrying out.Wherein, described storage medium can be magnetic disc, CD, read-only storage memory body (Read-Only Memory, ROM) or at random store memory body (Random AccessMemory, RAM) etc.
In sum, the embodiment of the invention has solved the internet news video effectively and has searched for automatically, accurately, timely and integrated problem, can identify the news video website quickly and accurately, can find automatically, in time and integrated news video.
The embodiment of the invention proposes a kind of towards the internet news video search of TV news door and integrated system and method; Abundant and high-quality internet news video resource can be provided for the video on-demand system towards the TV news door, can necessary news video material and descriptor be provided for the TV news door.
The above; Be merely the preferable embodiment of the present invention, but protection scope of the present invention is not limited thereto, any technician who is familiar with the present technique field is in the technical scope that the present invention discloses; The variation that can expect easily or replacement all should be encompassed within protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion with the protection domain of claim.

Claims (6)

1. the searching method of a news video is characterized in that, comprising:
Ontology knowledge based on semantic association information architecture search news video website; To each keyword in the said ontology knowledge; Utilize the searching request of first search technique structure to the search engine in the internet; The Search Results that the said search engine of extraction setting quantity returns; Extract the uniform resource position mark URL that comprises in the said Search Results, identify the URL of the news video website that comprises among the said URL through the website subject identifying method, with the news video web site stores that identifies at the news video site databases of setting up in advance;
Obtain the news video on the same day of some in the seed website; News video according to the said same day is carried out fuzzy query to the news video database; The news video quantity similar with the news video said same day that comprise in each news video website in the statistics news video database deposits the evaluation result of said news video quantity as the promptness of news video website in the news video site databases in; According to said news video quantity time interval of picking up of each news video website is set, comprise the many websites of said news video quantity corresponding to pick up the time interval short;
Utilize the time interval of picking up of said news video website, pick up the content in the said news video website in real time, obtain the news video in the said content through the searching method of setting.
2. the searching method of news video according to claim 1; It is characterized in that described semantic association information comprises: the searching key word that search engine itself provides, search for the content keyword of the news video website of discovery, search for discovery the news video website content institutional framework keyword and searched for the content description keyword of the news video website of discovery.
3. the searching method of news video according to claim 1 is characterized in that, describedly identifies the URL of the news video website that comprises among the said URL through the website subject identifying method, comprising:
Utilizing the pattern information of the URL that comprises in the said Search Results to identify said URL is website URL or webpage URL;
For each the website URL that identifies; Grasp all webpages in the ground floor of website; Utilize the broadcast page recognition technology to calculate the ratio of the video playback page or leaf in said all webpages; If this ratio, thinks then that said website URL is irrelevant with news video website theme, gets rid of said website URL less than predefined video playback page or leaf threshold value; Otherwise, think that said website URL is relevant with news video website theme;
Utilize the corresponding literal that links of video playback page or leaf in the said website relevant that the news video database of setting up is in advance carried out fuzzy query with news video website theme; Count total analog result number; Calculate average every corresponding analog result number of link literal; If this analog result number is counted threshold value less than predefined analog result, think that then said website and news video website theme are irrelevant; Otherwise identifying said website is the news video website.
4. according to the searching method of each described news video of claim 1 to 3; It is characterized in that; The described time interval of picking up of utilizing said news video website, pick up the news video in the said news video website in real time through the searching method of setting, comprising:
When the news video website in the news video site databases picked up apart from last time time when finishing surpassed said news video website pick up the time interval after, through the searching method of setting the content in the said news video website is picked up;
Utilize the broadcast page recognition technology to judge whether it is the video playback page or leaf to each webpage of from said news video website, picking up, remove its noise information that comprises for the video playback page or leaf of judging after, with the information of remainder as news video;
Utilize content-based duplicate detection technology to carry out duplicate detection to said news video; Utilization improves the quality of the news video that duplicate detection passes through based on the inverse iteration sciagraphy in video compress territory; Then, said news video and corresponding descriptor are deposited in the news video database.
5. the searcher of a news video is characterized in that, comprising:
News video site search module is used for the ontology knowledge based on semantic association information architecture search news video website, utilizes said ontology knowledge from the internet, to search out the news video website;
Pick up time interval setting module, be used for the evaluation of promptness is carried out in the news video website that said news video site search module searches for out, utilize the assessment result of promptness to set the time interval of picking up of said news video website;
The news video acquisition module; Be used to utilize the said time interval of picking up of picking up news video website that time interval setting module sets; Searching method through setting is picked up the content in the said news video website in real time, obtains the news video in the said content;
Described news video site search module comprises:
Search module; Be used for each keyword to said ontology knowledge; Utilize the searching request of first search technique structure to the search engine in the internet, the Search Results that the said search engine of extraction setting quantity returns extracts the uniform resource position mark URL that comprises in the return results;
Identification module is used for identifying through the website subject identifying method URL of the news video website that URL that said search module extracts comprises, with the news video web site stores that identifies at the news video site databases of setting up in advance;
The described time interval setting module of picking up comprises:
Statistical module; Be used in seed website, obtaining the news video on the same day of some; News video according to the said same day is carried out fuzzy query to the news video database; The news video quantity similar with the news video said same day that comprise in each news video website in the statistics news video database deposits the evaluation result of this news video quantity as the promptness of news video website in the news video site databases in;
Setting module is used for setting according to the said news video quantity similar with the news video on the said same day that comprise time interval of picking up of each news video website, and the news video website that news video quantity is many is corresponding, and to pick up the time interval short.
6. the searcher of news video according to claim 5 is characterized in that, described news video acquisition module comprises:
Pick up module; Be used for when the news video website of news video site databases picked up apart from last time time when finishing surpassed said news video website pick up the time interval after, through the searching method of setting the content in the said news video website is picked up;
Identification module; Be used for utilizing the broadcast page recognition technology to judge whether it is the video playback page or leaf to each webpage of picking up from said news video website; After removing its noise information that comprises for the video playback page or leaf of judging, with the information of remainder as news video;
Detect and enforcement module; Be used for utilizing content-based duplicate detection technology to carry out duplicate detection to said news video; Utilization strengthens the quality of the news video that duplicate detection passes through based on the inverse iteration sciagraphy in video compress territory; Then, said news video and corresponding descriptor are deposited in the news video database.
CN2010102801754A 2010-09-09 2010-09-09 Method and device for searching news video Expired - Fee Related CN101944111B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2010102801754A CN101944111B (en) 2010-09-09 2010-09-09 Method and device for searching news video

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2010102801754A CN101944111B (en) 2010-09-09 2010-09-09 Method and device for searching news video

Publications (2)

Publication Number Publication Date
CN101944111A CN101944111A (en) 2011-01-12
CN101944111B true CN101944111B (en) 2012-05-23

Family

ID=43436102

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2010102801754A Expired - Fee Related CN101944111B (en) 2010-09-09 2010-09-09 Method and device for searching news video

Country Status (1)

Country Link
CN (1) CN101944111B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102117267A (en) * 2011-02-25 2011-07-06 汉王科技股份有限公司 Information display method, device and electronic equipment
CN103548017A (en) * 2011-12-26 2014-01-29 华为技术有限公司 Video search method and video search system
CN104216928A (en) * 2013-06-05 2014-12-17 腾讯科技(深圳)有限公司 Site information acquiring method and device
CN103455602B (en) * 2013-09-03 2017-03-29 小米科技有限责任公司 A kind of video URL grasping means, device and terminal device
CN103699661A (en) * 2013-12-26 2014-04-02 乐视网信息技术(北京)股份有限公司 Method and system for acquiring data of video resources
CN106528569B (en) * 2015-09-11 2019-09-17 北京国双科技有限公司 Calculate the method and device of search in Website availability
CN109032906A (en) * 2018-07-17 2018-12-18 郑州升达经贸管理学院 A kind of appraisal procedure and its assessment device of internet news
CN110704603B (en) * 2019-09-12 2022-09-09 武汉灯塔之光科技有限公司 Method and device for discovering current hot event through information

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101065749A (en) * 2004-11-24 2007-10-31 琳达·劳逊 System and method for resource management
CN101599089A (en) * 2009-07-17 2009-12-09 中国科学技术大学 The automatic search of update information on content of video service website and extraction system and method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101065749A (en) * 2004-11-24 2007-10-31 琳达·劳逊 System and method for resource management
CN101599089A (en) * 2009-07-17 2009-12-09 中国科学技术大学 The automatic search of update information on content of video service website and extraction system and method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Ming Zhu,etc.Effective Video Content Abstraction by Similar Shots Clustering.《Signal Processing,ICSP 2008》.2008,第1445-1448页. *
MingZhu etc.Effective Video Content Abstraction by Similar Shots Clustering.《Signal Processing
朱明等.基于多超级节点的PMDN资源搜索策略.《计算机仿真》.2008,第25卷(第8期),第131-135页. *

Also Published As

Publication number Publication date
CN101944111A (en) 2011-01-12

Similar Documents

Publication Publication Date Title
CN101944111B (en) Method and device for searching news video
US10032081B2 (en) Content-based video representation
CN102929928B (en) Multidimensional-similarity-based personalized news recommendation method
CN112348602B (en) Automatic advertisement putting management system based on big data
CN103914478B (en) Webpage training method and system, webpage Forecasting Methodology and system
CN107577688A (en) Original article influence power analysis system based on media information collection
CN104462293A (en) Search processing method and method and device for generating search result ranking model
CN104219575A (en) Related video recommending method and system
CN105183897A (en) Method and system for ranking video retrieval
CN102165464A (en) Method and system for automated annotation of persons in video content
CN101681372A (en) Method and system for providing relevant information to a user of a device in a local network
CN101814083A (en) Automatic webpage classification method and system
KR101252670B1 (en) Apparatus, method and computer readable recording medium for providing related contents
CN105022827A (en) Field subject-oriented Web news dynamic aggregation method
CN103870454A (en) Method and method for recommending data
CN102831220A (en) Subject-oriented customized news information extraction system
KR101541495B1 (en) Apparatus, method and computer readable recording medium for analyzing a video using the image captured from the video
CN102880712A (en) Method and system for sequencing searched network videos
CN102542066B (en) Video clustering method, ordering method, video searching method and corresponding devices
US20170199930A1 (en) Systems Methods Devices Circuits and Associated Computer Executable Code for Taste Profiling of Internet Users
CN103593371A (en) Method and device for recommending search keywords
CN104899306A (en) Information processing method, information display method and information display device
Liu et al. Query sensitive dynamic web video thumbnail generation
Falchi et al. Similarity caching in large-scale image retrieval
CN102855245A (en) Image similarity determining method and image similarity determining equipment

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
ASS Succession or assignment of patent right

Owner name: ANHUI GUANGXING COMMUNICATION TECHNOLOGY CO., LTD.

Free format text: FORMER OWNER: UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA

Effective date: 20130821

C41 Transfer of patent application or patent right or utility model
COR Change of bibliographic data

Free format text: CORRECT: ADDRESS; FROM: 230026 HEFEI, ANHUI PROVINCE TO: 230001 HEFEI, ANHUI PROVINCE

TR01 Transfer of patent right

Effective date of registration: 20130821

Address after: 800 C4, 12 floor, animation industry park, Wangjiang Road, Anhui, Hefei 230001, China

Patentee after: Anhui Guangxing Communication Technology Co., Ltd.

Address before: 230026 Jinzhai Road, Anhui, China, No. 96, No.

Patentee before: University of Science and Technology of China

CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20120523

Termination date: 20200909

CF01 Termination of patent right due to non-payment of annual fee