CN106708929A - Video program search method and device - Google Patents

Video program search method and device Download PDF

Info

Publication number
CN106708929A
CN106708929A CN201611019485.4A CN201611019485A CN106708929A CN 106708929 A CN106708929 A CN 106708929A CN 201611019485 A CN201611019485 A CN 201611019485A CN 106708929 A CN106708929 A CN 106708929A
Authority
CN
China
Prior art keywords
matrix
vector
description
video
frequency program
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201611019485.4A
Other languages
Chinese (zh)
Other versions
CN106708929B (en
Inventor
李贤�
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Shiyuan Electronics Thecnology Co Ltd
Original Assignee
Guangzhou Shiyuan Electronics Thecnology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Shiyuan Electronics Thecnology Co Ltd filed Critical Guangzhou Shiyuan Electronics Thecnology Co Ltd
Priority to CN201611019485.4A priority Critical patent/CN106708929B/en
Priority to PCT/CN2016/113642 priority patent/WO2018090468A1/en
Publication of CN106708929A publication Critical patent/CN106708929A/en
Application granted granted Critical
Publication of CN106708929B publication Critical patent/CN106708929B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/71Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a video program search method. The method comprises the following steps of: receiving a description entry, input by a user, for describing video programs and a video category to which the video programs belong; selecting a potential semantic index model corresponding to the video category, and constructing a query vector of the description entry according to a construction manner of an index matrix of the semantic index model; calculating a cosine similarity between each column vector of the index matrix and the query vector according to the potential semantic index model; sorting the calculated cosine similarities from big to small, and selecting a video program corresponding to the column vector of the cosine similarity, the sorting number of which belongs to a sorting interval, and providing the video program to the user. Correspondingly, the invention furthermore discloses a video program search device. By adoption of the video program search method and device, the potential semantic meanings of documents can be mined, so that the correctness and efficiency of video program search can be improved.

Description

The searching method and device of video frequency program
Technical field
The present invention relates to computer realm, more particularly to video frequency program searching method and device.
Background technology
When variety show recommendation is done, ContentBase methods are a kind of important strategies, mainly by variety content The similarity of description carries out cluster recommendation, and this method is clustered the close text of content, existing to be mainly based upon TF- The Rocchio algorithms of IDF, Rocchio algorithms are theoretical from vector space model, vector space model Vector space The basic thought of model is that a text is represented using vector, and processing procedure afterwards can just be converted into vector in space Computing.The process of Rocchio Algorithm for Training, is exactly in fact the process for setting up category feature vector, for given one not Know text, generate the vector of the text, then calculate the vectorial similarity with characteristic vector of all categories, finally by this article one's duty To in the classification most like with it.
But use above-mentioned algorithm to exist with shortcoming:Rocchio algorithms cannot excavate the potential applications of document.2nd, it is false If training data is absolutely correct, because whether it does not have any quantitative measurement sample containing noisy mechanism, thus also Has no resistance to wrong data.
The content of the invention
The searching method and device of a kind of video frequency program that the embodiment of the present invention is proposed, can excavate the potential language of document Justice, improves the degree of accuracy and the search efficiency of search video frequency program.
A kind of searching method of video frequency program provided in an embodiment of the present invention, including:
Receive the video classification belonging to the description entry and the video frequency program of the description video frequency program of user input;
The Vector Space Model corresponding with the video classification is chosen, and according to the rope of the semantic indexing model Draw the building mode of matrix, build the query vector of the description entry;Wherein, the Vector Space Model is to by retouching The description constructed index matrix of document for stating the other video frequency program of same video class carries out singular value decomposition and obtains;
According to the Vector Space Model, each column vector and the query vector of the index matrix are calculated Cosine similarity;
To calculating the sequence that the cosine similarity for obtaining carries out from big to small, and choose sequence number and belong to interval remaining of sequence The corresponding video frequency program of column vector of string similarity is supplied to the user.
Further, the process for being built into index matrix by the description document of description video frequency program includes:It is crucial by i-th The numerical value of i-th element that the word frequency that word occurs in j-th description document of video frequency program is arranged as the jth of index matrix;
The process for building the query vector of the description entry includes:I-th element for setting the query vector is represented Keyword and the index matrix the keyword that represents of the i-th row element it is identical, and the corresponding keyword of i-th element is existed In the description entry occur word frequency as i-th element of the query vector numerical value;Wherein, the query vector is Column vector.
Further, the description document by describing the same other video frequency program of video class is built into the process of index matrix, Specially:
The description same other video frequency program of video class for database purchase is described document, according to standard entry Form, Format adjusting is carried out to the entry that the be described document is included;Wherein, the database purchase has various video class Other description document, a description document describes a video frequency program, the mutual not phase of video frequency program that different description documents is described Together;
Call participle instrument;
Using the participle instrument to Format adjusting after the entry of described be described document carry out participle, obtain first Word collection;
Concentrated from first word according to TF-IDF algorithms and extract keyword;
According to the word frequency that each keyword for being extracted occurs in each describes document, index building matrix;Its In, the row order of the index matrix is carried out from high to low in total word frequency that the be described document occurs according to keyword Arrangement, the row word frequency that is occurred in each describes document according to keyword of order of the index matrix carries out from high to low Arrangement.
Further, the query vector for building the description entry, specially:
According to standard words wiht strip-lattice type, Format adjusting is carried out to the description entry;
Call participle instrument;
Using the participle instrument to Format adjusting after the description entry carry out participle, obtain the second word collection;
Concentrated from second word according to TF-IDF algorithms and extract keyword;
According to the word frequency that each keyword for being extracted occurs in the description entry, the description entry is built Query vector.
Further, the index matrix is H, then carry out that singular value decomposition obtained to the index matrix is described latent It is in semantic indexing model:H=T*S*DT;Wherein, T is orthogonal matrix, and each row of matrix T are a left sides of the index matrix H Singular vector;S is diagonal matrix, and the diagonal entry of matrix S is the singular value of the index matrix H;D is orthogonal matrix, square Each right singular vector for being classified as the index matrix H of battle array D;The query vector is Q;
It is described according to the Vector Space Model, calculate each column vector of the index matrix with it is described inquire about to The cosine similarity of amount, specially:
Choose TK、SKAnd DKMatrix, it is H to revise the Vector Space ModelK=TK*SK*DK T;Wherein, TKIt is by square The preceding K of battle array T arranges the matrix to be formed, SKIt is the diagonal matrix formed by the preceding K diagonal entry of matrix S, DKIt is by matrix D Preceding K arranges the matrix to be formed;The maximum sequence number for including interval more than the sequence of the numerical value of K;
For the index matrix H of the revised Vector Space ModelK, calculate the transposition square of the query vector Battle array QTWith the matrix TKRow vector and the matrix D obtained by multiplicationKWith the matrix SKBe multiplied gained matrix jth row to Cosine similarity between two row vectors of amount, as the index matrix HKJth column vector and the query vector Q more than String similarity.
Further, the searching method also includes:
When database increases the description document of the new video frequency program of description, pair with regarding belonging to the new video frequency program The corresponding Vector Space Model of frequency classification is updated.
Correspondingly, the embodiment of the present invention provides a kind of searcher of video frequency program, including:
User profile receiver module, the description entry and the video section of the description video frequency program for receiving user input Video classification belonging to mesh;
Query vector builds module, for choosing the Vector Space Model corresponding with the video classification, and root According to the building mode of the index matrix of the semantic indexing model, the query vector of the description entry is built;Wherein, it is described latent It is that the description constructed index matrix of document by describing the same other video frequency program of video class is entered in semantic indexing model Row singular value decomposition and obtain;
Similarity calculation module, for according to the Vector Space Model, calculating each row of the index matrix The vectorial cosine similarity with the query vector;
Video frequency program chooses module, for carrying out sequence from big to small to the cosine similarity for calculating acquisition, and chooses The corresponding video frequency program of column vector that sequence number belongs to the interval cosine similarity of sequence is supplied to the user.
Further, the query vector build that module includes for being built according to the description document of description video frequency program Into the unit of index matrix, specifically for:The word frequency that i-th keyword is occurred in j-th description document of video frequency program The numerical value of i-th element arranged as the jth of index matrix;
The query vector builds the unit of the query vector for building description entry that module includes, specifically for: The keyword that the keyword of i-th element representative of the query vector is represented with the i-th row element of the index matrix is set It is identical, and the word frequency that the corresponding keyword of i-th element is occurred in the description entry is used as the i-th of the query vector The numerical value of individual element;Wherein, the query vector is column vector.
Further, the query vector builds module and includes for according to the same other video frequency program of video class of description Description document is built into the unit of index matrix, specially:
First Format adjusting unit, for all of the same other video frequency program of video class of description for database purchase Description document, according to standard words wiht strip-lattice type, Format adjusting is carried out to the entry that the be described document is included;Wherein, it is described Database purchase has the description document of various video classification, and a description document describes a video frequency program, different description texts The video frequency program of shelves description is different;
First instrument call unit, for calling participle instrument;
First participle unit, for using the participle instrument to Format adjusting after described be described document entry Participle is carried out, the first word collection is obtained;
First keyword extracting unit, keyword is extracted for being concentrated from first word according to TF-IDF algorithms;
Index matrix construction unit, for what is occurred in each describes document according to each keyword for being extracted Word frequency, index building matrix;Wherein, the row order of the index matrix is that occurred in the be described document according to keyword The arrangement that carries out from high to low of total word frequency, the row order of the index matrix goes out according to keyword in each describes document Existing word frequency carries out arrangement from high to low.
Further, the query vector builds module also includes the list of the query vector for building the description entry Unit, specially:
Second Format adjusting unit, for according to standard words wiht strip-lattice type, Format adjusting being carried out to the description entry;
Second instrument call unit, for calling participle instrument;
Second participle unit, for using the participle instrument to Format adjusting after the description entry carry out participle, Obtain the second word collection;
Second keyword extracting unit, keyword is extracted for being concentrated from second word according to TF-IDF algorithms;
Query vector construction unit, for the word occurred in the description entry according to each keyword for being extracted Frequently, the query vector of the description entry is built.
Further, the index matrix is H, then carry out that singular value decomposition obtained to the index matrix is described latent It is in semantic indexing model:H=T*S*DT;Wherein, T is orthogonal matrix, and each row of matrix T are a left sides of the index matrix H Singular vector;S is diagonal matrix, and the diagonal entry of matrix S is the singular value of the index matrix H;D is orthogonal matrix, square Each right singular vector for being classified as the index matrix H of battle array D;The query vector is Q;
The similarity calculation module is specifically included:
Model revises unit, for choosing TK、SKAnd DKMatrix, it is H to revise the Vector Space ModelK=TK*SK* DK T;Wherein, TKIt is that the matrix for being formed, S are arranged by the preceding K of matrix TKBe by the preceding K diagonal entry of matrix S formed to angular moment Battle array, DKIt is that the matrix for being formed is arranged by the preceding K of matrix D;The maximum sequence number for including interval more than the sequence of the numerical value of K;
Computing unit, for the index matrix H for the revised Vector Space ModelK, calculate described in look into Ask the transposed matrix Q of vectorTWith the matrix TKRow vector and the matrix D obtained by multiplicationKWith the matrix SKBe multiplied gained Cosine similarity between two row vectors of the jth row vector of matrix, as the index matrix HKJth column vector with it is described The cosine similarity of query vector Q.
Further, the searcher also includes:
Model modification module is pair new with described during for the description document for increasing the new video frequency program of description when database Video frequency program belonging to the corresponding Vector Space Model of video classification be updated.
Implement the embodiment of the present invention, have the advantages that:
The searching method and device of video frequency program provided in an embodiment of the present invention, by calculate to search for the inquiry of video to Amount and the cosine similarity of each column vector of the index matrix of Vector Space Model, can obtain the description of video to be searched for Degree of correlation between the description document that each column vector of entry and index matrix is represented, numerical value is higher, then degree of correlation is got over Height, and then the video program recommendation corresponding to entry degree of correlation description document high will be described with this to user, and due to latent Semantic indexing model be according to description video frequency program description document build (training) into, the potential language of document can be excavated Justice, improves the degree of accuracy of search video frequency program.In addition, the video classification belonging to the video frequency program for passing through user input, choosing Select Vector Space Model corresponding with the video classification to be calculated, can further improve the effect of search video frequency program Rate.
Brief description of the drawings
Fig. 1 is the schematic flow sheet of one embodiment of the searching method of the video frequency program that the present invention is provided;
Fig. 2 is the structural representation of one embodiment of the searcher of the video frequency program that the present invention is provided;
Fig. 3 is the knot of one embodiment of the query vector structure module of the searcher of the video frequency program that the present invention is provided Structure schematic diagram;
Fig. 4 is the structure of one embodiment of the similarity calculation module of the searcher of the video frequency program that the present invention is provided Schematic diagram.
Specific embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Site preparation is described, it is clear that described embodiment is only a part of embodiment of the invention, rather than whole embodiments.It is based on Embodiment in the present invention, it is every other that those of ordinary skill in the art are obtained under the premise of creative work is not made Embodiment, belongs to the scope of protection of the invention.
It is the schematic flow sheet of one embodiment of the searching method of the video frequency program that the present invention is provided referring to Fig. 1;This is searched Suo Fangfa, including step S1 to S4, specially:
S1, receives the video classification belonging to the description entry and the video frequency program of the description video frequency program of user input;
S2, chooses the Vector Space Model corresponding with the video classification, and according to the semantic indexing model Index matrix building mode, build it is described description entry query vector;Wherein, the Vector Space Model is right Singular value decomposition is carried out by the description constructed index matrix of document for describing the same other video frequency program of video class and is obtained 's;The numerical value of i-th element of the jth row of the index matrix represents i-th keyword in j-th description text of video frequency program The word frequency occurred in shelves;The query vector is column vector, the keyword that i-th element of the query vector is represented with it is described The keyword that i-th row element of index matrix is represented is identical, and the numerical value of i-th element of the query vector represents described the The word frequency that the corresponding keyword of i element occurs in the description entry;
S3, according to the Vector Space Model, calculate each column vector of the index matrix with it is described inquire about to The cosine similarity of amount;
S4, to calculating the sequence that the cosine similarity for obtaining carries out from big to small, and choosing sequence number, to belong to sequence interval The corresponding video frequency program of column vector of cosine similarity be supplied to the user.
It should be noted that the index matrix by calculating the query vector with Vector Space Model that to search for video Each column vector cosine similarity, each column vector that can obtain description entry and the index matrix of video to be searched for represents Description document between degree of correlation, numerical value is higher, then degree of correlation is higher, and then will to describe entry degree of correlation high with this Description document corresponding to video program recommendation to user, and due to Vector Space Model be according to description video frequency program Description document build (training) into, can excavate the potential applications of document, improve the degree of accuracy for searching for video frequency program.Separately Outward, by the video classification belonging to the video frequency program of user input, potential applications rope corresponding with the video classification is selected Draw model to be calculated, can further improve the efficiency of search video frequency program.Wherein, above-mentioned sequence interval is generally preferred to 10 sequences number being arranged in front.
Further, being built into according to the description document for describing the same other video frequency program of video class in above-mentioned steps S2 The process of index matrix, specially:
The description same other video frequency program of video class for database purchase is described document, according to standard entry Form, Format adjusting is carried out to the entry that the be described document is included;Wherein, the database purchase has various video class Other description document, a description document describes a video frequency program, the mutual not phase of video frequency program that different description documents is described Together;For the Format adjusting to entry, can be, but not limited to, the small letter in entry is unified into capitalization, to unnecessary in entry Space is deleted, the punctuation mark in unified entry, be one kind etc. by full-shape form or the half width form unification of entry.
Call participle instrument;Preferably, the participle instrument is jieba participle instruments, but is not limited to this participle instrument.
Using the participle instrument to Format adjusting after the entry of described be described document carry out participle, obtain first Word collection;Participle instrument carries out the pattern of participle to description entry various, in addition to by normal participle pattern cutting, can be with Continuing long word carries out cutting, improves recall rate, especially to short text, can cut out than being normally syncopated as more words, to follow-up The degree of accuracy of output video frequency program have lifting effect.
Concentrated from first word according to TF-IDF algorithms and extract keyword;
According to the word frequency that each keyword for being extracted occurs in each describes document, index building matrix;Its In, the row order of the index matrix is carried out from high to low in total word frequency that the be described document occurs according to keyword Arrangement, the row word frequency that is occurred in each describes document according to keyword of order of the index matrix carries out from high to low Arrangement.
It should be noted that it is built-up previously according to the description document of database purchase to build above-mentioned index matrix , building process need to be followed:The numerical value of i-th element of the jth row of index matrix represents i-th keyword in j-th video The word frequency occurred in the description document of program.Wherein, the same key representated by all elements of the i-th row of index matrix Word, and keyword representated by the element do not gone together differs.For example, it is assumed that all elements of the 1st row of index matrix are represented Keyword A, the element of the 1st row of index matrix represents description document B, then the number of the element of the row of the 1st row the 1st of the index matrix Value represents the probability that keyword A occurs in description document B.
Further, the query vector of the structure description entry in above-mentioned steps S2, specially:
According to standard words wiht strip-lattice type, Format adjusting is carried out to the description entry;For example, the small letter in entry is unified into Capitalize, united to the punctuation mark in space deletion unnecessary in entry, unified entry, by the full-shape form or half width form of entry One is one kind etc..
Call participle instrument;Preferably, the participle instrument is jieba participle instruments, but is not limited to this participle instrument.
Using the participle instrument to Format adjusting after the description entry carry out participle, obtain the second word collection;Point Word instrument has various to the pattern that description entry carries out participle, in addition to by normal participle pattern cutting, can also continue to long word Cutting is carried out, recall rate is improved, especially to short text, can be cut out than being normally syncopated as more words, follow-up output is regarded The degree of accuracy of frequency program has lifting effect.
Concentrated from second word according to TF-IDF algorithms and extract keyword;
According to the word frequency that each keyword for being extracted occurs in the description entry, the description entry is built Query vector.
It should be noted that when building the query vector of the description entry, it is to be ensured that i-th yuan of the query vector The keyword that element is represented is identical with the keyword that the i-th row element of the index matrix of above-mentioned Vector Space Model is represented, and makes Obtain comparison query vector has meaning with the cosine similarity of each column vector of index matrix.
In addition, the process for building vector also needs to follow following principle:The key that i-th element of the query vector is represented The keyword that i-th row element of word and the index matrix is represented is identical, and i-th element of the query vector numerical value generation The word frequency that the corresponding keyword of i-th element described in table occurs in the description entry;For example, assuming that index matrix The all elements of 1 row represent keyword A, then the keyword that the element of the 1st row of query vector is represented is keyword A, then inquire about The numerical value of the element of the 1st row of vector represents the word frequency that keyword A occurs in entry is described.
Further, the index matrix is H, then carry out that singular value decomposition obtained to the index matrix is described latent It is in semantic indexing model:H=T*S*DT;Wherein, T is orthogonal matrix, and each row of matrix T are a left sides of the index matrix H Singular vector;S is diagonal matrix, and the diagonal entry of matrix S is the singular value of the index matrix H;D is orthogonal matrix, square Each right singular vector for being classified as the index matrix H of battle array D;The query vector is Q;
The specific implementation process of above-mentioned steps S3 is specially:
Choose TK、SKAnd DKMatrix, it is H to revise the Vector Space ModelK=TK*SK*DK T;Wherein, TKIt is by square The preceding K of battle array T arranges the matrix to be formed, SKIt is the diagonal matrix formed by the preceding K diagonal entry of matrix S, DKIt is by matrix D Preceding K arranges the matrix to be formed;The maximum sequence number for including interval more than the sequence of the numerical value of K;
For the index matrix H of the revised Vector Space ModelK, calculate the transposition square of the query vector Battle array QTWith the matrix TKRow vector and the matrix D obtained by multiplicationKWith the matrix SKBe multiplied gained matrix jth row to Cosine similarity between two row vectors of amount, as the index matrix HKJth column vector and the query vector Q more than String similarity.
It should be noted that K values herein are a threshold value selections, can be selected according to actual conditions, decomposable process uses H K orders, be that the singular value for making the preceding K maximum singular value of index matrix H later is all zero.It is above-mentioned to Vector Space Model Revision, it is possible to increase recall precision.
Further, the searching method also includes:
When database increases the description document of the new video frequency program of description, pair with regarding belonging to the new video frequency program The corresponding Vector Space Model of frequency classification is updated.
It should be noted that because video frequency program can be ever-increasing, and the video frequency program that is newly increased for description is retouched Stating document can also be continuously added in the middle of database, it is therefore desirable to be updated in semantic indexing model to lifting.
The searching method of video frequency program provided in an embodiment of the present invention, the query vector of video will be searched for by calculating and is dived In the cosine similarity of each column vector of the index matrix of semantic indexing model, can obtain the description entry of video to be searched for Degree of correlation between the description document that each column vector of index matrix is represented, numerical value is higher, then degree of correlation is higher, and then Video program recommendation corresponding to entry degree of correlation description document high will be described with this to user, and due to potential applications rope Draw model be according to description video frequency program description document build (training) into, the potential applications of document can be excavated, raising Search for the degree of accuracy of video frequency program.In addition, the video classification belonging to the video frequency program for passing through user input, selects to be regarded with this The corresponding Vector Space Model of frequency classification is calculated, and can further improve the efficiency of search video frequency program.
It is the structural representation of one embodiment of the searcher of the video frequency program that the present invention is provided refering to Fig. 2.This is searched Rope device be able to carry out above-described embodiment offer video frequency program searching method whole flows, the searcher, including:
User profile receiver module 10, the description entry and the video of the description video frequency program for receiving user input Video classification belonging to program;
Query vector builds module 20, for choosing the Vector Space Model corresponding with the video classification, and The building mode of the index matrix according to the semantic indexing model, builds the query vector of the description entry;Wherein, it is described Vector Space Model is to the index matrix constructed by describing the description document of the same other video frequency program of video class Carry out singular value decomposition and obtain;
Similarity calculation module 30, for according to the Vector Space Model, calculating each of the index matrix The cosine similarity of column vector and the query vector;
Video frequency program chooses module 40, for carrying out sequence from big to small to the cosine similarity for calculating acquisition, and selects The corresponding video frequency program of column vector for taking the cosine similarity that sequence number belongs to sequence interval is supplied to the user.
Further, the query vector build that module includes for being built according to the description document of description video frequency program Into the unit of index matrix, specifically for:The word frequency that i-th keyword is occurred in j-th description document of video frequency program The numerical value of i-th element arranged as the jth of index matrix;
The query vector builds the unit of the query vector for building description entry that module includes, specifically for: The keyword that the keyword of i-th element representative of the query vector is represented with the i-th row element of the index matrix is set It is identical, and the word frequency that the corresponding keyword of i-th element is occurred in the description entry is used as the i-th of the query vector The numerical value of individual element;Wherein, the query vector is column vector.
Further, it is that the query vector of the searcher of the video frequency program that the present invention is provided builds module referring to Fig. 3 The structural representation of one embodiment, the query vector builds module 20 to be included for according to describing, same video class is other to be regarded The description document of frequency program is built into the unit of index matrix, specially:
First Format adjusting unit 21, for the institute of the same other video frequency program of video class of description for database purchase Document is described, according to standard words wiht strip-lattice type, Format adjusting is carried out to the entry that the be described document is included;Wherein, institute Stating database purchase has the description document of various video classification, and a description document describes a video frequency program, different descriptions The video frequency program of document description is different;
First instrument call unit 22, for calling participle instrument;
First participle unit 23, for using the participle instrument to Format adjusting after described be described document word Bar carries out participle, obtains the first word collection;
First keyword extracting unit 34, keyword is extracted for being concentrated from first word according to TF-IDF algorithms;
Index matrix construction unit 25, for being occurred in each describes document according to each keyword for being extracted Word frequency, index building matrix;Wherein, the row order of the index matrix is gone out in the be described document according to keyword Existing total word frequency carries out arrangement from high to low, and the row order of the index matrix is according to keyword in each describes document The word frequency of appearance carries out arrangement from high to low.
Further, the query vector builds module 20 and also includes for building the query vector for describing entry Unit, specially:
Second Format adjusting unit 26, for according to standard words wiht strip-lattice type, Format adjusting being carried out to the description entry;
Second instrument call unit 27, for calling participle instrument;
Second participle unit 28, for using the participle instrument to Format adjusting after the description entry divided Word, obtains the second word collection;
Second keyword extracting unit 29, keyword is extracted for being concentrated from second word according to TF-IDF algorithms;
Query vector construction unit 31, for what is occurred in the description entry according to each keyword for being extracted Word frequency, builds the query vector of the description entry.
Further, it is the one of the similarity calculation module of the searcher of the video frequency program that the present invention is provided referring to Fig. 4 The structural representation of individual embodiment, the index matrix is H, then the institute that singular value decomposition is obtained is carried out to the index matrix Stating Vector Space Model is:H=T*S*DT;Wherein, T is orthogonal matrix, and each row of matrix T are the index matrix H Left singular vector;S is diagonal matrix, and the diagonal entry of matrix S is the singular value of the index matrix H;D is orthogonal moment Battle array, each right singular vector for being classified as the index matrix H of matrix D;The query vector is Q;
The similarity calculation module 30 is specifically included:
Model revises unit 32, for choosing TK、SKAnd DKMatrix, it is H to revise the Vector Space ModelK=TK* SK*DK T;Wherein, TKIt is that the matrix for being formed, S are arranged by the preceding K of matrix TKBe by the preceding K diagonal entry of matrix S formed it is diagonal Matrix, DKIt is that the matrix for being formed is arranged by the preceding K of matrix D;The maximum sequence number for including interval more than the sequence of the numerical value of K;
Computing unit 33, for the index matrix H for the revised Vector Space ModelK, calculate described The transposed matrix Q of query vectorTWith the matrix TKRow vector and the matrix D obtained by multiplicationKWith the matrix SKMultiplication institute Cosine similarity between two row vectors of the jth row vector for obtaining matrix, as the index matrix HKJth column vector and institute State the cosine similarity of query vector Q.
Further, the searcher also includes:
Model modification module 50, during for the description document for increasing the new video frequency program of description when database, pair with it is described The corresponding Vector Space Model of video classification belonging to new video frequency program is updated.
The searcher of video frequency program provided in an embodiment of the present invention, the query vector of video will be searched for by calculating and is dived In the cosine similarity of each column vector of the index matrix of semantic indexing model, can obtain the description entry of video to be searched for Degree of correlation between the description document that each column vector of index matrix is represented, numerical value is higher, then degree of correlation is higher, and then Video program recommendation corresponding to entry degree of correlation description document high will be described with this to user, and due to potential applications rope Draw model be according to description video frequency program description document build (training) into, the potential applications of document can be excavated, raising Search for the degree of accuracy of video frequency program.In addition, the video classification belonging to the video frequency program for passing through user input, selects to be regarded with this The corresponding Vector Space Model of frequency classification is calculated, and can further improve the efficiency of search video frequency program.
One of ordinary skill in the art will appreciate that all or part of flow in realizing above-described embodiment method, can be The hardware of correlation is instructed to complete by computer program, described program can be stored in a computer read/write memory medium In, the program is upon execution, it may include such as the flow of the embodiment of above-mentioned each method.Wherein, described storage medium can be magnetic Dish, CD, read-only memory (Read-Only Memory, ROM) or random access memory (Random Access Memory, RAM) etc..
The above is the preferred embodiment of the present invention, it is noted that for those skilled in the art For, under the premise without departing from the principles of the invention, some improvements and modifications can also be made, these improvements and modifications are also considered as Protection scope of the present invention.

Claims (12)

1. a kind of searching method of video frequency program, it is characterised in that including:
Receive the video classification belonging to the description entry and the video frequency program of the description video frequency program of user input;
The Vector Space Model corresponding with the video classification is chosen, and according to the index square of the semantic indexing model The building mode of battle array, builds the query vector of the description entry;Wherein, the Vector Space Model is to same by description The description constructed index matrix of document of the other video frequency program of one video class carries out singular value decomposition and obtains;
According to the Vector Space Model, each column vector of the index matrix and the cosine of the query vector are calculated Similarity;
Cosine similarity to calculating acquisition carries out sequence from big to small, and chooses the cosine phase that sequence number belongs to sequence interval The user is supplied to like the corresponding video frequency program of column vector of degree.
2. the searching method of video frequency program as claimed in claim 1, it is characterised in that
The process for being built into index matrix by the description document of description video frequency program includes:By i-th keyword in j-th video The numerical value of i-th element that the word frequency occurred in the description document of program is arranged as the jth of index matrix;
The process for building the query vector of the description entry includes:The pass that i-th element of the query vector is represented is set The keyword that i-th row element of keyword and the index matrix is represented is identical, and by the corresponding keyword of i-th element described The word frequency of appearance in entry is described as the numerical value of i-th element of the query vector;Wherein, the query vector for row to Amount.
3. the searching method of video frequency program as claimed in claim 1 or 2, it is characterised in that other by describing same video class The description document of video frequency program is built into the process of index matrix, specially:
The description same other video frequency program of video class for database purchase is described document, according to standard words grid Formula, Format adjusting is carried out to the entry that the be described document is included;Wherein, the database purchase has various video classification Description document, a description document describes a video frequency program, and the video frequency program of different description document descriptions is different;
Call participle instrument;
Using the participle instrument to Format adjusting after the entry of described be described document carry out participle, obtain the first word Collection;
Concentrated from first word according to TF-IDF algorithms and extract keyword;
According to the word frequency that each keyword for being extracted occurs in each describes document, index building matrix;Wherein, institute The row order for stating index matrix is to carry out row from high to low in total word frequency that the be described document occurs according to keyword Row, the row word frequency that is occurred in each describes document according to keyword of order of the index matrix carries out row from high to low Row.
4. the searching method of video frequency program as claimed in claim 1 or 2, it is characterised in that the structure description entry Query vector, specially:
According to standard words wiht strip-lattice type, Format adjusting is carried out to the description entry;
Call participle instrument;
Using the participle instrument to Format adjusting after the description entry carry out participle, obtain the second word collection;
Concentrated from second word according to TF-IDF algorithms and extract keyword;
According to the word frequency that each keyword for being extracted occurs in the description entry, the inquiry of the description entry is built Vector.
5. the searching method of video frequency program as claimed in claim 3, it is characterised in that the index matrix is H, then to described Index matrix carries out the Vector Space Model that singular value decomposition obtained:H=T*S*DT;Wherein, T is orthogonal moment Battle array, each row of matrix T are the left singular vectors of the index matrix H;S is diagonal matrix, and the diagonal entry of matrix S is institute State the singular value of index matrix H;D is orthogonal matrix, each right singular vector for being classified as the index matrix H of matrix D;It is described Query vector is Q;
It is described according to the Vector Space Model, calculate each column vector and the query vector of the index matrix Cosine similarity, specially:
Choose TK、SKAnd DKMatrix, it is H to revise the Vector Space ModelK=TK*SK*DK T;Wherein, TKIt is by matrix T Preceding K arranges the matrix to be formed, SKIt is the diagonal matrix formed by the preceding K diagonal entry of matrix S, DKIt is to be arranged by the preceding K of matrix D The matrix of formation;The maximum sequence number for including interval more than the sequence of the numerical value of K;
For the index matrix H of the revised Vector Space ModelK, calculate the transposed matrix Q of the query vectorT With the matrix TKRow vector and the matrix D obtained by multiplicationKWith the matrix SKBe multiplied the jth row vector of gained matrix Cosine similarity between two row vectors, as the index matrix HKJth column vector and the query vector Q cosine phase Like degree.
6. the searching method of video frequency program as claimed in claim 1, it is characterised in that the searching method also includes:
When database increases the description document of the new video frequency program of description, pair with the video class belonging to the new video frequency program Not corresponding Vector Space Model is updated.
7. a kind of searcher of video frequency program, it is characterised in that including:
User profile receiver module, description entry and video frequency program institute for receiving the description video frequency program of user input The video classification of category;
Query vector builds module, for choosing the Vector Space Model corresponding with the video classification, and according to institute The building mode of the index matrix of predicate justice index model, builds the query vector of the description entry;Wherein, the potential language Adopted index model is that the description constructed index matrix of document by describing the same other video frequency program of video class is carried out very What different value was decomposed and obtained;
Similarity calculation module, for according to the Vector Space Model, calculating each column vector of the index matrix With the cosine similarity of the query vector;
Video frequency program chooses module, for carrying out sequence from big to small to the cosine similarity for calculating acquisition, and chooses sequence The corresponding video frequency program of column vector for number belonging to the interval cosine similarity of sequence is supplied to the user.
8. the searcher of video frequency program as claimed in claim 7, it is characterised in that
The query vector build that module includes for being built into index matrix according to the description document of description video frequency program Unit, specifically for:Using the word frequency of i-th keyword appearance in j-th description document of video frequency program as index matrix Jth row i-th element numerical value;
The query vector builds the unit of the query vector for building description entry that module includes, specifically for:Set The keyword that i-th element of the query vector is represented is identical with the keyword that the i-th row element of the index matrix is represented, And the word frequency for occurring the corresponding keyword of i-th element in the description entry is used as i-th yuan of the query vector The numerical value of element;Wherein, the query vector is column vector.
9. the searcher of video frequency program as claimed in claim 7 or 8, it is characterised in that the query vector builds module Including the unit for being built into index matrix according to the description document for describing the same other video frequency program of video class, specially:
First Format adjusting unit, for being described for the same other video frequency program of video class of description for database purchase Document, according to standard words wiht strip-lattice type, Format adjusting is carried out to the entry that the be described document is included;Wherein, the data Stock contains the description document of various video classification, and a description document describes a video frequency program, and different description documents are retouched The video frequency program stated is different;
First instrument call unit, for calling participle instrument;
First participle unit, for using the participle instrument to Format adjusting after the entry of described be described document carry out Participle, obtains the first word collection;
First keyword extracting unit, keyword is extracted for being concentrated from first word according to TF-IDF algorithms;
Index matrix construction unit, for the word occurred in each describes document according to each keyword for being extracted Frequently, index building matrix;Wherein, the row order of the index matrix is that occurred in the be described document according to keyword Total word frequency carries out arrangement from high to low, and the row order of the index matrix occurs according to keyword in each describes document The word frequency arrangement that carries out from high to low.
10. the searcher of video frequency program as claimed in claim 7 or 8, it is characterised in that the query vector builds module Also include the unit of the query vector for building the description entry, specially:
Second Format adjusting unit, for according to standard words wiht strip-lattice type, Format adjusting being carried out to the description entry;
Second instrument call unit, for calling participle instrument;
Second participle unit, for using the participle instrument to Format adjusting after the description entry carry out participle, obtain Second word collection;
Second keyword extracting unit, keyword is extracted for being concentrated from second word according to TF-IDF algorithms;
Query vector construction unit, for the word frequency occurred in the description entry according to each keyword for being extracted, Build the query vector of the description entry.
The searcher of 11. video frequency programs as claimed in claim 9, it is characterised in that the index matrix is H, then to institute State index matrix and carry out the Vector Space Model that singular value decomposition obtained and be:H=T*S*DT;Wherein, T is orthogonal Matrix, each row of matrix T are the left singular vectors of the index matrix H;S is diagonal matrix, and the diagonal entry of matrix S is The singular value of the index matrix H;D is orthogonal matrix, each right singular vector for being classified as the index matrix H of matrix D;Institute Query vector is stated for Q;
The similarity calculation module is specifically included:
Model revises unit, for choosing TK、SKAnd DKMatrix, it is H to revise the Vector Space ModelK=TK*SK*DK T; Wherein, TKIt is that the matrix for being formed, S are arranged by the preceding K of matrix TKIt is the diagonal matrix formed by the preceding K diagonal entry of matrix S, DKIt is that the matrix for being formed is arranged by the preceding K of matrix D;The maximum sequence number for including interval more than the sequence of the numerical value of K;
Computing unit, for the index matrix H for the revised Vector Space ModelK, calculate the query vector Transposed matrix QTWith the matrix TKRow vector and the matrix D obtained by multiplicationKWith the matrix SKBe multiplied gained matrix Cosine similarity between two row vectors of jth row vector, as the index matrix HKJth column vector with it is described inquire about to Measure the cosine similarity of Q.
The searcher of 12. video frequency programs as claimed in claim 7, it is characterised in that the searcher also includes:
Model modification module, during for the description document for increasing the new video frequency program of description when database, pair regards with described new The corresponding Vector Space Model of video classification belonging to frequency program is updated.
CN201611019485.4A 2016-11-18 2016-11-18 Video program searching method and device Active CN106708929B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201611019485.4A CN106708929B (en) 2016-11-18 2016-11-18 Video program searching method and device
PCT/CN2016/113642 WO2018090468A1 (en) 2016-11-18 2016-12-30 Method and device for searching for video program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611019485.4A CN106708929B (en) 2016-11-18 2016-11-18 Video program searching method and device

Publications (2)

Publication Number Publication Date
CN106708929A true CN106708929A (en) 2017-05-24
CN106708929B CN106708929B (en) 2020-06-26

Family

ID=58939942

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611019485.4A Active CN106708929B (en) 2016-11-18 2016-11-18 Video program searching method and device

Country Status (2)

Country Link
CN (1) CN106708929B (en)
WO (1) WO2018090468A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108416026A (en) * 2018-03-09 2018-08-17 腾讯科技(深圳)有限公司 Index generation method, content search method, device and equipment
CN109918616A (en) * 2019-01-23 2019-06-21 中国人民解放军军事科学院系统工程研究院 A kind of visual media processing method based on the enhancing of semantic indexing precision
CN110555127A (en) * 2018-03-30 2019-12-10 优酷网络技术(北京)有限公司 Multimedia content generation method and device
CN111177512A (en) * 2019-12-24 2020-05-19 绍兴市上虞区理工高等研究院 Scientific and technological achievement missing processing method and device based on big data
CN111651635A (en) * 2020-05-28 2020-09-11 拾音智能科技有限公司 Video retrieval method based on natural language description

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111984851B (en) * 2020-09-03 2023-11-14 深圳平安智慧医健科技有限公司 Medical data searching method, device, electronic device and storage medium
CN113094703B (en) * 2021-03-11 2024-06-21 北京六方云信息技术有限公司 Output content filtering method and system for web intrusion detection
CN114564496B (en) * 2022-03-01 2023-09-19 北京有竹居网络技术有限公司 Content recommendation method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101527815A (en) * 2008-03-06 2009-09-09 株式会社东芝 Program recommending apparatus and method
CN103559196A (en) * 2013-09-23 2014-02-05 浙江大学 Video retrieval method based on multi-core canonical correlation analysis
CN104199933A (en) * 2014-09-04 2014-12-10 华中科技大学 Multi-modal information fusion football video event detection and semantic annotation method
CN104657376A (en) * 2013-11-20 2015-05-27 航天信息股份有限公司 Searching method and searching device for video programs based on program relationship
CN105653690A (en) * 2015-12-30 2016-06-08 武汉大学 Video big data rapid searching method and system constrained by abnormal behavior early-warning information

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6189002B1 (en) * 1998-12-14 2001-02-13 Dolphin Search Process and system for retrieval of documents using context-relevant semantic profiles
CN103152618B (en) * 2011-12-07 2017-11-17 北京四达时代软件技术股份有限公司 Value added service of digital television content recommendation method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101527815A (en) * 2008-03-06 2009-09-09 株式会社东芝 Program recommending apparatus and method
CN103559196A (en) * 2013-09-23 2014-02-05 浙江大学 Video retrieval method based on multi-core canonical correlation analysis
CN104657376A (en) * 2013-11-20 2015-05-27 航天信息股份有限公司 Searching method and searching device for video programs based on program relationship
CN104199933A (en) * 2014-09-04 2014-12-10 华中科技大学 Multi-modal information fusion football video event detection and semantic annotation method
CN105653690A (en) * 2015-12-30 2016-06-08 武汉大学 Video big data rapid searching method and system constrained by abnormal behavior early-warning information

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108416026A (en) * 2018-03-09 2018-08-17 腾讯科技(深圳)有限公司 Index generation method, content search method, device and equipment
CN110555127A (en) * 2018-03-30 2019-12-10 优酷网络技术(北京)有限公司 Multimedia content generation method and device
CN109918616A (en) * 2019-01-23 2019-06-21 中国人民解放军军事科学院系统工程研究院 A kind of visual media processing method based on the enhancing of semantic indexing precision
CN111177512A (en) * 2019-12-24 2020-05-19 绍兴市上虞区理工高等研究院 Scientific and technological achievement missing processing method and device based on big data
CN111651635A (en) * 2020-05-28 2020-09-11 拾音智能科技有限公司 Video retrieval method based on natural language description
CN111651635B (en) * 2020-05-28 2023-04-28 拾音智能科技有限公司 Video retrieval method based on natural language description

Also Published As

Publication number Publication date
CN106708929B (en) 2020-06-26
WO2018090468A1 (en) 2018-05-24

Similar Documents

Publication Publication Date Title
CN106708929A (en) Video program search method and device
CN101223525B (en) Relationship networks
Sarawagi et al. Open-domain quantity queries on web tables: annotation, response, and consensus models
CN103425687A (en) Retrieval method and system based on queries
CN106126577A (en) A kind of weighted association rules method for digging based on data source Matrix dividing
US8515684B2 (en) System and method for identifying similar molecules
CN102456016B (en) Method and device for sequencing search results
CN106547864B (en) A kind of Personalized search based on query expansion
CN104376406A (en) Enterprise innovation resource management and analysis system and method based on big data
CN105045875A (en) Personalized information retrieval method and apparatus
CN110083683B (en) Entity semantic annotation method based on random walk
CN112988980B (en) Target product query method and device, computer equipment and storage medium
CN104484431A (en) Multi-source individualized news webpage recommending method based on field body
CN105426529A (en) Image retrieval method and system based on user search intention positioning
CN104281565B (en) Semantic dictionary construction method and device
CN106570196A (en) Video program searching method and apparatus
CN106528648A (en) Distributed keyword approximate search method for RDF in combination with Redis memory database
Xue et al. Ontology alignment based on instance using NSGA-II
CN113190593A (en) Search recommendation method based on digital human knowledge graph
CN111858567A (en) Method and system for cleaning government affair data through standard data elements
CN107436955A (en) A kind of English word relatedness computation method and apparatus based on Wikipedia Concept Vectors
CN105404677A (en) Tree structure based retrieval method
Wu et al. Searching online book documents and analyzing book citations
CN105426490A (en) Tree structure based indexing method
Phan et al. Automated data extraction from the web with conditional models

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant