CN103491205A - Related resource address push method and device based on video retrieval - Google Patents
Related resource address push method and device based on video retrieval Download PDFInfo
- Publication number
- CN103491205A CN103491205A CN201310462461.6A CN201310462461A CN103491205A CN 103491205 A CN103491205 A CN 103491205A CN 201310462461 A CN201310462461 A CN 201310462461A CN 103491205 A CN103491205 A CN 103491205A
- Authority
- CN
- China
- Prior art keywords
- participles
- resource data
- video resource
- video
- text information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 55
- 238000013507 mapping Methods 0.000 claims abstract description 20
- 230000011218 segmentation Effects 0.000 claims description 95
- 150000001875 compounds Chemical class 0.000 claims description 9
- 230000010365 information processing Effects 0.000 claims description 3
- 238000009825 accumulation Methods 0.000 abstract description 3
- PEDCQBHIVMGVHV-UHFFFAOYSA-N Glycerine Chemical compound OCC(O)CO PEDCQBHIVMGVHV-UHFFFAOYSA-N 0.000 description 27
- 241000607059 Solidago Species 0.000 description 4
- 238000005065 mining Methods 0.000 description 4
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 4
- 238000004590 computer program Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000014509 gene expression Effects 0.000 description 2
- 241000239290 Araneae Species 0.000 description 1
- 241001465754 Metazoa Species 0.000 description 1
- 240000007594 Oryza sativa Species 0.000 description 1
- 235000007164 Oryza sativa Nutrition 0.000 description 1
- 210000001015 abdomen Anatomy 0.000 description 1
- 238000004873 anchoring Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000007654 immersion Methods 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 235000009566 rice Nutrition 0.000 description 1
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a related resource address push method and device based on video retrieval. The related resource address push method and device based on the video retrieval comprises the steps of obtaining the characteristic text information of first video resource data when the loading or playing requests of the first video resource data are received, mapping the characteristic text information as one or more first participles, searching related second participles having the co-occurrence rate with the one or more first participles higher than a preset threshold value, wherein the co-occurrence rate is the possibility of the current one or more first participles and the second participles emerge together in identical video resource data, obtaining the network chained addresses of the second video resource data matched with the one or more fist participles and the related second participles, and pushing the network chained addresses of the second video resource data. The related resource address push method and device based on the video retrieval achieves the purpose of delving resources of good quality in a video database deeply, and improves delving efficiency of the resources. In addition, an index table can be enlarged continuously along with the accumulation of the video content of the internet, and the fact that a recall rate is facilitated is enlarged.
Description
Technical Field
The invention relates to the technical field of internet, in particular to a method and a device for pushing an associated resource address based on video search.
Background
A video search engine is a vertical search technique that is different from comprehensive search. The video search engine captures the results of the videos in the Internet and establishes an index, and the time for searching the videos by netizens can be greatly saved as the pure video results can be provided for searchers.
According to the related statistical data display of video search, videos of entertainment, games, movies, news, animation and other types are main search objects of users. This indicates that the user has a very demanding nature for the video search itself. Users often do not have strong purposiveness, and search results are not "not very popular", but rather have certain extensibility as long as the target is in the user's favorite category. Therefore, often relevant recommendations are made to the user outside of the search results.
However, the existing video search engine has made deficiencies in the related recommendations: some video search engines do not have relevant recommendations, and the video search engines with relevant recommendations can only achieve the recommendation in simple ways such as obtaining a correlation system through manual sorting according to search history data of users. The recommendation system is based on the existing search habits of the users, the recall rate is low, and in addition, the search range of the users is generally much smaller than the resource range in the existing Internet, so that high-quality videos in the Internet cannot be fully mined.
Another search recommendation method is to use a resource association system obtained by manual sorting or other knowledge systems in a recommendation system. For example, when a certain search engine searches for "square dance", recommended words such as "dance with friendship", "dance with belly", "gym" and the like are obtained, and when the search engine searches for "dota", recommended words such as "crossing fire", "magic animal world" and the like are obtained, but the recall rate of the system is low, and recommendations cannot be generally given in long-tailed search.
Disclosure of Invention
In view of the above problems, the present invention is proposed to provide a method for pushing an associated resource address based on video search and a corresponding device for pushing an associated resource address based on video search, which overcome or at least partially solve the above problems.
According to an aspect of the present invention, there is provided a method for pushing an associated resource address based on video search, including:
when a loading or playing request of first video resource data is received, acquiring feature text information of the first video resource data;
mapping the characteristic text information into one or more first word segmentations;
searching for associated second participles with the co-occurrence rate of the one or more first participles higher than a preset threshold; the co-occurrence rate is the probability that one or more first participles and one or more second participles commonly occur in the same video resource data;
acquiring a network link address of second video resource data matched with the one or more first participles and the associated second participles;
and pushing the network link address of the second video resource data.
Optionally, when a request for loading or playing first video resource data is received, the step of obtaining the feature text information of the first video resource data includes:
when a playing request of first video data is received, receiving feature text information of the first video resource data sent by a current terminal;
or,
when a first video data loading request is received, extracting local preset feature text information of the video resource data.
Optionally, the step of mapping the characteristic text information into one or more first participles includes:
extracting a word segmentation mapped by the characteristic text information;
or,
when the received characteristic text information is a compound word, splitting the characteristic text information into a plurality of search subwords; extracting a plurality of segmentation words mapped by the plurality of search sub-words.
Optionally, the step of searching for associated second word segments with a co-occurrence rate higher than a preset threshold with the one or more first word segments comprises:
when the characteristic text information is mapped into a first word segmentation, extracting a preset index table corresponding to the first word segmentation; the index table comprises information of video resource data to which the first word belongs and all words in the video resource data; all the participles in the video resource data are generated by capturing the video resource data, extracting the characteristic text information of the video resource data and participling the characteristic text information;
calculating the co-occurrence rate of the first participle and each second participle in the index table, wherein the co-occurrence rate is the ratio of the occurrence frequency of each second participle in the index table to the total number of information of the video resource data in the index table; the second participles are participles except the first participles in all participles in the video resource data;
and extracting the second participles with the co-occurrence rate higher than a preset threshold value as associated second participles.
Optionally, the step of searching for associated second word segments with a co-occurrence rate higher than a preset threshold with the one or more first word segments comprises:
when the characteristic text information is mapped into a plurality of first participles, respectively extracting a plurality of preset index tables corresponding to the plurality of first participles; each index table comprises information of video resource data to which the first word belongs and all words in the video resource data; all the participles in the video resource data are generated by capturing the video resource data, extracting the characteristic text information of the video resource data and participling the characteristic text information;
extracting a second participle which commonly appears with the plurality of first participles as a candidate participle; the second participles are participles except the first participles in all participles in the video resource data;
respectively calculating the co-occurrence rate of the first participle and the candidate participle in each index table, wherein the co-occurrence rate is the ratio of the occurrence frequency of the candidate participle in the index table to the total number of information of the video resource data in the index table;
configuring a plurality of corresponding weights for the co-occurrence rates of the plurality of first participles and the candidate participles respectively;
calculating an average value of a plurality of co-occurrence rates configured with weights as the co-occurrence rates of the plurality of first participles and the candidate participles;
and extracting the candidate participles with the co-occurrence rate higher than a preset threshold value as associated second participles.
Optionally, the step of searching for associated second word segments with a co-occurrence rate higher than a preset threshold with the one or more first word segments comprises:
when the characteristic text information is mapped into a plurality of first participles, respectively extracting a plurality of preset index tables corresponding to the plurality of first participles; each index table comprises information of video resource data to which the first word belongs and all words in the video resource data; all the participles in the video resource data are generated by capturing the video resource data, extracting the characteristic text information of the video resource data and participling the characteristic text information;
determining a main participle by adopting the plurality of index tables, wherein the main participle is a first participle corresponding to the index table with the largest total number of information of the video resource data;
calculating the co-occurrence rate of the main participle and each second participle in the index table corresponding to the main participle, wherein the co-occurrence rate is the ratio of the occurrence frequency of each second participle in the index table to the total information number of the video resource data in the index table; the second participles are participles except the first participles in all participles in the video resource data;
and extracting the second participles with the co-occurrence rate higher than a preset threshold value as associated second participles.
Optionally, the characteristic text information includes a video title, a video keyword, and/or a video description.
Optionally, the step of obtaining the network link address of the second video resource data matched with the one or more first participles and the associated second participles includes:
and acquiring the network link address of the main participle and the second video resource data associated with the second participle.
According to another aspect of the present invention, there is provided a device for pushing an associated resource address based on video search, including:
the device comprises a characteristic text information acquisition module, a characteristic text information acquisition module and a characteristic text information processing module, wherein the characteristic text information acquisition module is suitable for acquiring characteristic text information of first video resource data when a loading or playing request of the first video resource data is received;
a first segmentation mapping module adapted to map the characteristic text information into one or more first segmentations;
the second word segmentation searching module is suitable for searching for associated second words with the co-occurrence rate of the one or more first words higher than a preset threshold value; the co-occurrence rate is the probability that one or more first participles and one or more second participles commonly occur in the same video resource data;
the network connection address acquisition module is suitable for acquiring a network link address of second video resource data matched with the one or more first participles and the associated second participles;
and the network connection address pushing module is suitable for pushing the network link address of the second video resource data.
Optionally, the characteristic text information obtaining module is further adapted to:
when a playing request of first video data is received, receiving feature text information of the first video resource data sent by a current terminal;
or,
when a first video data loading request is received, extracting local preset feature text information of the video resource data.
Optionally, the first segmentation mapping module is further adapted to:
extracting a word segmentation mapped by the characteristic text information;
or,
when the received characteristic text information is a compound word, splitting the characteristic text information into a plurality of search subwords; extracting a plurality of segmentation words mapped by the plurality of search sub-words.
Optionally, the second participle lookup module is further adapted to:
when the characteristic text information is mapped into a first word segmentation, extracting a preset index table corresponding to the first word segmentation; the index table comprises information of video resource data to which the first word belongs and all words in the video resource data; all the participles in the video resource data are generated by capturing the video resource data, extracting the characteristic text information of the video resource data and participling the characteristic text information;
calculating the co-occurrence rate of the first participle and each second participle in the index table, wherein the co-occurrence rate is the ratio of the occurrence frequency of each second participle in the index table to the total number of information of the video resource data in the index table; the second participles are participles except the first participles in all participles in the video resource data;
and extracting the second participles with the co-occurrence rate higher than a preset threshold value as associated second participles.
Optionally, the second participle lookup module is further adapted to:
when the characteristic text information is mapped into a plurality of first participles, respectively extracting a plurality of preset index tables corresponding to the plurality of first participles; each index table comprises information of video resource data to which the first word belongs and all words in the video resource data; all the participles in the video resource data are generated by capturing the video resource data, extracting the characteristic text information of the video resource data and participling the characteristic text information;
extracting a second participle which commonly appears with the plurality of first participles as a candidate participle; the second participles are participles except the first participles in all participles in the video resource data;
respectively calculating the co-occurrence rate of the first participle and the candidate participle in each index table, wherein the co-occurrence rate is the ratio of the occurrence frequency of the candidate participle in the index table to the total number of information of the video resource data in the index table;
configuring a plurality of corresponding weights for the co-occurrence rates of the plurality of first participles and the candidate participles respectively;
calculating an average value of a plurality of co-occurrence rates configured with weights as the co-occurrence rates of the plurality of first participles and the candidate participles;
and extracting the candidate participles with the co-occurrence rate higher than a preset threshold value as associated second participles.
Optionally, the second participle lookup module is further adapted to:
when the characteristic text information is mapped into a plurality of first participles, respectively extracting a plurality of preset index tables corresponding to the plurality of first participles; each index table comprises information of video resource data to which the first word belongs and all words in the video resource data; all the participles in the video resource data are generated by capturing the video resource data, extracting the characteristic text information of the video resource data and participling the characteristic text information;
determining a main participle by adopting the plurality of index tables, wherein the main participle is a first participle corresponding to the index table with the largest total number of information of the video resource data;
calculating the co-occurrence rate of the main participle and each second participle in the index table corresponding to the main participle, wherein the co-occurrence rate is the ratio of the occurrence frequency of each second participle in the index table to the total information number of the video resource data in the index table; the second participles are participles except the first participles in all participles in the video resource data;
and extracting the second participles with the co-occurrence rate higher than a preset threshold value as associated second participles.
Optionally, the characteristic text information includes a video title, a video keyword, and/or a video description.
Optionally, the network connection address obtaining module is further adapted to:
and acquiring the network link address of the main participle and the second video resource data associated with the second participle.
The method can push according to the existing published content, so that a search engine gets rid of dependence on search habits of users, and pushes video resource data which are searched by less users but have more related resources in a video library, thereby realizing deep mining of high-quality resources in the video library and improving the efficiency of resource mining; in addition, the index table can be continuously enlarged along with the continuous accumulation of internet video contents, the quantity and the breadth of the contents produced by each large video station can far exceed the number of words searched by a user, and the recall rate can be favorably enlarged.
According to the method and the device, the network connection address of the matched second video resource data of the first word segmentation and the second word segmentation is obtained, so that the user can directly obtain the video data resource based on the address, the user can simply search to obtain more results, and the search does not need to be submitted for many times, so that the burden of accessing the server is reduced, the occupation of network resources is reduced, and the user experience is improved.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
FIG. 1 is a flow chart illustrating steps of an embodiment of a method for pushing an associated resource address based on a video search according to an embodiment of the present invention; and
fig. 2 is a block diagram illustrating an embodiment of a pushing apparatus for pushing an associated resource address based on video search according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
Referring to fig. 1, a flowchart illustrating steps of an embodiment of pushing an associated resource address based on video search according to an embodiment of the present invention is shown, which may specifically include the following steps:
it should be noted that the first video resource data may be located on the terminal device or on the network, and the characteristic text information may be information carried by the video resource data.
In a preferred embodiment of the present invention, the step 101 may specifically include the following sub-steps:
substep S11, when receiving a play request of first video data, receiving feature text information of the first video resource data sent by a current terminal;
when the first video resource data is located on the terminal device, the terminal device may extract the characteristic text information of the first video resource data, and then upload the characteristic text information to the corresponding server side.
Or,
and a substep S12, extracting locally preset characteristic text information of the video resource data when receiving the first video data loading request.
When the first video resource data is located on the network, the characteristic text information of the first video resource data can be extracted by the server side.
In a preferred embodiment of the invention, the characteristic text information may comprise a video title, a video keyword and/or a video description.
For example, in a piece of video resource data named "pat guest" changing Venice after eastern guan rainstorm, thousands of cars being immersed in water and anchored-online playing-XX net, video high definition online watching ", the characteristic text information may be as follows:
video Title (Title): changing Venice after a torrential rain in Dongguan, immersing and anchoring thousands of vehicles in water, playing on line, playing in an XX network, and watching videos in a high-definition on line;
video Keywords (Keywords): YY records the life information of Dongguan water;
video Description (Description): in the morning of yesterday, a storm makes the street in the east guan part of the area feel as if coming to Venice instantly. The running trolley is subjected to water immersion and anchorage in heavy rain, and some streets and houses are also in Wangyang.
In practical application, the characteristic text information may be a word, that is, a semantically independent word, such as mid-autumn, early afternoon, national celebration, etc.; the characteristic text information can also be a compound word, namely, two or more words with independent semantics, such as moon cake in mid-autumn, rice dumpling in the morning, national celebration, Tibet travel and the like. Generally, video resource data in terminal equipment often only have video titles (titles), such as movie titles of "ironmen", "spider knight", and the like; video asset data in a network often includes one or more of video titles (titles), video Keywords (Keywords), and video descriptions (descriptions).
it should be noted that the mapped segmentation words may be preset, and may be used to calculate the co-occurrence rate between different segmentation words.
The mapping rule can be one or more preset rules, and can include removing words without practical meanings such as dirty words, modified words, word help words, broad words and the like of video search characters; may include setting a stop word, i.e. some common words, as a criterion for stopping when splitting the phrase, e.g. yes, me, you, etc.; the method can also comprise the correspondence of the association relationship, and a plurality of expressions of the same thing are corresponded into one expression, for example, eighty-five month, mid-autumn festival, moon cake festival and the like are associated into mid-autumn; other mapping rules may also be included, which is not limited in this embodiment of the present invention.
English is a word unit, words are separated by spaces, Chinese is a word unit, and all words in a sentence can be connected to describe a meaning. For example, the English sentence I ama student, in Chinese, is: "I am a student". The computer can simply know that a student is a word by means of a space, but it cannot be easily understood that two words "learn" and "give birth" together to represent a word. The Chinese character sequence is cut into meaningful words, namely Chinese word segmentation. For example, i am a student, and the result of the word segmentation is: i, one, student.
Some common word segmentation methods are presented below:
1. the word segmentation method based on character string matching comprises the following steps: the method is characterized in that a Chinese character string to be analyzed is matched with a vocabulary entry in a preset machine dictionary according to a certain strategy, and if a certain character string is found in the dictionary, the matching is successful (a word is identified). In the actually used word segmentation system, mechanical word segmentation is used as an initial segmentation means, and various other language information is used to further improve the accuracy of segmentation.
2. The word segmentation method based on feature scanning or mark segmentation comprises the following steps: the method is characterized in that some words with obvious characteristics are preferentially identified and segmented in a character string to be analyzed, the words are used as breakpoints, an original character string can be segmented into smaller strings, and then mechanical segmentation is carried out, so that the matching error rate is reduced; or combining word segmentation and part of speech tagging, providing help for word decision by utilizing rich part of speech information, and detecting and adjusting word segmentation results in the tagging process, thereby improving the segmentation accuracy.
3. Understanding-based word segmentation method: the method is to enable a computer to simulate the understanding of sentences by a human so as to achieve the effect of recognizing words. The basic idea is to analyze syntax and semantics while segmenting words, and to process ambiguity phenomenon by using syntax information and semantic information. It generally comprises three parts: word segmentation subsystem, syntax semantic subsystem, and master control part. Under the coordination of the master control part, the word segmentation subsystem can obtain syntactic and semantic information of related words, sentences and the like to judge word segmentation ambiguity, namely the word segmentation subsystem simulates the process of understanding sentences by people. This word segmentation method requires the use of a large amount of linguistic knowledge and information.
4. The word segmentation method based on statistics comprises the following steps: the word co-occurrence frequency or probability of adjacent co-occurrence of the characters in the Chinese information can better reflect the credibility of the formed words, so that the frequency of the combination of the adjacent co-occurrence characters in the Chinese data can be counted, the co-occurrence information of the adjacent co-occurrence characters can be calculated, and the adjacent co-occurrence probability of the two Chinese characters X, Y can be calculated. The mutual presentation information can reflect the closeness degree of the combination relation between the Chinese characters. When the degree of closeness is above a certain threshold, it is considered that the word group may constitute a word. The method only needs to count the word group frequency in the corpus and does not need to segment the dictionary.
In a preferred embodiment of the present invention, the step 102 may specifically include the following sub-steps:
substep S21, extracting a word segment mapped by the characteristic text information;
for the situation that the characteristic text information is a word, corresponding participles can be directly extracted according to a preset mapping rule. For example, the characteristic text information is "mid-autumn", "my mid-autumn", or "mid-autumn", and the first segmentation of the mapping may be "mid-autumn". Of course, the feature text information and the first word segment mapped by the feature text information may be the same word, for example, the feature text information is "mid-autumn", and the first word segment mapped by the feature text information may also be "mid-autumn".
Or,
substep S22, when the received characteristic text information is a compound word, splitting the characteristic text information into a plurality of search subwords;
and a substep S23 of extracting a plurality of participles mapped by the plurality of search subwords.
For the situation that the characteristic text information is a compound word, word segmentation can be carried out according to a preset mapping rule to obtain search sub-words, and then word segmentation corresponding to the search sub-words is respectively extracted. For example, the received characteristic text message is "moon cake in mid-autumn festival", and the characteristic text message can be divided into two search sub-words, namely "moon cake" and "mid-autumn festival", and then the "moon cake" is mapped to the "mid-autumn festival", and the "moon cake" is mapped to the "moon cake", so that two first sub-words, namely "mid-autumn festival" and "moon cake", are obtained.
103, searching for associated second participles with the co-occurrence rate of the one or more first participles higher than a preset threshold;
the co-occurrence rate is the probability that one or more first participles and one or more second participles commonly occur in the same video resource data;
specifically, the co-occurrence rate may be a probability that one or more present participles and a second participle co-occur in the feature text information of the same video resource data, and specifically may include a co-occurrence rate of a first participle and a second participle, and a co-occurrence rate of a plurality of participles and a second participle.
It should be noted that the second participle may be a participle other than the first participle in all the preset participles. The associated second participle may be a second participle having a co-occurrence rate with the first participle above a preset threshold.
In practical applications, the video resource data may include characteristic text information, and the characteristic text information may be used to record related information of the video resource data, and may also be used to extract word segmentation.
In a preferred embodiment of the present invention, the step 103 may specifically include the following sub-steps:
substep S31, when the characteristic text information is mapped to a first word, extracting a preset index table corresponding to the first word; the index table comprises information of video resource data to which the first word belongs and all words in the video resource data; all the participles in the video resource data are generated by capturing the video resource data, extracting the characteristic text information of the video resource data and participling the characteristic text information;
in a specific implementation, a search engine can be adopted in advance to capture video resource data on each website platform through a crawler, and then an index library is established: extracting characteristic text information of the video resource data to perform word segmentation processing, and establishing an index table corresponding to each word segmentation, wherein the index table can store information of the video resource data (which can be ID, intranet address, extranet address and other video identifiers, and can also be a record formed by current word segmentation and other word segmentation), and all words in the video resource data (including a first word segmentation and a second word segmentation except the first word segmentation).
In a preferred embodiment of the invention, the characteristic text information may comprise a video title, a video keyword and/or a video description.
For example, an index table for "mid autumn" may be as follows:
the first segmentation is 'mid-autumn', and the information of the video resource data comprises a video identifier. Of course, the information of the video resource data may not include the video identifier, but only the records formed by the first participle and the second participle (i.e. the second participle of each line is used as one record).
Of course, the above index table is only used as an example, and when implementing the embodiment of the present invention, other index tables may be set according to actual situations, which is not limited in the embodiment of the present invention. In addition, besides the above index table, a person skilled in the art may also use other index tables according to actual needs, and the embodiment of the present invention is not limited to this.
It should be noted that, the video resource data on each platform can be captured periodically or at irregular time, and then the index database is updated, that is, each index table is updated.
Substep S32, calculating a co-occurrence rate of the first participle and each second participle in the index table, where the co-occurrence rate is a ratio of the occurrence frequency of each second participle in the index table to the total number of information of the video resource data in the index table; the second participles are participles except the first participles in all participles in the video resource data;
since the number of occurrences of each second participle in the index table is the same as the number of video data to which the second participle belongs, the co-occurrence rate can also be expressed as the ratio of the number of video data to which each second participle belongs in the index table to the total number of information of the video resource data in the index table.
For example, there are 100 pieces of information of video resource data in total in the index table of the divisional word "square dance", and there are 200 pieces of information of video resource data in total in the index table of the divisional word "solidago", and 10 pieces of information of video resource data in which "square dance" and "solidago" appear simultaneously in the two index tables, the co-occurrence rate of "square dance" and "solidago" is 10/100=10% for "square dance", and is 10/200=5% for "solidago".
And a substep S33, extracting the second participle with the co-occurrence rate higher than a preset threshold value as the associated second participle.
In a specific implementation, the preset threshold may be set by a person skilled in the art according to an actual situation, and the embodiment of the present invention is not limited thereto. The associated second sub-words extracted in the embodiment of the present invention may be null, or may be one or more.
In a preferred embodiment of the present invention, the step 103 may specifically include the following sub-steps:
a substep S41, when the characteristic text information is mapped into a plurality of first participles, respectively extracting a plurality of preset index tables corresponding to the plurality of first participles; each index table comprises information of video resource data to which the first word belongs and all words in the video resource data; all the participles in the video resource data are generated by capturing the video resource data, extracting the characteristic text information of the video resource data and participling the characteristic text information;
in a specific implementation, a search engine can be adopted in advance to capture video resource data on each platform through a crawler, and then an index is established to build a library: extracting characteristic text information of the video resource data to perform word segmentation processing, and establishing an index table corresponding to each word segmentation, wherein the index table can store information of the video resource data (which can be ID, intranet address, extranet address and other video identifiers, and can also be a record formed by current word segmentation and other word segmentation), and all words in the video resource data (including a first word segmentation and a second word segmentation except the first word segmentation).
In a preferred embodiment of the invention, the characteristic text information may comprise a video title, a video keyword and/or a video description.
A substep S42 of extracting a second participle that co-occurs with the plurality of first participles as a candidate participle; the second participles are participles except the first participles in all participles in the video resource data;
specifically, there are a plurality of first participles, that is, there are a plurality of corresponding index tables, and the candidate participles need to appear in each index table, that is, the candidate participles and the current first participles all appear in the same index table.
Substep S43, calculating a co-occurrence rate of the first participle and the candidate participle in each index table, where the co-occurrence rate is a ratio of the number of occurrences of the candidate participle in the index table to the total number of information of the video resource data in the index table;
for example, the characteristic text information "moon cake in mid-autumn festival" may be mapped into the first participles "mid-autumn" and "moon cake", and one of the candidate participles is extracted as "moon", and then the co-occurrence rates of "mid-autumn festival" and "moon cake" (assumed to be 70%), "moon cake" and "moon cake" (assumed to be 60%) may be calculated, respectively.
Substep S44, configuring a plurality of weights corresponding to the co-occurrence rates of the first participles and the candidate participles respectively;
the weight can be determined according to the proportion of the total number of the video resource data in the index table of each first participle, wherein the weight is larger when the total number of the video resource data in the index table is larger. For example, the total number of information of the video resource data in the index table of "mid-autumn" is 900, and the total number of information of the video resource data in the index table of "moon cake" is 100, the weight of the co-occurrence rate of "mid-autumn" and "moon" may be 0.9, and the weight of the co-occurrence rate of "moon cake" and "moon" may be 0.1.
Of course, the above weights are only examples, and when implementing the embodiment of the present invention, other weights may be set according to actual situations, for example, a corresponding weight is set according to a current social hotspot (news ranking, microblog ranking, and the like), a corresponding weight is set according to a local and/or online operation behavior of a user (video playing, news reading, and the like), and the like, which is not limited in this embodiment of the present invention. In addition, besides the above-mentioned weights, those skilled in the art may also adopt other weights according to actual needs, and the embodiment of the present invention is not limited to this.
A substep S45 of calculating an average value of the co-occurrence rates with weights as the co-occurrence rates of the first participles and the candidate participles;
in the embodiment of the present invention, a weighted average of a plurality of co-occurrence rates may be used as the final co-occurrence rate.
For example, the co-occurrence of mid-autumn "," moon cake "and" moon "may be (70% × 0.9+60% × 0.1)/2 = 34.5%.
And a substep S46, extracting the candidate participles with the co-occurrence rate higher than a preset threshold value as associated second participles.
In a specific implementation, the preset threshold may be set by a person skilled in the art according to an actual situation, and the embodiment of the present invention is not limited thereto. The associated second sub-words extracted in the embodiment of the present invention may be null, or may be one or more.
In a preferred embodiment of the present invention, the step 103 may specifically include the following sub-steps:
a substep S51, when the characteristic text information is mapped into a plurality of first participles, respectively extracting a plurality of preset index tables corresponding to the plurality of first participles; each index table comprises information of video resource data to which the first word belongs and all words in the video resource data; all the participles in the video resource data are generated by capturing the video resource data, extracting the characteristic text information of the video resource data and participling the characteristic text information;
in a specific implementation, a search engine can be adopted in advance to capture video resource data on each platform through a crawler, and then an index is established to build a library: extracting characteristic text information of the video resource data to perform word segmentation processing, and establishing an index table corresponding to each word segmentation, wherein the index table can store information of the video resource data (which can be ID, intranet address, extranet address and other video identifiers, and can also be a record formed by current word segmentation and other word segmentation), and all words in the video resource data (including a first word segmentation and a second word segmentation except the first word segmentation).
In a preferred embodiment of the invention, the characteristic text information may comprise a video title, a video keyword and/or a video description.
Substep S52, determining a main participle by using the plurality of index tables, wherein the main participle is a first participle corresponding to the index table with the largest total information amount of the video resource data;
in order to improve user experience, for a plurality of first participles with very different video resource data, the first participles with less total information of the video resource data can be ignored. For example, for the first participles "mid-autumn" and "moon cake" mapped by the characteristic text information "moon cake in mid-autumn," the total number of information of the video resource data in the index table of "mid-autumn" is 900, and the total number of information of the video resource data in the index table of "moon cake" is 100, then "mid-autumn" may be set as the main participle.
Substep S53, calculating a co-occurrence rate of the main participle and each second participle in the index table corresponding to the main participle, where the co-occurrence rate is a ratio of the occurrence frequency of each second participle in the index table to the total number of information of the video resource data in the index table; the second participles are participles except the first participles in all participles in the video resource data;
in the embodiment of the invention, the co-occurrence rate of the main part words can be used as the final co-occurrence rate.
And a substep S54, extracting the second participle with the co-occurrence rate higher than a preset threshold value as the associated second participle.
In a specific implementation, the preset threshold may be set by a person skilled in the art according to an actual situation, and the embodiment of the present invention is not limited thereto. The associated second sub-words extracted in the embodiment of the present invention may be null, or may be one or more.
in particular, after sub-step S33, a combination of the current one of the first participles and one or more associated second participles may be obtained. For example, the text information is characterized as "dota", and the words with higher co-occurrence rate are: "efficients", "egg pain", "2009", "billows", "first view" and "classic", co-occurrences of 40%, 35%, 30%, 25%, 20% and 10%, respectively, the resulting combination is "dota efficients", "dota egg pain", "dota 2009", "dota billows", "dota first view" and "dota classic", in that order.
After sub-step S46, a combination of the current plurality of first participles and one or more associated second participles may be obtained. For example, the characteristic text message is 'square dancing pawn brother', is mapped into a first participle 'square dancing' and 'pawn brother', a second participle which simultaneously appears with the two first participles is extracted, for example, the second participle 'teaching', can be used as a related second participle, and then a combined 'square dancing pawn brother teaching' is finally obtained.
In a preferred embodiment of the present invention, step 104 may specifically include the following sub-steps:
and a substep S61, obtaining the network link address of the second video resource data of the main participle and the associated second participle.
After sub-step S54, a combination of the current primary participle and one or more associated secondary participles may be obtained. For example, for the first participles "mid-autumn" and "moon cake" mapped by the characteristic text information "mid-autumn moon cake", mid-autumn "may be set as the main participle, and the associated second participle" moon "is obtained, and then the combination" mid-autumn moon "is finally obtained.
In the embodiment of the invention, the matched video data resource can be searched based on the combination of one or more first participles and second participles, and when the matched video data resource is searched, the network connection address of the matched video data resource is recorded, wherein the network connection address can be an internal network address or an external network address.
And 105, pushing the network link address of the second video resource data.
In practical application, the network link address of the second video resource data may be placed at any position of the current page, or pushed in a manner of embedding an icon or a button, and the user may further load the video data resource by triggering the network link address of the second video resource data.
The method can push according to the existing published content, so that a search engine gets rid of dependence on search habits of users, and pushes video resource data which are searched by less users but have more related resources in a video library, thereby realizing deep mining of high-quality resources in the video library and improving the efficiency of resource mining; in addition, the index table can be continuously enlarged along with the continuous accumulation of internet video contents, the quantity and the breadth of the contents produced by each large video station can far exceed the number of words searched by a user, and the recall rate can be favorably enlarged.
According to the method and the device, the network connection address of the matched second video resource data of the first word segmentation and the second word segmentation is obtained, so that the user can directly obtain the video data resource based on the address, the user can simply search to obtain more results, and the search does not need to be submitted for many times, so that the burden of accessing the server is reduced, the occupation of network resources is reduced, and the user experience is improved.
For simplicity of explanation, the method embodiments are described as a series of acts or combinations, but those skilled in the art will appreciate that the embodiments are not limited by the order of acts described, as some steps may occur in other orders or concurrently with other steps in accordance with the embodiments of the invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.
Referring to fig. 2, a block diagram of a push apparatus for an associated resource address based on video search according to an embodiment of the present invention is shown, which may specifically include the following modules:
the feature text information obtaining module 201 is adapted to obtain feature text information of first video resource data when a loading or playing request of the first video resource data is received;
a first segmentation mapping module 202 adapted to map the characteristic text information into one or more first segmentations;
the second word segmentation searching module 203 is suitable for searching for associated second words with the co-occurrence rate of the one or more first words higher than a preset threshold; the co-occurrence rate is the probability that one or more first participles and one or more second participles commonly occur in the same video resource data;
a network connection address obtaining module 204, adapted to obtain a network link address of the second video resource data matching the one or more first participles and the associated second participles;
a network connection address pushing module 205, adapted to push the network connection address of the second video resource data.
In a preferred embodiment of the present invention, the characteristic text information obtaining module may be further adapted to:
when a playing request of first video data is received, receiving feature text information of the first video resource data sent by a current terminal;
or,
when a first video data loading request is received, extracting local preset feature text information of the video resource data.
In a preferred embodiment of the present invention, the first segmentation mapping module may be further adapted to:
extracting a word segmentation mapped by the characteristic text information;
or,
when the received characteristic text information is a compound word, splitting the characteristic text information into a plurality of search subwords; extracting a plurality of segmentation words mapped by the plurality of search sub-words.
In a preferred embodiment of the present invention, the second participle lookup module may be further adapted to:
when the characteristic text information is mapped into a first word segmentation, extracting a preset index table corresponding to the first word segmentation; the index table comprises information of video resource data to which the first word belongs and all words in the video resource data; all the participles in the video resource data are generated by capturing the video resource data, extracting the characteristic text information of the video resource data and participling the characteristic text information;
calculating the co-occurrence rate of the first participle and each second participle in the index table, wherein the co-occurrence rate is the ratio of the occurrence frequency of each second participle in the index table to the total number of information of the video resource data in the index table; the second participles are participles except the first participles in all participles in the video resource data;
and extracting the second participles with the co-occurrence rate higher than a preset threshold value as associated second participles.
In a preferred embodiment of the present invention, the second participle lookup module may be further adapted to:
when the characteristic text information is mapped into a plurality of first participles, respectively extracting a plurality of preset index tables corresponding to the plurality of first participles; each index table comprises information of video resource data to which the first word belongs and all words in the video resource data; all the participles in the video resource data are generated by capturing the video resource data, extracting the characteristic text information of the video resource data and participling the characteristic text information;
extracting a second participle which commonly appears with the plurality of first participles as a candidate participle; the second participles are participles except the first participles in all participles in the video resource data;
respectively calculating the co-occurrence rate of the first participle and the candidate participle in each index table, wherein the co-occurrence rate is the ratio of the occurrence frequency of the candidate participle in the index table to the total number of information of the video resource data in the index table;
configuring a plurality of corresponding weights for the co-occurrence rates of the plurality of first participles and the candidate participles respectively;
calculating an average value of a plurality of co-occurrence rates configured with weights as the co-occurrence rates of the plurality of first participles and the candidate participles;
and extracting the candidate participles with the co-occurrence rate higher than a preset threshold value as associated second participles.
In a preferred embodiment of the present invention, the second participle lookup module may be further adapted to:
when the characteristic text information is mapped into a plurality of first participles, respectively extracting a plurality of preset index tables corresponding to the plurality of first participles; each index table comprises information of video resource data to which the first word belongs and all words in the video resource data; all the participles in the video resource data are generated by capturing the video resource data, extracting the characteristic text information of the video resource data and participling the characteristic text information;
determining a main participle by adopting the plurality of index tables, wherein the main participle is a first participle corresponding to the index table with the largest total number of information of the video resource data;
calculating the co-occurrence rate of the main participle and each second participle in the index table corresponding to the main participle, wherein the co-occurrence rate is the ratio of the occurrence frequency of each second participle in the index table to the total information number of the video resource data in the index table; the second participles are participles except the first participles in all participles in the video resource data;
and extracting the second participles with the co-occurrence rate higher than a preset threshold value as associated second participles.
In a preferred embodiment of the invention, the characteristic text information may comprise a video title, a video keyword and/or a video description.
In a preferred embodiment of the present invention, the network connection address obtaining module may be further adapted to:
and acquiring the network link address of the main participle and the second video resource data associated with the second participle.
For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.
The algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. Moreover, the present invention is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.
In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.
Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.
The various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. It will be appreciated by those skilled in the art that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functionality of some or all of the components of the video search based push device of associated resource addresses according to embodiments of the present invention. The present invention may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present invention may be stored on computer-readable media or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.
The invention discloses A1, a method for pushing associated resource addresses based on video search, comprising the following steps:
when a loading or playing request of first video resource data is received, acquiring feature text information of the first video resource data;
mapping the characteristic text information into one or more first word segmentations;
searching for associated second participles with the co-occurrence rate of the one or more first participles higher than a preset threshold; the co-occurrence rate is the probability that one or more first participles and one or more second participles commonly occur in the same video resource data;
acquiring a network link address of second video resource data matched with the one or more first participles and the associated second participles;
and pushing the network link address of the second video resource data.
A2, the method as in a1, wherein the step of obtaining the feature text information of the first video resource data when receiving a request for loading or playing the first video resource data comprises:
when a playing request of first video data is received, receiving feature text information of the first video resource data sent by a current terminal;
or,
when a first video data loading request is received, extracting local preset feature text information of the video resource data.
A3, the method of A1, the step of mapping the characteristic text information to one or more first participles comprising:
extracting a word segmentation mapped by the characteristic text information;
or,
when the received characteristic text information is a compound word, splitting the characteristic text information into a plurality of search subwords; extracting a plurality of segmentation words mapped by the plurality of search sub-words.
A4, the method as in a1, the step of finding associated second participles with a co-occurrence rate of the one or more first participles above a preset threshold comprising:
when the characteristic text information is mapped into a first word segmentation, extracting a preset index table corresponding to the first word segmentation; the index table comprises information of video resource data to which the first word belongs and all words in the video resource data; all the participles in the video resource data are generated by capturing the video resource data, extracting the characteristic text information of the video resource data and participling the characteristic text information;
calculating the co-occurrence rate of the first participle and each second participle in the index table, wherein the co-occurrence rate is the ratio of the occurrence frequency of each second participle in the index table to the total number of information of the video resource data in the index table; the second participles are participles except the first participles in all participles in the video resource data;
and extracting the second participles with the co-occurrence rate higher than a preset threshold value as associated second participles.
A5, the method as in a1, the step of finding associated second participles with a co-occurrence rate of the one or more first participles above a preset threshold comprising:
when the characteristic text information is mapped into a plurality of first participles, respectively extracting a plurality of preset index tables corresponding to the plurality of first participles; each index table comprises information of video resource data to which the first word belongs and all words in the video resource data; all the participles in the video resource data are generated by capturing the video resource data, extracting the characteristic text information of the video resource data and participling the characteristic text information;
extracting a second participle which commonly appears with the plurality of first participles as a candidate participle; the second participles are participles except the first participles in all participles in the video resource data;
respectively calculating the co-occurrence rate of the first participle and the candidate participle in each index table, wherein the co-occurrence rate is the ratio of the occurrence frequency of the candidate participle in the index table to the total number of information of the video resource data in the index table;
configuring a plurality of corresponding weights for the co-occurrence rates of the plurality of first participles and the candidate participles respectively;
calculating an average value of a plurality of co-occurrence rates configured with weights as the co-occurrence rates of the plurality of first participles and the candidate participles;
and extracting the candidate participles with the co-occurrence rate higher than a preset threshold value as associated second participles.
A6, the method as in a1, the step of finding associated second participles with a co-occurrence rate of the one or more first participles above a preset threshold comprising:
when the characteristic text information is mapped into a plurality of first participles, respectively extracting a plurality of preset index tables corresponding to the plurality of first participles; each index table comprises information of video resource data to which the first word belongs and all words in the video resource data; all the participles in the video resource data are generated by capturing the video resource data, extracting the characteristic text information of the video resource data and participling the characteristic text information;
determining a main participle by adopting the plurality of index tables, wherein the main participle is a first participle corresponding to the index table with the largest total number of information of the video resource data;
calculating the co-occurrence rate of the main participle and each second participle in the index table corresponding to the main participle, wherein the co-occurrence rate is the ratio of the occurrence frequency of each second participle in the index table to the total information number of the video resource data in the index table; the second participles are participles except the first participles in all participles in the video resource data;
and extracting the second participles with the co-occurrence rate higher than a preset threshold value as associated second participles.
A7, the method as described in a1 or a4 or a5 or a6, the feature text information comprising a video title, a video keyword and/or a video description.
A8, the method of a6, the step of obtaining a network link address of second video asset data matching the one or more first participles and the associated second participles comprising:
and acquiring the network link address of the main participle and the second video resource data associated with the second participle.
The invention also discloses B9, a pushing device of the associated resource address based on the video search, comprising:
the device comprises a characteristic text information acquisition module, a characteristic text information acquisition module and a characteristic text information processing module, wherein the characteristic text information acquisition module is suitable for acquiring characteristic text information of first video resource data when a loading or playing request of the first video resource data is received;
a first segmentation mapping module adapted to map the characteristic text information into one or more first segmentations;
the second word segmentation searching module is suitable for searching for associated second words with the co-occurrence rate of the one or more first words higher than a preset threshold value; the co-occurrence rate is the probability that one or more first participles and one or more second participles commonly occur in the same video resource data;
the network connection address acquisition module is suitable for acquiring a network link address of second video resource data matched with the one or more first participles and the associated second participles;
and the network connection address pushing module is suitable for pushing the network link address of the second video resource data.
B10, the apparatus as in B9, the characteristic text information obtaining module further adapted to:
when a playing request of first video data is received, receiving feature text information of the first video resource data sent by a current terminal;
or,
when a first video data loading request is received, extracting local preset feature text information of the video resource data.
B11, the apparatus of B9, the first participle mapping module further adapted to:
extracting a word segmentation mapped by the characteristic text information;
or,
when the received characteristic text information is a compound word, splitting the characteristic text information into a plurality of search subwords; extracting a plurality of segmentation words mapped by the plurality of search sub-words.
B12, the apparatus as in B9, the second participle lookup module further adapted to:
when the characteristic text information is mapped into a first word segmentation, extracting a preset index table corresponding to the first word segmentation; the index table comprises information of video resource data to which the first word belongs and all words in the video resource data; all the participles in the video resource data are generated by capturing the video resource data, extracting the characteristic text information of the video resource data and participling the characteristic text information;
calculating the co-occurrence rate of the first participle and each second participle in the index table, wherein the co-occurrence rate is the ratio of the occurrence frequency of each second participle in the index table to the total number of information of the video resource data in the index table; the second participles are participles except the first participles in all participles in the video resource data;
and extracting the second participles with the co-occurrence rate higher than a preset threshold value as associated second participles.
B13, the apparatus as in B9, the second participle lookup module further adapted to:
when the characteristic text information is mapped into a plurality of first participles, respectively extracting a plurality of preset index tables corresponding to the plurality of first participles; each index table comprises information of video resource data to which the first word belongs and all words in the video resource data; all the participles in the video resource data are generated by capturing the video resource data, extracting the characteristic text information of the video resource data and participling the characteristic text information;
extracting a second participle which commonly appears with the plurality of first participles as a candidate participle; the second participles are participles except the first participles in all participles in the video resource data;
respectively calculating the co-occurrence rate of the first participle and the candidate participle in each index table, wherein the co-occurrence rate is the ratio of the occurrence frequency of the candidate participle in the index table to the total number of information of the video resource data in the index table;
configuring a plurality of corresponding weights for the co-occurrence rates of the plurality of first participles and the candidate participles respectively;
calculating an average value of a plurality of co-occurrence rates configured with weights as the co-occurrence rates of the plurality of first participles and the candidate participles;
and extracting the candidate participles with the co-occurrence rate higher than a preset threshold value as associated second participles.
B14, the apparatus as in B9, the second participle lookup module further adapted to:
when the characteristic text information is mapped into a plurality of first participles, respectively extracting a plurality of preset index tables corresponding to the plurality of first participles; each index table comprises information of video resource data to which the first word belongs and all words in the video resource data; all the participles in the video resource data are generated by capturing the video resource data, extracting the characteristic text information of the video resource data and participling the characteristic text information;
determining a main participle by adopting the plurality of index tables, wherein the main participle is a first participle corresponding to the index table with the largest total number of information of the video resource data;
calculating the co-occurrence rate of the main participle and each second participle in the index table corresponding to the main participle, wherein the co-occurrence rate is the ratio of the occurrence frequency of each second participle in the index table to the total information number of the video resource data in the index table; the second participles are participles except the first participles in all participles in the video resource data;
and extracting the second participles with the co-occurrence rate higher than a preset threshold value as associated second participles.
B15, the apparatus as described in B9 or B12 or B13 or B14, the feature text information comprising a video title, a video keyword and/or a video description.
B16, the apparatus as in B14, the network connection address obtaining module further adapted to:
and acquiring the network link address of the main participle and the second video resource data associated with the second participle.
Claims (10)
1. A pushing method of associated resource addresses based on video search comprises the following steps:
when a loading or playing request of first video resource data is received, acquiring feature text information of the first video resource data;
mapping the characteristic text information into one or more first word segmentations;
searching for associated second participles with the co-occurrence rate of the one or more first participles higher than a preset threshold; the co-occurrence rate is the probability that one or more first participles and one or more second participles commonly occur in the same video resource data;
acquiring a network link address of second video resource data matched with the one or more first participles and the associated second participles;
and pushing the network link address of the second video resource data.
2. The method as claimed in claim 1, wherein the step of obtaining the feature text information of the first video resource data when receiving a request for loading or playing the first video resource data comprises:
when a playing request of first video data is received, receiving feature text information of the first video resource data sent by a current terminal;
or,
when a first video data loading request is received, extracting local preset feature text information of the video resource data.
3. The method of claim 1, wherein the step of mapping the eigentext information into one or more first tokens comprises:
extracting a word segmentation mapped by the characteristic text information;
or,
when the received characteristic text information is a compound word, splitting the characteristic text information into a plurality of search subwords; extracting a plurality of segmentation words mapped by the plurality of search sub-words.
4. The method of claim 1, wherein the step of finding associated second participles having a co-occurrence rate with the one or more first participles above a preset threshold comprises:
when the characteristic text information is mapped into a first word segmentation, extracting a preset index table corresponding to the first word segmentation; the index table comprises information of video resource data to which the first word belongs and all words in the video resource data; all the participles in the video resource data are generated by capturing the video resource data, extracting the characteristic text information of the video resource data and participling the characteristic text information;
calculating the co-occurrence rate of the first participle and each second participle in the index table, wherein the co-occurrence rate is the ratio of the occurrence frequency of each second participle in the index table to the total number of information of the video resource data in the index table; the second participles are participles except the first participles in all participles in the video resource data;
and extracting the second participles with the co-occurrence rate higher than a preset threshold value as associated second participles.
5. The method of claim 1, wherein the step of finding associated second participles having a co-occurrence rate with the one or more first participles above a preset threshold comprises:
when the characteristic text information is mapped into a plurality of first participles, respectively extracting a plurality of preset index tables corresponding to the plurality of first participles; each index table comprises information of video resource data to which the first word belongs and all words in the video resource data; all the participles in the video resource data are generated by capturing the video resource data, extracting the characteristic text information of the video resource data and participling the characteristic text information;
extracting a second participle which commonly appears with the plurality of first participles as a candidate participle; the second participles are participles except the first participles in all participles in the video resource data;
respectively calculating the co-occurrence rate of the first participle and the candidate participle in each index table, wherein the co-occurrence rate is the ratio of the occurrence frequency of the candidate participle in the index table to the total number of information of the video resource data in the index table;
configuring a plurality of corresponding weights for the co-occurrence rates of the plurality of first participles and the candidate participles respectively;
calculating an average value of a plurality of co-occurrence rates configured with weights as the co-occurrence rates of the plurality of first participles and the candidate participles;
and extracting the candidate participles with the co-occurrence rate higher than a preset threshold value as associated second participles.
6. The method of claim 1, wherein the step of finding associated second participles having a co-occurrence rate with the one or more first participles above a preset threshold comprises:
when the characteristic text information is mapped into a plurality of first participles, respectively extracting a plurality of preset index tables corresponding to the plurality of first participles; each index table comprises information of video resource data to which the first word belongs and all words in the video resource data; all the participles in the video resource data are generated by capturing the video resource data, extracting the characteristic text information of the video resource data and participling the characteristic text information;
determining a main participle by adopting the plurality of index tables, wherein the main participle is a first participle corresponding to the index table with the largest total number of information of the video resource data;
calculating the co-occurrence rate of the main participle and each second participle in the index table corresponding to the main participle, wherein the co-occurrence rate is the ratio of the occurrence frequency of each second participle in the index table to the total information number of the video resource data in the index table; the second participles are participles except the first participles in all participles in the video resource data;
and extracting the second participles with the co-occurrence rate higher than a preset threshold value as associated second participles.
7. The method of claim 1 or 4 or 5 or 6, wherein the characteristic text information comprises a video title, a video keyword and/or a video description.
8. The method of claim 6, wherein the step of obtaining a network link address for second video asset data that matches the one or more first participles and the associated second participles comprises:
and acquiring the network link address of the main participle and the second video resource data associated with the second participle.
9. A push device of associated resource addresses based on video search comprises:
the device comprises a characteristic text information acquisition module, a characteristic text information acquisition module and a characteristic text information processing module, wherein the characteristic text information acquisition module is suitable for acquiring characteristic text information of first video resource data when a loading or playing request of the first video resource data is received;
a first segmentation mapping module adapted to map the characteristic text information into one or more first segmentations;
the second word segmentation searching module is suitable for searching for associated second words with the co-occurrence rate of the one or more first words higher than a preset threshold value; the co-occurrence rate is the probability that one or more first participles and one or more second participles commonly occur in the same video resource data;
the network connection address acquisition module is suitable for acquiring a network link address of second video resource data matched with the one or more first participles and the associated second participles;
and the network connection address pushing module is suitable for pushing the network link address of the second video resource data.
10. The apparatus of claim 9, wherein the characteristic text information obtaining module is further adapted to:
when a playing request of first video data is received, receiving feature text information of the first video resource data sent by a current terminal;
or,
when a first video data loading request is received, extracting local preset feature text information of the video resource data.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310462461.6A CN103491205B (en) | 2013-09-30 | 2013-09-30 | The method for pushing of a kind of correlated resources address based on video search and device |
PCT/CN2014/086519 WO2015043389A1 (en) | 2013-09-30 | 2014-09-15 | Participle information push method and device based on video search |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310462461.6A CN103491205B (en) | 2013-09-30 | 2013-09-30 | The method for pushing of a kind of correlated resources address based on video search and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103491205A true CN103491205A (en) | 2014-01-01 |
CN103491205B CN103491205B (en) | 2016-08-17 |
Family
ID=49831158
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310462461.6A Active CN103491205B (en) | 2013-09-30 | 2013-09-30 | The method for pushing of a kind of correlated resources address based on video search and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103491205B (en) |
Cited By (28)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2015043389A1 (en) * | 2013-09-30 | 2015-04-02 | 北京奇虎科技有限公司 | Participle information push method and device based on video search |
CN105279172A (en) * | 2014-06-30 | 2016-01-27 | 惠州市伟乐科技股份有限公司 | Video matching method and device |
CN105912600A (en) * | 2016-04-05 | 2016-08-31 | 上海智臻智能网络科技股份有限公司 | Question-answer knowledge base and establishing method thereof, intelligent question-answering method and system |
CN110427381A (en) * | 2019-08-07 | 2019-11-08 | 北京嘉和海森健康科技有限公司 | A kind of data processing method and relevant device |
CN110674386A (en) * | 2018-06-14 | 2020-01-10 | 北京百度网讯科技有限公司 | Resource recommendation method, device and storage medium |
US10581880B2 (en) | 2016-09-19 | 2020-03-03 | Group-Ib Tds Ltd. | System and method for generating rules for attack detection feedback system |
CN111400546A (en) * | 2020-03-18 | 2020-07-10 | 腾讯科技(深圳)有限公司 | Video recall method and video recommendation method and device |
US10721271B2 (en) | 2016-12-29 | 2020-07-21 | Trust Ltd. | System and method for detecting phishing web pages |
US10721251B2 (en) | 2016-08-03 | 2020-07-21 | Group Ib, Ltd | Method and system for detecting remote access during activity on the pages of a web resource |
US10762352B2 (en) | 2018-01-17 | 2020-09-01 | Group Ib, Ltd | Method and system for the automatic identification of fuzzy copies of video content |
US10778719B2 (en) | 2016-12-29 | 2020-09-15 | Trust Ltd. | System and method for gathering information to detect phishing activity |
US10958684B2 (en) | 2018-01-17 | 2021-03-23 | Group Ib, Ltd | Method and computer device for identifying malicious web resources |
US11005779B2 (en) | 2018-02-13 | 2021-05-11 | Trust Ltd. | Method of and server for detecting associated web resources |
US11122061B2 (en) | 2018-01-17 | 2021-09-14 | Group IB TDS, Ltd | Method and server for determining malicious files in network traffic |
US11151581B2 (en) | 2020-03-04 | 2021-10-19 | Group-Ib Global Private Limited | System and method for brand protection based on search results |
US11153351B2 (en) | 2018-12-17 | 2021-10-19 | Trust Ltd. | Method and computing device for identifying suspicious users in message exchange systems |
US11250129B2 (en) | 2019-12-05 | 2022-02-15 | Group IB TDS, Ltd | Method and system for determining affiliation of software to software families |
US11356470B2 (en) | 2019-12-19 | 2022-06-07 | Group IB TDS, Ltd | Method and system for determining network vulnerabilities |
US11431749B2 (en) | 2018-12-28 | 2022-08-30 | Trust Ltd. | Method and computing device for generating indication of malicious web resources |
US11451580B2 (en) | 2018-01-17 | 2022-09-20 | Trust Ltd. | Method and system of decentralized malware identification |
US11503044B2 (en) | 2018-01-17 | 2022-11-15 | Group IB TDS, Ltd | Method computing device for detecting malicious domain names in network traffic |
US11526608B2 (en) | 2019-12-05 | 2022-12-13 | Group IB TDS, Ltd | Method and system for determining affiliation of software to software families |
US11755700B2 (en) | 2017-11-21 | 2023-09-12 | Group Ib, Ltd | Method for classifying user action sequence |
US11847223B2 (en) | 2020-08-06 | 2023-12-19 | Group IB TDS, Ltd | Method and system for generating a list of indicators of compromise |
US11934498B2 (en) | 2019-02-27 | 2024-03-19 | Group Ib, Ltd | Method and system of user identification |
US11947572B2 (en) | 2021-03-29 | 2024-04-02 | Group IB TDS, Ltd | Method and system for clustering executable files |
US11985147B2 (en) | 2021-06-01 | 2024-05-14 | Trust Ltd. | System and method for detecting a cyberattack |
US12088606B2 (en) | 2021-06-10 | 2024-09-10 | F.A.C.C.T. Network Security Llc | System and method for detection of malicious network resources |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040064447A1 (en) * | 2002-09-27 | 2004-04-01 | Simske Steven J. | System and method for management of synonymic searching |
CN101236567A (en) * | 2008-02-04 | 2008-08-06 | 上海升岳电子科技有限公司 | Method and terminal apparatus for accomplishing on-line network multimedia application |
CN101599995A (en) * | 2009-07-13 | 2009-12-09 | 中国传媒大学 | The directory distribution method and the network architecture towards high-concurrency retrieval system |
WO2010068931A1 (en) * | 2008-12-12 | 2010-06-17 | Atigeo Llc | Providing recommendations using information determined for domains of interest |
CN101957828A (en) * | 2009-07-20 | 2011-01-26 | 阿里巴巴集团控股有限公司 | Method and device for sequencing search results |
-
2013
- 2013-09-30 CN CN201310462461.6A patent/CN103491205B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040064447A1 (en) * | 2002-09-27 | 2004-04-01 | Simske Steven J. | System and method for management of synonymic searching |
CN101236567A (en) * | 2008-02-04 | 2008-08-06 | 上海升岳电子科技有限公司 | Method and terminal apparatus for accomplishing on-line network multimedia application |
WO2010068931A1 (en) * | 2008-12-12 | 2010-06-17 | Atigeo Llc | Providing recommendations using information determined for domains of interest |
CN101599995A (en) * | 2009-07-13 | 2009-12-09 | 中国传媒大学 | The directory distribution method and the network architecture towards high-concurrency retrieval system |
CN101957828A (en) * | 2009-07-20 | 2011-01-26 | 阿里巴巴集团控股有限公司 | Method and device for sequencing search results |
Cited By (32)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2015043389A1 (en) * | 2013-09-30 | 2015-04-02 | 北京奇虎科技有限公司 | Participle information push method and device based on video search |
CN105279172A (en) * | 2014-06-30 | 2016-01-27 | 惠州市伟乐科技股份有限公司 | Video matching method and device |
CN105279172B (en) * | 2014-06-30 | 2019-07-09 | 惠州市伟乐科技股份有限公司 | Video matching method and device |
CN105912600A (en) * | 2016-04-05 | 2016-08-31 | 上海智臻智能网络科技股份有限公司 | Question-answer knowledge base and establishing method thereof, intelligent question-answering method and system |
US10721251B2 (en) | 2016-08-03 | 2020-07-21 | Group Ib, Ltd | Method and system for detecting remote access during activity on the pages of a web resource |
US10581880B2 (en) | 2016-09-19 | 2020-03-03 | Group-Ib Tds Ltd. | System and method for generating rules for attack detection feedback system |
US10778719B2 (en) | 2016-12-29 | 2020-09-15 | Trust Ltd. | System and method for gathering information to detect phishing activity |
US10721271B2 (en) | 2016-12-29 | 2020-07-21 | Trust Ltd. | System and method for detecting phishing web pages |
US11755700B2 (en) | 2017-11-21 | 2023-09-12 | Group Ib, Ltd | Method for classifying user action sequence |
US11503044B2 (en) | 2018-01-17 | 2022-11-15 | Group IB TDS, Ltd | Method computing device for detecting malicious domain names in network traffic |
US10762352B2 (en) | 2018-01-17 | 2020-09-01 | Group Ib, Ltd | Method and system for the automatic identification of fuzzy copies of video content |
US10958684B2 (en) | 2018-01-17 | 2021-03-23 | Group Ib, Ltd | Method and computer device for identifying malicious web resources |
US11122061B2 (en) | 2018-01-17 | 2021-09-14 | Group IB TDS, Ltd | Method and server for determining malicious files in network traffic |
US11475670B2 (en) | 2018-01-17 | 2022-10-18 | Group Ib, Ltd | Method of creating a template of original video content |
US11451580B2 (en) | 2018-01-17 | 2022-09-20 | Trust Ltd. | Method and system of decentralized malware identification |
US11005779B2 (en) | 2018-02-13 | 2021-05-11 | Trust Ltd. | Method of and server for detecting associated web resources |
CN110674386A (en) * | 2018-06-14 | 2020-01-10 | 北京百度网讯科技有限公司 | Resource recommendation method, device and storage medium |
CN110674386B (en) * | 2018-06-14 | 2022-11-01 | 北京百度网讯科技有限公司 | Resource recommendation method, device and storage medium |
US11153351B2 (en) | 2018-12-17 | 2021-10-19 | Trust Ltd. | Method and computing device for identifying suspicious users in message exchange systems |
US11431749B2 (en) | 2018-12-28 | 2022-08-30 | Trust Ltd. | Method and computing device for generating indication of malicious web resources |
US11934498B2 (en) | 2019-02-27 | 2024-03-19 | Group Ib, Ltd | Method and system of user identification |
CN110427381A (en) * | 2019-08-07 | 2019-11-08 | 北京嘉和海森健康科技有限公司 | A kind of data processing method and relevant device |
US11250129B2 (en) | 2019-12-05 | 2022-02-15 | Group IB TDS, Ltd | Method and system for determining affiliation of software to software families |
US11526608B2 (en) | 2019-12-05 | 2022-12-13 | Group IB TDS, Ltd | Method and system for determining affiliation of software to software families |
US11356470B2 (en) | 2019-12-19 | 2022-06-07 | Group IB TDS, Ltd | Method and system for determining network vulnerabilities |
US11151581B2 (en) | 2020-03-04 | 2021-10-19 | Group-Ib Global Private Limited | System and method for brand protection based on search results |
CN111400546B (en) * | 2020-03-18 | 2020-12-01 | 腾讯科技(深圳)有限公司 | Video recall method and video recommendation method and device |
CN111400546A (en) * | 2020-03-18 | 2020-07-10 | 腾讯科技(深圳)有限公司 | Video recall method and video recommendation method and device |
US11847223B2 (en) | 2020-08-06 | 2023-12-19 | Group IB TDS, Ltd | Method and system for generating a list of indicators of compromise |
US11947572B2 (en) | 2021-03-29 | 2024-04-02 | Group IB TDS, Ltd | Method and system for clustering executable files |
US11985147B2 (en) | 2021-06-01 | 2024-05-14 | Trust Ltd. | System and method for detecting a cyberattack |
US12088606B2 (en) | 2021-06-10 | 2024-09-10 | F.A.C.C.T. Network Security Llc | System and method for detection of malicious network resources |
Also Published As
Publication number | Publication date |
---|---|
CN103491205B (en) | 2016-08-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103491205B (en) | The method for pushing of a kind of correlated resources address based on video search and device | |
CN103488787B (en) | A kind of method for pushing and device of the online broadcasting entrance object based on video search | |
CN112015949B (en) | Video generation method and device, storage medium and electronic equipment | |
CN103544266B (en) | A kind of method and device for searching for suggestion word generation | |
CN108733766B (en) | Data query method and device and readable medium | |
CN110569496B (en) | Entity linking method, device and storage medium | |
CN109271518B (en) | Method and equipment for classified display of microblog information | |
US10152478B2 (en) | Apparatus, system and method for string disambiguation and entity ranking | |
CN102279889B (en) | A kind of question pushing method and system based on geography information | |
WO2015175931A1 (en) | Language modeling for conversational understanding domains using semantic web resources | |
CN103793434A (en) | Content-based image search method and device | |
CN111506831A (en) | Collaborative filtering recommendation module and method, electronic device and storage medium | |
CN111783712A (en) | Video processing method, device, equipment and medium | |
Altadmri et al. | A framework for automatic semantic video annotation: Utilizing similarity and commonsense knowledge bases | |
Kaneko et al. | Visual event mining from geo-tweet photos | |
Li et al. | Question answering over community-contributed web videos | |
CN113821592A (en) | Data processing method, device, equipment and storage medium | |
EP3905060A1 (en) | Artificial intelligence for content discovery | |
CN103500214B (en) | Word segmentation information pushing method and device based on video searching | |
KR20120003834A (en) | Entity searching and opinion mining system of hybrid-based using internet and method thereof | |
CN109145261B (en) | Method and device for generating label | |
WO2015043389A1 (en) | Participle information push method and device based on video search | |
Miao et al. | Automatic identifying entity type in linked data | |
KR102279125B1 (en) | Terminal and apparatus for providing recommendation information based on preference filter | |
CN109284364B (en) | Interactive vocabulary updating method and device for voice microphone-connecting interaction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20220707 Address after: Room 801, 8th floor, No. 104, floors 1-19, building 2, yard 6, Jiuxianqiao Road, Chaoyang District, Beijing 100015 Patentee after: BEIJING QIHOO TECHNOLOGY Co.,Ltd. Address before: 100088 room 112, block D, 28 new street, new street, Xicheng District, Beijing (Desheng Park) Patentee before: BEIJING QIHOO TECHNOLOGY Co.,Ltd. Patentee before: Qizhi software (Beijing) Co.,Ltd. |