CN103500214B

CN103500214B - Word segmentation information pushing method and device based on video searching

Info

Publication number: CN103500214B
Application number: CN201310462214.6A
Authority: CN
Inventors: 崔代超
Original assignee: Beijing Qihoo Technology Co Ltd; Qizhi Software Beijing Co Ltd
Current assignee: Beijing Qihoo Technology Co Ltd
Priority date: 2013-09-30
Filing date: 2013-09-30
Publication date: 2017-04-19
Anticipated expiration: 2033-09-30
Also published as: CN103500214A

Abstract

The invention discloses a word segmentation information pushing method based on video searching. The method includes the steps of receiving a video searching character string, mapping the video searching character string into one or more first segmentations, looking for a relevant second word segmentation, wherein the concurrence rate of the relevant second word segmentation and the one or more first segmentations is higher than a preset threshold value, and the concurrence rate refers to the probability that the current first or more first word segmentations and the second word segmentation appear together in the same video resource data, and pushing a combination of the one or more first word segmentations and the one or more relevant second word segmentations. According to the method, data of the video resources which are seldom searched by users but have multiple relative resources in a video database are pushed, good-quality resources in the video database are deeply dug, resource digging efficiency is improved, an index table can be expanded along with constant accumulation of video content of the Internet, the quantity and the range of content produced by video stations will further exceed words which the users search, and the recall rate is beneficially increased.

Description

Word segmentation information pushing method and device based on video search

Technical Field

The invention relates to the technical field of internet, in particular to a word segmentation information pushing method and a word segmentation information pushing device based on video search.

Background

A video search engine is a vertical search technique that is different from comprehensive search. The video search engine captures the results of the videos in the Internet and establishes an index, and the time for searching the videos by netizens can be greatly saved as the pure video results can be provided for searchers.

According to the related statistical data display of video search, videos of entertainment, games, movies, news, animation and other types are main search objects of users. This indicates that the user has a very demanding nature for the video search itself. Users often do not have strong purposiveness, and search results are not "not very popular", but rather have certain extensibility as long as the target is in the user's favorite category. Therefore, often relevant recommendations are made to the user outside of the search results.

However, the existing video search engine has made deficiencies in the related recommendations: some video search engines do not have relevant recommendations, and the video search engines with relevant recommendations can only achieve the recommendation in simple ways such as obtaining a correlation system through manual sorting according to search history data of users. The recommendation system is based on the existing search habits of the users, the recall rate is low, and in addition, the search range of the users is generally much smaller than the resource range in the existing Internet, so that high-quality videos in the Internet cannot be fully mined.

Another search recommendation method is to use a resource association system obtained by manual sorting or other knowledge systems in a recommendation system. For example, when a certain search engine searches for "square dance", recommended words such as "dance with friendship", "dance with belly", "gym" and the like are obtained, and when the search engine searches for "dota", recommended words such as "crossing fire", "magic animal world" and the like are obtained, but the recall rate of the system is low, and recommendations cannot be generally given in long-tailed search.

Disclosure of Invention

In view of the above problems, the present invention has been made to provide a method for pushing word segmentation information based on video search and a corresponding device for pushing word segmentation information based on video search, which overcome or at least partially solve the above problems.

According to one aspect of the invention, a word segmentation information pushing method based on video search is provided, which comprises the following steps:

receiving a video search string;

mapping the video search string into one or more first participles;

searching for associated second participles with the co-occurrence rate of the one or more first participles higher than a preset threshold; the co-occurrence rate is the probability that one or more first participles and one or more second participles commonly occur in the same video resource data;

pushing a combination of the one or more first participles and the one or more associated second participles.

Optionally, the step of mapping the video search string into one or more first participles includes:

extracting a participle mapped by the video search character string;

or,

when the received video search character string is a compound word, splitting the video search character string into a plurality of search sub-words; extracting a plurality of segmentation words mapped by the plurality of search sub-words.

Optionally, the step of searching for associated second word segments with a co-occurrence rate higher than a preset threshold with the one or more first word segments comprises:

when the video search character string is mapped to a first word segmentation, extracting a preset index table corresponding to the first word segmentation; the index table comprises information of video resource data to which the first word belongs and all words in the video resource data; all the participles in the video resource data are generated by capturing the video resource data, extracting the characteristic text information of the video resource data and participling the characteristic text information;

calculating the co-occurrence rate of the first participle and each second participle in the index table, wherein the co-occurrence rate is the ratio of the occurrence frequency of each second participle in the index table to the total number of information of the video resource data in the index table; the second participles are participles except the first participles in all participles in the video resource data;

and extracting the second participles with the co-occurrence rate higher than a preset threshold value as associated second participles.

when the video search character string is mapped into a plurality of first participles, respectively extracting a plurality of preset index tables corresponding to the first participles; each index table comprises information of video resource data to which the first word belongs and all words in the video resource data; all the participles in the video resource data are generated by capturing the video resource data, extracting the characteristic text information of the video resource data and participling the characteristic text information;

extracting a second participle which commonly appears with the plurality of first participles as a candidate participle; the second participles are participles except the first participles in all participles in the video resource data;

respectively calculating the co-occurrence rate of the first participle and the candidate participle in each index table, wherein the co-occurrence rate is the ratio of the occurrence frequency of the candidate participle in the index table to the total number of information of the video resource data in the index table;

configuring a plurality of corresponding weights for the co-occurrence rates of the plurality of first participles and the candidate participles respectively;

calculating an average value of a plurality of co-occurrence rates configured with weights as the co-occurrence rates of the plurality of first participles and the candidate participles;

and extracting the candidate participles with the co-occurrence rate higher than a preset threshold value as associated second participles.

determining a main participle by adopting the plurality of index tables, wherein the main participle is a first participle corresponding to the index table with the largest total number of information of the video resource data;

calculating the co-occurrence rate of the main participle and each second participle in the index table corresponding to the main participle, wherein the co-occurrence rate is the ratio of the occurrence frequency of each second participle in the index table to the total information number of the video resource data in the index table; the second participles are participles except the first participles in all participles in the video resource data;

Optionally, the characteristic text information includes a video title, a video keyword, and/or a video description.

Optionally, the step of pushing a combination of the one or more first participles and the one or more associated second participles comprises:

pushing a combination of the primary participle and the associated second participle.

According to another aspect of the present invention, there is provided a word segmentation information pushing apparatus based on video search, including:

the video search character string receiving module is suitable for receiving a video search character string;

a first segmentation mapping module adapted to map the video search string into one or more first segmentations;

the second word segmentation searching module is suitable for searching for associated second words with the co-occurrence rate of the one or more first words higher than a preset threshold value; the co-occurrence rate is the probability that one or more present participles and a second participle co-occur in the same video resource data;

a combination pushing module adapted to push a combination of the one or more first participles and the one or more associated second participles.

Optionally, the first segmentation mapping module is further adapted to:

extracting a participle mapped by the video search character string;

or,

Optionally, the second participle lookup module is further adapted to:

Optionally, the combined pushing module is further adapted to:

The method can push according to the existing published content, so that a search engine gets rid of dependence on search habits of users, and pushes video resource data which are searched by less users but have more related resources in a video library, thereby realizing deep mining of high-quality resources in the video library and improving the efficiency of resource mining; in addition, the index table can be continuously enlarged along with the continuous accumulation of internet video contents, the quantity and the breadth of the contents produced by each large video station can far exceed the number of words searched by a user, and the recall rate can be favorably enlarged.

According to the method and the device, the combination of the first participle and the second participle is pushed, so that a user can directly conduct more levels of search based on the combination, the user can obtain more results through simple search, and the search does not need to be submitted for many times, so that the burden of accessing a server is reduced, the occupation of network resources is reduced, and the user experience is improved.

The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

fig. 1 is a flowchart illustrating steps of an embodiment of a method for pushing word segmentation information based on video search according to an embodiment of the present invention; and

fig. 2 is a block diagram illustrating an embodiment of a device for pushing word segmentation information based on video search according to an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

Referring to fig. 1, a flowchart illustrating steps of an embodiment of a method for pushing word segmentation information based on video search according to an embodiment of the present invention is shown, which may specifically include the following steps:

step 101, receiving a video search character string;

it should be noted that the video search string may be video search information input by a user, and may be used to request a search of a video data resource related to the video search information.

In practical applications, the video search string may be a word, i.e., include a semantically independent word, such as mid-autumn, end-of-day, national celebration, etc.; the video search string may also be a compound word, i.e., comprising two or more semantically independent words, such as moon cake in mid autumn, rice dumpling at noon, travel in national and Tibetan, etc.

Step 102, mapping the video search character string into one or more first word segmentations;

it should be noted that the mapped segmentation words may be preset, and may be used to calculate the co-occurrence rate between different segmentation words.

The mapping rule can be one or more preset rules, and can include removing words without practical meanings such as dirty words, modified words, word help words, broad words and the like of video search characters; may include setting a stop word, i.e. some common words, as a criterion for stopping when splitting the phrase, e.g. yes, me, you, etc.; the method can also comprise the correspondence of the association relationship, and a plurality of expressions of the same thing are corresponded into one expression, for example, eighty-five month, mid-autumn festival, moon cake festival and the like are associated into mid-autumn; other mapping rules may also be included, which is not limited in this embodiment of the present invention.

English is a word unit, words are separated by spaces, Chinese is a word unit, and all words in a sentence can be connected to describe a meaning. For example, the English sentence I am a student, in Chinese, is: "I am a student". The computer can simply know that a student is a word by means of a space, but it cannot be easily understood that two words "learn" and "give birth" together to represent a word. The Chinese character sequence is cut into meaningful words, namely Chinese word segmentation. For example, i am a student, and the result of the word segmentation is: i, one, student.

Some common word segmentation methods are presented below:

1. the word segmentation method based on character string matching comprises the following steps: the method is characterized in that a Chinese character string to be analyzed is matched with a vocabulary entry in a preset machine dictionary according to a certain strategy, and if a certain character string is found in the dictionary, the matching is successful (a word is identified). In the actually used word segmentation system, mechanical word segmentation is used as an initial segmentation means, and various other language information is used to further improve the accuracy of segmentation.

2. The word segmentation method based on feature scanning or mark segmentation comprises the following steps: the method is characterized in that some words with obvious characteristics are preferentially identified and segmented in a character string to be analyzed, the words are used as breakpoints, an original character string can be segmented into smaller strings, and then mechanical segmentation is carried out, so that the matching error rate is reduced; or combining word segmentation and part of speech tagging, providing help for word decision by utilizing rich part of speech information, and detecting and adjusting word segmentation results in the tagging process, thereby improving the segmentation accuracy.

3. Understanding-based word segmentation method: the method is to enable a computer to simulate the understanding of sentences by a human so as to achieve the effect of recognizing words. The basic idea is to analyze syntax and semantics while segmenting words, and to process ambiguity phenomenon by using syntax information and semantic information. It generally comprises three parts: word segmentation subsystem, syntax semantic subsystem, and master control part. Under the coordination of the master control part, the word segmentation subsystem can obtain syntactic and semantic information of related words, sentences and the like to judge word segmentation ambiguity, namely the word segmentation subsystem simulates the process of understanding sentences by people. This word segmentation method requires the use of a large amount of linguistic knowledge and information.

4. The word segmentation method based on statistics comprises the following steps: the word co-occurrence frequency or probability of adjacent co-occurrence of the characters in the Chinese information can better reflect the credibility of the formed words, so that the frequency of the combination of the adjacent co-occurrence characters in the Chinese data can be counted, the co-occurrence information of the adjacent co-occurrence characters can be calculated, and the adjacent co-occurrence probability of the two Chinese characters X, Y can be calculated. The mutual presentation information can reflect the closeness degree of the combination relation between the Chinese characters. When the degree of closeness is above a certain threshold, it is considered that the word group may constitute a word. The method only needs to count the word group frequency in the corpus and does not need to segment the dictionary.

In a preferred embodiment of the present invention, the step 102 may specifically include the following sub-steps:

substep S11, extracting a participle mapped by the video search character string;

for the situation that the video search character string is a word, the corresponding participle can be directly extracted according to the preset mapping rule. For example, the video search string is "mid-autumn", "my mid-autumn", or "mid-autumn", etc., and the first segmentation of the mapping may be "mid-autumn". Of course, the video search string may also be the same word as the first word mapped thereto, for example, the video search string is "mid-autumn", and the first word mapped thereto may also be "mid-autumn".

Or,

substep S12, when the received video search character string is a compound word, splitting the video search character string into a plurality of search subwords;

and a substep S13 of extracting a plurality of participles mapped by the plurality of search subwords.

For the situation that the video search character string is a compound word, word segmentation can be performed according to a preset mapping rule to obtain search sub-words, and then word segmentation corresponding to the search sub-words is respectively extracted. For example, the received video search string is "moon cake in mid-autumn festival", and can be split into two search sub-words, namely "moon cake" and "mid-autumn festival", and then the "moon cake" is mapped to the "mid-autumn festival", and the "moon cake" is mapped to the "moon cake", so that two first sub-words, namely "mid-autumn festival" and "moon cake", are obtained.

103, searching for associated second participles with the co-occurrence rate of the one or more first participles higher than a preset threshold;

the co-occurrence rate is the probability that one or more first participles and one or more second participles commonly occur in the same video resource data;

it should be noted that the second participle may be a participle other than the first participle in all the preset participles. The associated second participle may be a second participle having a co-occurrence rate with the first participle above a preset threshold.

In practical applications, the video resource data may include characteristic text information, and the characteristic text information may be used to record related information of the video resource data, and may also be used to extract word segmentation.

In a preferred embodiment of the invention, the characteristic text information may comprise a video title, a video keyword and/or a video description.

For example, in a piece of video resource data named "pat guest" changing Venice after eastern guan rainstorm, thousands of cars being immersed in water and anchored-online playing-XX net, video high definition online watching ", the characteristic text information may be as follows:

video Title (Title): changing Venice after a torrential rain in Dongguan, immersing and anchoring thousands of vehicles in water, playing on line, playing in an XX network, and watching videos in a high-definition on line;

video Keywords (Keywords): YY records the life information of Dongguan water;

video Description (Description): in the morning of yesterday, a storm makes the street in the east guan part of the area feel as if coming to Venice instantly. The running trolley is subjected to water immersion and anchorage in heavy rain, and some streets and houses are also in Wangyang.

Specifically, the co-occurrence rate may be a probability that one or more present participles and a second participle co-occur in the feature text information of the same video resource data, and specifically may include a co-occurrence rate of a first participle and a second participle, and a co-occurrence rate of a plurality of participles and a second participle.

In a preferred embodiment of the present invention, the step 103 may specifically include the following sub-steps:

substep S21, when the video search string is mapped to a first word, extracting a preset index table corresponding to the first word; the index table comprises information of video resource data to which the first word belongs and all words in the video resource data; all the participles in the video resource data are generated by capturing the video resource data, extracting the characteristic text information of the video resource data and participling the characteristic text information;

in a specific implementation, a search engine can be adopted in advance to capture video resource data on each website platform through a crawler, and then an index library is established: extracting characteristic text information of the video resource data to perform word segmentation processing, and establishing an index table corresponding to each word segmentation, wherein the index table can store information of the video resource data (which can be ID, intranet address, extranet address and other video identifiers, and can also be a record formed by current word segmentation and other word segmentation), and all words in the video resource data (including a first word segmentation and a second word segmentation except the first word segmentation).

For example, an index table for "mid autumn" may be as follows:

the first segmentation is 'mid-autumn', and the information of the video resource data comprises a video identifier. Of course, the information of the video resource data may not include the video identifier, but only the records formed by the first participle and the second participle (i.e. the second participle of each line is used as one record).

Of course, the above index table is only used as an example, and when implementing the embodiment of the present invention, other index tables may be set according to actual situations, which is not limited in the embodiment of the present invention. In addition, besides the above index table, a person skilled in the art may also use other index tables according to actual needs, and the embodiment of the present invention is not limited to this.

It should be noted that, the video resource data on each platform can be captured periodically or at irregular time, and then the index database is updated, that is, each index table is updated.

Substep S22, calculating a co-occurrence rate of the first participle and each second participle in the index table, where the co-occurrence rate is a ratio of the occurrence frequency of each second participle in the index table to the total number of information of the video resource data in the index table; the second participles are participles except the first participles in all participles in the video resource data;

since the number of occurrences of each second participle in the index table is the same as the number of video data to which the second participle belongs, the co-occurrence rate can also be expressed as the ratio of the number of video data to which each second participle belongs in the index table to the total number of information of the video resource data in the index table.

For example, there are 100 pieces of information of video resource data in total in the index table of the divisional word "square dance", and there are 200 pieces of information of video resource data in total in the index table of the divisional word "solidago", and 10 pieces of information of video resource data in which "square dance" and "solidago" appear simultaneously in the two index tables, the co-occurrence rate of "square dance" and "solidago" is 10/100=10% for "square dance", and is 10/200=5% for "solidago".

And a substep S23, extracting the second participle with the co-occurrence rate higher than a preset threshold value as the associated second participle.

In a specific implementation, the preset threshold may be set by a person skilled in the art according to an actual situation, and the embodiment of the present invention is not limited thereto. The associated second sub-words extracted in the embodiment of the present invention may be null, or may be one or more.

a substep S31, when the video search string is mapped to a plurality of first participles, respectively extracting a plurality of preset index tables corresponding to the plurality of first participles; each index table comprises information of video resource data to which the first word belongs and all words in the video resource data; all the participles in the video resource data are generated by capturing the video resource data, extracting the characteristic text information of the video resource data and participling the characteristic text information;

in a specific implementation, a search engine can be adopted in advance to capture video resource data on each platform through a crawler, and then an index is established to build a library: extracting characteristic text information of the video resource data to perform word segmentation processing, and establishing an index table corresponding to each word segmentation, wherein the index table can store information of the video resource data (which can be ID, intranet address, extranet address and other video identifiers, and can also be a record formed by current word segmentation and other word segmentation), and all words in the video resource data (including a first word segmentation and a second word segmentation except the first word segmentation).

A substep S32 of extracting a second participle that co-occurs with the plurality of first participles as a candidate participle; the second participles are participles except the first participles in all participles in the video resource data;

specifically, there are a plurality of first participles, that is, there are a plurality of corresponding index tables, and the candidate participles need to appear in each index table, that is, the candidate participles and the current first participles all appear in the same index table.

Substep S33, calculating a co-occurrence rate of the first participle and the candidate participle in each index table, where the co-occurrence rate is a ratio of the number of occurrences of the candidate participle in the index table to the total number of information of the video resource data in the index table;

for example, the video search string "moon cake" may be mapped to the first participles "mid-autumn" and "moon cake", and one of the candidate participles is extracted as "moon", and the co-occurrence rates of "mid-autumn" and "moon" may be calculated, respectively (assuming 70%), "moon cake" and "moon" may be calculated (assuming 60%).

Substep S34, configuring a plurality of weights corresponding to the co-occurrence rates of the first participles and the candidate participles respectively;

the weight can be determined according to the proportion of the total number of the video resource data in the index table of each first participle, wherein the weight is larger when the total number of the video resource data in the index table is larger. For example, the total number of information of the video resource data in the index table of "mid-autumn" is 900, and the total number of information of the video resource data in the index table of "moon cake" is 100, the weight of the co-occurrence rate of "mid-autumn" and "moon" may be 0.9, and the weight of the co-occurrence rate of "moon cake" and "moon" may be 0.1.

Of course, the above weights are only examples, and when implementing the embodiment of the present invention, other weights may be set according to actual situations, for example, a corresponding weight is set according to a current social hotspot (news ranking, microblog ranking, and the like), a corresponding weight is set according to a local and/or online operation behavior of a user (video playing, news reading, and the like), and the like, which is not limited in this embodiment of the present invention. In addition, besides the above-mentioned weights, those skilled in the art may also adopt other weights according to actual needs, and the embodiment of the present invention is not limited to this.

A substep S35 of calculating an average value of the co-occurrence rates with weights as the co-occurrence rates of the first participles and the candidate participles;

in the embodiment of the present invention, a weighted average of a plurality of co-occurrence rates may be used as the final co-occurrence rate.

For example, the co-occurrence of mid-autumn "," moon cake "and" moon "may be (70% × 0.9+60% × 0.1)/2 = 34.5%.

And a substep S36, extracting the candidate participles with the co-occurrence rate higher than a preset threshold value as associated second participles.

a substep S41, when the video search string is mapped to a plurality of first participles, respectively extracting a plurality of preset index tables corresponding to the plurality of first participles; each index table comprises information of video resource data to which the first word belongs and all words in the video resource data; all the participles in the video resource data are generated by capturing the video resource data, extracting the characteristic text information of the video resource data and participling the characteristic text information;

Substep S42, determining a main participle by using the plurality of index tables, wherein the main participle is a first participle corresponding to the index table with the largest total information amount of the video resource data;

in order to improve user experience, for a plurality of first participles with very different video resource data, the first participles with less total information of the video resource data can be ignored. For example, for the first participles "mid-autumn" and "moon cake" mapped by the video search string "mid-autumn moon cake", the total number of information of the video resource data in the index table of "mid-autumn" is 900, and the total number of information of the video resource data in the index table of "moon cake" is 100, then "mid-autumn" may be set as the main participle.

Substep S43, calculating a co-occurrence rate of the main participle and each second participle in the index table corresponding to the main participle, where the co-occurrence rate is a ratio of the occurrence frequency of each second participle in the index table to the total number of information of the video resource data in the index table; the second participles are participles except the first participles in all participles in the video resource data;

in the embodiment of the invention, the co-occurrence rate of the main part words can be used as the final co-occurrence rate.

And a substep S44, extracting the second participle with the co-occurrence rate higher than a preset threshold value as the associated second participle.

Step 104, pushing a combination of the one or more first participles and the one or more associated second participles.

Specifically, after sub-step S23, the current combination of the first participle and one or more participles may be pushed at a location such as a drop-down menu of an input box of the web page. For example, the video search string is "dota", and the words with higher co-occurrence rate are: "efficients", "egg pain", "2009", "billows", "first view" and "classic", co-occurrence rates of 40%, 35%, 30%, 25%, 20% and 10%, respectively, will be pushed in sequence to combine "dota efficients", "dota egg pain", "dota 2009", "dota billows", "dota first view" and "dota classic".

After sub-step S36, a combination of the current plurality of first participles and one or more participles may be pushed at a location such as a drop down menu of an input box of the web page. For example, the video search string is "square dancing pawn brother", the video search string is mapped into first participles of "square dancing" and "pawn brother", a second participle which simultaneously appears with the two first participles is extracted, for example, the second participle of "teaching", and can be used as an associated second participle, and then the combined "square dancing pawn brother teaching" is finally pushed.

In a preferred embodiment of the present invention, step 104 may specifically include the following sub-steps:

and a substep S51 of pushing the primary participle and the associated secondary participle.

After sub-step S44, the combination of the current main participle and one or more participles may be pushed at a location such as a drop down menu of an input box of the web page. For example, for the first participles "mid-autumn" and "moon cake" mapped by the video search string "mid-autumn moon cake", mid-autumn "may be set as the main participle, and the associated second participle" moon "is obtained, and then the combination" mid-autumn moon "may be pushed.

The user can search for new video resource data by clicking the push combination in the pull-down menu.

For simplicity of explanation, the method embodiments are described as a series of acts or combinations, but those skilled in the art will appreciate that the embodiments are not limited by the order of acts described, as some steps may occur in other orders or concurrently with other steps in accordance with the embodiments of the invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.

Referring to fig. 2, a block diagram of a structure of an embodiment of a device for pushing word segmentation information based on video search according to an embodiment of the present invention is shown, which may specifically include the following modules:

a video search string receiving module 201 adapted to receive a video search string;

a first segmentation mapping module 202 adapted to map the video search string into one or more first segmentations;

the second word segmentation searching module 203 is suitable for searching for associated second words with the co-occurrence rate of the one or more first words higher than a preset threshold; the co-occurrence rate is the probability that one or more present participles and a second participle co-occur in the same video resource data;

a pushing module 204 adapted to push a combination of the one or more first participles and the one or more associated second participles.

In a preferred embodiment of the present invention, the first segmentation mapping module may be further adapted to:

extracting a participle mapped by the video search character string;

or,

In a preferred embodiment of the present invention, the second participle lookup module may be further adapted to:

In a preferred embodiment of the invention, the characteristic text information comprises a video title, a video keyword and/or a video description.

In a preferred embodiment of the present invention, the combined pushing module may be further adapted to:

The algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. Moreover, the present invention is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.

In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.

The various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. It will be understood by those skilled in the art that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functions of some or all of the components in the video search based participle information push apparatus according to an embodiment of the present invention. The present invention may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present invention may be stored on computer-readable media or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

Claims

1. A word segmentation information pushing method based on video search comprises the following steps:

receiving a video search string;

mapping the video search string into one or more first participles;

extracting a preset index table corresponding to the one or more first participles, and searching for associated second participles with the co-occurrence rate of the one or more first participles higher than a preset threshold value according to the preset index table; the co-occurrence rate is the probability that one or more first participles and one or more second participles commonly occur in the same video resource data; wherein, the preset index table is updated through video resource data;

2. The method of claim 1, wherein the step of mapping the video search string into one or more first participles comprises:

extracting a participle mapped by the video search character string;

or,

3. The method of claim 1, wherein the step of extracting a preset index table corresponding to the one or more first participles and finding an associated second participle having a co-occurrence rate with the one or more first participles higher than a preset threshold according to the preset index table comprises:

4. The method of claim 1, wherein the step of extracting a preset index table corresponding to the one or more first participles and finding an associated second participle having a co-occurrence rate with the one or more first participles higher than a preset threshold according to the preset index table comprises:

5. The method of claim 1, wherein the step of extracting a preset index table corresponding to the one or more first participles and finding an associated second participle having a co-occurrence rate with the one or more first participles higher than a preset threshold according to the preset index table comprises:

6. A method according to claim 3, 4 or 5, wherein the characteristic text information comprises a video title, a video keyword and/or a video description.

7. The method of claim 5, wherein the step of pushing a combination of the one or more first participles and the one or more associated second participles comprises:

8. A word segmentation information pushing device based on video search comprises:

the second word segmentation searching module is suitable for extracting a preset index table corresponding to the one or more first word segmentations and searching for associated second word segmentations of which the co-occurrence rates with the one or more first word segmentations are higher than a preset threshold value according to the preset index table; the co-occurrence rate is the probability that one or more present participles and a second participle co-occur in the same video resource data; wherein, the preset index table is updated through video resource data;

9. The apparatus of claim 8, wherein the first participle mapping module is further adapted to:

extracting a participle mapped by the video search character string;

or,

10. The apparatus of claim 8, wherein the second participle lookup module is further adapted to:

11. The apparatus of claim 8, wherein the second participle lookup module is further adapted to:

12. The apparatus of claim 8, wherein the second participle lookup module is further adapted to:

13. The apparatus according to claim 10, 11 or 12, wherein the characteristic text information comprises a video title, a video keyword and/or a video description.

14. The apparatus of claim 12, wherein the combined push module is further adapted to: