CN105787078B - Multimedia title display method and device - Google Patents

Multimedia title display method and device Download PDF

Info

Publication number
CN105787078B
CN105787078B CN201610118441.0A CN201610118441A CN105787078B CN 105787078 B CN105787078 B CN 105787078B CN 201610118441 A CN201610118441 A CN 201610118441A CN 105787078 B CN105787078 B CN 105787078B
Authority
CN
China
Prior art keywords
word
words
inter
title
weight
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610118441.0A
Other languages
Chinese (zh)
Other versions
CN105787078A (en
Inventor
甘润生
刘云剑
王旭
尹玉宗
姚键
潘柏宇
王冀
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba China Co Ltd
Youku Network Technology Beijing Co Ltd
Original Assignee
Youku Network Technology Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Youku Network Technology Beijing Co Ltd filed Critical Youku Network Technology Beijing Co Ltd
Priority to CN201610118441.0A priority Critical patent/CN105787078B/en
Publication of CN105787078A publication Critical patent/CN105787078A/en
Application granted granted Critical
Publication of CN105787078B publication Critical patent/CN105787078B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/73Querying
    • G06F16/738Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/43Querying
    • G06F16/438Presentation of query results
    • G06F16/4387Presentation of query results by the use of playlists
    • G06F16/4393Multimedia presentations, e.g. slide shows, multimedia albums
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The invention provides a multimedia title display method and a device, wherein the method comprises the following steps: performing word segmentation processing on each sample title included in the multimedia title data set to obtain a plurality of words; establishing a statistical model according to the obtained words; calculating the inter-word association weight and the inter-word association degree factor respectively corresponding to each obtained word according to the established statistical model; determining the inter-word association degree corresponding to each obtained word according to the calculated inter-word association weight and the inter-word association degree factor; and carrying out thumbnail display on each sample title in the multimedia title data set according to the word association degree. By the multimedia title display method and the multimedia title display device, the core theme of the title and the content to be displayed are determined on the premise of not changing the title, the problem that a user positions the theme of the long video title is solved, and the information acquisition efficiency and the user experience are improved.

Description

Multimedia title display method and device
Technical Field
The present invention relates to the field of multimedia processing, and in particular, to a method and an apparatus for displaying a multimedia title.
Background
Titles with many characters appear because the authoring of titles of user multimedia data, such as user video data, is an incompletely controllable behavior. The effects of the titles displayed in different devices are different, the length of characters which can be displayed on screens of different terminal devices is different, and for titles with more characters, some screens cannot be displayed completely, so that the completeness of information display is influenced, and the comprehension of a user to a video theme is reduced.
Meanwhile, in applications such as video data aggregation, titles with large difference in length and different sequences are displayed on the same page, and the display causes visual salience, so that the page is disordered and not attractive, and the browsing experience of a user is reduced. Therefore, the layout style of the video title needs to be unified according to the terminal device, and the user experience and the efficiency of obtaining the video theme are improved.
In the prior art, there are several solutions for solving the problem of too long multimedia titles: in the first scheme, when the length of the title exceeds a limited range, the title is intercepted from left to right, and an excessive part is replaced by an ellipsis; when the length of the title exceeds the limited range, reserving characters before and after the title contains the search keyword, and replacing the left and right excess parts with ellipses; a third solution is to provide a second title with a shorter character for the title according to the method used in patent document 1, chinese patent publication No. CN1860454A, and to select and use the second title according to the length of the character that can be accommodated; in a fourth scheme, according to the method used in patent document 2, namely chinese patent publication No. CN104008115A, a preset floating title bar is provided for titles in the wap page that are not in the device screen, so that the titles can be completely displayed by floating the window.
Through the technologies, the display problem of the general overlong character titles can be basically solved, but the display method cannot achieve good effect in applications such as user video aggregation data. For example, in the user video aggregate data, some video titles are a series, some video titles are a heap of keywords, and the video titles of the entire series are almost identical except for the number or the subject. Therefore, if only the characters beyond the screen are abbreviated, the user can make the illusion that all videos are the same when viewing the video titles, and the video title topics cannot be accurately reflected, so that each video topic cannot be distinguished, the selection of the user on the videos is influenced, and the user cannot directly watch the videos, and the user experience is influenced. In addition, for generating a plurality of titles including the second title, it is a waste for storage, it is more difficult to bear when the number of video titles is large, and the screens of various terminals may accommodate a variable number of characters, which may require generating various titles to accommodate. In addition, the use of the floating frame can prolong the waiting time of the user for each long title, and the user can obtain the video theme information only by staring at the screen all the time, so that the time for the user to determine the video title theme is prolonged, the efficiency for the user to obtain the theme information is influenced, and the user experience is reduced to a certain extent.
Disclosure of Invention
Technical problem
In view of the above, the technical problem to be solved by the present invention is how to properly display a multimedia title, particularly a long title, so as to improve user experience.
Solution scheme
In order to solve the above technical problem, according to an embodiment of the present invention, there is provided a multimedia title display method including: performing word segmentation processing on each sample title included in the multimedia title data set to obtain a plurality of words; establishing a statistical model according to the obtained words; calculating the inter-word association weight and the inter-word association degree factor respectively corresponding to each obtained word according to the established statistical model; determining the inter-word association degree corresponding to each obtained word according to the calculated inter-word association weight and the inter-word association degree factor; and carrying out thumbnail display on each sample title in the multimedia title data set according to the word association degree.
For the above multimedia title display method, in one possible implementation manner, determining an inter-word association degree corresponding to each obtained word according to the calculated inter-word association weight and the inter-word association degree factor includes: calculating word weights corresponding to the obtained words according to the factors of the association degree among the words; and determining the word association degree corresponding to each obtained word according to the word association weight and the word weight.
For the above multimedia title display method, in one possible implementation manner, determining an inter-word association degree corresponding to each obtained word according to the inter-word association weight and the word weight includes:
calculating the association degree between words by using the following formula 1,
Figure BDA0000933257350000031
in the formula 1, the compound is shown in the specification,
wherein, Co (X, y) represents the word association degree between the word X and the word y, X (X, y) represents the word association weight between the word X and the word y, and w (X), w (y), w (xy) represent the word weights corresponding to the word X, y and xy respectively.
For the above multimedia title display method, in one possible implementation, the inter-word association factor includes a word frequency and a document inversion frequency,
calculating a word weight corresponding to each of the obtained words according to the inter-word association degree factor, including:
calculating a word weight corresponding to each of the obtained words based on the word frequency and the document inversion frequency using the following formula 2,
Figure BDA0000933257350000032
in the formula (2), the first and second groups,
wherein, TF (x), TF (y), TF (xy) respectively represent the word frequency corresponding to the words x, y, xy, IDF (x), IDF (y), IDF (xy) respectively represent the document reversal frequency corresponding to the words x, y, xy.
With respect to the above multimedia title display method, in one possible implementation, the inter-word association factor includes a word frequency, a document inversion frequency and a word activity,
calculating a word weight corresponding to each of the obtained words according to the inter-word association degree factor, including:
calculating a word weight corresponding to each of the obtained words according to the word frequency, the document inversion frequency, and the word liveness by using the following formula 3,
Figure BDA0000933257350000041
in the formula 3, the first step is,
wherein, TF (x), (y), TF (xy) respectively represent the word frequency corresponding to the words x, y, xy, IDF (x), (y), IDF (xy) respectively represent the document reversal frequency corresponding to the words x, y, xy, H (x), (y), H (xy) respectively represent the word activity corresponding to the words x, y, xy.
For the above multimedia title display method, in one possible implementation, the inter-word association factor includes word frequency, document inversion frequency and part-of-speech weight,
calculating a word weight corresponding to each of the obtained words according to the inter-word association degree factor, including:
calculating a word weight corresponding to each of the obtained words based on the word frequency, the document inversion frequency, and the part-of-speech weight using the following equation 4,
in the formula (4), the first and second groups,
wherein, TF (x), (y), TF (xy) respectively represents the word frequency corresponding to the words x, y, xy, IDF (x), (y), IDF (xy) respectively represents the document reversal frequency corresponding to the words x, y, xy, TN (x), (y), TN (xy) respectively represents the part of speech weight corresponding to the words x, y, xy, α represents the part of speech weight parameter for adding or reducing the part of speech weight.
With respect to the above multimedia title display method, in one possible implementation, the inter-word association factor includes word frequency, document inversion frequency, word liveness and part-of-speech weight,
calculating a word weight corresponding to each of the obtained words according to the inter-word association degree factor, including:
calculating a word weight corresponding to each of the obtained words based on the word frequency, the document inversion frequency, the word liveness, and the part-of-speech weight using the following equation 5,
Figure BDA0000933257350000051
in the formula 5, the first step is,
TF (x), TF (y), TF (xy) respectively represent word frequencies corresponding to the words x, y and xy, IDF (x), IDF (y) and IDF (xy) respectively represent document reversal frequencies corresponding to the words x, y and xy, H (x), H (y) and H (xy) respectively represent word activity degrees corresponding to the words x, y and xy, TN (x), TN (y) and TN (xy) respectively represent part-of-speech weights corresponding to the words x, y and xy, and α represents part-of-speech weight parameters for adding and reducing the part-of-speech weights.
As to the above multimedia title display method, in a possible implementation manner, the multimedia title display method further includes:
and displaying the multimedia titles except the multimedia title data set in a thumbnail mode according to the word association degree.
As for the above multimedia title display method, in a possible implementation manner, before performing the word segmentation process, the multimedia title display method further includes preprocessing each sample title, specifically including:
carrying out normalization processing on each sample title; and
and cleaning each sample title subjected to normalization processing.
For the method for displaying a multimedia title, in a possible implementation manner, the displaying each sample title in the multimedia title data set in a thumbnail manner according to the inter-word association degree includes:
layering each word obtained by segmenting each sample title according to the inter-word association degree;
and carrying out differential thumbnail display on each sample title according to the layering result.
In order to solve the above technical problem, according to another embodiment of the present invention, there is provided a multimedia title display apparatus including: the word segmentation unit is used for performing word segmentation processing on each sample title included in the multimedia title data set to obtain a plurality of words; the statistical model establishing unit is connected with the word segmentation unit and used for establishing a statistical model according to the obtained words; the calculation unit is connected with the word segmentation unit and the statistical model establishment unit and is used for calculating the inter-word association weight and the inter-word association degree factor respectively corresponding to each obtained word according to the established statistical model; a determining unit, connected to the calculating unit, for determining an inter-word association degree corresponding to each of the obtained words according to the calculated inter-word association weight and inter-word association degree factor; and the abbreviative display unit is connected with the determining unit and is used for displaying each sample title in the multimedia title data set in an abbreviative mode according to the association degree among the words.
With regard to the above multimedia title display apparatus, in one possible implementation, the determining unit includes:
the calculation module is used for calculating the word weight corresponding to each obtained word according to the inter-word association degree factor;
and the determining module is connected with the calculating module and used for determining the association degree between words corresponding to the obtained words according to the association weight between words and the weight between words.
For the above multimedia title display apparatus, in one possible implementation manner, the determining module calculates the inter-word association degree by using the following formula 1,
in the formula 1, the compound is shown in the specification,
wherein, Co (X, y) represents the word association degree between the word X and the word y, X (X, y) represents the word association weight between the word X and the word y, and w (X), w (y), w (xy) represent the word weights corresponding to the word X, y and xy respectively.
With the above multimedia title display apparatus, in one possible implementation, the inter-word association degree factor includes a word frequency and a document inversion frequency,
the calculation module calculates word weights corresponding to the respective resulting words using the following equation 2,
Figure BDA0000933257350000071
in the formula (2), the first and second groups,
wherein, TF (x), TF (y), TF (xy) respectively represent the word frequency corresponding to the words x, y, xy, IDF (x), IDF (y), IDF (xy) respectively represent the document reversal frequency corresponding to the words x, y, xy.
With the above multimedia title display apparatus, in one possible implementation, the inter-word association degree factors include word frequency, document inversion frequency, and word liveness,
the calculation module calculates word weights corresponding to the respective resulting words using the following equation 3,
Figure BDA0000933257350000072
in the formula 3, the first step is,
wherein, TF (x), (y), TF (xy) respectively represent the word frequency corresponding to the words x, y, xy, IDF (x), (y), IDF (xy) respectively represent the document reversal frequency corresponding to the words x, y, xy, H (x), (y), H (xy) respectively represent the word activity corresponding to the words x, y, xy.
With the above multimedia title display apparatus, in one possible implementation, the inter-word association degree factors include word frequency, document inversion frequency, and part-of-speech weight,
the calculation module calculates word weights corresponding to the respective resulting words using the following equation 4,
Figure BDA0000933257350000073
in the formula (4), the first and second groups,
wherein, TF (x), (y), TF (xy) respectively represents the word frequency corresponding to the words x, y, xy, IDF (x), (y), IDF (xy) respectively represents the document reversal frequency corresponding to the words x, y, xy, TN (x), (y), TN (xy) respectively represents the part of speech weight corresponding to the words x, y, xy, α represents the part of speech weight parameter for adding or reducing the part of speech weight.
With the above multimedia title display apparatus, in one possible implementation, the inter-word association degree factors include word frequency, document inversion frequency, word liveness and part-of-speech weight,
the calculation module calculates word weights corresponding to the respective resulting words using the following equation 5,
Figure BDA0000933257350000081
in the formula 5, the first step is,
TF (x), TF (y), TF (xy) respectively represent word frequencies corresponding to the words x, y and xy, IDF (x), IDF (y) and IDF (xy) respectively represent document reversal frequencies corresponding to the words x, y and xy, H (x), H (y) and H (xy) respectively represent word activity degrees corresponding to the words x, y and xy, TN (x), TN (y) and TN (xy) respectively represent part-of-speech weights corresponding to the words x, y and xy, and α represents part-of-speech weight parameters for adding and reducing the part-of-speech weights.
With respect to the above multimedia title display apparatus, in one possible implementation, the thumbnail display unit is further configured to:
and displaying the multimedia titles except the multimedia title data set in a thumbnail mode according to the word association degree.
For the above multimedia title display apparatus, in a possible implementation manner, the multimedia title display apparatus further includes a preprocessing unit, connected to the word segmentation unit, for preprocessing each sample title,
wherein the preprocessing unit is specifically configured to:
carrying out normalization processing on each sample title; and
and cleaning each sample title subjected to normalization processing.
With regard to the above multimedia title display apparatus, in one possible implementation, the thumbnail display unit is configured to:
layering each word obtained by segmenting each sample title according to the inter-word association degree;
and carrying out differential thumbnail display on each sample title according to the layering result.
Advantageous effects
Through the multimedia title display method and device provided by the embodiment of the invention, a word association network (word association degree) can be constructed by utilizing a natural language processing technology based on word segmentation processing, the part-of-speech labels provided by the word segmentation processing and the part-of-speech weights corresponding to the word segmentation labels, and a known statistical model is combined, the core subject words of the title are preferentially displayed according to the word association degree, and the modified overlapped words or words with lower weights related to the core subject words are hidden, so that the method and device can dynamically adapt to the screen of the terminal equipment. Therefore, on the premise of not changing the title, the core theme of the title and the content to be displayed are clarified, the problem that the user locates the theme of the long video title is solved, and the information acquisition efficiency and the user experience are improved.
Other features and aspects of the present invention will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments, features, and aspects of the invention and, together with the description, serve to explain the principles of the invention.
Fig. 1 illustrates a flowchart of a multimedia title display method according to an embodiment of the present invention;
fig. 2 illustrates a flowchart of a multimedia title display method according to another embodiment of the present invention;
FIG. 3 is a diagram illustrating the determination of inter-word association based on four inter-word association factors, word frequency, document inversion frequency, part of speech, and word activity, and inter-word association weights;
fig. 4 illustrates a flowchart of a multimedia title display method according to still another embodiment of the present invention;
FIG. 5 shows a schematic diagram of pre-processing of sample titles;
fig. 6 is a block diagram illustrating a structure of a multimedia title display apparatus according to an embodiment of the present invention;
fig. 7 is a block diagram illustrating a structure of a multimedia title display apparatus according to another embodiment of the present invention.
Detailed Description
Various exemplary embodiments, features and aspects of the present invention will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers can indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.
The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.
Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present invention. It will be understood by those skilled in the art that the present invention may be practiced without some of these specific details. In some instances, methods, procedures, components, and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present invention.
In view of the above problems in the background art, the present invention provides a multimedia title display method that combines word association degrees, aiming at the problem of multimedia title display strategy, especially long title display strategy. The method utilizes an NLP (Natural language Processing) technology, based on word segmentation Processing and part-of-speech labels (whether entity words exist) provided by the word segmentation Processing and part-of-speech weights corresponding to the entity words are combined with a known statistical model to construct a word association network (word association degree), core subject words of a title are preferentially displayed according to the word association degree, and words which are overlapped with modifications related to the core words or have lower weights are hidden, so that the method is dynamically adapted to a screen of a terminal device. Therefore, on the premise of not changing the title, the core theme of the title and the content to be displayed are clarified, the problem that the user locates the theme of the long video title is solved, and the information acquisition efficiency and the user experience are improved.
It should be noted that, in the present invention, the multimedia title display method and apparatus of the present invention have been described mainly by taking video as an example of multimedia, but the present invention is not limited thereto. Those skilled in the art will appreciate that the multimedia title display method and apparatus of the present invention are also applicable to the display of titles of other multimedia such as audio and electronic books.
Hereinafter, the multimedia title display method and apparatus of the present invention will be described in detail by the following embodiments.
Example 1
Fig. 1 illustrates a flowchart of a multimedia title display method according to an embodiment of the present invention. As shown in fig. 1, the multimedia title display method mainly includes the following steps S110 to S150.
Step S110, performing word segmentation on each sample title included in the multimedia title data set to obtain a plurality of words.
Specifically, a data set (sample data set) including a plurality of sample titles, for example 10000, is first acquired, and then, a natural language processing technique is used to perform a word segmentation process on each sample title in the acquired data set by an existing word segmentation method, for example, a word segmentation method based on character string matching or a word segmenter combining a dictionary and statistics.
It should be noted that the number of sample titles obtained above is merely an example, and the present invention is not limited thereto. Those skilled in the art can select the appropriate number of titles according to actual needs. Further, those skilled in the art know that the larger the number of titles selected, the more accurate the result is, but the larger the amount of calculation for performing statistics or the like increases.
By the word segmentation processing, each sample title can be divided into a plurality of words. And the type of each word and the corresponding part-of-speech weight thereof can be obtained according to the existing dictionary and statistics. Here, the types of words may be classified into solid words and non-solid words. For example, the term "western" may be considered as a tv series or a movie, and both the tv series and the movie belong to the category of the entity words. As another example, a word such as "has no actual meaning and belongs to a non-physical word. Here, the part-of-speech weight refers to a probability that a word is an entity word or a non-entity word in the title. For example, the part-of-speech weight of the word "westernist" indicates the probability that the word is an entity word such as a television show or a movie in the title.
And step S120, establishing a statistical model according to the obtained words.
Specifically, after a plurality of words are obtained according to the word segmentation process of step S110 described above, a statistical model, for example, a trigram model, may be established based on the obtained plurality of words.
After the word segmentation process, for example, A, B, C, D, … … and other words can be obtained. In the process of establishing the model, common single word stop words such as "and" can be cleaned up according to needs, and then data such as the number and probability of occurrence of each word (word) in the data set are calculated.
In addition, in a possible implementation manner, in the process of establishing the model, parameters required for calculating the later-described inter-word association weight and inter-word association degree factor may be counted.
For example, in the process of establishing the model, the number of each word and the total word frequency, i.e. the sum of the occurrence times of all words, may be counted for each word. In other words, the parameters required for calculating the word frequency of each word described later can be counted in the process of establishing the model.
As another example, the number of times the word x appears in the title and the total number of titles may be counted. In other words, the parameters required for calculating the file inversion frequency of each word described later can be counted.
As another example, the number of titles that contain both the word x and the word y associated with x, the number of titles that contain no word x but the word y associated with x, the number of titles that contain the word x but no word y associated with x, the number of titles that contain neither the word x nor the word y associated with x, and the like may be counted. In other words, parameters required for calculating the association weight between words described later can be counted in the process of establishing the model.
As another example, the probability of n words associated with x may also be counted. In other words, the parameters required for calculating the word liveness described later can be counted in the process of establishing the model.
And step S130, calculating the inter-word association weight and the inter-word association degree factor corresponding to each obtained word according to the established statistical model.
Specifically, the inter-word association weight and the inter-word association degree factors such as the word frequency and the document inversion frequency corresponding to each word obtained by the word segmentation process may be calculated according to various types of parameters and data counted in the process of the statistical model established in step S120.
Step S140 determines an inter-word association degree corresponding to each of the obtained words according to the calculated inter-word association weight and the inter-word association degree factor.
After the inter-word association weight and the inter-word association degree factor are calculated in step S130, the inter-word association degree corresponding to each word may be calculated according to the two parameters.
In this way, based on the acquired data set and the established statistical model, a relationship network between words can be established, and then the display and hiding of characters in the title can be judged to perform the thumbnail display described later.
And S150, carrying out thumbnail display on each sample title in the multimedia title data set according to the association degree among the words.
Specifically, the relationship network between words, which is established according to the degree of association between words determined in step S140, is applied to each sample title in the data set, so that each sample title in the data set can be displayed in a thumbnail manner.
Therefore, for different terminal devices, by applying the multimedia title display method, the title characters beyond the screen display are not simply omitted, but the core subject words of the title are preferentially displayed, repeated description words and words with low weight are omitted until the device screen is adapted, and meanwhile, the meaning of the title is not changed, so that the information acquisition experience of a user is improved.
Thus, the multimedia title display method of the embodiment of the invention can utilize the natural language processing technology, construct a word association network (word association degree) based on word segmentation processing, the part-of-speech tags provided by the word segmentation processing and the part-of-speech weights corresponding to the word segmentation processing and a known statistical model, preferentially display the core subject words of the title according to the word association degree, and hide the modified overlapped words or words with lower weights associated with the core subject words so as to dynamically adapt to the screen of the terminal equipment. Therefore, on the premise of not changing the title, the core theme of the title and the content to be displayed are clarified, the problem that the user locates the theme of the long video title is solved, and the information acquisition efficiency and the user experience are improved.
Example 2
Fig. 2 illustrates a flowchart of a multimedia title display method according to another embodiment of the present invention. The steps in fig. 2, which are numbered the same as those in fig. 1, have the same functions, and detailed descriptions of the steps are omitted for the sake of brevity.
As shown in fig. 2, the main difference between the multimedia title display method shown in fig. 2 and the multimedia title display method shown in fig. 1 is that the step S140 may specifically include steps S1401 to S1402.
And S1401, calculating word weights corresponding to the obtained words according to the factors of the association degree among the words.
Specifically, in step S130, factors affecting the degree of association between words, that is, factors of the degree of association between words, may be calculated from various types of data and parameters in the established statistical model. The word frequency and the document inversion frequency can be calculated by using the parameters required for calculating the word frequency and the document inversion frequency, which are counted in the statistical model, and then the word weight corresponding to each word obtained after the word segmentation is correspondingly calculated according to the calculated word frequency and the document inversion frequency. Wherein the word weight represents the importance of the word in the sample data set.
In addition, besides the two factors of the word frequency and the document reversal frequency, the association degree factor between words may include, for example, the word activity. Accordingly, the word liveness may be calculated using the parameters required to calculate the word liveness counted in the statistical model, and then the word weight may be calculated through the calculated word frequency, document inversion frequency, and word liveness.
In addition, besides two factors of word frequency and document reversal frequency, the association degree between words can also consider the part of speech. Accordingly, the inter-word association degree factor may further include a part-of-speech weight, and then the word weight is calculated from the calculated word frequency, the document inversion frequency, and the part-of-speech weight obtained in the word segmentation process.
In addition, besides the two factors of the word frequency and the document reversal frequency, the association degree factor between words can also comprise the word activity degree and the part-of-speech weight at the same time, and then the word weight is calculated through the four factors of the word frequency, the document reversal frequency, the word activity degree and the part-of-speech weight.
S1402 determines the inter-word association degree corresponding to each of the obtained words according to the inter-word association weight and the word weight.
In one possible implementation, the inter-word association degree corresponding to each of the obtained words in step S1402 can be calculated by the following formula 1.
Figure BDA0000933257350000141
In the formula 1, the compound is shown in the specification,
wherein, Co (X, y) represents the degree of inter-word association between the word X and the word y, and X (X, y) represents the inter-word association weight between the word X and the word y. Wherein, the inter-word association weight X (X, y) can be measured by using chi-square distribution, and the greater the chi-square value, the greater the correlation between the words X and y. The specific calculation is shown in equation 6 below:
Figure BDA0000933257350000151
in the formula (6), the compound is represented by the formula,
wherein X (X, y) represents an inter-word association weight, X is a certain word, y is a related word, a represents the number of titles containing X and y, B represents the number of titles not containing X but containing y, C represents the number of titles containing X but not containing y, and D represents the number of titles containing neither X nor y.
w (x), w (y), w (xy) represent the word weights corresponding to the words x, y, xy, respectively. Where, in some headings, only the word x may appear; in some headings, only the word y may appear; in some headings, the words x and y may occur simultaneously. Accordingly, in the above formula 1, w (x) is calculated from the corresponding data of the title in which the word x appears, w (y) is calculated from the corresponding data of the title in which the word y appears, and w (xy) is calculated from the corresponding data of the titles in which the words x and y appear simultaneously.
After calculating the inter-word association Co (x, y) corresponding to each word after the word segmentation processing by using the above formula 1, a relationship network between words in the acquired data set can be established. Then, the relationship network between the words, that is, the degree of association between the words, is used to display the sample titles included in the data set in a thumbnail manner.
In a possible implementation manner, according to the inter-word association degree between words of each sample title in the acquired data set, in addition to the thumbnail display of each sample title, any other multimedia titles besides the data set can be thumbnail displayed.
Thus, the multimedia title display method of the embodiment of the invention can utilize the natural language processing technology, construct a word association network (word association degree) based on word segmentation processing, the part-of-speech tags provided by the word segmentation processing and the part-of-speech weights corresponding to the word segmentation processing and a known statistical model, preferentially display the core subject words of the title according to the word association degree, and hide the modified overlapped words or words with lower weights associated with the core subject words so as to dynamically adapt to the screen of the terminal equipment. Therefore, on the premise of not changing the title, the core theme of the title and the content to be displayed are clarified, the problem that the user locates the theme of the long video title is solved, and the information acquisition efficiency and the user experience are improved.
Example 3
The main difference between the present embodiment and the above embodiments is that the above factors of the degree of association between words specifically include word frequency and document inversion frequency. In this case, the word weight corresponding to each of the obtained words can be calculated specifically using the following formula 2,
Figure BDA0000933257350000161
in the formula (2), the first and second groups,
wherein, TF (x), TF (y), TF (xy) respectively represent the word frequency corresponding to the words x, y, xy. Tf (x) is calculated from the corresponding data of the title in which the word x appears, tf (y) is calculated from the corresponding data of the title in which the word y appears, and tf (xy) is calculated from the corresponding data of the titles in which the words x and y appear simultaneously. IDF (x), IDF (y), IDF (xy) respectively represent the document reversal frequency corresponding to the words x, y, xy. As above, idf (x) is calculated from the corresponding data of the title in which the word x appears, idf (y) is calculated from the corresponding data of the title in which the word y appears, and idf (xy) is calculated from the corresponding data of the titles in which the words x and y appear simultaneously. Term Frequency (TF) measures the importance and prevalence of a word in a title. Document inversion frequency (IDF) measures the discriminative power of a word, the more common a word is the lower its discriminative power. In natural language processing techniques, the above two factors are usually considered together, and their calculation formulas are as follows 7 and 8:
Figure BDA0000933257350000162
in the formula 7, the compound represented by the formula,
Figure BDA0000933257350000163
in the formula 8, the compound represented by the formula,
wherein TF (x) represents the word frequency, txIndicates the number of a certain word x appearing in the title, and T indicates the total word frequency; IDF (x) denotes the document inversion frequency, nxIndicating the number of titles that appear x and N indicating the total number of titles.
Thus, the word weight corresponding to each word can be calculated by taking into account two factors, i.e., the word frequency and the document inversion frequency, by the above formula 2, and then the corresponding inter-word association degree can be calculated by the above formula 1.
In a possible implementation manner, the inter-word association factor may specifically include a word frequency, a document inversion frequency, and a word activity. In this case, the word weight corresponding to each of the obtained words can be calculated specifically using the following formula 3,
in the formula 3, the first step is,
as described above, tf (x), tf (y), tf (xy) respectively indicate the word frequencies corresponding to the words x, y, xy, and idf (x), idf (y), idf (xy) respectively indicate the document inversion frequencies corresponding to the words x, y, xy. H (x), H (y), H (xy) respectively represent the word liveness corresponding to the words x, y, xy. Where h (x) is calculated from the data corresponding to the title in which the word x appears, h (y) is calculated from the data corresponding to the title in which the word y appears, and h (xy) is calculated from the data corresponding to the titles in which the words x and y appear simultaneously.
The activity of a word is from the perspective of information theory, the information entropy can be used to measure the information content and the activity degree of the word, so as to obtain the word activity, and the specific calculation formula is as follows 8:
in the formula 8, the compound represented by the formula,
where H (x) represents word liveness, x represents a source of information, i.e., a specified word, n represents the number of words associated with word x, and p (x)i) Representing the probability of the ith related word.
Thus, by the above formula 3, it is possible to calculate the word weight corresponding to each word by considering the word activity in addition to the two factors of the word frequency and the document inversion frequency, and then calculate the corresponding inter-word association degree by the above formula 1.
Therefore, the calculated association degree between words can be more accurate and reliable.
In one possible implementation, the above-mentioned inter-word association factor includes word frequency, document inversion frequency and part-of-speech weight. In this case, the word weight corresponding to each of the obtained words can be specifically calculated using the following expression 4.
Figure BDA0000933257350000181
In the formula (4), the first and second groups,
wherein, as mentioned above, tf (x), tf (y), tf (xy) represent word frequencies corresponding to words x, y, xy, respectively, and idf (x), idf (y), idf (xy) represent document reversal frequencies corresponding to words x, y, xy, respectively, tn (x), tn (y), tn (xy) represent part-of-speech weights corresponding to words x, y, xy, respectively, wherein tn (x) is a part-of-speech weight of a word x appearing in a title obtained from, for example, dictionaries and statistics during participle processing, tn (y) is a part-of-speech weight of a word y appearing in a title obtained from, for example, dictionaries and statistics during participle processing, tn (y) is a part-of-speech weight of a word x and y appearing in a title simultaneously obtained from, for example, dictionaries and statistics during participle processing, α represents a part-of-speech weight parameter for adding a part-of-speech weight, for example, higher entity for adding a weight, and non-entity with a higher weight, so that non-entity with a higher weight, it is possible to distinguish words from non-entities.
Thus, by the above formula 4, it is possible to calculate the word weight corresponding to each word by considering the part of speech and the weight thereof in addition to the two factors of the word frequency and the document inversion frequency, and then calculate the corresponding inter-word association degree by the above formula 1.
The use of word frequency and anti-document frequency alone may result in emphasis on high frequency words with some distinction, while some words in the title with subject distinction may be low frequency words. By combining the part of speech and the weight TN thereof, the high-frequency words with certain distinction degree can be considered, and the low-frequency words with subject distinction degree can be considered, so that the calculated association degree between words is more accurate and reliable.
In one possible implementation, the above-mentioned inter-word association factor includes word frequency, document inversion frequency, word liveness and part-of-speech weight. That is, the above-mentioned inter-word association factor may include two factors, i.e. word frequency and document inversion frequency, as well as word activity and part-of-speech weight. In this case, the word weight corresponding to each of the obtained words can be calculated specifically using the following equation 5,
Figure BDA0000933257350000191
in the formula 5, the first step is,
as described above, tf (x), tf (y), tf (xy) respectively represent word frequencies corresponding to the words x, y, xy, idf (x), idf (y), idf (xy) respectively represent document inversion frequencies corresponding to the words x, y, xy, h (x), h (y), h (xy) respectively represent word liveness corresponding to the words x, y, xy, tn (x), tn (y), tn (xy) respectively represent part-of-speech weights corresponding to the words x, y, xy, and α represents part-of-speech weight parameters for adding or subtracting part-of-speech weights.
Thus, by the above formula 5, it is possible to calculate the word weight corresponding to each word by considering both the part of speech and the word activity in addition to the two factors of the word frequency and the document inversion frequency, and then calculate the corresponding inter-word association degree by the above formula 1.
FIG. 3 is a diagram illustrating how to determine the inter-word association degree according to four inter-word association degree factors, i.e., word frequency, document inversion frequency, part of speech, and word activity, and inter-word association weight.
By aiming at the difference of the considered word association degree factors, one of the above formulas 2 to 5 can be correspondingly adopted to calculate the word weight corresponding to each word, and then the corresponding inter-word association degree is calculated by the above formula 1.
Thus, the multimedia title display method of the embodiment of the invention can utilize the natural language processing technology, construct a word association network (word association degree) based on word segmentation processing, the part-of-speech tags provided by the word segmentation processing and the part-of-speech weights corresponding to the word segmentation processing and a known statistical model, preferentially display the core subject words of the title according to the word association degree, and hide the modified overlapped words or words with lower weights associated with the core subject words so as to dynamically adapt to the screen of the terminal equipment. Therefore, on the premise of not changing the title, the core theme of the title and the content to be displayed are clarified, the problem that the user locates the theme of the long video title is solved, and the information acquisition efficiency and the user experience are improved.
Example 4
Fig. 4 illustrates a flowchart of a multimedia title display method according to still another embodiment of the present invention. The steps in fig. 4 that are labeled the same as those in fig. 1 and 2 have the same functions, and detailed descriptions of these steps are omitted for the sake of brevity.
As shown in fig. 4, the main difference between the multimedia title display method shown in fig. 4 and the multimedia title display method shown in fig. 2 is that, before performing the word segmentation process, the multimedia title display method may further include the following step S100 (steps S1001 and S1002), and the step S150 may specifically include steps S1501 to S1502.
The following is a specific description of each of the above steps.
And step S100, preprocessing each sample title.
Wherein, the step S100 may specifically include the following steps:
step S1001, carrying out standardization processing on each sample title; and
and step S1002, cleaning each sample title subjected to normalization processing.
For example, the above steps S1001, S1002 may be performed according to the schematic diagram shown in fig. 5.
Specifically, as shown in fig. 5, first, each sample title in the multimedia title data set is normalized, specifically, the following steps are performed: some special characters of the title, such as the symbols &, # and the like, which do not generally belong to the title components, are deleted, and then the multimedia title data is filtered according to a preset title length threshold to ignore the titles having a length below the threshold. Wherein, the preset header length threshold may be different according to different types of multimedia.
Next, after the normalization processing, each sample title is cleaned, specifically, some junk data, for example, advertisements (QQ numbers, etc.) embedded in the titles are cleaned, and the cleaning process is repeated again after the remaining data are analyzed until the data are confirmed to meet the predetermined quality standard (meeting the predetermined quality standard is "pass" in fig. 5, and not meeting it is "fail" in fig. 5).
It should be noted that the execution order of steps S1001 and S1002 may be changed, that is, step S1002 may be executed first, and then step S1001 may be executed.
Through the preprocessing, the quality of the sample titles in the multimedia title data set in the steps S110 to S150 can be higher, so as to facilitate the processing of the subsequent steps S110 to S150.
In a possible implementation manner, the step S150 may specifically include the steps of:
step S1501, layering each word obtained by segmenting each sample title according to the inter-word association degree; and
and step S1502, carrying out differential thumbnail display on each sample title according to the layering result.
Specifically, for example, for the title "song xiao bao junior mad relative of junior and junior, after the word segmentation and the common stop words are cleared, the title can be segmented into four words" song xiao bao "," junior and mad relative ". According to the calculation in the step S140, the highest inter-word association value corresponding to "songxibao" and "mad relative" is obtained, and the lowest inter-word association value corresponding to "big and complete" is obtained after the inter-word association value corresponding to "minor article" is obtained.
Then, according to the above step S1501, the "songxubao" and the "mad relative" can be divided into the first layer, the "small article" into the second layer, and the "big whole" into the third layer.
Then, in step S1502, the sample title is displayed differentially according to the above layering result, so as to adapt to the screen of the terminal device and highlight the core theme. For example, when the display length of the screen of the terminal device is only long enough to display two words of "songbao" and "mad relative", only the first layer is displayed, and the second layer and the third layer are hidden. When the display length of the screen of the terminal device is long enough to display the three words "songbao", "mad relative", and "figurine", the above-described first layer and second layer may be displayed while only the third layer is hidden, and the first layer may be highlighted using, for example, a difference in color or the like.
In this way, through the above steps S1501 and S1502, the sample titles can be displayed in a differentiated and abbreviated manner, so as to obtain a better display effect.
Thus, the multimedia title display method of the embodiment of the invention can utilize the natural language processing technology, construct a word association network (word association degree) based on word segmentation processing, the part-of-speech tags provided by the word segmentation processing and the part-of-speech weights corresponding to the word segmentation processing and a known statistical model, preferentially display the core subject words of the title according to the word association degree, and hide the modified overlapped words or words with lower weights associated with the core subject words so as to dynamically adapt to the screen of the terminal equipment. Therefore, on the premise of not changing the title, the core theme of the title and the content to be displayed are clarified, the problem that the user locates the theme of the long video title is solved, and the information acquisition efficiency and the user experience are improved.
Example 5
Fig. 6 illustrates a structural block of a multimedia title display apparatus according to an embodiment of the present invention. As shown in fig. 6, the multimedia title display apparatus 60 includes: a word segmentation unit 61, configured to perform word segmentation on each sample title included in the multimedia title data set to obtain multiple words; a statistical model establishing unit 62 connected to the word segmentation unit 61 and configured to establish a statistical model according to the obtained plurality of words; a calculating unit 63, connected to the word segmentation unit 61 and the statistical model establishing unit 62, for calculating the inter-word association weight and the inter-word association degree factor corresponding to each obtained word according to the established statistical model; a determining unit 64, connected to the calculating unit 63, for determining an inter-word association degree corresponding to each of the obtained words according to the calculated inter-word association weight and inter-word association degree factor; and a thumbnail display unit 65 connected to the determination unit 64 for displaying each sample title in the multimedia title data set in a thumbnail manner according to the inter-word association degree.
The multimedia title display apparatus 60 according to the embodiment of the present invention can execute the multimedia title display method described in any one of the embodiments 1 to 4, and the specific flow of the multimedia title display method is described in detail in the embodiments.
The multimedia title display device provided by the embodiment of the invention can utilize a natural language processing technology, construct a word association network (word association degree) based on word segmentation processing, the part-of-speech labels provided by the word segmentation processing and the part-of-speech weights corresponding to the part-of-speech labels, and combine with a known statistical model, preferentially display the core subject words of the title according to the word association degree, and hide the modified overlapped words or words with lower weights related to the core subject words so as to dynamically adapt to the screen of the terminal equipment. Therefore, on the premise of not changing the title, the core theme of the title and the content to be displayed are clarified, the problem that the user locates the theme of the long video title is solved, and the information acquisition efficiency and the user experience are improved.
Example 6
Fig. 7 illustrates a structural block of a multimedia title display apparatus according to an embodiment of the present invention. Components in fig. 7 that are numbered the same as those in fig. 6 have the same functions, and detailed descriptions of these components are omitted for the sake of brevity.
As shown in fig. 7, the multimedia title display apparatus 70 according to the embodiment of the present invention is mainly different from the multimedia title display apparatus 60 according to the previous embodiment in that the determining unit 64 mainly comprises: a calculating module 641, configured to calculate word weights corresponding to the obtained words according to the inter-word association factor; a determining module 642, connected to the calculating module 641, for determining an inter-word association degree corresponding to each of the obtained words according to the inter-word association weight and the word weight.
In one possible implementation, the determining module 642 calculates the inter-word association degree by using the following formula 1,
Figure BDA0000933257350000231
in the formula 1, the compound is shown in the specification,
wherein, Co (X, y) represents the word association degree between the word X and the word y, X (X, y) represents the word association weight between the word X and the word y, and w (X), w (y), w (xy) represent the word weights corresponding to the word X, y and xy respectively.
In one possible implementation, the factors of the degree of association between words include word frequency and document inversion frequency,
the calculating module 641 calculates a word weight corresponding to each of the obtained words using the following formula 2,
Figure BDA0000933257350000232
in the formula (2), the first and second groups,
wherein, TF (x), TF (y), TF (xy) respectively represent the word frequency corresponding to the words x, y, xy, IDF (x), IDF (y), IDF (xy) respectively represent the document reversal frequency corresponding to the words x, y, xy.
In one possible implementation, the factors of the degree of association between words include word frequency, document inversion frequency and word activity,
the calculating module 641 calculates a word weight corresponding to each of the obtained words using the following equation 3,
Figure BDA0000933257350000241
in the formula 3, the first step is,
wherein, TF (x), (y), TF (xy) respectively represent the word frequency corresponding to the words x, y, xy, IDF (x), (y), IDF (xy) respectively represent the document reversal frequency corresponding to the words x, y, xy, H (x), (y), H (xy) respectively represent the word activity corresponding to the words x, y, xy.
In one possible implementation, the inter-word association factor includes word frequency, document inversion frequency and part-of-speech weight,
the calculating module 641 calculates a word weight corresponding to each of the obtained words using the following equation 4,
Figure BDA0000933257350000242
in the formula (4), the first and second groups,
wherein, TF (x), (y), TF (xy) respectively represents the word frequency corresponding to the words x, y, xy, IDF (x), (y), IDF (xy) respectively represents the document reversal frequency corresponding to the words x, y, xy, TN (x), (y), TN (xy) respectively represents the part of speech weight corresponding to the words x, y, xy, α represents the part of speech weight parameter for adding or reducing the part of speech weight.
In one possible implementation, the association degree factors include word frequency, document inversion frequency, word activity degree and part-of-speech weight,
the calculating module 641 calculates a word weight corresponding to each of the obtained words using the following equation 5,
Figure BDA0000933257350000243
in the formula 5, the first step is,
TF (x), TF (y), TF (xy) respectively represent word frequencies corresponding to the words x, y and xy, IDF (x), IDF (y) and IDF (xy) respectively represent document reversal frequencies corresponding to the words x, y and xy, H (x), H (y) and H (xy) respectively represent word activity degrees corresponding to the words x, y and xy, TN (x), TN (y) and TN (xy) respectively represent part-of-speech weights corresponding to the words x, y and xy, and α represents part-of-speech weight parameters for adding and reducing the part-of-speech weights.
In one possible implementation, the thumbnail display unit 65 is further configured to:
and displaying the multimedia titles except the multimedia title data set in a thumbnail mode according to the word association degree.
In a possible implementation manner, the multimedia title display apparatus 70 may further include a preprocessing unit 66, where the preprocessing unit 66 is connected to the word segmentation unit 61, and is configured to preprocess each sample title,
wherein the preprocessing unit 66 is specifically configured to: carrying out normalization processing on each sample title; and cleaning each sample title subjected to the normalization processing.
In one possible implementation, the thumbnail display unit 65 is configured to: layering each word obtained by segmenting each sample title according to the inter-word association degree; and carrying out differential thumbnail display on each sample title according to the layering result.
The multimedia title display apparatus 70 of the embodiment of the present invention can execute the multimedia title display method described in any one of the embodiments 1 to 4, and the specific flow of the multimedia title display method is described in detail in the embodiments.
The multimedia title display device provided by the embodiment of the invention can utilize a natural language processing technology, construct a word association network (word association degree) based on word segmentation processing, the part-of-speech labels provided by the word segmentation processing and the part-of-speech weights corresponding to the part-of-speech labels, and combine with a known statistical model, preferentially display the core subject words of the title according to the word association degree, and hide the modified overlapped words or words with lower weights related to the core subject words so as to dynamically adapt to the screen of the terminal equipment. Therefore, on the premise of not changing the title, the core theme of the title and the content to be displayed are clarified, the problem that the user locates the theme of the long video title is solved, and the information acquisition efficiency and the user experience are improved.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims (18)

1. A method for displaying a multimedia title, comprising:
performing word segmentation processing on each sample title included in the multimedia title data set to obtain a plurality of words;
establishing a statistical model according to the obtained words;
calculating the inter-word association weight and the inter-word association degree factor respectively corresponding to each obtained word according to the established statistical model;
determining the inter-word association degree corresponding to each obtained word according to the calculated inter-word association weight and the inter-word association degree factor; and
carrying out thumbnail display on each sample title in the multimedia title data set according to the word association degree so as to enable the length of the title after the thumbnail display to be suitable for a screen of a terminal device, wherein the length of the title after the thumbnail display is smaller than the original length of the sample title;
wherein determining the inter-word association degrees corresponding to the respective words according to the calculated inter-word association weights and the inter-word association degree factors includes:
calculating word weights corresponding to the obtained words according to the factors of the association degree among the words;
and determining the word association degree corresponding to each obtained word according to the word association weight and the word weight.
2. The method of claim 1, wherein determining an inter-word association degree corresponding to each of the obtained words according to the inter-word association weight and the word weight comprises:
calculating the association degree between words by using the following formula 1,
in the formula 1, the compound is shown in the specification,
wherein the content of the first and second substances,
Figure 950294DEST_PATH_IMAGE003
representing the degree of inter-word association between word x and word y,
Figure 995611DEST_PATH_IMAGE005
representing an inter-word association weight between word x and word y,
Figure 656399DEST_PATH_IMAGE007
respectively representing the word weights corresponding to words x, y, xy.
3. The multimedia title display method of claim 2, wherein the factors of the degree of association between words include a word frequency and a document inversion frequency,
calculating a word weight corresponding to each of the obtained words according to the inter-word association degree factor, including:
calculating a word weight corresponding to each of the obtained words based on the word frequency and the document inversion frequency using the following formula 2,
Figure 470771DEST_PATH_IMAGE009
in the formula (2), the first and second groups,
wherein the content of the first and second substances,
Figure 558813DEST_PATH_IMAGE011
respectively representing the word frequencies corresponding to the words x, y, xy,
Figure 91425DEST_PATH_IMAGE013
respectively, represent document inversion frequencies corresponding to the words x, y, xy.
4. The multimedia title display method of claim 2, wherein the inter-word association degree factors include word frequency, document inversion frequency and word activity,
calculating a word weight corresponding to each of the obtained words according to the inter-word association degree factor, including:
calculating a word weight corresponding to each of the obtained words according to the word frequency, the document inversion frequency, and the word liveness by using the following formula 3,
in the formula 3, the first step is,
wherein the content of the first and second substances,
Figure 224784DEST_PATH_IMAGE011
respectively representing the word frequencies corresponding to the words x, y, xy,
Figure 218147DEST_PATH_IMAGE013
respectively represent the document inversion frequencies corresponding to the words x, y, xy,
Figure 238056DEST_PATH_IMAGE017
respectively representing the word liveness corresponding to the words x, y, xy.
5. The multimedia title display method of claim 2, wherein the factors of the degree of association between words include word frequency, document inversion frequency and part-of-speech weight,
calculating a word weight corresponding to each of the obtained words according to the inter-word association degree factor, including:
calculating a word weight corresponding to each of the obtained words based on the word frequency, the document inversion frequency, and the part-of-speech weight using the following equation 4,
in the formula (4), the first and second groups,
wherein the content of the first and second substances,
Figure 764032DEST_PATH_IMAGE011
respectively representing the word frequencies corresponding to the words x, y, xy,
Figure 459456DEST_PATH_IMAGE013
respectively represent the document inversion frequencies corresponding to the words x, y, xy,
Figure 966661DEST_PATH_IMAGE021
respectively representing part-of-speech weights corresponding to the words x, y, xy,
Figure 241784DEST_PATH_IMAGE023
and representing a part-of-speech weight parameter for adding or subtracting part-of-speech weight.
6. The multimedia title display method of claim 2, wherein the factors of the degree of association between words include word frequency, document inversion frequency, word liveness and part-of-speech weight,
calculating a word weight corresponding to each of the obtained words according to the inter-word association degree factor, including:
calculating a word weight corresponding to each of the obtained words based on the word frequency, the document inversion frequency, the word liveness, and the part-of-speech weight using the following equation 5,
Figure 150834DEST_PATH_IMAGE025
in the formula 5, the first step is,
wherein the content of the first and second substances,
Figure 751580DEST_PATH_IMAGE011
respectively representing the word frequencies corresponding to the words x, y, xy,respectively represent the document inversion frequencies corresponding to the words x, y, xy,respectively representing the word liveness corresponding to the words x, y and xy,
Figure 588452DEST_PATH_IMAGE021
respectively representing part-of-speech weights corresponding to the words x, y, xy,
Figure 625678DEST_PATH_IMAGE023
and representing a part-of-speech weight parameter for adding or subtracting part-of-speech weight.
7. The multimedia title display method according to any one of claims 1-6, further comprising:
and displaying the multimedia titles except the multimedia title data set in a thumbnail mode according to the word association degree.
8. The method according to any one of claims 1 to 6, wherein before performing the word segmentation, the method further comprises preprocessing each of the sample titles, specifically comprising:
carrying out normalization processing on each sample title; and
and cleaning each sample title subjected to normalization processing.
9. The method of any of claims 1-6, wherein displaying each sample title in the multimedia title data set in a thumbnail according to the degree of association between words comprises:
layering each word obtained by segmenting each sample title according to the inter-word association degree;
and carrying out differential thumbnail display on each sample title according to the layering result.
10. A multimedia title display apparatus, comprising:
the word segmentation unit is used for performing word segmentation processing on each sample title included in the multimedia title data set to obtain a plurality of words;
the statistical model establishing unit is connected with the word segmentation unit and used for establishing a statistical model according to the obtained words;
the calculation unit is connected with the word segmentation unit and the statistical model establishment unit and is used for calculating the inter-word association weight and the inter-word association degree factor respectively corresponding to each obtained word according to the established statistical model;
a determining unit, connected to the calculating unit, for determining an inter-word association degree corresponding to each of the obtained words according to the calculated inter-word association weight and inter-word association degree factor; and
the thumbnail display unit is connected with the determining unit and is used for carrying out thumbnail display on each sample title in the multimedia title data set according to the inter-word association degree so as to enable the length of the title after the thumbnail display to be suitable for the screen of the terminal equipment, wherein the length of the title after the thumbnail display is smaller than the original length of the sample title;
wherein the determination unit includes:
the calculation module is used for calculating the word weight corresponding to each obtained word according to the inter-word association degree factor;
and the determining module is connected with the calculating module and used for determining the association degree between words corresponding to the obtained words according to the association weight between words and the weight between words.
11. The apparatus of claim 10, wherein the determining module calculates the degree of inter-word association using the following equation 1,
Figure 841896DEST_PATH_IMAGE001
in the formula 1, the compound is shown in the specification,
wherein the content of the first and second substances,
Figure 989980DEST_PATH_IMAGE003
representing the degree of inter-word association between word x and word y,representing an inter-word association between word x and word yThe weight of the weight is calculated,
Figure 550592DEST_PATH_IMAGE007
respectively representing the word weights corresponding to words x, y, xy.
12. The multimedia title display device of claim 11, wherein the inter-word association factor includes a word frequency and a document inversion frequency,
the calculation module calculates word weights corresponding to the respective resulting words using the following equation 2,
Figure 254105DEST_PATH_IMAGE009
in the formula (2), the first and second groups,
wherein the content of the first and second substances,respectively representing the word frequencies corresponding to the words x, y, xy,
Figure 412871DEST_PATH_IMAGE013
respectively, represent document inversion frequencies corresponding to the words x, y, xy.
13. The multimedia title display device of claim 11, wherein the inter-word association degree factors include word frequency, document inversion frequency and word activity,
the calculation module calculates word weights corresponding to the respective resulting words using the following equation 3,
Figure 791900DEST_PATH_IMAGE015
in the formula 3, the first step is,
wherein the content of the first and second substances,
Figure 717131DEST_PATH_IMAGE011
respectively representing the word frequencies corresponding to the words x, y, xy,respectively represent the document inversion frequencies corresponding to the words x, y, xy,respectively representing the word liveness corresponding to the words x, y, xy.
14. The multimedia title display device of claim 11, wherein the inter-word association degree factors include word frequency, document inversion frequency and part-of-speech weight,
the calculation module calculates word weights corresponding to the respective resulting words using the following equation 4,
in the formula (4), the first and second groups,
wherein the content of the first and second substances,
Figure 496551DEST_PATH_IMAGE011
respectively representing the word frequencies corresponding to the words x, y, xy,
Figure 524550DEST_PATH_IMAGE013
respectively represent the document inversion frequencies corresponding to the words x, y, xy,respectively representing part-of-speech weights corresponding to the words x, y, xy,
Figure 161384DEST_PATH_IMAGE023
and representing a part-of-speech weight parameter for adding or subtracting part-of-speech weight.
15. The multimedia title display device of claim 11, wherein the inter-word association degree factors include word frequency, document inversion frequency, word liveness and part-of-speech weight,
the calculation module calculates word weights corresponding to the respective resulting words using the following equation 5,
Figure 326787DEST_PATH_IMAGE025
in the formula 5, the first step is,
wherein the content of the first and second substances,
Figure 892897DEST_PATH_IMAGE011
respectively representing the word frequencies corresponding to the words x, y, xy,
Figure 460145DEST_PATH_IMAGE013
respectively represent the document inversion frequencies corresponding to the words x, y, xy,respectively representing the word liveness corresponding to the words x, y and xy,
Figure 738996DEST_PATH_IMAGE021
respectively representing part-of-speech weights corresponding to the words x, y, xy,
Figure 108798DEST_PATH_IMAGE023
and representing a part-of-speech weight parameter for adding or subtracting part-of-speech weight.
16. The multimedia title display device of any of claims 10-15, wherein the thumbnail display unit is further configured to:
and displaying the multimedia titles except the multimedia title data set in a thumbnail mode according to the word association degree.
17. The multimedia title display device according to any one of claims 10-15, further comprising a preprocessing unit connected to the segmentation unit for preprocessing each of the sample titles,
wherein the preprocessing unit is specifically configured to:
carrying out normalization processing on each sample title; and
and cleaning each sample title subjected to normalization processing.
18. The multimedia title display device of any of claims 10-15, wherein the thumbnail display unit is configured to:
layering each word obtained by segmenting each sample title according to the inter-word association degree;
and carrying out differential thumbnail display on each sample title according to the layering result.
CN201610118441.0A 2016-03-02 2016-03-02 Multimedia title display method and device Active CN105787078B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610118441.0A CN105787078B (en) 2016-03-02 2016-03-02 Multimedia title display method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610118441.0A CN105787078B (en) 2016-03-02 2016-03-02 Multimedia title display method and device

Publications (2)

Publication Number Publication Date
CN105787078A CN105787078A (en) 2016-07-20
CN105787078B true CN105787078B (en) 2020-02-14

Family

ID=56386904

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610118441.0A Active CN105787078B (en) 2016-03-02 2016-03-02 Multimedia title display method and device

Country Status (1)

Country Link
CN (1) CN105787078B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106681983A (en) * 2016-11-25 2017-05-17 北京掌行通信息技术有限公司 Station name participle display method and device
CN108460150A (en) * 2018-03-23 2018-08-28 北京奇虎科技有限公司 The processing method and processing device of headline
CN109815499B (en) * 2019-01-25 2023-05-23 杭州凡闻科技有限公司 Information association method and system
CN111581952B (en) * 2020-05-20 2023-10-03 长沙理工大学 Large-scale replaceable word library construction method for natural language information hiding

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101315624A (en) * 2007-05-29 2008-12-03 阿里巴巴集团控股有限公司 Text subject recommending method and device
CN104391963A (en) * 2014-12-01 2015-03-04 北京中科创益科技有限公司 Method for constructing correlation networks of keywords of natural language texts
CN105260359A (en) * 2015-10-16 2016-01-20 晶赞广告(上海)有限公司 Semantic keyword extraction method and apparatus

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101115024A (en) * 2006-07-28 2008-01-30 阿里巴巴公司 Method and system for displaying web page contents related information
UY33576A (en) * 2010-08-31 2012-03-30 Directv Group Inc METHOD AND SYSTEM TO LOOK FOR THE CONTENT OF A USER'S DEVICE D
CN103744954B (en) * 2014-01-06 2017-02-01 同济大学 Word relevancy network model establishing method and establishing device thereof

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101315624A (en) * 2007-05-29 2008-12-03 阿里巴巴集团控股有限公司 Text subject recommending method and device
CN104391963A (en) * 2014-12-01 2015-03-04 北京中科创益科技有限公司 Method for constructing correlation networks of keywords of natural language texts
CN105260359A (en) * 2015-10-16 2016-01-20 晶赞广告(上海)有限公司 Semantic keyword extraction method and apparatus

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"新闻网页抽取技术的研究与实现";王星;《中国优秀硕士学位论文全文数据库 信息科技辑》;20120715(第07期);第4.2节 *

Also Published As

Publication number Publication date
CN105787078A (en) 2016-07-20

Similar Documents

Publication Publication Date Title
CN111259215B (en) Multi-mode-based topic classification method, device, equipment and storage medium
CN105787078B (en) Multimedia title display method and device
KR100708337B1 (en) Apparatus and method for automatic video summarization using fuzzy one-class support vector machines
CN109344241A (en) Recommended method, device, terminal and the storage medium of information
CN105893478A (en) Tag extraction method and equipment
CN105975499A (en) Text subject detection method and system
CN108920456A (en) A kind of keyword Automatic method
CN105824923A (en) Movie and video resource recommendation method and device
KR20120088650A (en) Estimating and displaying social interest in time-based media
CN108491463A (en) Label determines method and device
CN113454954A (en) Real-time event detection on social data streams
CN110427897A (en) Analysis method, device and the server of video highlight degree
CN101489139A (en) Video advertisement correlation method and system based on visual saliency
CN106951415A (en) A kind of name of firm searching method and device
Zhao et al. A novel system for visual navigation of educational videos using multimodal cues
CN110287314A (en) Long text credibility evaluation method and system based on Unsupervised clustering
CN113779381A (en) Resource recommendation method and device, electronic equipment and storage medium
US11361759B2 (en) Methods and systems for automatic generation and convergence of keywords and/or keyphrases from a media
CN108170845A (en) Multimedia data processing method, device and storage medium
Nigam et al. Towards a robust metric of polarity
CN109062905B (en) Barrage text value evaluation method, device, equipment and medium
CN104427263A (en) Method for displaying subtitles and multimedia playing device
CN107590163B (en) The methods, devices and systems of text feature selection
Li et al. Confidence estimation and reputation analysis in aspect extraction
CN112804580B (en) Video dotting method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder
CP01 Change in the name or title of a patent holder

Address after: 100080 Beijing Haidian District city Haidian street A Sinosteel International Plaza No. 8 block 5 layer A, C

Patentee after: Youku network technology (Beijing) Co.,Ltd.

Address before: 100080 Beijing Haidian District city Haidian street A Sinosteel International Plaza No. 8 block 5 layer A, C

Patentee before: 1VERGE INTERNET TECHNOLOGY (BEIJING) Co.,Ltd.

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20200522

Address after: 310052 room 508, floor 5, building 4, No. 699, Wangshang Road, Changhe street, Binjiang District, Hangzhou City, Zhejiang Province

Patentee after: Alibaba (China) Co.,Ltd.

Address before: 100080 Beijing Haidian District city Haidian street A Sinosteel International Plaza No. 8 block 5 layer A, C

Patentee before: Youku network technology (Beijing) Co.,Ltd.