Multimedia title display method and device
Technical Field
The present invention relates to the field of multimedia processing, and in particular, to a method and an apparatus for displaying a multimedia title.
Background
Titles with many characters appear because the authoring of titles of user multimedia data, such as user video data, is an incompletely controllable behavior. The effects of the titles displayed in different devices are different, the length of characters which can be displayed on screens of different terminal devices is different, and for titles with more characters, some screens cannot be displayed completely, so that the completeness of information display is influenced, and the comprehension of a user to a video theme is reduced.
Meanwhile, in applications such as video data aggregation, titles with large difference in length and different sequences are displayed on the same page, and the display causes visual salience, so that the page is disordered and not attractive, and the browsing experience of a user is reduced. Therefore, the layout style of the video title needs to be unified according to the terminal device, and the user experience and the efficiency of obtaining the video theme are improved.
In the prior art, there are several solutions for solving the problem of too long multimedia titles: in the first scheme, when the length of the title exceeds a limited range, the title is intercepted from left to right, and an excessive part is replaced by an ellipsis; when the length of the title exceeds the limited range, reserving characters before and after the title contains the search keyword, and replacing the left and right excess parts with ellipses; a third solution is to provide a second title with a shorter character for the title according to the method used in patent document 1, chinese patent publication No. CN1860454A, and to select and use the second title according to the length of the character that can be accommodated; in a fourth scheme, according to the method used in patent document 2, namely chinese patent publication No. CN104008115A, a preset floating title bar is provided for titles in the wap page that are not in the device screen, so that the titles can be completely displayed by floating the window.
Through the technologies, the display problem of the general overlong character titles can be basically solved, but the display method cannot achieve good effect in applications such as user video aggregation data. For example, in the user video aggregate data, some video titles are a series, some video titles are a heap of keywords, and the video titles of the entire series are almost identical except for the number or the subject. Therefore, if only the characters beyond the screen are abbreviated, the user can make the illusion that all videos are the same when viewing the video titles, and the video title topics cannot be accurately reflected, so that each video topic cannot be distinguished, the selection of the user on the videos is influenced, and the user cannot directly watch the videos, and the user experience is influenced. In addition, for generating a plurality of titles including the second title, it is a waste for storage, it is more difficult to bear when the number of video titles is large, and the screens of various terminals may accommodate a variable number of characters, which may require generating various titles to accommodate. In addition, the use of the floating frame can prolong the waiting time of the user for each long title, and the user can obtain the video theme information only by staring at the screen all the time, so that the time for the user to determine the video title theme is prolonged, the efficiency for the user to obtain the theme information is influenced, and the user experience is reduced to a certain extent.
Disclosure of Invention
Technical problem
In view of the above, the technical problem to be solved by the present invention is how to properly display a multimedia title, particularly a long title, so as to improve user experience.
Solution scheme
In order to solve the above technical problem, according to an embodiment of the present invention, there is provided a multimedia title display method including: performing word segmentation processing on each sample title included in the multimedia title data set to obtain a plurality of words; establishing a statistical model according to the obtained words; calculating the inter-word association weight and the inter-word association degree factor respectively corresponding to each obtained word according to the established statistical model; determining the inter-word association degree corresponding to each obtained word according to the calculated inter-word association weight and the inter-word association degree factor; and carrying out thumbnail display on each sample title in the multimedia title data set according to the word association degree.
For the above multimedia title display method, in one possible implementation manner, determining an inter-word association degree corresponding to each obtained word according to the calculated inter-word association weight and the inter-word association degree factor includes: calculating word weights corresponding to the obtained words according to the factors of the association degree among the words; and determining the word association degree corresponding to each obtained word according to the word association weight and the word weight.
For the above multimedia title display method, in one possible implementation manner, determining an inter-word association degree corresponding to each obtained word according to the inter-word association weight and the word weight includes:
calculating the association degree between words by using the following formula 1,
in the formula 1, the compound is shown in the specification,
wherein, Co (X, y) represents the word association degree between the word X and the word y, X (X, y) represents the word association weight between the word X and the word y, and w (X), w (y), w (xy) represent the word weights corresponding to the word X, y and xy respectively.
For the above multimedia title display method, in one possible implementation, the inter-word association factor includes a word frequency and a document inversion frequency,
calculating a word weight corresponding to each of the obtained words according to the inter-word association degree factor, including:
calculating a word weight corresponding to each of the obtained words based on the word frequency and the document inversion frequency using the following formula 2,
in the formula (2), the first and second groups,
wherein, TF (x), TF (y), TF (xy) respectively represent the word frequency corresponding to the words x, y, xy, IDF (x), IDF (y), IDF (xy) respectively represent the document reversal frequency corresponding to the words x, y, xy.
With respect to the above multimedia title display method, in one possible implementation, the inter-word association factor includes a word frequency, a document inversion frequency and a word activity,
calculating a word weight corresponding to each of the obtained words according to the inter-word association degree factor, including:
calculating a word weight corresponding to each of the obtained words according to the word frequency, the document inversion frequency, and the word liveness by using the following formula 3,
in the formula 3, the first step is,
wherein, TF (x), (y), TF (xy) respectively represent the word frequency corresponding to the words x, y, xy, IDF (x), (y), IDF (xy) respectively represent the document reversal frequency corresponding to the words x, y, xy, H (x), (y), H (xy) respectively represent the word activity corresponding to the words x, y, xy.
For the above multimedia title display method, in one possible implementation, the inter-word association factor includes word frequency, document inversion frequency and part-of-speech weight,
calculating a word weight corresponding to each of the obtained words according to the inter-word association degree factor, including:
calculating a word weight corresponding to each of the obtained words based on the word frequency, the document inversion frequency, and the part-of-speech weight using the following equation 4,
in the formula (4), the first and second groups,
wherein, TF (x), (y), TF (xy) respectively represents the word frequency corresponding to the words x, y, xy, IDF (x), (y), IDF (xy) respectively represents the document reversal frequency corresponding to the words x, y, xy, TN (x), (y), TN (xy) respectively represents the part of speech weight corresponding to the words x, y, xy, α represents the part of speech weight parameter for adding or reducing the part of speech weight.
With respect to the above multimedia title display method, in one possible implementation, the inter-word association factor includes word frequency, document inversion frequency, word liveness and part-of-speech weight,
calculating a word weight corresponding to each of the obtained words according to the inter-word association degree factor, including:
calculating a word weight corresponding to each of the obtained words based on the word frequency, the document inversion frequency, the word liveness, and the part-of-speech weight using the following equation 5,
in the formula 5, the first step is,
TF (x), TF (y), TF (xy) respectively represent word frequencies corresponding to the words x, y and xy, IDF (x), IDF (y) and IDF (xy) respectively represent document reversal frequencies corresponding to the words x, y and xy, H (x), H (y) and H (xy) respectively represent word activity degrees corresponding to the words x, y and xy, TN (x), TN (y) and TN (xy) respectively represent part-of-speech weights corresponding to the words x, y and xy, and α represents part-of-speech weight parameters for adding and reducing the part-of-speech weights.
As to the above multimedia title display method, in a possible implementation manner, the multimedia title display method further includes:
and displaying the multimedia titles except the multimedia title data set in a thumbnail mode according to the word association degree.
As for the above multimedia title display method, in a possible implementation manner, before performing the word segmentation process, the multimedia title display method further includes preprocessing each sample title, specifically including:
carrying out normalization processing on each sample title; and
and cleaning each sample title subjected to normalization processing.
For the method for displaying a multimedia title, in a possible implementation manner, the displaying each sample title in the multimedia title data set in a thumbnail manner according to the inter-word association degree includes:
layering each word obtained by segmenting each sample title according to the inter-word association degree;
and carrying out differential thumbnail display on each sample title according to the layering result.
In order to solve the above technical problem, according to another embodiment of the present invention, there is provided a multimedia title display apparatus including: the word segmentation unit is used for performing word segmentation processing on each sample title included in the multimedia title data set to obtain a plurality of words; the statistical model establishing unit is connected with the word segmentation unit and used for establishing a statistical model according to the obtained words; the calculation unit is connected with the word segmentation unit and the statistical model establishment unit and is used for calculating the inter-word association weight and the inter-word association degree factor respectively corresponding to each obtained word according to the established statistical model; a determining unit, connected to the calculating unit, for determining an inter-word association degree corresponding to each of the obtained words according to the calculated inter-word association weight and inter-word association degree factor; and the abbreviative display unit is connected with the determining unit and is used for displaying each sample title in the multimedia title data set in an abbreviative mode according to the association degree among the words.
With regard to the above multimedia title display apparatus, in one possible implementation, the determining unit includes:
the calculation module is used for calculating the word weight corresponding to each obtained word according to the inter-word association degree factor;
and the determining module is connected with the calculating module and used for determining the association degree between words corresponding to the obtained words according to the association weight between words and the weight between words.
For the above multimedia title display apparatus, in one possible implementation manner, the determining module calculates the inter-word association degree by using the following formula 1,
in the formula 1, the compound is shown in the specification,
wherein, Co (X, y) represents the word association degree between the word X and the word y, X (X, y) represents the word association weight between the word X and the word y, and w (X), w (y), w (xy) represent the word weights corresponding to the word X, y and xy respectively.
With the above multimedia title display apparatus, in one possible implementation, the inter-word association degree factor includes a word frequency and a document inversion frequency,
the calculation module calculates word weights corresponding to the respective resulting words using the following equation 2,
in the formula (2), the first and second groups,
wherein, TF (x), TF (y), TF (xy) respectively represent the word frequency corresponding to the words x, y, xy, IDF (x), IDF (y), IDF (xy) respectively represent the document reversal frequency corresponding to the words x, y, xy.
With the above multimedia title display apparatus, in one possible implementation, the inter-word association degree factors include word frequency, document inversion frequency, and word liveness,
the calculation module calculates word weights corresponding to the respective resulting words using the following equation 3,
in the formula 3, the first step is,
wherein, TF (x), (y), TF (xy) respectively represent the word frequency corresponding to the words x, y, xy, IDF (x), (y), IDF (xy) respectively represent the document reversal frequency corresponding to the words x, y, xy, H (x), (y), H (xy) respectively represent the word activity corresponding to the words x, y, xy.
With the above multimedia title display apparatus, in one possible implementation, the inter-word association degree factors include word frequency, document inversion frequency, and part-of-speech weight,
the calculation module calculates word weights corresponding to the respective resulting words using the following equation 4,
in the formula (4), the first and second groups,
wherein, TF (x), (y), TF (xy) respectively represents the word frequency corresponding to the words x, y, xy, IDF (x), (y), IDF (xy) respectively represents the document reversal frequency corresponding to the words x, y, xy, TN (x), (y), TN (xy) respectively represents the part of speech weight corresponding to the words x, y, xy, α represents the part of speech weight parameter for adding or reducing the part of speech weight.
With the above multimedia title display apparatus, in one possible implementation, the inter-word association degree factors include word frequency, document inversion frequency, word liveness and part-of-speech weight,
the calculation module calculates word weights corresponding to the respective resulting words using the following equation 5,
in the formula 5, the first step is,
TF (x), TF (y), TF (xy) respectively represent word frequencies corresponding to the words x, y and xy, IDF (x), IDF (y) and IDF (xy) respectively represent document reversal frequencies corresponding to the words x, y and xy, H (x), H (y) and H (xy) respectively represent word activity degrees corresponding to the words x, y and xy, TN (x), TN (y) and TN (xy) respectively represent part-of-speech weights corresponding to the words x, y and xy, and α represents part-of-speech weight parameters for adding and reducing the part-of-speech weights.
With respect to the above multimedia title display apparatus, in one possible implementation, the thumbnail display unit is further configured to:
and displaying the multimedia titles except the multimedia title data set in a thumbnail mode according to the word association degree.
For the above multimedia title display apparatus, in a possible implementation manner, the multimedia title display apparatus further includes a preprocessing unit, connected to the word segmentation unit, for preprocessing each sample title,
wherein the preprocessing unit is specifically configured to:
carrying out normalization processing on each sample title; and
and cleaning each sample title subjected to normalization processing.
With regard to the above multimedia title display apparatus, in one possible implementation, the thumbnail display unit is configured to:
layering each word obtained by segmenting each sample title according to the inter-word association degree;
and carrying out differential thumbnail display on each sample title according to the layering result.
Advantageous effects
Through the multimedia title display method and device provided by the embodiment of the invention, a word association network (word association degree) can be constructed by utilizing a natural language processing technology based on word segmentation processing, the part-of-speech labels provided by the word segmentation processing and the part-of-speech weights corresponding to the word segmentation labels, and a known statistical model is combined, the core subject words of the title are preferentially displayed according to the word association degree, and the modified overlapped words or words with lower weights related to the core subject words are hidden, so that the method and device can dynamically adapt to the screen of the terminal equipment. Therefore, on the premise of not changing the title, the core theme of the title and the content to be displayed are clarified, the problem that the user locates the theme of the long video title is solved, and the information acquisition efficiency and the user experience are improved.
Other features and aspects of the present invention will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments, features, and aspects of the invention and, together with the description, serve to explain the principles of the invention.
Fig. 1 illustrates a flowchart of a multimedia title display method according to an embodiment of the present invention;
fig. 2 illustrates a flowchart of a multimedia title display method according to another embodiment of the present invention;
FIG. 3 is a diagram illustrating the determination of inter-word association based on four inter-word association factors, word frequency, document inversion frequency, part of speech, and word activity, and inter-word association weights;
fig. 4 illustrates a flowchart of a multimedia title display method according to still another embodiment of the present invention;
FIG. 5 shows a schematic diagram of pre-processing of sample titles;
fig. 6 is a block diagram illustrating a structure of a multimedia title display apparatus according to an embodiment of the present invention;
fig. 7 is a block diagram illustrating a structure of a multimedia title display apparatus according to another embodiment of the present invention.
Detailed Description
Various exemplary embodiments, features and aspects of the present invention will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers can indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.
The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.
Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present invention. It will be understood by those skilled in the art that the present invention may be practiced without some of these specific details. In some instances, methods, procedures, components, and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present invention.
In view of the above problems in the background art, the present invention provides a multimedia title display method that combines word association degrees, aiming at the problem of multimedia title display strategy, especially long title display strategy. The method utilizes an NLP (Natural language Processing) technology, based on word segmentation Processing and part-of-speech labels (whether entity words exist) provided by the word segmentation Processing and part-of-speech weights corresponding to the entity words are combined with a known statistical model to construct a word association network (word association degree), core subject words of a title are preferentially displayed according to the word association degree, and words which are overlapped with modifications related to the core words or have lower weights are hidden, so that the method is dynamically adapted to a screen of a terminal device. Therefore, on the premise of not changing the title, the core theme of the title and the content to be displayed are clarified, the problem that the user locates the theme of the long video title is solved, and the information acquisition efficiency and the user experience are improved.
It should be noted that, in the present invention, the multimedia title display method and apparatus of the present invention have been described mainly by taking video as an example of multimedia, but the present invention is not limited thereto. Those skilled in the art will appreciate that the multimedia title display method and apparatus of the present invention are also applicable to the display of titles of other multimedia such as audio and electronic books.
Hereinafter, the multimedia title display method and apparatus of the present invention will be described in detail by the following embodiments.
Example 1
Fig. 1 illustrates a flowchart of a multimedia title display method according to an embodiment of the present invention. As shown in fig. 1, the multimedia title display method mainly includes the following steps S110 to S150.
Step S110, performing word segmentation on each sample title included in the multimedia title data set to obtain a plurality of words.
Specifically, a data set (sample data set) including a plurality of sample titles, for example 10000, is first acquired, and then, a natural language processing technique is used to perform a word segmentation process on each sample title in the acquired data set by an existing word segmentation method, for example, a word segmentation method based on character string matching or a word segmenter combining a dictionary and statistics.
It should be noted that the number of sample titles obtained above is merely an example, and the present invention is not limited thereto. Those skilled in the art can select the appropriate number of titles according to actual needs. Further, those skilled in the art know that the larger the number of titles selected, the more accurate the result is, but the larger the amount of calculation for performing statistics or the like increases.
By the word segmentation processing, each sample title can be divided into a plurality of words. And the type of each word and the corresponding part-of-speech weight thereof can be obtained according to the existing dictionary and statistics. Here, the types of words may be classified into solid words and non-solid words. For example, the term "western" may be considered as a tv series or a movie, and both the tv series and the movie belong to the category of the entity words. As another example, a word such as "has no actual meaning and belongs to a non-physical word. Here, the part-of-speech weight refers to a probability that a word is an entity word or a non-entity word in the title. For example, the part-of-speech weight of the word "westernist" indicates the probability that the word is an entity word such as a television show or a movie in the title.
And step S120, establishing a statistical model according to the obtained words.
Specifically, after a plurality of words are obtained according to the word segmentation process of step S110 described above, a statistical model, for example, a trigram model, may be established based on the obtained plurality of words.
After the word segmentation process, for example, A, B, C, D, … … and other words can be obtained. In the process of establishing the model, common single word stop words such as "and" can be cleaned up according to needs, and then data such as the number and probability of occurrence of each word (word) in the data set are calculated.
In addition, in a possible implementation manner, in the process of establishing the model, parameters required for calculating the later-described inter-word association weight and inter-word association degree factor may be counted.
For example, in the process of establishing the model, the number of each word and the total word frequency, i.e. the sum of the occurrence times of all words, may be counted for each word. In other words, the parameters required for calculating the word frequency of each word described later can be counted in the process of establishing the model.
As another example, the number of times the word x appears in the title and the total number of titles may be counted. In other words, the parameters required for calculating the file inversion frequency of each word described later can be counted.
As another example, the number of titles that contain both the word x and the word y associated with x, the number of titles that contain no word x but the word y associated with x, the number of titles that contain the word x but no word y associated with x, the number of titles that contain neither the word x nor the word y associated with x, and the like may be counted. In other words, parameters required for calculating the association weight between words described later can be counted in the process of establishing the model.
As another example, the probability of n words associated with x may also be counted. In other words, the parameters required for calculating the word liveness described later can be counted in the process of establishing the model.
And step S130, calculating the inter-word association weight and the inter-word association degree factor corresponding to each obtained word according to the established statistical model.
Specifically, the inter-word association weight and the inter-word association degree factors such as the word frequency and the document inversion frequency corresponding to each word obtained by the word segmentation process may be calculated according to various types of parameters and data counted in the process of the statistical model established in step S120.
Step S140 determines an inter-word association degree corresponding to each of the obtained words according to the calculated inter-word association weight and the inter-word association degree factor.
After the inter-word association weight and the inter-word association degree factor are calculated in step S130, the inter-word association degree corresponding to each word may be calculated according to the two parameters.
In this way, based on the acquired data set and the established statistical model, a relationship network between words can be established, and then the display and hiding of characters in the title can be judged to perform the thumbnail display described later.
And S150, carrying out thumbnail display on each sample title in the multimedia title data set according to the association degree among the words.
Specifically, the relationship network between words, which is established according to the degree of association between words determined in step S140, is applied to each sample title in the data set, so that each sample title in the data set can be displayed in a thumbnail manner.
Therefore, for different terminal devices, by applying the multimedia title display method, the title characters beyond the screen display are not simply omitted, but the core subject words of the title are preferentially displayed, repeated description words and words with low weight are omitted until the device screen is adapted, and meanwhile, the meaning of the title is not changed, so that the information acquisition experience of a user is improved.
Thus, the multimedia title display method of the embodiment of the invention can utilize the natural language processing technology, construct a word association network (word association degree) based on word segmentation processing, the part-of-speech tags provided by the word segmentation processing and the part-of-speech weights corresponding to the word segmentation processing and a known statistical model, preferentially display the core subject words of the title according to the word association degree, and hide the modified overlapped words or words with lower weights associated with the core subject words so as to dynamically adapt to the screen of the terminal equipment. Therefore, on the premise of not changing the title, the core theme of the title and the content to be displayed are clarified, the problem that the user locates the theme of the long video title is solved, and the information acquisition efficiency and the user experience are improved.
Example 2
Fig. 2 illustrates a flowchart of a multimedia title display method according to another embodiment of the present invention. The steps in fig. 2, which are numbered the same as those in fig. 1, have the same functions, and detailed descriptions of the steps are omitted for the sake of brevity.
As shown in fig. 2, the main difference between the multimedia title display method shown in fig. 2 and the multimedia title display method shown in fig. 1 is that the step S140 may specifically include steps S1401 to S1402.
And S1401, calculating word weights corresponding to the obtained words according to the factors of the association degree among the words.
Specifically, in step S130, factors affecting the degree of association between words, that is, factors of the degree of association between words, may be calculated from various types of data and parameters in the established statistical model. The word frequency and the document inversion frequency can be calculated by using the parameters required for calculating the word frequency and the document inversion frequency, which are counted in the statistical model, and then the word weight corresponding to each word obtained after the word segmentation is correspondingly calculated according to the calculated word frequency and the document inversion frequency. Wherein the word weight represents the importance of the word in the sample data set.
In addition, besides the two factors of the word frequency and the document reversal frequency, the association degree factor between words may include, for example, the word activity. Accordingly, the word liveness may be calculated using the parameters required to calculate the word liveness counted in the statistical model, and then the word weight may be calculated through the calculated word frequency, document inversion frequency, and word liveness.
In addition, besides two factors of word frequency and document reversal frequency, the association degree between words can also consider the part of speech. Accordingly, the inter-word association degree factor may further include a part-of-speech weight, and then the word weight is calculated from the calculated word frequency, the document inversion frequency, and the part-of-speech weight obtained in the word segmentation process.
In addition, besides the two factors of the word frequency and the document reversal frequency, the association degree factor between words can also comprise the word activity degree and the part-of-speech weight at the same time, and then the word weight is calculated through the four factors of the word frequency, the document reversal frequency, the word activity degree and the part-of-speech weight.
S1402 determines the inter-word association degree corresponding to each of the obtained words according to the inter-word association weight and the word weight.
In one possible implementation, the inter-word association degree corresponding to each of the obtained words in step S1402 can be calculated by the following formula 1.
In the formula 1, the compound is shown in the specification,
wherein, Co (X, y) represents the degree of inter-word association between the word X and the word y, and X (X, y) represents the inter-word association weight between the word X and the word y. Wherein, the inter-word association weight X (X, y) can be measured by using chi-square distribution, and the greater the chi-square value, the greater the correlation between the words X and y. The specific calculation is shown in equation 6 below:
in the formula (6), the compound is represented by the formula,
wherein X (X, y) represents an inter-word association weight, X is a certain word, y is a related word, a represents the number of titles containing X and y, B represents the number of titles not containing X but containing y, C represents the number of titles containing X but not containing y, and D represents the number of titles containing neither X nor y.
w (x), w (y), w (xy) represent the word weights corresponding to the words x, y, xy, respectively. Where, in some headings, only the word x may appear; in some headings, only the word y may appear; in some headings, the words x and y may occur simultaneously. Accordingly, in the above formula 1, w (x) is calculated from the corresponding data of the title in which the word x appears, w (y) is calculated from the corresponding data of the title in which the word y appears, and w (xy) is calculated from the corresponding data of the titles in which the words x and y appear simultaneously.
After calculating the inter-word association Co (x, y) corresponding to each word after the word segmentation processing by using the above formula 1, a relationship network between words in the acquired data set can be established. Then, the relationship network between the words, that is, the degree of association between the words, is used to display the sample titles included in the data set in a thumbnail manner.
In a possible implementation manner, according to the inter-word association degree between words of each sample title in the acquired data set, in addition to the thumbnail display of each sample title, any other multimedia titles besides the data set can be thumbnail displayed.
Thus, the multimedia title display method of the embodiment of the invention can utilize the natural language processing technology, construct a word association network (word association degree) based on word segmentation processing, the part-of-speech tags provided by the word segmentation processing and the part-of-speech weights corresponding to the word segmentation processing and a known statistical model, preferentially display the core subject words of the title according to the word association degree, and hide the modified overlapped words or words with lower weights associated with the core subject words so as to dynamically adapt to the screen of the terminal equipment. Therefore, on the premise of not changing the title, the core theme of the title and the content to be displayed are clarified, the problem that the user locates the theme of the long video title is solved, and the information acquisition efficiency and the user experience are improved.
Example 3
The main difference between the present embodiment and the above embodiments is that the above factors of the degree of association between words specifically include word frequency and document inversion frequency. In this case, the word weight corresponding to each of the obtained words can be calculated specifically using the following formula 2,
in the formula (2), the first and second groups,
wherein, TF (x), TF (y), TF (xy) respectively represent the word frequency corresponding to the words x, y, xy. Tf (x) is calculated from the corresponding data of the title in which the word x appears, tf (y) is calculated from the corresponding data of the title in which the word y appears, and tf (xy) is calculated from the corresponding data of the titles in which the words x and y appear simultaneously. IDF (x), IDF (y), IDF (xy) respectively represent the document reversal frequency corresponding to the words x, y, xy. As above, idf (x) is calculated from the corresponding data of the title in which the word x appears, idf (y) is calculated from the corresponding data of the title in which the word y appears, and idf (xy) is calculated from the corresponding data of the titles in which the words x and y appear simultaneously. Term Frequency (TF) measures the importance and prevalence of a word in a title. Document inversion frequency (IDF) measures the discriminative power of a word, the more common a word is the lower its discriminative power. In natural language processing techniques, the above two factors are usually considered together, and their calculation formulas are as follows 7 and 8:
in the formula 7, the compound represented by the formula,
in the formula 8, the compound represented by the formula,
wherein TF (x) represents the word frequency, txIndicates the number of a certain word x appearing in the title, and T indicates the total word frequency; IDF (x) denotes the document inversion frequency, nxIndicating the number of titles that appear x and N indicating the total number of titles.
Thus, the word weight corresponding to each word can be calculated by taking into account two factors, i.e., the word frequency and the document inversion frequency, by the above formula 2, and then the corresponding inter-word association degree can be calculated by the above formula 1.
In a possible implementation manner, the inter-word association factor may specifically include a word frequency, a document inversion frequency, and a word activity. In this case, the word weight corresponding to each of the obtained words can be calculated specifically using the following formula 3,
in the formula 3, the first step is,
as described above, tf (x), tf (y), tf (xy) respectively indicate the word frequencies corresponding to the words x, y, xy, and idf (x), idf (y), idf (xy) respectively indicate the document inversion frequencies corresponding to the words x, y, xy. H (x), H (y), H (xy) respectively represent the word liveness corresponding to the words x, y, xy. Where h (x) is calculated from the data corresponding to the title in which the word x appears, h (y) is calculated from the data corresponding to the title in which the word y appears, and h (xy) is calculated from the data corresponding to the titles in which the words x and y appear simultaneously.
The activity of a word is from the perspective of information theory, the information entropy can be used to measure the information content and the activity degree of the word, so as to obtain the word activity, and the specific calculation formula is as follows 8:
in the formula 8, the compound represented by the formula,
where H (x) represents word liveness, x represents a source of information, i.e., a specified word, n represents the number of words associated with word x, and p (x)i) Representing the probability of the ith related word.
Thus, by the above formula 3, it is possible to calculate the word weight corresponding to each word by considering the word activity in addition to the two factors of the word frequency and the document inversion frequency, and then calculate the corresponding inter-word association degree by the above formula 1.
Therefore, the calculated association degree between words can be more accurate and reliable.
In one possible implementation, the above-mentioned inter-word association factor includes word frequency, document inversion frequency and part-of-speech weight. In this case, the word weight corresponding to each of the obtained words can be specifically calculated using the following expression 4.
In the formula (4), the first and second groups,
wherein, as mentioned above, tf (x), tf (y), tf (xy) represent word frequencies corresponding to words x, y, xy, respectively, and idf (x), idf (y), idf (xy) represent document reversal frequencies corresponding to words x, y, xy, respectively, tn (x), tn (y), tn (xy) represent part-of-speech weights corresponding to words x, y, xy, respectively, wherein tn (x) is a part-of-speech weight of a word x appearing in a title obtained from, for example, dictionaries and statistics during participle processing, tn (y) is a part-of-speech weight of a word y appearing in a title obtained from, for example, dictionaries and statistics during participle processing, tn (y) is a part-of-speech weight of a word x and y appearing in a title simultaneously obtained from, for example, dictionaries and statistics during participle processing, α represents a part-of-speech weight parameter for adding a part-of-speech weight, for example, higher entity for adding a weight, and non-entity with a higher weight, so that non-entity with a higher weight, it is possible to distinguish words from non-entities.
Thus, by the above formula 4, it is possible to calculate the word weight corresponding to each word by considering the part of speech and the weight thereof in addition to the two factors of the word frequency and the document inversion frequency, and then calculate the corresponding inter-word association degree by the above formula 1.
The use of word frequency and anti-document frequency alone may result in emphasis on high frequency words with some distinction, while some words in the title with subject distinction may be low frequency words. By combining the part of speech and the weight TN thereof, the high-frequency words with certain distinction degree can be considered, and the low-frequency words with subject distinction degree can be considered, so that the calculated association degree between words is more accurate and reliable.
In one possible implementation, the above-mentioned inter-word association factor includes word frequency, document inversion frequency, word liveness and part-of-speech weight. That is, the above-mentioned inter-word association factor may include two factors, i.e. word frequency and document inversion frequency, as well as word activity and part-of-speech weight. In this case, the word weight corresponding to each of the obtained words can be calculated specifically using the following equation 5,
in the formula 5, the first step is,
as described above, tf (x), tf (y), tf (xy) respectively represent word frequencies corresponding to the words x, y, xy, idf (x), idf (y), idf (xy) respectively represent document inversion frequencies corresponding to the words x, y, xy, h (x), h (y), h (xy) respectively represent word liveness corresponding to the words x, y, xy, tn (x), tn (y), tn (xy) respectively represent part-of-speech weights corresponding to the words x, y, xy, and α represents part-of-speech weight parameters for adding or subtracting part-of-speech weights.
Thus, by the above formula 5, it is possible to calculate the word weight corresponding to each word by considering both the part of speech and the word activity in addition to the two factors of the word frequency and the document inversion frequency, and then calculate the corresponding inter-word association degree by the above formula 1.
FIG. 3 is a diagram illustrating how to determine the inter-word association degree according to four inter-word association degree factors, i.e., word frequency, document inversion frequency, part of speech, and word activity, and inter-word association weight.
By aiming at the difference of the considered word association degree factors, one of the above formulas 2 to 5 can be correspondingly adopted to calculate the word weight corresponding to each word, and then the corresponding inter-word association degree is calculated by the above formula 1.
Thus, the multimedia title display method of the embodiment of the invention can utilize the natural language processing technology, construct a word association network (word association degree) based on word segmentation processing, the part-of-speech tags provided by the word segmentation processing and the part-of-speech weights corresponding to the word segmentation processing and a known statistical model, preferentially display the core subject words of the title according to the word association degree, and hide the modified overlapped words or words with lower weights associated with the core subject words so as to dynamically adapt to the screen of the terminal equipment. Therefore, on the premise of not changing the title, the core theme of the title and the content to be displayed are clarified, the problem that the user locates the theme of the long video title is solved, and the information acquisition efficiency and the user experience are improved.
Example 4
Fig. 4 illustrates a flowchart of a multimedia title display method according to still another embodiment of the present invention. The steps in fig. 4 that are labeled the same as those in fig. 1 and 2 have the same functions, and detailed descriptions of these steps are omitted for the sake of brevity.
As shown in fig. 4, the main difference between the multimedia title display method shown in fig. 4 and the multimedia title display method shown in fig. 2 is that, before performing the word segmentation process, the multimedia title display method may further include the following step S100 (steps S1001 and S1002), and the step S150 may specifically include steps S1501 to S1502.
The following is a specific description of each of the above steps.
And step S100, preprocessing each sample title.
Wherein, the step S100 may specifically include the following steps:
step S1001, carrying out standardization processing on each sample title; and
and step S1002, cleaning each sample title subjected to normalization processing.
For example, the above steps S1001, S1002 may be performed according to the schematic diagram shown in fig. 5.
Specifically, as shown in fig. 5, first, each sample title in the multimedia title data set is normalized, specifically, the following steps are performed: some special characters of the title, such as the symbols &, # and the like, which do not generally belong to the title components, are deleted, and then the multimedia title data is filtered according to a preset title length threshold to ignore the titles having a length below the threshold. Wherein, the preset header length threshold may be different according to different types of multimedia.
Next, after the normalization processing, each sample title is cleaned, specifically, some junk data, for example, advertisements (QQ numbers, etc.) embedded in the titles are cleaned, and the cleaning process is repeated again after the remaining data are analyzed until the data are confirmed to meet the predetermined quality standard (meeting the predetermined quality standard is "pass" in fig. 5, and not meeting it is "fail" in fig. 5).
It should be noted that the execution order of steps S1001 and S1002 may be changed, that is, step S1002 may be executed first, and then step S1001 may be executed.
Through the preprocessing, the quality of the sample titles in the multimedia title data set in the steps S110 to S150 can be higher, so as to facilitate the processing of the subsequent steps S110 to S150.
In a possible implementation manner, the step S150 may specifically include the steps of:
step S1501, layering each word obtained by segmenting each sample title according to the inter-word association degree; and
and step S1502, carrying out differential thumbnail display on each sample title according to the layering result.
Specifically, for example, for the title "song xiao bao junior mad relative of junior and junior, after the word segmentation and the common stop words are cleared, the title can be segmented into four words" song xiao bao "," junior and mad relative ". According to the calculation in the step S140, the highest inter-word association value corresponding to "songxibao" and "mad relative" is obtained, and the lowest inter-word association value corresponding to "big and complete" is obtained after the inter-word association value corresponding to "minor article" is obtained.
Then, according to the above step S1501, the "songxubao" and the "mad relative" can be divided into the first layer, the "small article" into the second layer, and the "big whole" into the third layer.
Then, in step S1502, the sample title is displayed differentially according to the above layering result, so as to adapt to the screen of the terminal device and highlight the core theme. For example, when the display length of the screen of the terminal device is only long enough to display two words of "songbao" and "mad relative", only the first layer is displayed, and the second layer and the third layer are hidden. When the display length of the screen of the terminal device is long enough to display the three words "songbao", "mad relative", and "figurine", the above-described first layer and second layer may be displayed while only the third layer is hidden, and the first layer may be highlighted using, for example, a difference in color or the like.
In this way, through the above steps S1501 and S1502, the sample titles can be displayed in a differentiated and abbreviated manner, so as to obtain a better display effect.
Thus, the multimedia title display method of the embodiment of the invention can utilize the natural language processing technology, construct a word association network (word association degree) based on word segmentation processing, the part-of-speech tags provided by the word segmentation processing and the part-of-speech weights corresponding to the word segmentation processing and a known statistical model, preferentially display the core subject words of the title according to the word association degree, and hide the modified overlapped words or words with lower weights associated with the core subject words so as to dynamically adapt to the screen of the terminal equipment. Therefore, on the premise of not changing the title, the core theme of the title and the content to be displayed are clarified, the problem that the user locates the theme of the long video title is solved, and the information acquisition efficiency and the user experience are improved.
Example 5
Fig. 6 illustrates a structural block of a multimedia title display apparatus according to an embodiment of the present invention. As shown in fig. 6, the multimedia title display apparatus 60 includes: a word segmentation unit 61, configured to perform word segmentation on each sample title included in the multimedia title data set to obtain multiple words; a statistical model establishing unit 62 connected to the word segmentation unit 61 and configured to establish a statistical model according to the obtained plurality of words; a calculating unit 63, connected to the word segmentation unit 61 and the statistical model establishing unit 62, for calculating the inter-word association weight and the inter-word association degree factor corresponding to each obtained word according to the established statistical model; a determining unit 64, connected to the calculating unit 63, for determining an inter-word association degree corresponding to each of the obtained words according to the calculated inter-word association weight and inter-word association degree factor; and a thumbnail display unit 65 connected to the determination unit 64 for displaying each sample title in the multimedia title data set in a thumbnail manner according to the inter-word association degree.
The multimedia title display apparatus 60 according to the embodiment of the present invention can execute the multimedia title display method described in any one of the embodiments 1 to 4, and the specific flow of the multimedia title display method is described in detail in the embodiments.
The multimedia title display device provided by the embodiment of the invention can utilize a natural language processing technology, construct a word association network (word association degree) based on word segmentation processing, the part-of-speech labels provided by the word segmentation processing and the part-of-speech weights corresponding to the part-of-speech labels, and combine with a known statistical model, preferentially display the core subject words of the title according to the word association degree, and hide the modified overlapped words or words with lower weights related to the core subject words so as to dynamically adapt to the screen of the terminal equipment. Therefore, on the premise of not changing the title, the core theme of the title and the content to be displayed are clarified, the problem that the user locates the theme of the long video title is solved, and the information acquisition efficiency and the user experience are improved.
Example 6
Fig. 7 illustrates a structural block of a multimedia title display apparatus according to an embodiment of the present invention. Components in fig. 7 that are numbered the same as those in fig. 6 have the same functions, and detailed descriptions of these components are omitted for the sake of brevity.
As shown in fig. 7, the multimedia title display apparatus 70 according to the embodiment of the present invention is mainly different from the multimedia title display apparatus 60 according to the previous embodiment in that the determining unit 64 mainly comprises: a calculating module 641, configured to calculate word weights corresponding to the obtained words according to the inter-word association factor; a determining module 642, connected to the calculating module 641, for determining an inter-word association degree corresponding to each of the obtained words according to the inter-word association weight and the word weight.
In one possible implementation, the determining module 642 calculates the inter-word association degree by using the following formula 1,
in the formula 1, the compound is shown in the specification,
wherein, Co (X, y) represents the word association degree between the word X and the word y, X (X, y) represents the word association weight between the word X and the word y, and w (X), w (y), w (xy) represent the word weights corresponding to the word X, y and xy respectively.
In one possible implementation, the factors of the degree of association between words include word frequency and document inversion frequency,
the calculating module 641 calculates a word weight corresponding to each of the obtained words using the following formula 2,
in the formula (2), the first and second groups,
wherein, TF (x), TF (y), TF (xy) respectively represent the word frequency corresponding to the words x, y, xy, IDF (x), IDF (y), IDF (xy) respectively represent the document reversal frequency corresponding to the words x, y, xy.
In one possible implementation, the factors of the degree of association between words include word frequency, document inversion frequency and word activity,
the calculating module 641 calculates a word weight corresponding to each of the obtained words using the following equation 3,
in the formula 3, the first step is,
wherein, TF (x), (y), TF (xy) respectively represent the word frequency corresponding to the words x, y, xy, IDF (x), (y), IDF (xy) respectively represent the document reversal frequency corresponding to the words x, y, xy, H (x), (y), H (xy) respectively represent the word activity corresponding to the words x, y, xy.
In one possible implementation, the inter-word association factor includes word frequency, document inversion frequency and part-of-speech weight,
the calculating module 641 calculates a word weight corresponding to each of the obtained words using the following equation 4,
in the formula (4), the first and second groups,
wherein, TF (x), (y), TF (xy) respectively represents the word frequency corresponding to the words x, y, xy, IDF (x), (y), IDF (xy) respectively represents the document reversal frequency corresponding to the words x, y, xy, TN (x), (y), TN (xy) respectively represents the part of speech weight corresponding to the words x, y, xy, α represents the part of speech weight parameter for adding or reducing the part of speech weight.
In one possible implementation, the association degree factors include word frequency, document inversion frequency, word activity degree and part-of-speech weight,
the calculating module 641 calculates a word weight corresponding to each of the obtained words using the following equation 5,
in the formula 5, the first step is,
TF (x), TF (y), TF (xy) respectively represent word frequencies corresponding to the words x, y and xy, IDF (x), IDF (y) and IDF (xy) respectively represent document reversal frequencies corresponding to the words x, y and xy, H (x), H (y) and H (xy) respectively represent word activity degrees corresponding to the words x, y and xy, TN (x), TN (y) and TN (xy) respectively represent part-of-speech weights corresponding to the words x, y and xy, and α represents part-of-speech weight parameters for adding and reducing the part-of-speech weights.
In one possible implementation, the thumbnail display unit 65 is further configured to:
and displaying the multimedia titles except the multimedia title data set in a thumbnail mode according to the word association degree.
In a possible implementation manner, the multimedia title display apparatus 70 may further include a preprocessing unit 66, where the preprocessing unit 66 is connected to the word segmentation unit 61, and is configured to preprocess each sample title,
wherein the preprocessing unit 66 is specifically configured to: carrying out normalization processing on each sample title; and cleaning each sample title subjected to the normalization processing.
In one possible implementation, the thumbnail display unit 65 is configured to: layering each word obtained by segmenting each sample title according to the inter-word association degree; and carrying out differential thumbnail display on each sample title according to the layering result.
The multimedia title display apparatus 70 of the embodiment of the present invention can execute the multimedia title display method described in any one of the embodiments 1 to 4, and the specific flow of the multimedia title display method is described in detail in the embodiments.
The multimedia title display device provided by the embodiment of the invention can utilize a natural language processing technology, construct a word association network (word association degree) based on word segmentation processing, the part-of-speech labels provided by the word segmentation processing and the part-of-speech weights corresponding to the part-of-speech labels, and combine with a known statistical model, preferentially display the core subject words of the title according to the word association degree, and hide the modified overlapped words or words with lower weights related to the core subject words so as to dynamically adapt to the screen of the terminal equipment. Therefore, on the premise of not changing the title, the core theme of the title and the content to be displayed are clarified, the problem that the user locates the theme of the long video title is solved, and the information acquisition efficiency and the user experience are improved.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.