CN106844410B - Determining quality of a summary of multimedia content - Google Patents

Determining quality of a summary of multimedia content Download PDF

Info

Publication number
CN106844410B
CN106844410B CN201610877283.7A CN201610877283A CN106844410B CN 106844410 B CN106844410 B CN 106844410B CN 201610877283 A CN201610877283 A CN 201610877283A CN 106844410 B CN106844410 B CN 106844410B
Authority
CN
China
Prior art keywords
content
text
metric
determining
multimedia content
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610877283.7A
Other languages
Chinese (zh)
Other versions
CN106844410A (en
Inventor
N·莫达尼
V·苏布拉马尼安
S·古普塔
P·R·马内里克
G·希拉南达尼
A·R·辛哈
尤特帕尔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Adobe Inc
Original Assignee
Adobe Systems Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Adobe Systems Inc filed Critical Adobe Systems Inc
Publication of CN106844410A publication Critical patent/CN106844410A/en
Application granted granted Critical
Publication of CN106844410B publication Critical patent/CN106844410B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/41Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/48Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2135Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/98Detection or correction of errors, e.g. by rescanning the pattern or by human intervention; Evaluation of the quality of the acquired patterns
    • G06V10/993Evaluation of the quality of the acquired pattern
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • G06V20/47Detecting features for summarising video content

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Quality & Reliability (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Library & Information Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present disclosure relates to determining the quality of a summary of multimedia content. The quality metric for a multimedia summary of a multimedia content item is determined based in part on semantic similarity of the summary and the content item, rather than based solely on word frequency. This is accomplished in some embodiments by identifying the abstract and semantic meaning of the multimedia content item using vector analysis. The vector of summaries and the vector of multimedia content items are compared to determine semantic similarity. In other examples, the quality metric of the multimedia summary is determined based in part on a coherence between an image portion of the summary and a text portion of the summary used to determine the quality metric of the multimedia summary.

Description

Determining quality of a summary of multimedia content
Technical Field
The present disclosure relates generally to characterizing multimedia content. In particular, the present disclosure relates to determining the quality of a summary of multimedia content, where both the summary and the multimedia content include text and images.
Background
Multimedia content refers primarily to digital content that includes some combination of different content forms, including text and images (video, animation, graphics, etc.). Such multimedia content is so pervasive and inexpensive that users are often overwhelmed by the process of selecting multimedia content items for consumption. Because of this, users of multimedia content often rely on summaries of multimedia content items. These summaries may be used instead for consuming multimedia content items or to facilitate the selection of multimedia content items to be consumed. Thus, the quality of the multimedia summary may have a significant impact on the intended reader's decision to consume a given content item. However, there is currently no suitable method for assessing the quality of the multimedia summary.
Drawings
Fig. 1 is a high-level flow diagram illustrating a method for determining a quality metric for a summary corresponding to a multimedia content item according to one embodiment of the present disclosure.
Fig. 2 is a detailed flow diagram illustrating a method for determining a quality metric of a summary corresponding to a multimedia content item according to one embodiment of the present disclosure.
Fig. 3 is a block diagram of a distributed processing environment including a quality metric determination system remotely coupled to a given user's computing device by a communication network according to one embodiment of the present disclosure.
Fig. 4 is a block diagram of a quality metric determination system for determining the quality of a multimedia summary of a multimedia content item according to one embodiment of the present disclosure.
The figures depict various embodiments of the present disclosure for purposes of example only. Many variations, configurations, and other embodiments will become apparent from the following detailed discussion.
Detailed Description
As previously indicated, there is no technique for evaluating the quality of a given multimedia summary. However, such summaries may have a significant impact on the intended user's decision, including whether to consume a full version of the summarized digital content item. Therefore, techniques for assessing the quality of a summary of a multimedia content item are desirable from a market development perspective. Consider, for example, a digital article having both image and text portions. As will be appreciated in light of this disclosure, an abstract of the article with a high degree of coherence between the image and text portions may help the reader to have a better understanding of the article faster than an abstract that would have lacked coherence between the image and text portions. In a more general sense, the degree to which the summary represents the corresponding multimedia content item may be quantified as a quality metric. The quality metric of the summary may then be used, for example, to gauge the likelihood that the summary will be effective in causing consumption of the content item itself. Although some available algorithms may be available to evaluate the textual portion of a given multimedia summary (or simply referred to herein as a "summary" for brevity) of a multimedia content item, such algorithms will fail to account for the non-textual portion of the summary. In particular, the algorithm used to evaluate the content will likely operate by comparing the word frequencies in the text portion of the multimedia content with the word frequencies in the corresponding summaries. The more similar the word frequency of the summary is to the word frequency in the multimedia content item, the higher the quality score. Examples of such algorithms include retention rate (which may operate, for example, by dividing the number of unique words in the summary by the number of unique words in the multimedia content item), KL divergence (which may operate, for example, by measuring the distribution of word frequencies in the content and corresponding summary), bilingual assessment substitution ("BLEU") (which determines the quality of machine-translated text from one language to another), and recall-oriented substitution ("ROUGE") for gist assessment (which uses a human-generated summary as a reference to determine the quality of the summary).
However, as will be appreciated in light of this disclosure, the above and similar algorithms are insufficient if used to determine the quality of a summary of a multimedia content item. One reason is that because these algorithms rely primarily on word frequency, the semantic meaning of the summary is not compared to the semantic meaning of the multimedia (non-text) content item. This word frequency approach may thus problematically generate a high quality metric value even for digests having a semantic meaning that is very different from the corresponding multimedia content item. Consider for example a simple example of a text portion of a multimedia content item stating "this girl dislikes cheese". A corresponding summary with a text portion stating that "this girl likes cheese" would have a good score using the word frequency algorithm, but would be inaccurate given the absence of "negatives" in the summary. In another example scenario, a multimedia content item that includes a text portion that uses pronouns to reference an accompanying image portion may have a high scoring summary with no information. Consider, for example, a multimedia content item that includes a picture accompanied by a shirt with the text title "this is good". Without analysis of the image portion of the shirt, the abstract stating "this is good" may be given a high quality metric because it exactly fits the text portion of the multimedia content item (i.e., there is a high degree of correlation between the text of the abstract and the text of the full text). However, if the image is actually considered, the summary may already be "this shirt is good", which is a relatively much more accurate summary and therefore should be scored higher than the score based on text alone. Thus, using currently available algorithms, the summary may be misleading determined to have a high quality score, but does not accurately reflect the semantic meaning of the multimedia content item.
To this end, techniques are provided herein for determining a quality metric for a multimedia summary of a multimedia content item by considering both textual and non-textual components of the summary. In some embodiments, the quality metric is based in part on semantic similarity of the summary and the content item rather than just word frequency. This is accomplished in some embodiments by identifying the abstract and semantic meaning of the multimedia content using vector analysis. The vector of summaries and the vector of multimedia content items are compared to determine semantic similarity. Note that both textual and non-textual items can be easily represented by vectors, thereby facilitating vector-based comparisons.
In addition to assessing semantic meaning similarity between a given multimedia content item and its multimedia summary, the present technique may also include determining a degree of correlation between textual and non-textual portions of the summary itself. As will be appreciated in light of this disclosure, a high degree of correlation or "coherence" between the text and non-text portions of the summary tends to indicate a higher quality summary. Accordingly, some embodiments of the present disclosure provide methods for determining a quality metric of a multimedia summary of a multimedia content item based in part on determining a coherence between an image portion of the summary and a text portion of the summary used to determine the quality metric of the multimedia summary. "coherence" refers to the semantic meaning similarity between the text portion of the multimedia summary and the image portion of the multimedia summary and is determined according to the method described below. At a high level, determining coherence is achieved by generating vectors from both the segments of the text portion and the segments of the image portion and projecting the vectors onto a common unit space. The projected vectors are then compared. Vectors that are adjacent to each other in a common unit space correspond to semantically similar information across both the text portion and the image portion of the summary, and thus correspond to a high degree of coherence between those portions. Note that if a given multimedia summary includes video instead of (or in addition to) still images, the video may be viewed as a collection of still images (or frames), where each image is evaluated separately against the text portion of the summary in the same manner as a still image. An average or other suitable statistical representation of the individual comparisons can then be calculated to provide a degree of overall coherence between the text portion and the video. For purposes herein, reference to "image" is intended to include a frame of video content.
One benefit of some embodiments of the present disclosure is improved accuracy of the quality metric. There are several reasons for the improved accuracy. One reason is that some embodiments of the present disclosure analyze both text portions and image portions of a multimedia content item and a corresponding summary. This improves the accuracy of the quality metric, since the quality metric thus reflects the semantic meaning conveyed between the text portion and the image portion of the multimedia content item and the corresponding summary. Another reason for the increased accuracy is that some embodiments analyze and incorporate coherence between the text portion of the summary and the image portion of the summary. This improves accuracy because semantically similar digests with text portions and image portions will yield a high quality metric when using embodiments of the present disclosure.
Another benefit of some embodiments of the present disclosure is the ability to customize the weights of the three different contributions to the multimedia quality metric. In particular, by means of user-selectable coefficients, according to some embodiments, the individual contributions of the following information contents may be weighted according to user preferences: (1) information content ("text overlay") of the text portion of the summary relative to the text portion of the multimedia content; (2) information content ("image overlay") of the image portion of the summary relative to the image portion of the multimedia content item; and (3) coherence between the text and the image of the summary. Some embodiments are customized to make an assessment of an abstract that is consistent with a set of topics or consistent with user-selected topics and interests. Some embodiments may be customized to improve the accuracy of the comparison between semantic meanings of image portions, text portions, or both.
As used herein, the term multimedia content item refers to a content item that includes a text portion and an image portion. The image portion may be a still image of any format in any type of digital resource (e.g., e-book, web page, mobile application, digital photograph) or a frame of video as previously explained. Each of the text portion and the image portion includes a text segment and an image segment, respectively. A text segment is a sentence, a clause of a sentence, a word or character (i.e., number, symbol, letter) in a sentence. An image segment is a frame or portion of a frame of an image or an object within a frame of an image. The informational content of a text portion or text segment refers to the number of words (e.g., nouns, verbs, and adjectives) in the text portion or text segment that can convey meaning, as opposed to words (e.g., conjunctions and articles) that themselves generally do not convey meaning. The information content of an image portion or image segment refers to a frame, portion of a frame, or object within a frame (e.g., an image of a face compared to an unfocused background) that may convey meaning. As indicated above, "relevance" refers to semantic meaning similarity between the text portion of the summary and the image portion of the summary. The term "quality" as used herein refers to the degree of similarity between the semantic meaning of the summary compared to the semantic meaning of the corresponding multimedia content item. The higher the value of the quality metric, the closer the digest and the corresponding multimedia content item are in semantic meaning.
Determining a quality measureMethod
Fig. 1 is a high-level flow diagram illustrating a method 100 for determining a quality metric for a multimedia summary corresponding to a multimedia content item, according to one embodiment of the present disclosure. The method 100 begins by receiving 104 a multimedia content item and also receiving 108 a multimedia summary corresponding to the multimedia content item. As presented above, the application of the method 100 to a multimedia content item and a multimedia summary is only one embodiment. Other embodiments of the present disclosure are applicable to content items and summaries containing only one or the other of a text portion and an image portion.
Some embodiments of the present disclosure then analyze 112 both the multimedia content item and the multimedia summary. The analysis 112 is described in more detail below in the context of fig. 2. Based on the analysis 112, a quality metric of the multimedia summary is determined 116. The quality metric and its determination 116 are also described in more detail below in the context of fig. 2.
Fig. 2 is a detailed flow diagram illustrating a method 200 for determining a quality metric for a multimedia summary corresponding to a multimedia content item according to one embodiment of the present disclosure. For ease of illustration, the method is illustrated as including three meta-steps (not presented in a particular order): (1) analyzing 204 semantic similarities between sentences of the text portion of the multimedia content item and sentences of the text portion of the summary; (2) analyzing 208 semantic similarities between sentences of the text portion of the summary and images of the image portion of the summary; and (3) analyzing 212 semantic similarity between the image of the image portion of the multimedia content item and the image of the image portion of the summary. For ease of illustration, elements of method 100 related to accepting a multimedia content item and a multimedia summary are omitted from fig. 2.
The meta-step 204 of the method 200 illustrates operations for analyzing similarities between sentences (or sentence fragments) of the text portion of the multimedia content item and sentences (or sentence fragments) of the text portion of the abstract. The function and benefit of this analysis 204 operation is to determine the degree to which semantic meaning is comparable between the text portion of the multimedia content item and the text portion of the corresponding summary. This analysis 204 is accomplished by first generating 216 a vector for sentences in the text portions of the multimedia content item and the summary, respectively, to determine whether the text portions of the summary convey the same (or similar) semantic meaning as that conveyed by the text portions of the multimedia content item. The more similar the semantic meaning conveyed, the higher the contribution to the quality metric of the text portion of the summary.
The vector is generated 216 by first processing the text portions of both the multimedia content item and the summary using a recursive auto-encoder. First training the coding matrix We。WeOnce trained, is used to analyze the multimedia content item and the corresponding summarized sentences to extract the respective semantic meanings and compare them in a common unit space (described in more detail below).
To train the coding matrix WeThe recursive automatic encoder first generates a syntax parse tree for at least one training sentence. Semantic vectors are generated for each word and clause within each training sentence. Each non-terminal (i.e., non-leaf) node of the parse tree is generated according to equation 1 below.
s=f(We[c1,c2]+ b) equation 1
In equation 1, s represents a non-leaf node, WeIs a trained coding matrix, and c1And c2(more generally, ci) Is a word-to-vector representation. Specifically, ciIncluding sentence fragments, which are elements of the parse tree. The sentence fragments are a subset of one or more of the training sequences. The term b in equation 1 is a constant. The function f is in one example a sigmoid function that produces results between 0 and 1 when it operates on the variables of the function.
For matrix WeThe recursive auto-encoder reconstructs the elements under each node in the parse tree for each sentence of the multimedia content item and corresponding summary according to equation 2 below.
[x1′∶y1′]=f(Wdy2+ b) equation 2
Equation of2 description based on matrix WdFor sentence y2Outputs a plurality of vectors (from vector x)1' to y1') that output is subsequently processed with a sigmoid function f.
After completing the pair matrix WeThen using the trained matrix WeA vector representation of the root of the parse tree is generated and used as a representation vector for the sentence. The vector generated for each sentence is then used to calculate the cosine similarity between the sentence of the multimedia content item and the corresponding sentence of the summary. Determining similarity S between sentences of text portions of the multimedia content item and the text portion of the summary based on cosine similarity according to equation 3 belowT(u,v)。
Figure BDA0001125659630000071
Equation 3
In the case of the equation 3,
Figure BDA0001125659630000072
and
Figure BDA0001125659630000073
are vector representations of the text portions of the summary (u) and the text segments of the text portion of the multimedia content item (v), respectively. Cosine similarity quantifies semantic meaning similarity between a multimedia content item and a text portion of a sentence of a summary, which similarity then later serves as a contribution to a multimedia summary quality metric, as described in more detail below.
The meta-step 208 of the method 200 illustrates operations for analyzing similarities between the text portion of the summary and the sentences of the accompanying image portion of the summary. The function and benefit of this analysis 204 operation is to determine the extent to which semantic meanings between the text portion of the summary and the accompanying image portion of the summary correspond to each other. The more semantic similarity between the text and the accompanying image, the higher the quality of the multimedia summary.
In a process similar to that described above, vectors corresponding to the Image content and text content of the summary are generated 224 in a manner similar to that described by Karpathy et al (Deep Fragment outlines for Bidirectional Image Processing Systems,2014, pp.1889-1897), which is incorporated herein by reference in its entirety. First, a process for generating a vector of an image portion of a digest is described.
The process for generating 224 a vector corresponding to an image portion of a summary includes first identifying a segment of the image portion that may be relevant to the summary. The segments are identified by training a deep neural network auto-encoder, which is then applied to the image to extract relevant image portions. At a high level, this process is accomplished by extracting pixel values from the image and using the pixel values individually or in associated groups to identify higher levels of organization within the image that correspond to objects in the image.
Once the image segments are identified, a Regional Convolutional Neural Network (RCNN) is used to generate a vector corresponding to each of the identified image segments. In one embodiment, the RCNN is as described by Girshick et al, incorporated herein by reference in its entirety (see Rich features technologies for Accurate Object Detection and magnetic segmentation,Computer Vision and Pattern Recognition2014) generates a 4096-dimensional vector corresponding to each identified segment. A 4096-dimensional vector represents a convenient trade-off between consumption of computational resources and output quality. Since 4096 is equal to 212It is therefore conveniently applied to binary data bits. A lower dimensional space may be used but with less discrimination between features. Higher dimensional space may also be used, but the consumption of computing resources increases.
An intersection between any two vectors is identified. A subset of segments for which vectors are generated is selected based on a likelihood of one of the image segments corresponding to a portion of the image that is semantically related to the summary. In some embodiments, the identified segments are further limited based on the classification determined using the vectors to reduce the risk of over-representation of any image segment in subsequent steps of the analysis.
The vector corresponding to the text portion of the summary is generated 224 using the procedure described above in the context of element 216 of meta-step 204.
The image vector and sentence vector are then projected onto a common unit space by matrix transformation. The matrices used to transform the vectors onto the common unit space have been trained so that semantically similar elements, whether in image parts or text parts, are correspondingly projected onto regions of the common unit space that reflect the semantic similarity.
One benefit of projecting vectors onto a common unit space is to reduce the impact of extraneous information on determining semantic similarity. For example, the vector as generated may comprise external information (e.g. color, texture, shape) that is not relevant to the semantic meaning of the image or text portion. The effect of this extrinsic information is reduced by mapping the vectors to a common unit space.
The cosine similarity of the vector to the image and text portions of the summary is then determined according to equation 4 below.
Figure BDA0001125659630000091
Equation 4
In the context of this equation, the equation,
Figure BDA0001125659630000092
and
Figure BDA0001125659630000093
are vector representations of the text segments of the text portion u of the abstract and the image segments of the image portion p of the abstract obtained using the method described above.
Meta-step 212 of method 200 illustrates operations for analyzing similarities between image portions of the summary and image portions of the multimedia content item in one embodiment. As explained above in the context of meta-step 208, vectors are determined for the images and projected onto the common unit space. The cosine similarity between the images based on the generated vector is determined according to equation 5 below.
Figure BDA0001125659630000094
Equation 5
In the case of the equation 5, the,
Figure BDA0001125659630000095
and
Figure BDA0001125659630000096
vector representations of image segments p and q of the summary and image portion of the multimedia content item, respectively.
Having generated similarity scores for various elements of the multimedia content item and corresponding summary as described above in method 200, a multimedia quality metric is determined 116 as shown in fig. 1 and as described in more detail below.
Determining multimedia summary metrics
Referring again to fig. 1, a process for determining 116 a quality metric that quantifies a degree of similarity between a summary and a semantic meaning of a multimedia content item using information determined in the analysis 112 (and corresponding method 200) is described below.
The multimedia digest quality metric is determined according to equation 6 below.
MuSQ=f(ICtext,ICimage,Cohtotal) Equation 6
Where MuSQ is a multimedia quality digest metric, ICtextIs a measure describing the amount of proportional information in the text portion of the summary relative to the text portion of the multimedia content item, ICimageIs the amount of scale information in the image portion of the summary relative to the image portion of the multimedia content item. The term "f" in equation 6 and as used elsewhere in this disclosure represents a generic function rather than a specific function. CohtotalIs the "coherence" between the text portion of the summary and the image portion of the summary. Coherence reflects the degree of semantic similarity between the text portion of the summary and the image portion of the summary while higher numbers reflect more semantic similarity between the text and the image of the summary. In one embodiment, equation 6 is as shown below in equation 7The non-decreasing summation of its variables.
MuSQ=A·ICtext+B·ICimage+C·CohtotalEquation 7
In equation 7, A, B and C are positive constants used to change the relative contribution of each variable to MuSQ.
IC is defined in equation 8 belowtext
MuSQ=A·ICtext+B·ICimage+C·CohtotalEquation 8
In equation 8, STIs defined above in equation 3, and RvIs the number of items or words that may contribute to the semantic meaning of the text portion of the multimedia content item (referred to above as "information content"). That is, RvIs the word count of nouns, verbs, adjectives, adverbs, and pronouns in a segment of text of a text portion. In the determination of RvArticles, conjunctions, and the like are omitted.
For a given text segment v of the multimedia content item, a "max" function is taken on the text segment u present in the text portion of the summary. The result of the "max" function is the maximum representation of the text segment v present in the summary S. The "max" function also prevents redundant sentences in the summary from increasing the quality metric score, since only the summary sentences or segments that are most relevant to the multimedia content item contribute to the metric. In other words, using this function facilitates selecting the sentence with the most information content from among the plurality of sentences in the multimedia content item with respect to a particular semantic. This increases the score of the summary comprising a more extensive coverage of multimedia content, since repeated sentences contribute no (or less) to the score, wherein sentences and images representing diverse topics are scored as contributing more information content.
The result of the "max" function and the information content R of the sentencevMultiplication. Including the information content R in equation 8vThe assisted selection conveys more informative (in terms of number of nouns, adjectives, etc.) segments than less informative sentences having a lower count of "informative" words of the identified type. This quantity is for all the pieces of text present in the multimedia content itemThe summation of the segments v is a quality indicator of the text portion of the summary relative to the multimedia content item as a whole.
IC is defined below in equation 9image
Figure BDA0001125659630000111
Equation 9
S as defined above in equation 5I(p, q) represents the information content of the image segment p (in the summary) in relation to the image q (in the multimedia content item). In one embodiment, SIThe similarity between the image segment in the summary p and the corresponding image segment in the multimedia content item q is quantified. Determining pairs S based on representations of image segments as Recursive Convolutional Neural Network (RCNN) analysis optionally projected onto a common unit space as described aboveIQuantization of (2). Item(s)
Figure BDA0001125659630000112
Is the information content of the image q of the multimedia content item. In one embodiment, the term is determined by converting the image segment q into text (and specifically the vector that generates 224) as described above in the context of meta-step 208, and then measuring the information content of that text using the method described above
Figure BDA0001125659630000113
Figure BDA0001125659630000114
And the term R described abovevThe functions of (a) are similar.
In equation 9, for a given image segment q of a multimedia content item, the maximum function is taken for the image segment p present in the image portion of the summary. The result is a maximum representation of the image segment q present in the image portion of the summary S. Summing all image segments q present in the multimedia content item provides an indication of how the summarized image portions represent the multimedia content item.
Coh is defined below in equation 10total
Figure BDA0001125659630000115
Equation 10
In equation 10, CT,I(u, p) represents the coherence between the sentence (or text segment) u of the text portion from the abstract S and the image segment p of the image portion I of the abstract. As described above in the context of equation 4, C may be expressedT,IProjected onto a common unit space to compare vectors of the extracted text portion and the image portion of the summary. RuAnd
Figure BDA0001125659630000121
is the information content of the text portion and the image portion as defined above.
Example System
Fig. 3 is a block diagram of a distributed processing environment including a quality metric determination system remotely coupled to a given user's computing device by a communication network according to one embodiment of the present disclosure. The distributed processing environment 300 shown in fig. 3 includes a user device 304, a network 308, and a digest quality determination system 312. In other embodiments, system environment 300 includes different and/or additional components than those shown in FIG. 3.
The user device 304 is a computing device capable of receiving user input as well as transmitting and/or receiving data via the network 308. In one embodiment, the user device 304 is a computer system, such as a desktop or laptop computer. In another embodiment, the user device 304 may be a computer-enabled device, such as a Personal Digital Assistant (PDA), mobile phone, tablet computer, smart phone, or similar device. In some embodiments, the user device 304 is a mobile computing device for consuming a multimedia content item, a summary corresponding to the multimedia content item, and the methods described herein for determining a summary quality metric for a summary corresponding to the multimedia content item. The user equipment 304 isConfigured to communicate with digest quality determination system 312 via network 308. In one embodiment, user device 304 executes an application that allows a user of user device 304 to interact with summary quality determination system 312, thus becoming a specialized computing machine. For example, user device 304 executes a browser application to enable interaction between user device 304 and summary quality determination system 312 via network 308. In another embodiment, the user device 304 is operated by a native operating system (e.g., operating system) at the user device 304
Figure BDA0001125659630000122
Or ANDROIDTM) An upper running Application Programming Interface (API) interacts with digest quality determination system 312.
The user device 304 is configured to communicate via the network 308, which may include any combination of local and/or wide area networks, using both wired and wireless communication systems. In one embodiment, network 308 uses standard communication technologies and/or protocols. Thus, the network 308 may include links using technologies such as the Internet, 802.11, Worldwide Interoperability for Microwave Access (WiMAX), 3G, 4G, CDMA, Digital Subscriber Line (DSL), and so forth. Similarly, networking protocols used on network 308 may include multiprotocol label switching (MPLS), transmission control protocol/internet protocol (TCP/IP), User Datagram Protocol (UDP), hypertext transfer protocol (HTTP), Simple Mail Transfer Protocol (SMTP), and File Transfer Protocol (FTP). Data exchanged over network 308 may be represented using techniques and/or formats including hypertext markup language (HTML) or extensible markup language (XML). Further, all or some of the links may be encrypted using encryption techniques such as Secure Sockets Layer (SSL), Transport Layer Security (TLS), and internet protocol security (IPsec).
Fig. 4 is a block diagram of a system architecture of the digest quality determination system 312 as shown in fig. 3. The summary quality system 312 is configured to perform some or all of the above-described embodiments upon receiving the multimedia content and the corresponding summary to determine a quality metric indicating a degree of similarity between the overall semantic meaning of the summary and the semantic meaning of the corresponding multimedia content item. Summary quality determination system 312 includes non-transitory memory 416 and quality metric determination module 432, sub-components of which are described below.
Non-transitory memory 416 is depicted as including two different memory elements: a multimedia content item store 420 and a summary store 524. The multimedia content item repository 420 stores multimedia content items and (optionally content items comprising only one of text portions or image portions) for analysis and optionally for display or transmission. Summary store 424 stores summaries corresponding to multimedia content items. As with the multimedia content item repository 420, the summary repository 424 may store any one or more of a text summary, an image summary, and a multimedia summary that includes both text portions and image portions. Regardless of the nature of the stored content and summary, multimedia content item store 420 and summary store 424 are in communication with quality metric determination module 432.
The non-transitory memory 416 may include a computer system memory or random access memory for storing data and computer readable instructions and/or software implementing various embodiments as taught in the present disclosure, such as a persistent disk storage (which may include any suitable optical or magnetic persistent storage device, e.g., RAM, ROM, flash memory, USB devices, or other semiconductor-based storage media), a hard drive, a CD-ROM, or other computer readable medium. The non-transitory memory 416 may also include other types of memory or combinations thereof. The non-transitory memory 416 may be provided as a physical element of the system 312 or the non-transitory memory 416 may be provided separately or remotely from the system 312. The non-transitory memory 416 of the system 312 may store computer-readable and computer-executable instructions or software for implementing various embodiments, including a multimedia content item store 420 and a summary store 424.
In use, the quality metric determination module 432 communicates with the non-transitory memory 416 including the multimedia content item store 420 and the summary store 424 in order to receive and subsequently analyze multimedia content items and corresponding summaries. The quality metric determination module 432 includes a sentence-to-sentence analyzer 432, a sentence-to-image analyzer 436, and an image-to-image analyzer 440. The sentence-to-sentence analyzer analyzes the quality of the sentences (or sentence fragments) in the text portion of the summary relative to the sentences in the text portion of the multimedia content item as described above in the context of fig. 1 and 2. The sentence-to-image analyzer analyzes the quality of the sentences in the text portion of the summary relative to the accompanying image portion of the summary as described above in the context of fig. 1 and 2. The image-to-image analyzer analyzes the quality of the image portions of the summary with respect to the image portions of the corresponding multimedia content item as described above in the context of fig. 1 and 2. Once each of these analyzers 432, 436, 440 completes the analysis, the quality metric determination module receives the output of the respective analysis to determine a summary quality metric as described above.
Web server 444 links summary quality determination system 312 to user device 304 via network 308. Web server 344 serves up Web pages and other Web related content, such as
Figure BDA0001125659630000141
XML, and the like. Web server 344 may provide functionality to receive and transmit content items and summaries from and to user device 304, to receive and transmit summary quality metrics from and to user devices, and to otherwise facilitate consumption of content items. Additionally, a web server 344 may be provided for operating the system to the native client device (such as
Figure BDA0001125659630000142
ANDROIDTM
Figure BDA0001125659630000143
Or RIM) directly transmit data. The Web server 344 also provides API functions for exchanging data with the user device 304.
Summary quality determination system 312 also includes at least one processor 448 for executing computer-readable and computer-executable instructions or software stored in non-transitory memory 416 and other programs for controlling system hardware. Virtualization may be employed so that infrastructure and resources in the digest quality determination system 312 may be dynamically shared. For example, a virtual machine may be provided to handle a process running on multiple processors so that the process appears to use only one computing resource rather than multiple computing resources. Multiple virtual machines may also be used with a processor.
Example applications
The following two examples qualitatively describe the application of the embodiments described herein. In a first example, a multimedia content item contains two distinct sentences. The first sentence Str1Set w comprising unique words1. Str in multimedia content item1Repeat n1Next, the process is carried out. The second sentence Str2Set w comprising unique words2. Str in multimedia content item2Repeat n2Next, the process is carried out. For convenience of explanation, assume w1And w2Without any common words. This last hypothesis is expressed mathematically as w1∩w2Phi is given. Further, assume for this example that the word count | w1|=5,|w2And 6. Str in multimedia content item1The number of repetitions is n110 and Str in the multimedia content item2The number of repetitions is n2=2。
If a summary of only a single sentence is requested, two options are possible: containing only Str1Summary of (S)1Or only Str2Summary of (S)2. Due to Str1Repeating for 10 times, until Str2Five more times more frequently, so summary S1Is preferred because it captures information that is dominant in the original multimedia content item. Due to w1And w2Without any common words, the total number of unique words in the multimedia content item is w1+w2. Summary S compared to multimedia content item1And S2The retention rate of the words in each summary in (1) follows equations 11 and 12:
retention rate
Figure BDA0001125659630000151
Equation 11
Retention rate
Figure BDA0001125659630000152
Equation 12
Retention Rate Algorithm, such as that presented above, will preferentially select S2Since it has the highest number of unique words of the analyzed summary. The retention rate algorithm bases this selection criterion on the assumption that the summary comprising more unique words describes more content in the multimedia content item. However, since these methods only focus on word counts, significant semantic differences are ignored. In this example, the retention rate would select a summary S with more unique words2Even though it represents less of the entire content of the multimedia content item.
According to embodiments of the present disclosure, a summary having more information content and broader coverage (i.e., reflecting different topics throughout the multimedia content item) of the multimedia content item as a whole is preferred. In contrast to the retention rate example above, consider applying to summary 1 (S)1) And abstract 2 (S)2) Embodiments of the present disclosure selected in between. Equations 13 and 14 apply embodiments of the present disclosure to the above scenarios.
MuSQ(S1) N1 w1 w 5 w 50 equation 13
MuSQ(S2) N2 w2 w 6 w 12 equation 14
In the above example, equation 7 is reduced to the form of equations 13 and 14, and since this example includes only a text portion, the analysis image portion of equation 7 (i.e., IC) is reducedimageAnd Cohtotal) Is reduced to zero. Thus, the only term remaining from equation 7 is ICtextAn item. In this case, the ICtextReduced to semantic meaning (R) in sentencesv) The number of words that contribute because the "max" term is 1. Based on the foregoing, embodiments of the present disclosure may select S1Because it is more representative of the multimedia content item(i.e., selecting includes the ratio Str2The sentence Str repeated five times more frequently1S of1)。
In another example, consider the advantages of embodiments of the present disclosure over KL divergence. Adapt to the precedent, define the abstract S1And S2Is S1={Str1,Str2And S2={Str1,Str1And | w1|=5,|w2| ═ 6 and w1∩w2Phi is given. Due to S1With Str comprising only two repetitions1S of2The contrast includes more information (i.e., Str)1And Str2Both of them) so that S1Is the preferred abstract.
Review KL divergence is defined in equation 15 below.
Figure BDA0001125659630000161
Equation 15
In equation 13, qiIs the probability of occurrence of the ith word in the summary, and p is the probability of occurrence of the ith word in the original document. If KL (S)2)<KL(S1) Then the abstract S will be selected according to KL divergence2. The ratio of equation 16 determines the selection criteria based on the known application of the mathematics.
Figure BDA0001125659630000162
Equation 16
In this example, n110 and n22, so n1>4.3*n2. For this reason, even S2Has a ratio S1Less information, in this case S will still be selected according to KL divergence2As a preferred abstract.
In contrast, MuSQ (S), an example of the present disclosure, is applied1)=n1*w1+n2*w210 × 5+2 × 6 ═ 62 and mussq (S)2)=n1*w110 x 5 x 50. Applying this model, due to the diversity of the information, it is appropriateSelection of S1As a preferred abstract.
More considerations are given
As will be appreciated in light of this disclosure, the various modules and components of the systems shown in fig. 3 and 4, such as the sentence-to-sentence analyzer 432, the sentence-to-image analyzer 436, and the image-to-image analyzer 440, may be implemented in software, such as a set of instructions (e.g., HTML, XML, C + +, object-oriented C, JavaScript, Java, BASIC, etc.), encoded on any computer-readable medium or computer program product (e.g., a hard drive, a server, a disk, or other suitable non-transitory memory or collection of memories) that, when executed by one or more processors, causes the various methods provided in this disclosure to be performed. It will be appreciated that in some embodiments, various functions performed by a user computing system as described in this disclosure may be performed by similar processors and/or databases in different configurations and arrangements, and the depicted embodiments are not intended to be limiting. The various components of this example embodiment, including the computing device 1000, may be integrated into, for example, one or more desktop or laptop computers, workstations, tablets, smart phones, game consoles, set-top boxes, or other such computing devices. Other typical components and modules of a computing system, such as a processor (e.g., central processing unit and co-processor, graphics processor, etc.), input devices (e.g., keyboard, mouse, touchpad, touchscreen, etc.), and an operating system are not shown but will be apparent.
The foregoing description of embodiments of the disclosure has been presented for purposes of illustration; it is not intended to be exhaustive or to limit the claims to the precise form disclosed. One skilled in the relevant art will recognize that many modifications and variations are possible in light of the above disclosure.
Some portions of the present description describe embodiments in terms of algorithms and symbolic representations of operations on information. These algorithmic descriptions and representations are the means used by those skilled in the data processing arts to convey the substance of their work effectively to others skilled in the art. These operations, when described functionally, computationally, or logically, are understood to be implemented by computer programs or equivalent electrical circuits, microcode, or the like. The described operations may be embodied in software, firmware, hardware, or any combination thereof.
Any of the steps, operations, or processes described herein may be performed or implemented by one or more hardware or software modules, either alone or in combination with other devices. In one embodiment, the software modules are implemented with a computer program product comprising a non-transitory computer-readable medium containing computer program code executable by a computer processor for performing any or all of the steps, operations, or processes described.
Example embodiments
In one example, a computer-implemented method for evaluating a summary of a digital multimedia content item includes receiving a multimedia content item including a text portion and an image portion, receiving a summary of the multimedia content, the summary including a text portion and an image portion, and determining a quality metric of the summary relative to the multimedia content item. The determining includes determining at least two of the following content metrics: the method further includes determining a first content metric quantifying an amount of information content in the text portion of the summary that is common with the text portion of the multimedia content item, determining a second content metric quantifying an amount of information content in the image portion of the summary that is common with the image portion of the multimedia content item, and determining a third content metric quantifying an information coherence between the text portion of the summary and the image portion of the summary. The quality metric is based at least in part on the at least two determined content metrics. In one embodiment of this example, determining the quality metric further comprises determining a product of the first content metric, the second content metric, and the third content metric. In one embodiment of this example, determining the first content metric includes determining a cosine similarity between at least one text segment of the text portion of the multimedia summary and a vector representation of at least one text segment of the multimedia content item. A max function may be applied to the cosine similarity determination. In one embodiment of this example, determining the second content metric includes generating a first image vector from the image portion of the summary and generating a second image vector from the image portion of the multimedia content item. In one embodiment of this example, determining the third content metric includes projecting a first text content vector from the text portion of the summary and a second text content vector from the image portion of the summary onto a common unit space. In one embodiment of this example, determining the third content metric includes determining a product of a first content of the text portion of the summary and a second content of the image portion of the summary.
In another example, a computer program product is stored on at least one non-transitory computer-readable medium comprising instructions that, when executed by one or more processors, cause the above computer-implemented method to be performed.
In another example, a system for evaluating a summary of a digital multimedia content item includes various modules, at least one processor, and at least one non-transitory storage medium for determining a quality metric according to the example methods described above.

Claims (20)

1. A computer-implemented method for evaluating a summary of a digital multimedia content item, the method comprising:
receiving the multimedia content item comprising a text portion and an image portion;
receiving the summary of the multimedia content item, the summary comprising a text portion and an image portion;
determining a quality metric of the summary relative to the multimedia content item, the determining comprising:
determining a first content metric quantifying an amount of information content in the text portion of the summary that is common with the text portion of the multimedia content item;
determining a second content metric quantifying an amount of information content in the image portion of the summary that is common with the image portion of the multimedia content item; and
determining a third content metric that quantifies a coherence of information between the text portion of the summary and the image portion of the summary;
wherein the quality metric is based at least in part on the determined first, second, and third content metrics.
2. The method of claim 1, wherein determining the quality metric further comprises determining a product of the first content metric, the second content metric, and the third content metric.
3. The method of claim 1, wherein determining the first content metric comprises determining a cosine similarity between a vector representation of at least one text segment of the text portion of the summary and a vector representation of at least one text segment of the multimedia content item.
4. The method of claim 3, further comprising applying a max function to the cosine similarity.
5. The method of claim 1, wherein determining the second content metric comprises generating a first image vector from the image portion of the summary and a second image vector from the image portion of the multimedia content item.
6. The method of claim 1, wherein determining the third content metric comprises projecting a first text content vector from the text portion of the summary and a second text content vector from the image portion of the summary onto a common unit space.
7. The method of claim 1, wherein determining the third content metric comprises determining a product of a first content of the text portion of the summary and a second content of the image portion of the summary.
8. A computer program product, wherein the computer program product is stored on at least one non-transitory computer-readable medium comprising instructions that, when executed by one or more processors, cause a process to be performed, the process comprising:
receiving a multimedia content item comprising a text portion and an image portion;
receiving a summary of the multimedia content item, the summary comprising a text portion and an image portion;
determining a quality metric of the summary relative to the multimedia content item, the determining comprising:
determining a first content metric quantifying an amount of information content in the text portion of the summary that is common with the text portion of the multimedia content item;
determining a second content metric quantifying an amount of information content in the image portion of the summary that is common with the image portion of the multimedia content item; and
determining a third content metric that quantifies a coherence of information between the text portion of the summary and the image portion of the summary;
wherein the quality metric is based at least in part on the determined first, second, and third content metrics.
9. The computer program product of claim 8, wherein determining the quality metric further comprises determining a product of the first content metric, the second content metric, and the third content metric.
10. The computer program product of claim 8, wherein determining the first content metric comprises determining a cosine similarity between a vector representation of at least one text segment of the text portion of the summary and a vector representation of at least one text segment of the multimedia content item.
11. The computer program product of claim 10, further comprising applying a max function to the cosine similarity.
12. The computer program product of claim 8, wherein determining the second content metric comprises generating a first image vector from the image portion of the summary and a second image vector from the image portion of the multimedia content item.
13. The computer program product of claim 8, wherein determining the third content metric comprises projecting a first text content vector from the text portion of the summary and a second text content vector from the image portion of the summary onto a common unit space.
14. The computer program product of claim 8, wherein determining the third content metric comprises determining a product of a first content of the text portion of the summary and a second content of the image portion of the summary.
15. A system for evaluating a summary of a digital multimedia content item, the system comprising:
a multimedia content item repository configured to receive a multimedia content item comprising a text portion and an image portion;
a summary repository configured to receive a summary comprising a text portion and an image portion;
a quality metric determination module configured to determine a quality metric of the summary relative to the multimedia content item, the determination comprising:
determining a first content metric quantifying an amount of information content in the text portion of the summary that is common with the text portion of the multimedia content item;
determining a second content metric quantifying an amount of information content in the image portion of the summary that is common with the image portion of the multimedia content item; and
determining a third content metric that quantifies a coherence of information between the text portion of the summary and the image portion of the summary;
wherein the quality metric is based at least in part on the determined first, second, and third content metrics.
16. The system of claim 15, wherein the quality metric determination module is further configured to determine the quality metric by determining a product of the first content metric, the second content metric, and the third content metric.
17. The system of claim 15, wherein the quality metric determination module is further configured to determine the first content metric by determining a cosine similarity between a vector representation of at least one text segment of the text portion of the summary and a vector representation of at least one text segment of the multimedia content item.
18. The system of claim 17, wherein the quality metric determination module is further configured to determine the first content metric by applying a max function to the cosine similarity.
19. The system of claim 15, wherein the quality metric determination module is further configured to determine the second content metric by generating a first image vector from the image portion of the summary and a second image vector from the image portion of the multimedia content item.
20. The system of claim 15, wherein the quality metric determination module is further configured to determine the third content metric by projecting a first text content vector from the text portion of the summary and a second text content vector from the image portion of the summary onto a common unit space.
CN201610877283.7A 2015-12-04 2016-09-30 Determining quality of a summary of multimedia content Active CN106844410B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US14/959,219 US9454524B1 (en) 2015-12-04 2015-12-04 Determining quality of a summary of multimedia content
US14/959,219 2015-12-04

Publications (2)

Publication Number Publication Date
CN106844410A CN106844410A (en) 2017-06-13
CN106844410B true CN106844410B (en) 2022-02-08

Family

ID=56939505

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610877283.7A Active CN106844410B (en) 2015-12-04 2016-09-30 Determining quality of a summary of multimedia content

Country Status (5)

Country Link
US (1) US9454524B1 (en)
CN (1) CN106844410B (en)
AU (1) AU2016238832B2 (en)
DE (1) DE102016011905A1 (en)
GB (1) GB2545051A (en)

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11182414B2 (en) * 2017-03-20 2021-11-23 International Business Machines Corporation Search queries of multi-datatype databases
CN109492213B (en) * 2017-09-11 2023-04-07 阿里巴巴集团控股有限公司 Sentence similarity calculation method and device
US10587669B2 (en) * 2017-12-20 2020-03-10 Facebook, Inc. Visual quality metrics
CN110020169A (en) * 2017-12-28 2019-07-16 北京京东尚科信息技术有限公司 A kind of method and apparatus of determining object dependencies
CN108829659B (en) * 2018-05-04 2021-02-09 北京中科闻歌科技股份有限公司 Reference identification method, reference identification equipment and computer-storable medium
CN108985370B (en) * 2018-07-10 2021-04-16 中国人民解放军国防科技大学 Automatic generation method of image annotation sentences
US11093560B2 (en) * 2018-09-21 2021-08-17 Microsoft Technology Licensing, Llc Stacked cross-modal matching
CN109543512A (en) * 2018-10-09 2019-03-29 中国科学院自动化研究所 The evaluation method of picture and text abstract
CN109299746A (en) * 2018-10-22 2019-02-01 广州星唯信息科技有限公司 A kind of segment chord similar degree calculation method
CN109658938B (en) * 2018-12-07 2020-03-17 百度在线网络技术(北京)有限公司 Method, device and equipment for matching voice and text and computer readable medium
CN111428032B (en) * 2020-03-20 2024-03-29 北京小米松果电子有限公司 Content quality evaluation method and device, electronic equipment and storage medium
US11687514B2 (en) * 2020-07-15 2023-06-27 International Business Machines Corporation Multimodal table encoding for information retrieval systems
US11675822B2 (en) * 2020-07-27 2023-06-13 International Business Machines Corporation Computer generated data analysis and learning to derive multimedia factoids
US20220027578A1 (en) * 2020-07-27 2022-01-27 Nvidia Corporation Text string summarization
CN112528598B (en) * 2020-12-07 2022-04-05 上海交通大学 Automatic text abstract evaluation method based on pre-training language model and information theory
CN112800745A (en) * 2021-02-01 2021-05-14 北京明略昭辉科技有限公司 Method, device and equipment for text generation quality evaluation
WO2022213313A1 (en) 2021-04-08 2022-10-13 Citrix Systems, Inc. Intelligent collection of meeting background information
WO2023039698A1 (en) * 2021-09-14 2023-03-23 Citrix Systems, Inc. Systems and methods for accessing online meeting materials

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7555431B2 (en) * 1999-11-12 2009-06-30 Phoenix Solutions, Inc. Method for processing speech using dynamic grammars
CN101634996A (en) * 2009-08-13 2010-01-27 浙江大学 Individualized video sequencing method based on comprehensive consideration
CN103617158A (en) * 2013-12-17 2014-03-05 苏州大学张家港工业技术研究院 Method for generating emotion abstract of dialogue text
CN103699591A (en) * 2013-12-11 2014-04-02 湖南大学 Page body extraction method based on sample page
CN104199826A (en) * 2014-07-24 2014-12-10 北京大学 Heterogeneous media similarity calculation method and retrieval method based on correlation analysis
CN104699763A (en) * 2015-02-11 2015-06-10 中国科学院新疆理化技术研究所 Text similarity measuring system based on multi-feature fusion

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030154072A1 (en) * 1998-03-31 2003-08-14 Scansoft, Inc., A Delaware Corporation Call analysis
US20030061022A1 (en) * 2001-09-21 2003-03-27 Reinders James R. Display of translations in an interleaved fashion with variable spacing
US7209875B2 (en) * 2002-12-04 2007-04-24 Microsoft Corporation System and method for machine learning a confidence metric for machine translation
WO2005004370A2 (en) * 2003-06-28 2005-01-13 Geopacket Corporation Quality determination for packetized information
US7778632B2 (en) * 2005-10-28 2010-08-17 Microsoft Corporation Multi-modal device capable of automated actions
KR100995839B1 (en) * 2008-08-08 2010-11-22 주식회사 아이토비 Multi contents display system and method thereof

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7555431B2 (en) * 1999-11-12 2009-06-30 Phoenix Solutions, Inc. Method for processing speech using dynamic grammars
CN101634996A (en) * 2009-08-13 2010-01-27 浙江大学 Individualized video sequencing method based on comprehensive consideration
CN103699591A (en) * 2013-12-11 2014-04-02 湖南大学 Page body extraction method based on sample page
CN103617158A (en) * 2013-12-17 2014-03-05 苏州大学张家港工业技术研究院 Method for generating emotion abstract of dialogue text
CN104199826A (en) * 2014-07-24 2014-12-10 北京大学 Heterogeneous media similarity calculation method and retrieval method based on correlation analysis
CN104699763A (en) * 2015-02-11 2015-06-10 中国科学院新疆理化技术研究所 Text similarity measuring system based on multi-feature fusion

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"面向Web图片检索的文本和图片信息融合技术研究";尹湘舟;《中国优秀硕士学位论文全文数据库》;20111215(第S2期);正文第9-15、19-21页 *
Deep Fragment Embeddings for Bidirectional Image Sentence Mapping;A Karpathy,A Joulin,FF Li;《Advances in Neural Information Processing Systems》;20140622;第1-4页 *

Also Published As

Publication number Publication date
GB201616833D0 (en) 2016-11-16
AU2016238832B2 (en) 2021-02-25
GB2545051A (en) 2017-06-07
US9454524B1 (en) 2016-09-27
CN106844410A (en) 2017-06-13
AU2016238832A1 (en) 2017-06-22
DE102016011905A1 (en) 2017-06-08

Similar Documents

Publication Publication Date Title
CN106844410B (en) Determining quality of a summary of multimedia content
JP6916383B2 (en) Image question answering methods, devices, systems and storage media
CN110162593B (en) Search result processing and similarity model training method and device
US9858264B2 (en) Converting a text sentence to a series of images
WO2019200806A1 (en) Device for generating text classification model, method, and computer readable storage medium
US10783395B2 (en) Method and apparatus for detecting abnormal traffic based on convolutional autoencoder
EP3117369B1 (en) Detecting and extracting image document components to create flow document
US9275307B2 (en) Method and system for automatic selection of one or more image processing algorithm
US7711673B1 (en) Automatic charset detection using SIM algorithm with charset grouping
US20220383108A1 (en) Information-aware graph contrastive learning
WO2019028990A1 (en) Code element naming method, device, electronic equipment and medium
US11915500B2 (en) Neural network based scene text recognition
US10417578B2 (en) Method and system for predicting requirements of a user for resources over a computer network
CN110955750A (en) Combined identification method and device for comment area and emotion polarity, and electronic equipment
CN112805715A (en) Identifying entity attribute relationships
US20180005248A1 (en) Product, operating system and topic based
Wang et al. Classification with unstructured predictors and an application to sentiment analysis
Boillet et al. Confidence estimation for object detection in document images
US20140372090A1 (en) Incremental response modeling
CN116680401A (en) Document processing method, document processing device, apparatus and storage medium
WO2023155304A1 (en) Keyword recommendation model training method and apparatus, keyword recommendation method and apparatus, device, and medium
CN111488452A (en) Webpage tampering detection method, detection system and related equipment
US20230126022A1 (en) Automatically determining table locations and table cell types
US9122705B1 (en) Scoring hash functions
CN112148902A (en) Data processing method, device, server and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant