GB2552969A - Image processing system - Google Patents

Image processing system Download PDF

Info

Publication number
GB2552969A
GB2552969A GB1613987.5A GB201613987A GB2552969A GB 2552969 A GB2552969 A GB 2552969A GB 201613987 A GB201613987 A GB 201613987A GB 2552969 A GB2552969 A GB 2552969A
Authority
GB
United Kingdom
Prior art keywords
image
images
pixels
summary image
series
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
GB1613987.5A
Other versions
GB2552969B (en
GB201613987D0 (en
Inventor
Chow Peter
Georgescu Serban
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujitsu Ltd
Original Assignee
Fujitsu Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujitsu Ltd filed Critical Fujitsu Ltd
Priority to GB1613987.5A priority Critical patent/GB2552969B/en
Publication of GB201613987D0 publication Critical patent/GB201613987D0/en
Publication of GB2552969A publication Critical patent/GB2552969A/en
Application granted granted Critical
Publication of GB2552969B publication Critical patent/GB2552969B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/002Image coding using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/73Querying
    • G06F16/738Presentation of query results
    • G06F16/739Presentation of query results in form of a video summary, e.g. the video summary being a video sequence, a composite still image or having synthesized frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/60Editing figures and text; Combining figures or text
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/02Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
    • G11B27/031Electronic editing of digitised analogue information signals, e.g. audio or video signals
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • G11B27/19Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier
    • G11B27/28Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier by using information signals recorded by the same method as the main recording
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/8549Creating video summaries, e.g. movie trailer
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20216Image averaging

Abstract

Producing summary image from series of images, comprising: extracting series of images X from input file(s); processing images from series to obtain rectangles R of pixels, having dimensions AxB (A and B are numbers of pixels), each rectangle R representing one image X from series; arranging 7 rectangles R of pixels to form summary image representing series of images. Summary image may be analysed by neural network to classify input file, e.g. determining that summary image comprises rectangles in close proximity which are different colours, indicating that original file contained rapid scene changes, further indicating that input was an action film. Processing may comprise averaging colour or brightness of image. Summary image may be rectangular having dimensions CxD (C and D are numbers of rectangles). Rectangle does not exclude square in the case where A=B and C=D. Each image of series may be single frame of video file. Sample frequency of video file may be predetermined. Rectangles of summary image may be arranged chronologically or by average brightness or colour. Two summary images may be produced, which are then classified and compared to determine how similar they are.

Description

(54) Title of the Invention: Image processing system
Abstract Title: Producing a summary image composed of multiple processed extracted images from the input file (57) Producing summary image from series of images, comprising: extracting series of images X from input file(s); processing images from series to obtain rectangles R of pixels, having dimensions AxB (A and B are numbers of pixels), each rectangle R representing one image X from series; arranging 7 rectangles R of pixels to form summary image representing series of images. Summary image may be analysed by neural network to classify input file, e.g. determining that summary image comprises rectangles in close proximity which are different colours, indicating that original file contained rapid scene changes, further indicating that input was an ‘action’ film. Processing may comprise averaging colour or brightness of image. Summary image may be rectangular having dimensions CxD (C and D are numbers of rectangles). ‘Rectangle’ does not exclude ‘square’ in the case where A=B and C=D. Each image of series may be single frame of video file. Sample frequency of video file may be predetermined. Rectangles of summary image may be arranged chronologically or by average brightness or colour. Two summary images may be produced, which are then classified and compared to determine how similar they are.
Figure 3 innnci^cm
X X X X X X
Figure GB2552969A_D0001
Figure GB2552969A_D0002
Figure GB2552969A_D0003
/8
Figure 1
Figure GB2552969A_D0004
c ;r
2/8 <
Φ gp b_
Figure GB2552969A_D0005
Figure GB2552969A_D0006
3/8
Figure GB2552969A_D0007
4/8
Figure 4
Figure GB2552969A_D0008
QC DC DC co
DC DC DC DC
nr QC DC DC
DC DC DC
DC or DC
5/8
Figure GB2552969A_D0009
6/8
Figure 6
Figure GB2552969A_D0010
zs
Q.
C
5/i
JJ uZ
JS ε
φ
J3 ns >
7/8 >·
Figure GB2552969A_D0011
8/8 oo <υ i
ΓΜ σ*
Figure GB2552969A_D0012
σ>
Ω£ Ο ω
Figure GB2552969A_D0013
Intellectual
Property
Office
Application No. GB1613987.5
RTM
Date :14 February 2017
The following terms are registered trade marks and should be read as such wherever they occur in this document:
Youtube (Page 2)
Intellectual Property Office is an operating name of the Patent Office www.gov.uk/ipo
Image Processing System [001], The invention relates to an image processing method, apparatus and program.
[002]. Existing image comparison mechanisms for single images often rely on a simultaneous consideration of a pair of images and a qualitative judgement by a skilled expert as to the degree of similarity between the pair of images. An example of an existing image comparison mechanism of this type would be an art expert considering a pair of paintings. However, while existing mechanisms may provide some form of similarity measurement, there are several issues with the widespread use of existing techniques. Existing techniques, such as the technique discussed above, are typically labour intensive as an expert is required to consider each pair of images individually before forming a judgement.
[003]. The increase in the use of the internet over recent years in conjunction with advances in digital image capture and storage techniques has resulted in a significant increase in both the number of images available in a digitised form and the ease with which these images may be accessed. However, without some form of categorisation, it may be challenging to locate an image related to a specific subject.
[004], Accordingly, there is a need for a system to automate a process of recognising images to facilitate categorisation and retrieval of the images. One such mechanism is through the use of neural networks.
[005]. ImageNet Classification with Deep Convolutional Neural Networks by Krizhevsky, A., Sutskever, I. and Hinton, G. E. discloses the use of large deep convolutional neural networks to classify 1.2 million images (specifically photographs) from a previously obtained database into 1000 different classes. The neural network consisted of 8 learned layers, of which five were convolutional. The neural network was trained using two GPUs in parallel.
[006], The present invention produces summary images for series of images. One example of the applicability of this technique is to producing summary images of video files. Video files may be equated to a series of still images, such as frames of film, potentially with a linked audio component. As discussed above in the context of single still images, the increase in the number of files available on the internet may make it difficult to locate a video file related to a specific topic. Further, the ubiquitous nature of camera phones means that the number of individuals generating video files is rapidly increasing, as is the average number of files each individual generates.
[007], “Large-scale Video Classification with Convolutional Neural Networks” by Karpathy, A. et al, discusses the use of convolutional neural networks for the classification of a dataset of 1 million YouTube videos into 487 classes. The methodology used involves identifying local motion in the videos (that is, frame to frame variation in identified object positioning), and classifying this local motion as signifying a particular action, described using a specific term (or “word”). The repetition of this process for all local motions in the video provides a bag of words”, which may then be used to categorise the video.
[008], Existing classification methods for video files suffer from various limitations. Typically, although video files (such as commercial films) released by companies may be individually classified by the companies (that is, classification information for these videos may already exist), the same is often not true for video files released by small companies or individuals, for which classification information may not be available. Further, a user that has a large number of video files stored on a personal storage unit (such as a hard drive) may wish to be able to classify these video files without personally reviewing all of the video files; this is not practical using existing techniques. This is particularly true if the user does not wish to upload the files onto the internet, or it is not possible to upload the files to the internet.
[009], Existing video file classification mechanisms typically require a significant degree of human input, for example, to watch a video and provide a review or key terms indicating the category or categories into which the video file falls. This process is both labour intensive (as a human is required to view and review the files) and subjective (as the review will largely amount to the opinion of the reviewer, and will be limited by the knowledge and objectivity of the viewer). Further, all actions that may be displayed in a video are unlikely to be classified. Accordingly, it may be difficult to locate video files of a particular type.
[0010]. Attempts have been made to provide a greater degree of objectivity in the categorisation of video files through the use of review aggregation, wherein a selection of reviews by various reviewers are compared and combined to produce an amalgamated review that is intended to present the majority opinion. However, while this approach may ameliorate the impact of outlying review opinions, an element of subjectivity persists. Further, in order for review aggregation to be successful, it is necessary for a plurality of reviews to be available for a given video file. This is unlikely for some files, particularly for files produced by individuals and potentially not uploaded onto the internet as discussed above.
[0011]. It is desirable to address some of the above issues.
[0012], According to an aspect of an embodiment of the present invention, there is provided a method for producing a summary image from a series of images, the method comprising: extracting a series of images from one or more input files; processing images from the series of images to obtain rectangles of pixels of dimensions AxB, each rectangle of pixels representing one image from the series of images, wherein: A and B are numbers of pixels, and A and B are positive integers; and arranging the rectangles of pixels to form a summary image representing the series of images. The method provides a simple and effective way to summarise a series of input images, and the summary images may also be used to further analyse overall trends and patterns within the series of images.
[0013], A further aspect of an embodiment of the present invention provides an image summariser configured to obtain a summary image from a series of images, the image summariser comprising: an extractor configured to extract a series of images from one or more input files; an image processor configured to process the images from the series of images to obtain rectangles of pixels of dimensions AxB, each rectangle of pixels representing one image from the series of images, wherein: A and B are numbers of pixels, and A and B are positive integers; and an image preparator configured to arrange the rectangles of pixels to form a summary image representing the series of images. The image summariser provides a simple and effective way to summarise a series of input images, and the summary images may also be used to further analyse overall trends and patterns within the series of images.
[0014], Each rectangle of pixels may be formed by compressing an image from the series of images. This provides an advantage of reducing the size of the summary image relative to a summary image using uncompressed images, thereby increasing the ease of storing and analysing the images.
[0015], Each rectangle of pixels may display a uniform colour that is the average colour of the one image from which the rectangle of pixels is formed, and/or display a uniform brightness that is the average brightness of the one image from which the rectangle of pixels is formed. This provides an advantage of improving the ease with which the summary image may be analysed, and also ensures that the summary image may remain representative of the series of images as a whole while requiring less storage space than if each image from the series of images was included in full in the summary image.
[0016], The summary image may be a rectangular array of the rectangles of pixels, the dimensions of the rectangular array being CxD, wherein: C and D are numbers of rectangles of pixels, and C and D are positive integers larger than 1. This provides the advantage of improving the ease with which the summary images may be quantified, and also allowing plural summary images of equal dimensions to be produced (which may be useful for some applications of the summary images).
[0017]. Each of the one or more input files may be a video file, and each image from the series of images extracted from the one or more input files may be a single frame of the video file. This provides the advantage of allowing the method to be used to analyse video fiies (which may be equated to a series of image fiies) in addition to still image files.
[0018], The single frames may be extracted from the one or more video files at a predetermined frequency, or the single frames may be extracted from the one or more video files such that a predetermined total number of single frames are extracted. This provides the advantage of ensuring that the summary image is representative of the contents across the whoie of the input video fiie or files, and may also be useful if it is desired to produce a series of summary images that may easily be compared with one another.
[0019], When extracting the single frames from the one or more video files, a portion of the video fiie may be excluded from the extraction process before the single frames are extracted, This provides the advantage of allowing a portion of a video file that is not relevant, or that Is not representative of the overall content of the video fiie, to be excluded prior to extracting the series of images, resulting in a more useful and representative summary image.
[00201. The rectangles of pixels may be arranged in the summary image in accordance with the chronological order of the single frames in the one or more video files. This provides the advantage of generating a summary image which allows the progression through the video file to be analysed.
[0021], The rectangles of pixels may be arranged in horizontal rows in the summary image, wherein the rows are filled from left to right, and wherein the horizontal rows are arranged from top io bottom with respect to the orientation of the summary image. This provides the advantage of generating a summary image which is easier to analyse and, in the event that the rectangles of pixels are arranged in chronological order, may also allow the summary image to be prepared using fewer computing resources.
[0022], The method may further comprise displaying the generated summary image on a display screen. This provides the advantage of providing immediate feedback to a user once the summary image has been prepared.
[0023], According to a further aspect of an embodiment of the present invention, there is provided a method for calculating a similarity value representing the similarity between a first summary image and a second summary image, the method comprising: obtaining a first summary image by: extracting a series of images from one or more input files; processing images from the series of images to obtain rectangies of pixels of dimensions AxB, each rectangle of pixels representing one image from the series of images, wherein: A and B are numbers of pixels, and A and B are positive integers; and arranging the rectangles of pixels to form a summary image representing the series of images, analysing the first summary image; classifying the first summary image on the basis of the analysis: performing a comparison between the classifications for the first summary image and equivalent classifications for the second summary image; deriving a similarity value on the basis of the comparison; and outputting the similarity value. The method provides an efficient and objective way of comparing two series of images without requiring substantial amounts of user input.
[0024], A further aspect of an embodiment of the present invention provides a comparer configured to calculate a similarity value representing the similarity between a first summary image and a second summary image, the comparer comprising: an image summariser configured to obtain the first summary image from a series of images; an analyser configured to analyse the first summary image; a classifier configured to classify the first summary image on the basis of the analysis produced by the analyser: a juxtaposer configured to perform a comparison between the classifications for the first summary image generated by the classifier and equivalent classifications for the second summary image; a calculator configured to derive a similarity value on the basis of the results of the comparison performed by the juxtaposer; and an outputter configured to output the similarity value derived by the calculator. The comparer provides an efficient and objective way of comparing two series of images without requiring substantial amounts of user input.
[0025]. The second summary image may be selected from a plurality of stored summary images, the second summary image being selected on the basis of the previously obtained classifications for the plurality of stored summary images. This provides the advantage of allowing a rapid and objective comparison between the first summary Image and a plurality of second summary images, and may be used to quickly locate a second series of images (corresponding to the second summary image) that is similar to a first series of images (corresponding to the first summary image).
[0026]. The second summary image may be obtained, analysed and classified in the same way as the first summary image, prior to performing the comparison between the classifications. This provides the advantage of allowing a rapid comparison between two series of images to be performed in isolation.
[0027], in the obtaining step, the rectangles of pixels may be obtained in such a way that the values of A and B are the same for all of the rectangles of pixels in the first summary image and the rectangles of pixels in the second summary image. This provides the advantage of Improving the accuracy of the comparison between the summary images, because the comparison is performed using equivalent information.
[0028]. Each summary image may be a rectangular array of the rectangles of pixels, the dimensions of the rectangular array being CxD, wherein: C and D are numbers of rectangles of pixels, and C and D are positive integers larger than 1; and the values of C and D may be the same for the first summary image and the second summary image. This also provides the advantage of improving the accuracy of the comparison between the summary images, because the comparison is performed using equivalent information. Also, some analysis techniques, in particular those using neural networks, are more efficient if provided with summary images of equal dimensions.
[0029]. A neural network may be used to perform the analysis and classification.
This provides more accurate results than some other analysis techniques [0030]. When each of the one or more input files is a video file, each image from the series of images extracted from the one or more input files is a single frame of the video file; and the single frames are extracted from the one or more video files at a predetermined frequency, or wherein the single frames are extracted from the one or more video files such that a predetermined total number of single frames are extracted, the predetermined total number of single frames may be the same for the first summary image and the second summary image. This also provides the advantage of improving the accuracy of the comparison between the summary images, because the comparison is performed using equivalent information.
[0031], The similarity value may be used as the basis for a search of a database of further video files. This provides the advantages associated with an objective search for a video file that is similar to an input video file. By locating a second video file which has a high similarity value when the respective summary images for the first and second video files are compared, it is possible to locate a second video file having similar properties to a first video file without relying on subjective information provided through written reviews (for example).
[0032], The database may be an online database. This provides the advantage of allowing a broader range of input files to be used to generate summary images, and subsequently used in comparisons.
[00331. According to an aspect of an embodiment of the present invention, there is provided a computer program which, when executed by a computing apparatus, causes the computing apparatus to execute a method as described above. This provides the advantage of an easy and efficient way to implement a method of the present invention.
Description of Figures [0034], The invention will now be further described, by way of example only, with reference to the following figures, in which:
[0035]. Figure 1 is a schematic representation of an image summariser, configured to generate a summary image, according to an aspect of an embodiment of the invention.
[0036], Figure 2A illustrates an example of the sampling of a video file with a predetermined frequency according to an aspect of an embodiment of the invention.
[0037], Figure 2B illustrates the exclusion of a portion of a video file prior to image extraction according to an aspect of an embodiment of the invention.
[0038]. Figure 3 illustrates the formation of a rectangle of pixels according to an aspect of an embodiment of the invention.
[0039]. Figure 4 illustrates how a predetermined extraction frequency may be varied according to an aspect of an embodiment of the invention.
[0040]. Figure 5 provides an example of a summary image generated by an aspect of an embodiment of the invention.
[0041]. Figure 6 is a schematic representation of a comparer configured to calculate a similarity value representing the similarity between a first summary image and a second summary image according to an aspect of an embodiment of the invention.
[0042]. Figure 7 is a diagrammatic overview of the similarity value generation process according to an aspect of an embodiment of the invention.
[0043]. Figure 8 is a schematic representation of a computing device that may be used to implement aspects of embodiments of the present invention.
Detailed Description [0044], The schematic representation in Figure 1 illustrates the image summariser 1 configured to obtain a summary image from a series of Images, in accordance with an aspect of an embodiment of the invention. In order to generate a summary image, the image summariser 1 must first be provided with a series of images to be used as the basis for the summary image. Typically the series of images are provided electronically in a suitable image file format. Examples of suitable image file formats include JPEG, TIFF, PNG, GIF and BMP files, but other image file formats may also be used, [0045] As discussed in greater detail below, the image summariser 1 may also accept video files as input files, wherein the images forming a series of images are single frames extracted from the video files. The frames may be extracted from the input video file using any suitable technique, as will be familiar to those skilled in the art. As is the case with input image files, any suitable video file format may be used. Examples of suitable video file formats inciude Flash Video files, Windows Media Video files, MPEG-4 files, and so on.
[0046]. If connected to a suitable input device, the image summariser 1 may also accept input images directly from a recording device, such as a video or still image camera, or images scanned in using a scanner.
[0047], A summary image may be formed from a single input file, or from a plurality of input files The determination as to whether to use a single input file or several input files may be made by the user or may be made automatically by the image summariser 1. Typical considerations when making this determination inciude the size of the input file, the resolution of the images that may be extracted from the input file, the desired size and level of detail in the generated summary image, and so on.
In the event that the input fiie is a video file, the duration of this video file may also be a factor in the determination. Normally a plurality of image files or a single video file would be used to form a summary Image, but the apparatus is not limited to this input, [0048], The series of images is extracted from the one or more input files using an extractor 3. The extractor 3 may be set to extract every image in the input file or input files for use as an image in the series of images, or alternatively may be set to extract a selection of images from the input file or files for use in the series of images.
[0049]. An example of a situation where the extractor 3 may extract a selection of the images in the input file or input files for use in the series of images is where there are a number of images in the input file or files that are simiiar or identical to one another, such as a series of photographs of the same object taken in close succession or duplicates of a given image file, in this instance, the system may be configured to automatically exclude images that are detected as being similar or identical to an image that has already been extracted for use in the series of images. Alternatively, the image summariser 1 may be configured to identify the simiiar or identical images to a user and request that the user identify one or more of the similar or identical images to be used in the series of images.
[0050]. In the event that the input file or files are video files, the extractor 3 may be configured to sample the files at a predetermined frequency. An example of the sampling of the files at a predetermined frequency is described below in Example 1, with reference to figures 2A and 2B.
[0051]. Example 1 [0052], in example 1, a video file having an intended frame rate of 20 frames per second {that is, a display frequency of 20 Hz) is used as an input file. The extractor 3 in this example is configured to recognise the frame rate of a video file from the file metadata (the frame rate may also be manually input by a user). The extractor 3 in this example is configured to extract one image in five, such that the series of images contains one image for every quarter-second of video. In figure 2A, the extracted frames are marked with the symbol “X”. The extraction frequency of example 1 is therefore 4Hz.
[0053]. The extractor 3 may also be configured to exclude a portion of the input file or input files before extracting images for use in the series of images. This is illustrated in figure 2B, wherein the frames that are excluded prior to the extraction of the frames are marked with the symbol Έ. This may be done in response to a specific instruction from a user (for example, identifying a series of input image files to be omitted from a summary image), or automatically by the extractor 3. This feature is particularly useful if the input file or files is/are video file(s) that have been professionally produced.
[0054], For professionally produced video files, it is common for the main content of the file (a film, a television program, and the like) to be preceded and/or succeeded by a credit sequence. Typically, the credit sequences will consist of a list of names of those who worked on the production of the video file. In terms of the images to be displayed, these credit sequences may not be representative of the main content of the video file. For example, the credit sequences may consist of simple text against a plain background, while the main content involves a series of rapidly changing images. Accordingly, inclusion of the credit sequences in the series of images may introduce inaccuracies into an analysis of a summary image (discussed in greater detail below).
[0055]. in order to avoid the introduction of inaccuracies due to the inclusion of the credit sequences, the extractor 3 may be configured to identify the credit sequences as portions to be excluded before extracting the images for use in the series of images. If the credit sequences are not manually identified by a user, the extractor 3 may be set to either monitor the change in the input video file to identify where credit sequences start and end (for example, when changes in the frames of the video file indicate that the main content is being displayed because the variation between consecutive images has significantly increased), or to simply exclude a portion of the video file equating to a set percentage of the file or a set amount of time from the start and/or end of the video file.
[0056]. Once the series of images has been extracted by the extractor 3, the series of images is passed from the extractor 3 to an image processor 5. The image processor 5 is configured to process the images from the series of images, and thus to form rectangles of pixels from the images. Each of the rectangles of pixels is formed (via the processing) from one of the images from the series of images. Of course, it is not necessary for ail of the images in the series of images to be extracted by the extractor 3 before any images are passed io the image processor 5. Instead, images from the series of images may be passed to the image processor 5 as soon as they have been extracted from the input file or input files by the extractor 3, such that the image processing may be performed concurrently with the image extraction.
[0057]. Each rectangle of pixels formed by the image processor 5 has dimensions of AxB, wherein A and B are positive integers that represent numbers of pixels. At an arbitrary orientation, the outputs from the image processor 5 are rectangular, with the horizontal side of the rectangle being A pixels long, and the vertical side of the rectangle being B pixels iong. The use of the terms “vertical and “horizontal'' should not be understood as limiting the orientation of the rectangle of pixels; instead these terms should be understood to mean that the rectangles of pixels have two pairs of sides, wherein the two sides in each pair are parallel to one another, and perpendicular to the sides in the other of the two pairs.
[0058], The term “rectangle should be interpreted broadly, and should not be interpreted to exclude the possibility where A=B (and therefore the rectangle of pixels is a square). For some uses of the rectangles of pixels, it is useful if a square is generated. In particular, it is often useful if the rectangle of pixels consists of only a single pixel, that is, if A=B=1, particularly if the initial series of images contains a large number of images that each occupy a large amount of memory. In such a case, it may be necessary to compress each of the initial series of images to a single pixel· [0059], A diagram of a rectangle of pixels in accordance with an aspect of an embodiment of the invention is shown in figure 3. Figure 3 shows the frames extracted from a video file (which are identified using the X” symbol) that are processed by the image processor 5 to produce the rectangles of pixels (identified using the symbol “R). In figure 3, the values of A and B are both 2, so rectangle of pixels R has sides that are two pixels long. Both A and B are labelled in figure 3.
[0060], The processing of the images from the series of images to form the rectangles of pixels often involves the compression of the images. Although compression is not always used in the image processing, it is frequently necessary in order that the summary image ultimately generated from the series of images is of a manageable size. This reason for the use of compression is illustrated by example 2 beiow.
[0061]. Example 2 [0062], if the initial series of images contains only 4 images, then It would not be unreasonable to form a summary image without compressing any of the series of images, because the resulting summary image would contain the data of only 4 images from the series of images and this is likely to be a manageable amount of data. By contrast, If the initial series of images is taken from a video file wherein each frame of the video file is approximately 100KB in size, and the video file is 90 minutes long then, even if a sampling rate of a frame per second is used, processing the rectangles of pixels without compression would result in a summary image containing the uncompressed data of 6400 images; approximately 540MB. The resulting summary image could therefore be too large for easy use.
[0063]. The image processor 5 may also process the images from the series of images to form rectangles of pixels that approximate the images in other ways, In addition to or alternatively to compressing the images. Examples of alternative types of image processing that may be used include producing a rectangle of pixels that is of a uniform colour, wherein the uniform colour is the average colour of the image that was processed to form the rectangle of pixels. The average colour of the Image may be derived using a numeric representation of the image colour, for example, using an RGB colour model which indicates how much of red, green and blue is included in a colour using a number between zero and a defined maximum value. The numeric representation may be used to obtain numeric values for the colour of each of the pixels forming the image from the series, then these numeric values may be averaged to generate an average colour of the image The average colour used may be any of the mean, median or modal colour of the pixels forming the image. In the present text, each instance of the term “average” should be interpreted as meaning that the mean, modal or median average may be used, unless otherwise stated.
[0064]. Similarly, the rectangle of pixels may be generated to have a brightness that is equal to an average brightness of the pixels forming the image from the series of images. Again, this average brightness may be obtained by using a numeric representation of the image brightness. The RGB colour model may also be used to indicate the image brightness, wherein the brightness for any given pixel is the mean of the red, green and blue colour values for the pixel. The overall average brightness for the image may then be taken as any of the mean, median or modai brightness values of the pixels forming the image.
[0065], The generated rectangles of pixels are arranged to form a summary image by the image preparator 7. Typically the rectangles of pixels are arranged in the summary image in the same order that the images from which the rectangles of pixels were obtained were ordered in the series of images. This allows the summary image to present a more accurate representation of the progression of the images in the series, particularly event that the series of images is extracted chronologically from one or more video files.
[0066], it is also possible to arrange the rectangles of pixels forming the summary image in another way, for example, in order of the average colour value (which may be a uniform colour value as discussed above) of the rectangle of pixels, in order of the average brightness of the rectangle of pixels, and so on. Ordering the images by average colour or brightness may also provide useful information on the nature of the series of images when interpreting the summary images. For example, a series of images containing a large number of green-dominated images could indicate that the series of images depict natural scenes, such as woodland or fields. Similarly, in the event that the series of images were extracted from a video file containing a film, a predominance of darker images (that is, images having a low brightness rating) could indicate that the film is from the horror genre.
[0067], The rectangles of pixels generated by the image processor 5 may be sent to the image preparator 7 and used to populate the summary image as soon as the rectangles of pixels are generated, or the entirety of the rectangles of pixels for a particular series of images may be generated before any are used to populate the summary image. If the rectangles of pixels are to be arranged in the summary image in the order in which the images appeared in the series of images, then the rectangles of pixels may be added to the summary image as they are generated with little calculation required. However, even in the event that the rectangles of pixels are arranged in the summary image in a different way, such as by average brightness, the rectangles of pixels may be added to the summary image as they are generated with movements of other rectangles of pixels made as necessary.
[0068]. Each summary image generated by the image preparator 7 is a rectangular array, the rectangular array itself being an arrangement of the rectangles of pixels generated by the image processor 5. The rectangular array of the summary image may be defined in terms of the number of rectangles of pixels on each side of the array. That is, the summary image may be defined as a rectangular array ofthe rectangles of pixels having dimensions of D, wherein C and D are numbers of rectangles of pixeis, and both C and D are positive integers.
[0069]. it is possible to generate a summary image having only one rectangle of pixeis on one side (that is, wherein either of C and D is equal to 1), thereby generating a summary image that is essentially a linear one-dimensional array of rectangles of pixeis. However, generally the values of both C and D are larger than 1, such that the rectangular array is a two-dimensional array of the rectangles of pixeis. It is also possible that the values of A and B are taken into account when determining the values of C and D, such that the total size of the summary image (in terms ofthe number of pixeis in the image) may be set.
[0070], in the event that the values of C and D are set in such a way that ail of the rectangles of pixels cannot fit into a single summary image, several summary images are used to represent a single series of images. However, typically the values of the extraction frequency, A, B, C and D are set such at ail of the rectangles of pixeis may be included in a single summary image.
(0071]. Figure 3 shows a diagram of a summary image in accordance with an aspect of an embodiment of the invention, in the figure, the rectangle of pixels, identified by the symbol “R” and discussed above, is incorporated by the image preparator 7 into the summary image. The example of a summary image shown in figure 3 has a C value of 5, and a D value of 4. Accordingly the summary image has dimensions of 5x4 rectangles of pixels. As the rectangles of pixeis in question are each 2x2 (that is, the values of A and B are both 2), the dimensions of the summary image in pixeis is 10x8, that is (CxA)x(DxB).
[0072]. The summary image provides an overview of all of the images in the initial series of images, from which genera! variations in image colour, brightness, etc. throughout the series of images may be deduced.
(0073]. The description of the extraction of the images from the input file or files above discusses a situation wherein one or more video files is used as an input file, and wherein the to form the extractor 3 is configured to sample the files at a predetermine frequency. The extractor 3 may also be configured to extract a predetermined total number of images. It may be desirable for some applications of the summary image for the extractor 3 to provide a series of images containing a predetermined number of images so that the output summary image arranged by the image preparator 7 may be a rectangular array having a known total number of pixels, in a known arrangement. This may be particularly useful when the summary image is to be analysed using techniques of some aspects of embodiments of the invention, as discussed below.
[0074] Example 3 discusses an aspect of an embodiment of the invention, and includes a discussion of the process for obtaining the summary image from the extraction of the series of images to the output of the summary image.
[00751. Example 3 [00761. For some analysis mechanisms that may be used in aspects of embodiments of the invention, it is useful if the summary image is of a known size and shape. As an example of this, it is useful for some analysis mechanisms if the summary image has 256 pixels per side, for a total of 65536 pixels in the image.
[0077]. In example 3, an input video file configured to have a 128 minute duration when played at an intended frame rate of 30 frames per second is provided, therefore the total number of frames in the video file is 128x60x30=230400 frames. It is necessary to process and compress the images from the series of Images to form the rectangles of pixels, in order that the generated summary image is of a useable size. Therefore, the values of A and B are set to 1, and the image processor 5 is configured such that each rectangle of pixels is generated with a uniform colour that is the mean average colour of the pixels forming the image from the series of images from which the rectangle of pixels Is generated.
[0078]. The entire 128 minute duration of the video file is to be used in the extraction (without excluding any portions of the video from the extraction process), so a total of 230400 pixels would be generated if every frame from the video file was included in the series of images. The desired total number of pixels is 65536. Accordingly, the extractor 3 is set to extract frames with a rate of 3.5, that is, extracting on average one image for every 3.5 frames in the video file. For a 30 frame per second video, this would equate to an extraction frequency of 8.6Hz.
[0079], A fractional extraction frequency is providing by varying the frequency with which a frame is extracted. In the present example, this would be achieved by extracting frames 1,4, 8, 11, 15, etc. That is, the extractor 3 is set to extract a frame, miss two frames, extract a frame, miss three frames, and repeat this pattern to extract the required number of frames so that the average extraction frequency for the process is one image for every 3.5 frames of the original video fiie.
[0080], if desired, the extraction frequency could be reduced further and the values of A and B increased accordingiy, such that the same total number of pixels was generated, in the present example, the values of both A and B could be set to 2, such that each rectangle of pixels contained 4 pixeis, and the value ofthe extraction frequency reduced such that the same total number of pixels was obtained. That is, the extraction frequency could be reduced from 8.GHz to 2.1Hz, such that the extractor 3 extracts one frame for every 14,1 frames in the video fiie.
[0081], Once the series of images has been extracted and processed to form the rectangles of pixels, the summary image is formed by arranging the rectangles of pixeis. Taking the case where A=B=1 and the extraction frequency is 3.5 (as discussed above), the totai number of rectangies of pixeis is the same as the total number of pixels; 65538.
[0082], The image preparator 7 in the present example is configured to arrange the rectangies of pixeis chronologically, that is, in the order that the extracted frames appeared in the video file. The rectangles of pixels are first used to fill the top row of the rectangular array from left to right. Once the top row is filled, the second row is then filled, again from left to right. In this way, the summary image is formed.
[0083]. The extraction frequency need not be set such that precisely the correct number of images are extracted; it is acceptable to vary the frequency when extracting the iast few images for the series of images such that the desired total number of images is reached. For example, the precise value of the extraction rate required to obtain the desired number of frames in this instance is 230400/85536, that is, 3.515825. However, it is not necessary to set the extraction rate to this degree of precision. Instead, it is acceptable if the correct rate is used for the majority of the extraction process, then varied as necessary to complete the extraction with the desired number of frames. An example of this is described below, with reference to figure 4.
[0084], Example 4 [0085]. In example 4, an extraction rate of 1 frame in 5 is used, and the rectangles of pixels are arranged chronologically and fill the summary image in rows from the top left corner of the summary image. The values of C and D for the summary image mean that a total of 20 rectangles of pixeis are required to complete the summary image. As shown in figure 4, 19 of the rectangies of pixels R have been formed from images X extracted in accordance with the extraction rate as described above. In order to complete the summary image, a further image Y is extracted not in accordance with the extraction rate to complete the summary image. The image Y is the final image in the input file, and is used to form the rectangle of pixels S. The rectangle of pixels S is then included in the final position chronologically in the summary image.
[0086]. The generated summary image may be displayed to the user of the image summariser 1 on a display screen 995. An example of a summary image is shown in figure 5. Alternatively, and particularly if the summary image is intended for further analysis, the summary image may simply be retained pending this analysis. Typically the summary image is retained in an electronic format in a memory 994, however the image summariser 1 may be linked to a printer to provide a hard copy of the summary image if desired.
[0087], A further aspect of an embodiment of the invention provides a comparer 11 and method for calculating a similarity value representing the similarity between a first summary image and a second summary image. An example of the comparer 11 is illustrated in figure 6.
[0088]. The comparer 11 is configured to calculate a similarity value representing the similarity between first and second summary images and incorporates an image summariser 1 configured to obtain a summary image, as discussed above. The image summariser 1 that is incorporated in the comparer 11 may be configured using any combination of the features discussed above, such as the ability to extract frames from video flies for use as the images in the series of images, the ability to use predetermined extraction frequencies or to extract predetermined numbers of images, and so on. If the image summariser 1 includes a display 995 for displaying a summary image, this display 995 may also be used for other functions of the comparer 11 such as displaying comparisons between summary images, as discussed below.
[0089]. The comparer 11 is configured to obtain at least one summary image (first summary image) from a series of images using the image summariser 1. A further summary image (second summary image) may also be obtained from another series of images using the image summariser 1. Alternatively, in some aspects of embodiments of the invention, the second summary image is taken from a database of stored summary images, as discussed in greater detail below.
[00901. The terms “first” and “second are used to distinguish between summary images, and should not be understood to imply that one of the summary images is obtained before the other, or that one of the summary images takes precedence over the other. In the following text where it is not important to distinguish between the first and second summary images, the general term “a summary image” is used.
[0091]. When a summary image has been generated using the image summariser 1, the comparer 11 then analyses the summary image using an analyser 13. The analyser 13 takes the input summary image and, through analysis ofthe summary image, produces information regarding the characteristics of the summary image that may subsequently be used by a classifier 15 to classify the summary image. The classifier 15 is discussed in greater detail below.
[0092]. The analyser 13 may use several different systems to analyse the summary image, A common image analysis technique that may be used by the analyser 13 relies upon colour histograms of images. A colour histogram of an image indicates the distribution of colours in the image. In the context of the summary images, the colour histograms indicate how many pixels forming the summary images have RGB values in each of a selection of ranges that span the entirety of the colour space (that is, the set of possible colours that the summary image is capable of containing), thereby providing a numerical representation ofthe colour distribution In the summary image that may subsequently be used by the classifier 15.
[00931. An additional, related analysis technique that may be used in addition to or as an alternative to the use of colour histograms reiies upon the generation of an image histogram (also referred to as a brightness or tonal histogram). Similarly to a colour histogram, an Image histogram indicates how many of the pixels forming a summary image fall into specified ranges., in an image histogram, the ranges span the entire brightness range (from black to white). As with a colour histogram, a numerical representation of the distribution of the pixels in the summary image is generated; the image histogram provides a representation ofthe brightness distribution.
[0094]. A preferred technique for use in the analysis of the summary images, particularly in cases where the initial series of images from which the summary image was generated were frames from one or more video files, is to use an artificial neural network. An artificial neural network is a computational model that may be used to perform analysis and classification. The neural network may therefore act in both the analyser 13 and the classifier 15. The neural network may be implemented using a series of hardware units, but the present invention typically utilises a software implemented neural network.
[0095], The neural network analyses the summary image as a whole, rather than quantifying each pixel individually before performing an analysis on the quantified results (as is the case with the colour histogram and image histogram techniques).
As such, the neural network analysis may complement the histogram-based analysis, and the two different types of analysis may usefully be applied in conjunction.
[0096], The results from the analyser 13 are then passed on to the classifier 15, which uses the analysis to classify the summary image, and consequentially also classifies the series of images that were used to generate the summary image and the original input file or files that provided the series of images. Although the classifier 15 primarily uses the results from the analyser 13 other information, such as the size of the input file or input files, the duration of any video files, and any titles or other information (such as age restriction information) included in the metadata or manually Inputted by the user, may also be taken into consideration when classifying the summary image.
[0097], The neural network analysis is based on the Identification of features in the summary Image. The neural network is intended to capture key characteristics of the image, without giving undue weight lo small differences. This Is explained further in example 5 below, which builds upon example 3.
[0098]. Example 5 [0099], In example 5, a series of images is taken from a single video file of 128 minute duration. The rectangles of pixels are generated using A=B=1, with the extraction frequency set to 3.5. The rectangles of pixels are set to display a uniform colour that is the mean colour of the image from the series of images, and the rectangles of colour are arranged chronologically. The values of C and D used in the summary image are both 256. All of these values are as detailed in example 3.
[00100]. The generated summary image is passed to the analyser 13, which uses a neural network to analyse the summary Image. The neural network analyses the summan^ image and, based on the patterns created by the rectangles of pixels, recognises that the original series of images (frames from the video file) contained frequent and significant colour changes between neighbouring rectangles of pixels. The proximity of the rectangles of pixels to one another in the summary image indicates, in this example, that the extracted frames were close to one another chronologically in the original video file. The neural network identifies this feature as being indicative of a video file having lots of scene changes; the colour changes being caused by switches between scenes. This is as would be expected for an action film due to the film containing rapid cuts, changes of camera focus and so on. By contrast, in a different genre of film, such as a biography containing lengthier scenes (that is, fewer changes of scene per unit time), the number of significant colour changes between neighbouring rectangles of pixels would be reduced. Other indicators of video file content type, such as a large number of darker images indicating a horror film or brighter images indicating a film directed at children, could also be detected by the neural network.
[00101]. The classifier 15 uses the analysis from the analyser 13 to classify the summary Image. Additional classification information may also be provided depending upon the information provided by the summary image. For example, and as discussed above, a large amount of greener shades in the summary image may indicate that the input files show natural images, such as woodland shots. The summary image classification is then passed on to the juxtaposer 17. The classification information classifies the summary image in a variety of ways, using discrete properties (that is, if the summary image does or does not have a given property) and also ranking the degree to which certain attributes are found in the summary image using a continuous scale. If it is not possible to derive a certain classification value for a given summary image, for example, because the information required to make this determination is not present in the summary image, then this classification value may be omitted from the classification information.
[00102], The results from the classification may be outputted by the classifier 15 in the form of a vector having a plurality of real value elements Each of the real value elements relates to a classification value of the summary image The number of classification values that the classifier 15 may output vary with the precision of the system; in the present example 1000 classification values are outputted by the classifier 15 for each summary image (assuming that ail of the necessary information is available, as discussed above).
[00103]. The juxtaposer 17 takes the classification information, in the form of the vector of real value elements from the classifier 15 and compares this information to classification information for a further (second) summary image. The second summary image may be generated by the comparer 11 using the image summariser 1 as discussed above, either before or after the first summary image is generated. Alternatively, the second summary image may be taken from a database of summary images. This database of summary images may be stored locally in a memory unit 18 connected to or contained within the comparer 11, or may be stored remotely and accessed via a network such as the internet. The memory unit 18 may be incorporated within the memory 994, or vice versa, if, for a given summary image, a particular classification value is absent from the classification information, then this comparison is omitted.
[00104], Although the juxtaposer 17 may be used to perform only a single comparison between two summary images, typically the juxtaposer 17 will be used to compare a plurality of summary image classifications.
[00105]. When the ciassifier 15 outputs a vector, the juxtaposer 17 performs the similarity calculation by calculating a cosine similarity between the two vectors. The cosine similarity is a measure of similarity between two vectors, and is calculated using equation 1, wherein A and B are the vectors obtained from the classifier 15, each having n components. The symbols Λ and S> indicate the rth components of A and B respectively.
[00106]. Equation 1
4-f? ΣΓ,νΑό’ϊ
Figure GB2552969A_D0014
[00107], The output from the juxtaposer 17 is a raw similarity value, such as the cosine similarity value, which is a value between 0 and 1; this raw similarity value is passed to the calculator 19.
[00108]. The calculator 19 produces a processed similarity value using the comparison results. The processed similarity value is typically, though not necessarily, a single numerical value indicating a degree of similarity between two summary images, A numerical scale using integers from 0 to 100 is considered to be sufficiently precise to measure similarity for most applications, and this scale is also convenient for users. More or less precise scales are also used if more appropriate. For example, a more precise scale may be used to sort a large number of similar summary images, and a less precise scale may be used if a user desires to have the summary images broadly categorised and is less concerned with the granularity of the similarity measurement. A non-numerical scale (for example, using letters from A to F or a colour spectrum) may also be used.
[00109]. If a numerical scale from 0 to 100 is used, higher numbers indicate a greater degree of similarity between the images. A processed similarity value of 100 would result if the same summary image were input into the juxtaposer 17 twice, that Is, if a summary image were to be compared with itself.
[00110]. Once the calculator 19 has obtained the processed similarity value, this processed similarity vaiue is then output by an outputter 21. The outputter 21 may be connected to a display 995 of the Image summariser 1, such that the processed similarity value is shown to a user, optionally in conjunction with a display of one or both summary images and/or the classification information. Alternatively or additionally, the outputter 21 may be connected to a local or remote database (which may be stored in the memory 994), such that the processed similarity value is returned to the database.
[00111]. Figure 7 provides an overview of the process for obtaining a similarity value for first and second input video files (generating first and second summary images respectively) according to an aspect of an embodiment of the present Invention.
[00112], The second summary image, if not obtained from a series of images in the same way as the first summary image, is obtained from a database of summary images. The database may be stored remotely, for example on a remote server, and accessed by the comparer 11 using network connection via a network l/F 997.
[00113], Connecting the comparer 11 to the internet may be particularly useful if the comparer 11 is being used to search an online database. An example of this would be a user who has identified an input file (such as a video fiie of a film) and would like to identify further files that are similar in content to the input file. The comparer 11 could therefore be configured in conjunction with an online database to search the online database until an input fiie from the database that generates a similarity value above a user set or automatic threshold is identified, at which point the search may be stopped. Alternatively, the entire database could be searched and ail input files that generate a similarity value above a threshold could then be presented for the user, so that the user could select a file.
[00114], In the event that the comparer 11 is used to search a database, it is useful if the generated summary images are in the same format as the summary images and ciassification information aiready present in the database. The comparer 11 may therefore be set to generate and use a summary image that is the same size (that Is, a summary image containing the same number of pixels and having the same dimensions) as the existing summary images in the database. Some or all of the values of A, B, C, D and the total number of extracted frames for the first summary image may therefore be set to be the same as the equivalent values for the database summary images.
[00115]. Figure 8 is a block diagram of a computing device, such as a personal computer, which embodies the present invention, and which may be used to implement an embodiment of the method for producing a summary image from a series of images, and may also be used to implement an embodiment of the method for calculating a similarity value representing the similarity between a first summary image and a second summary image.. The computing device comprises a processor 993, and memory 994. Optionally, the computing device also includes a network interface 997 for communication with other computing devices, for example with other computing devices of invention embodiments, or for computing with remote databases.
[00116]. For example, an aspect of an embodiment of the invention may be composed of a network of such computing devices, such that components of the apparatus configured to obtain a summary image from a series of images or components of the apparatus configured to calculate a similarity value representing the similarity between a first summary image and a second summary image are split across a plurality of computing devices. In particular, where the analyser 13 and classifier 15 utilise a neural network, this neural network may be implemented by a plurality of interconnected computing devices. Optionally, the computing device also includes one or more input mechanisms such as keyboard and mouse or touchscreen interface 996, and a display unit such as one or more monitors 995. The components are connectable to one another via a bus 992.
[00117], The memory 994 may include a computer readable medium, which term may refer to a single medium or multiple media (e.g., a centralized or distributed database and/or associated caches and servers) configured to carry computer-executable instructions or have data structures stored thereon. The memory 994 may be incorporated within the memory unit 18 (as shown in Figure 6), or vice versa. Computer-executable instructions may inciude, for example, instructions and data accessible by and causing a general purpose computer, special purpose computer, or special purpose processing device (e.g,, one or more processors) to perform one or more functions or operations. Thus, the term “computer-readable storage medium may also inciude any medium that is capable of storing, encoding or carrying a set of instructions for execution by the machine and that cause the machine to perform any one or more of the methods of the present disclosure. The term “computer-readable storage medium may accordingly be taken to inciude, but not be limited to, solid-state memories, optical media and magnetic media. By way of example, and not limitation, such computer-readable media may include non-transitory computer-readable storage media, including Random Access Memory (RAM), Read-Only Memory (ROM), Electricaliy Erasable Programmable Read-Only Memory (EEPROM), Compact Disc Read-Only Memory (CD-ROM) or other optical disk storage, magnetic disk storage or other magnetic storage devices, flash memory devices (e.g., solid state memory devices).
[00118], The processor 993 is configured to control the computing device and execute processing operations, for example executing code stored in the memory to implement the various different functions of the extractor 3, image processor 5, image preparator 7, analyser 13, classifier 15, juxtaposer 17, calculator 19 and outputter 21 described here and in the claims. The memory 994 stores data being read and written by the processor 993, As referred to herein, a processor may include one or more generalpurpose processing devices such as a microprocessor, central processing unit, or the like. The processor may include a complex instruction set computing (CISC) microprocessor, reduced instruction set computing (RISC) microprocessor, very long instruction word (VLIW) microprocessor, or a processor implementing other instruction sets or processors implementing a combination of instruction sets. The processor may also include one or more special-purpose processing devices such as an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), a digital signal processor (DSP), network processor, or the like, in one or more embodiments, a processor is configured to execute instructions for performing the operations and steps discussed herein.
[00119], The display unit 997 may display a representation of data stored by the computing device and may also display a cursor and dialog boxes and screens enabling interaction between a user and the programs and data stored on the computing device. The display unit may also comprise a touchscreen interface. The Input mechanisms 996 may enable a user to input data and instructions to the computing device.
[00120]. The network interface (network i/F) 997 may be connected to a network, such as the Internet, and is connectable to other such computing devices via the network. The network l/F 997 may contro! data input/output from/to other apparatus via the network.
[00121]. Other peripheral devices such as microphone, speakers, printer, power supply unit, fan, case, scanner, trackerbali etc. may be included in the computing device. [00122]. The extractor 3 of Figures 1 and 6 may be a processor 993 (or plurality thereof) executing processing instructions (a program) stored on a memory 994 and exchanging data via a network l/F 997 or bus 992. In particular, the processor 993 may execute processing instructions to receive, via the network l/F 997 or bus 992, input image fiies from a network, such as the internet and extract the series of images from the one or more image files, as discussed above. Furthermore, the processor
993 may execute processing Instructions to store the series of Images on a connected storage unit and/or to transmit, via the network l/F 997 or bus 992, the series of images to an image processor 5 for processing if the image processor 5 is located remotely from the extractor 3.
[00123]. The Image processor 5 of Figures 1, 3 and 6 may be a processor 993 (or plurality thereof) executing processing instructions (a program) stored on a memory
994 and exchanging data via a network l/F 997 or bus 992. In particular, the processor
993 may execute processing instructions to receive, via the network l/F 997 or bus 992, a series of images from the extractor 3 and process the images from the series of images to obtain the rectangles of pixels. Furthermore, the processor 993 may execute processing instructions to store the rectangles of pixels on a connected storage unit and/or to transmit, via the network l/F 997 or bus 992, the rectangles of pixels to the image preparator 7 for use in the population of a summary image.
[00124], The image preparator 7 of Figures 1, 3 and 6 may be a processor 993 (or plurality thereof) executing processing instructions (a program) stored on a memory
994 and exchanging data via a network l/F 997 or bus 992. In particular, the processor 993 may execute processing instructions to receive, via the network l/F 997 or bus 992, the rectangles of pixels from the image processor 5 and arrange the rectangles of pixels to form the summary image. Furthermore, the processor 993 may execute processing instructions to store the summary image on a connected storage unit (such as the memory 994) and/or to transmit, via the network l/F 997 or bus 992, the summary image to a display 995.
[00125], The analyser 13 of Figure 8 may be a processor 993 (or plurality thereof) executing processing instructions (a program) stored on a memory 994 and exchanging data via a network l/F 997 or bus 992. In particular, the processor 993 may execute processing instructions to receive, via the network l/F 997 or bus 992, one or more summary images from the image preparator 7 or the memory 994 and analyse the summary image or summary images. Furthermore, the processor 993 may execute processing instructions to store the analysis results on a connected storage unit and/or to transmit, via the network l/F 997 or bus 992, the results of the analysis to the classifier 15. The analyser 13 and the classifier 15 may both utilise a neural network.
[00126], The classifier 15 of Figure 6 may be a processor 993 (or plurality thereof) executing processing instructions (a program) stored on a memory 994 and exchanging data via a network l/F 997 or bus 992. In particular, the processor 993 may execute processing instructions to receive, via the network l/F 997 or bus 992, the results of the analysis from the analyser 13 and classify the summary image or summary images on the basis of the classification. Furthermore, the processor 993 may execute processing instructions to store the classification results on a connected storage unit and/or to transmit, via the network l/F 997 or bus 992, the classification results to the juxtaposer 17.
[00127], The juxtaposer 17 of Figure 6 may be a processor 993 (or plurality thereof) executing processing instructions (a program) stored on a memory 994 and exchanging data via a network l/F 997 or bus 992. In particular, the processor 993 may execute processing instructions to receive, via the network !/F 997 or bus 992, the classification results from the classifier 15 or from the memory unit 18 and compare the classification results for two summary images, outputting a raw similarity value. Furthermore, the processor 993 may execute processing instructions to store the similarity value on a connected storage unit and/or to transmit, via the network l/F 997 or bus 992, the raw similarity value to the calculator 19.
[00128]. The calculator 19 of Figure 6 may be a processor 993 (or plurality thereof) executing processing instructions (a program) stored on a memory 994 and exchanging data via a network l/F S97 or bus 992. in particular, the processor 993 may execute processing instructions to receive, via the network l/F 997 or bus 992, the raw similarity value from the juxtaposer 17 and output the processed similarity value. Furthermore, the processor 993 may execute processing instructions to store the similarity value on a connected storage unit and/or to transmit, via the network l/F 997 or bus 992, the processed similarity value to the outputter 21. If is possible to implement the invention without the calculator 19, that is, to pass the raw similarity value from the juxtaposer 17 to the outputter 21, However, this raw similarity value may be more difficult for a user to understand.
[00129]. The outputter 21 of Figure 6 may be a processor 993 (or plurality thereof) executing processing instructions (a program) stored on a memory 994 and exchanging data via a network l/F 997 or bus 992. in particular, the processor 993 may execute processing instructions to receive, via the network l/F 997 or bus 992, the processed similarity value from the calculator 19 (or the raw similarity value from the juxtaposer 17) and output the similarity value (raw or processed) to a connected storage unit and/or to transmit, via the network l/F 997 or bus 992, the raw similarity value to a database (local or remote) or to display the similarity value, for example, using the display 995 of the image summariser 1.
[00130]. Methods embodying the present invention may be carried out on a computing device such as that illustrated in Figure 8. Such a computing device need not have every component illustrated in Figure 8, and may be composed of a subset of those components. A method embodying the present invention may be carried out by a single computing device in communication with one or more data storage servers via a network, as discussed above.
[00131], As the present invention is configured to analyse and classify the summaryimage without human intervention, the results of the classification are less subjective than those produced using other methods.
[00132], The summary images are generated based on the input image files and, through the use of an online database, may be used to classify the input image files and locate similar existing files. Accordingly, it is possible for a user to input files that the user has generated, and to have these files classified in an objective way. Also, the similarity between two files may be calculated without any reference to external sources (such as review information), which makes it easier for a user to search their own library of files in a localised manner, and also to classify their own files.
[00133]. For the avoidance of doubt, the scope of the invention is defined by the claims.

Claims (26)

1. A method for producing a summary image from a series of images, the method comprising:
extracting a series of images from one or more input files;
processing images from the series of images to obtain rectangles of pixels of dimensions AxB, each rectangle of pixels representing one image from the series of images, wherein: A and B are numbers of pixels, and A and B are positive integers; and arranging the rectangles of pixels to form a summary image representing the series of images.
2. The method of claim 1, wherein each rectangle of pixels is formed by compressing an image from the series of images.
3. The method of any of claims 1 and 2, wherein each rectangle of pixels displays a uniform colour that is the average colour of the one image from which the rectangle of pixels is formed, and/or wherein each rectangle of pixels displays a uniform brightness that is the average brightness of the one image from which the rectangle of pixels is formed.
4. The method of any of claims 1 to 3, wherein the summary image is a rectangular array of the rectangles of pixels, the dimensions of the rectangular array being CxD, wherein: C and D are numbers of rectangles of pixels, and C and D are positive integers larger than 1.
5. The method of any of claims 1 to 4, wherein each of the one or more input files is a video file, and wherein each image from the series of images extracted from the one or more input files is a single frame of the video file.
6. The method of claim 5, wherein single frames are extracted from the one or more video files at a predetermined frequency, or wherein the single frames are extracted from the one or more video files such that a predetermined total number of single frames are extracted.
7. The method of claim 6 wherein, when extracting the single frames from the one or more video files, a portion of the video file is excluded from the extraction process before the single frames are extracted.
8 The method of any of claims 5 to 7, wherein the rectangles of pixels are arranged in the summary image in accordance with the chronological order of the single frames in the one or more video files.
9. The method of claim 8, wherein the rectangles of pixels are arranged in horizontal rows in the summary image, wherein the rows are filled from left to right, and wherein the horizontal rows are arranged from top to bottom with respect to the orientation of the summary image.
10. The method of any of claims 1 to 9, further comprising displaying the generated summary image on a display screen.
11. A method for calculating a similarity value representing the similarity between a first summary image and a second summary image, the method comprising:
obtaining a first summary image using the method according to claim 1: analysing the first summary image;
classifying the first summary image on the basis of the analysis: performing a comparison between the classifications for the first summary image and equivalent classifications for the second summary image; deriving a similarity value on the basis of the comparison: and outputting the similarity value.
12. The method of claim 11, wherein the second summary image is selected from a plurality of stored summary images, the second summary image being selected on the basis of the previously obtained classifications for the plurality of stored summary images.
13. The method of claim 11, wherein the second summary image is obtained, analysed and classified in the same way as the first summary image, prior to performing the comparison between the classifications.
14. The method of any of claims 11 to 13 wherein, in the obtaining step, the rectangles of pixels are obtained in such a way that the values of A and B are the same for aii of the rectangles of pixels In the first summary image and the rectangles of pixels in the second summary image.
16. The method of any of claims 11 to 14 wherein:
each summary image is a rectangular array of the rectangles of pixels, the dimensions of the rectangular array being CxD, wherein: C and D are numbers of rectangles of pixels, and C and D are positive integers larger than 1: and wherein the values of C and D are the same for the first summary image and the second summary image.
16. The method of any of claims 11 to 15, wherein a neural network is used to perform the analysis and classification.
17. The method of any of claims 11 io 16, wherein:
each of the one or more input files is a video file, each image from the series of images extracted from the one or more input files is a single frame of the video file; and the single frames are extracted from the one or more video files at a predetermined frequency, or wherein the single frames are extracted from the one or more video files such that a predetermined total number of single frames are extracted, the predetermined total number of single frames being the same for the first summary image and the second summary image.
18. The method of claim 17, further comprising using the similarity value as the basis for a search of a database of further video files.
19. The method of claim 18, wherein the database is an online database.
20. A computer program which, when executed by a computing apparatus, causes the computing apparatus to execute the method of any of claims 1 to 19.
21. An image summariser configured to obtain a summary image from a series of images, the image summariser comprising:
an extractor configured to extract a series of images from one or more input files;
an image processor configured to process the images from the series of images to obtain rectangles of pixels of dimensions AxB, each rectangle of pixels representing one image from the series of images, wherein: A and B are numbers of pixels, and A and B are positive integers; and an image preparator configured to arrange the rectangles of pixels to form a summary image representing the series of images.
22. A comparer configured to calculate a similarity value representing the similarity between a first summary image and a second summary image, the comparer comprising:
an image summariser according to claim 21 configured to obtain the first summary image from a series of images;
an analyser configured to analyse the first summary image, a classifier configured to classify the first summary image on the basis of the analysis produced by the analyser;
a juxtaposer configured to perform a comparison between the classifications for the first summary image generated by the classifier and equivalent classifications for the second summary image;
a calculator configured to derive a similarity value on the basis of the results of the comparison performed by the juxtaposer; and an outputter configured to output the similarity value derived by the calculator,
23. A method as hereinbefore described and/or with reference to any of the accompanying figures.
24. An image summariser as hereinbefore described and/or with reference to any of the accompanying figures.
25. A comparer as hereinbefore described and/or with reference to any of the accompanying figures.
26. A computer program as hereinbefore described and/or with reference to any of the accompanying figures.
Intellectual
Property
Office
GB1613987.5
1-22
GB1613987.5A 2016-08-16 2016-08-16 Image processing system Active GB2552969B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
GB1613987.5A GB2552969B (en) 2016-08-16 2016-08-16 Image processing system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB1613987.5A GB2552969B (en) 2016-08-16 2016-08-16 Image processing system

Publications (3)

Publication Number Publication Date
GB201613987D0 GB201613987D0 (en) 2016-09-28
GB2552969A true GB2552969A (en) 2018-02-21
GB2552969B GB2552969B (en) 2022-04-13

Family

ID=56985806

Family Applications (1)

Application Number Title Priority Date Filing Date
GB1613987.5A Active GB2552969B (en) 2016-08-16 2016-08-16 Image processing system

Country Status (1)

Country Link
GB (1) GB2552969B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110148077B (en) * 2018-02-12 2023-08-29 江苏洪旭德生科技有限公司 Method for accelerating ELBP-IP core and MR intelligent glasses
CN112958313B (en) * 2021-02-04 2022-03-04 深圳市邦建科技有限公司 Intelligent area compensation paint spraying parameter control method using distance matrix weighting characteristics

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5805733A (en) * 1994-12-12 1998-09-08 Apple Computer, Inc. Method and system for detecting scenes and summarizing video sequences
US6123362A (en) * 1998-10-26 2000-09-26 Eastman Kodak Company System and method of constructing a photo collage
US20070089152A1 (en) * 2005-10-14 2007-04-19 Microsoft Corporation Photo and video collage effects
CN202771488U (en) * 2012-07-12 2013-03-06 长安大学 Image automatic mosaic device based on halcon algorithm
US20140204125A1 (en) * 2013-01-18 2014-07-24 UDC Software LLC Systems and methods for creating photo collages
US20140211065A1 (en) * 2013-01-30 2014-07-31 Samsung Electronics Co., Ltd. Method and system for creating a context based camera collage
CN104537607A (en) * 2014-12-23 2015-04-22 天津大学 Automatic efficient photo collage method based on binary tree and layer sequencing
US20150254807A1 (en) * 2014-03-07 2015-09-10 Fractograf, LLC Systems and Methods for Generating an Interactive Mosaic Comprising User Visual Content Data on a Portable Terminal and an Image Sharing Platform

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5805733A (en) * 1994-12-12 1998-09-08 Apple Computer, Inc. Method and system for detecting scenes and summarizing video sequences
US6123362A (en) * 1998-10-26 2000-09-26 Eastman Kodak Company System and method of constructing a photo collage
US20070089152A1 (en) * 2005-10-14 2007-04-19 Microsoft Corporation Photo and video collage effects
CN202771488U (en) * 2012-07-12 2013-03-06 长安大学 Image automatic mosaic device based on halcon algorithm
US20140204125A1 (en) * 2013-01-18 2014-07-24 UDC Software LLC Systems and methods for creating photo collages
US20140211065A1 (en) * 2013-01-30 2014-07-31 Samsung Electronics Co., Ltd. Method and system for creating a context based camera collage
US20150254807A1 (en) * 2014-03-07 2015-09-10 Fractograf, LLC Systems and Methods for Generating an Interactive Mosaic Comprising User Visual Content Data on a Portable Terminal and an Image Sharing Platform
CN104537607A (en) * 2014-12-23 2015-04-22 天津大学 Automatic efficient photo collage method based on binary tree and layer sequencing

Also Published As

Publication number Publication date
GB2552969B (en) 2022-04-13
GB201613987D0 (en) 2016-09-28

Similar Documents

Publication Publication Date Title
US10356027B2 (en) Location resolution of social media posts
Gygli et al. The interestingness of images
US11405344B2 (en) Social media influence of geographic locations
US9269016B2 (en) Content extracting device, content extracting method and program
Amerini et al. Tracing images back to their social network of origin: A CNN-based approach
US9116924B2 (en) System and method for image selection using multivariate time series analysis
US8942469B2 (en) Method for classification of videos
CN105069424B (en) Quick face recognition system and method
US20150242689A1 (en) System and method for determining graph relationships using images
US20160048849A1 (en) Method and system for clustering and classifying online visual information
Lovato et al. Faved! biometrics: Tell me which image you like and I'll tell you who you are
Wang et al. Synthesized computational aesthetic evaluation of photos
US20140029854A1 (en) Metadata supersets for matching images
CN113254696B (en) Cover image acquisition method and device
GB2552969A (en) Image processing system
JP2016076115A (en) Information processing device, information processing method and program
Yin et al. Crowdsourced learning to photograph via mobile devices
Le et al. Image aesthetic assessment based on image classification and region segmentation
US11915480B2 (en) Image processing apparatus and image processing method
EP3148173A1 (en) Method for color grading of a digital visual content, and corresponding electronic device, computer readable program product and computer readable storage medium
Thomee et al. An evaluation of content-based duplicate image detection methods for web search
JP2020060923A (en) Image search device, image search system, image search method and program
Le Moan et al. Towards exploiting change blindness for image processing
US20200092484A1 (en) Image display control apparatus, image display control method, program, and recording medium
Reka et al. A review on big data analytics