WO2013180590A1 - Procédé et système pour détecter un contenu 3d compatible avec une trame - Google Patents

Procédé et système pour détecter un contenu 3d compatible avec une trame Download PDF

Info

Publication number
WO2013180590A1
WO2013180590A1 PCT/RO2012/000010 RO2012000010W WO2013180590A1 WO 2013180590 A1 WO2013180590 A1 WO 2013180590A1 RO 2012000010 W RO2012000010 W RO 2012000010W WO 2013180590 A1 WO2013180590 A1 WO 2013180590A1
Authority
WO
WIPO (PCT)
Prior art keywords
content
processing
frame compatible
compatible
frames
Prior art date
Application number
PCT/RO2012/000010
Other languages
English (en)
Inventor
Cristian-Gavril OLAR
Andrei-Claudiu COSMA
Demis DIACONESCU
Cormac Brick
Mihai MICEA
Valentin Muresan
Original Assignee
Sc Movidius Srl
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sc Movidius Srl filed Critical Sc Movidius Srl
Priority to PCT/RO2012/000010 priority Critical patent/WO2013180590A1/fr
Publication of WO2013180590A1 publication Critical patent/WO2013180590A1/fr

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • H04N13/139Format conversion, e.g. of frame-rate or size
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N2213/00Details of stereoscopic systems
    • H04N2213/007Aspects relating to detection of stereoscopic image format, e.g. for adaptation to the display format

Definitions

  • This invention refers to a system and method for analyzing video frames for the improvement of stereographic image and video processing used mostly in television and 3D devices.
  • US20120038744 uses a combination of methods consisting in analyzing motion vectors over multiple frames and histogram matching to get stream information results.
  • Patent WO2011098936A2 is a system and method employing depth maps that are computed based on disparity images obtained from received frames in order to detect 3D frame compatible content.
  • KR20090025934A claims frame compatible 3D content may be determined by comparison with a reference 3D image.
  • US2010053306A1 is another patent application employing motion vectors on multiple frames in order to detect 3D frame compatible content.
  • US2011267360A1 is a patent application request which proposes a method to detect 3D frame compatible content but does not describe the inner workings of the method. Instead only "a collection of 3D algorithm detection" block is generically used.
  • WO2011071467A1 describes a system and method that uses image feature detection in order to determine if content is 3D frame compatible.
  • WO2011071473A1 claims a system and method which creates a map of differences between the image areas under comparison for the 3D frame compatible detection.
  • the method relies on the observation that the obtained maps will have regions of differences which for 3D frame compatible content look like thin edges while for 2D content, "edges" would be a lot thicker if present. There may just be spots on the difference map instead of lines.
  • An analysis of the thickness of "edges” determines the nature of the content from the 3D point of view. It is important to note that although the term "edge” is used, this patent clearly defines it in a different manner than the canonical image edge term, so it does not present any similarity to the method described in this invention even if terms of similar form are used.
  • WO2011162737A1 is another system and method using motion vectors in multiple frames in order to test and determine 3D frame compatible content.
  • the general drawback of the state of art briefly presented above is that all approaches analyze the content classification problem rather theoretically in a general fashion, without trying to eliminate specific cases in making detection decisions present in real streams.
  • the method and system of this invention was designed with the aim of making specific decisions based on learning from real streams and hence manage to be successful including on difficult to detect cases.
  • the method described here uses adaptive thresholds and as such it also presents the opportunity for future improvement simply by enlarging the streams test base.
  • the problem solved by the present invention is that of generating capabilities for display devices to automatically detect 3D content in a video stream and display that content in 3D stereographic viewable format.
  • the method and system for detecting frame compatible 3D content consists of a frame receiving circuit.
  • This module is needed in order to minimize the need of processing power in the system. It achieves this by being employed in conjunction with a block of frame memory which is required to store received frames and in conjunction with data memory block.
  • the frame receiving circuit features registers required to configure it in such a way to stream frame data directly into memory without the need of the CPU (central processing unit) intervention in the process.
  • the frame memory is required to store frames and the processing data memory will store "locked" frame information sampled for processing as well as the program associated to the method of the invention.
  • a CPU will execute the steps required to achieve the method of this invention.
  • the method associated to the system uses 3 main steps.
  • a first step is to generate an edge image from the locked frame information in order to eliminate errors due to different chromatic variations from usual real life images.
  • the invention then executes a second step that calculates projection series based on the values from the edge image discarding misleading frame data found on the edge of frames and with different thresholds to determine a similarity factor.
  • the thresholds are chosen adaptively by running the method on several video streams.
  • the method employs in the final step an early exit technique limiting use of processing power and electrical power statistically using a minimum of comparisons required.
  • the method then generates a complete result regarding the nature of the processed video stream: 3D Side-by- Side(SBS), 3D Top-and-bottom(TAB) or 2D which can further enable the capability of displaying viewable images.
  • the present invention has certain advantages which we highlight below.
  • the invention offers the possibility of improved display of video frames through generating the capability of displaying viewable images.
  • This invention uses edge filtering and projection normalization steps that have the capability of filtering false decisions.
  • This invention uses a sub-sampling in order to accelerate processing time and minimize memory consumption.
  • This invention discards possibly misleading lines and columns from around the frame border focusing the analysis only on the significant part of real life content.
  • This invention uses adaptively tuned thresholds allowing for easy thresholds upgrade when applied to new video streams. 6) This invention employs the early exit (cascading) technique that limits the processing when a good level of confidence is achieved early.
  • This invention can be applied generically to any kind of video content that is 3D SBS, 3D TAB or 2D.
  • - Fig 1 is a functional plan of the association of the 3D frame compatible method to the hardware system which ensures the operational platform.
  • - Fig 2 is a block diagram of the frame receiving circuit used to stream video frames into the frame memory system.
  • - Fig 3 is an explanatory image of the transmission type of the 3D frames treated by the invention.
  • - Fig 4 is an explanatory image of the transmission type of 3D frames not treated by the invention.
  • - Fig 5 is an explanatory image showing the comparison between the received image and the subsampled one.
  • - Fig 6 is a functional schematics of the way the edge image is obtained from the subsampled image and a visual explanation of the result.
  • - Fig 7 is an explanatory image of the way the projection series overlaps processed frames.
  • - Fig 8 is a graphical representation showing absolute differences between corresponding projection series.
  • - Fig 9 is a graphical representation of the correlation sub-step executed to minimize differences.
  • FIG. 10 is a program flowchart of the method that depicts the sub-steps of comparing projection series.
  • Fig. 11 is a program flowchart of the method that depicts the sub-steps of determining if content is 3D SBS, 3D TAB or 2D.
  • the video signal comes from various sources such as: 3D TV broadcaster, BluRay player, multimedia device box and so on.
  • the method and system enables a technical effect of detection capability which previous systems do not have.
  • This capability enables devices to display viewable images.
  • viewable images We define viewable images as being images perceived with meaning by the human mind. 2D images are always viewable whereas 3D images require display systems to apply different stereographic techniques in order to create viewable images.
  • Display systems need to be endowed with a capability of detecting 3D images in order to detect if they need to apply stereographic techniques and how to apply them.
  • This is the technical capability enabled by this present invention.
  • This capability is determined by this invention's embedded system through capturing video frames by using specific circuitry and a method implemented in software running on the same embedded system.
  • Figure 1 explains the functional blocks of our method and the system it is embedded in.
  • the hardware platform used requires a frame receiving circuit (1).
  • This circuit handles streaming of video frames into the system without infringing on the processing power of any of the other elements of the system.
  • the frame memory (2) block required by our platform will offer adequate support for storing the frames required at processing time.
  • the processing data memory (3) block is used to keep the edge image and subsampled image. In practical implementation blocks (2) and (3) do not have to be separate circuits if their operations are carried appropriately.
  • Block (4) the CPU (Central Processing Unit) is a microprocessor able to run the method of determining 3D compatible frames.
  • Block (5), the edge processing block describes the method step of getting the edge image from the input frames.
  • the projections determination block (6) is the method step dealing with projections extracted from the edge image previously explained. The method goes through a final step (7) of comparing similarity factors obtained for both 3D side by side (SBS) and 3D top and bottom (TAB) content and presents the 3D SBS, 3D TAB or 2D content classification decision.
  • SBS 3D side by side
  • TAB 3D top
  • Video sources provide content in a complete colour space format, usually RGB or YUV. Any colour component may be used for processing.
  • the method is exemplified on the luminance component of images and use YUYV4:2:2 colour space.
  • the frames are streamed into the frame memory of the system using the frame receiving circuit detailed in Figure 2.
  • the frame receiving circuit can accept horizontal and vertical synchronization signals defined as HSYNC and VSYNC respectively and has a timing generator required to process input data for pixels. These signals are then linked through an input interface to a specialized Direct memory Access (DMA) controller which can accept various memory locations as output as specified by the Configuration registers block.
  • DMA Direct memory Access
  • Using the configuration registers a double buffering technique is implemented such that the frame only just streamed into the system can be left untouched for a duration of time needed for the system to lock on its relevant data to be processed later. Locking on the data of one frame is implemented by simply copying the relevant data part into a safe memory location.
  • the relevant data part will be the complete frame or the subsampled version of the frame.
  • Frame compatible 3D content may be presented as either SBS, TAB or Frame packed content (FP). These are represented for a better understanding in figures 3 and 4.
  • FP content is usually very specific and may be detected through the use of intrinsic methods specific for FP enabled devices, the SBS and TAB contents are not. These may need to be detected based on content analysis in some cases.
  • the method and system described in this invention provide a way of detecting SBS and TAB content based on the contents of a 3D compatible frame.
  • a large set of test video streams was used in order to fine tune the detection parameters and achieve a high degree of confidence in automatic 3D detection.
  • the method provides a way of detecting if a clip is either 2D, 3D SBS or 3D TAB format. It does so in multiple steps, each of the steps filtering out various causes of false positives in trying to detect 3D detection.
  • the end result of the analysis is run on multiple frames over a period of time and an average result gives the final decision. Experiments proved 3 seconds to be sufficient to get robust results.
  • Step 1 Sub-sampling
  • the method works without using sub-sampled frames, but there are advantages in using sub-sampling.
  • Sub-sampling is useful for two reasons: the need to lock to a frame during the algorithm processing and the need to minimize memory footprint. Another benefit of using sub-sampling is increased speed while applying edge filters is explained later on.
  • the original area of an image is A and the area of the subsampled image is A'.
  • the relation between the two will then be: — : .
  • the sub-sampling algorithm needs to be fast, to run in less than the time required for processing one frame.
  • the resulting image may then be processed independent of the stream frame rate.
  • Step 2 Generating Edge Image
  • the invention will work with an edge image.
  • the edge image is created using an edge operator.
  • the invention uses the Sobel operator for both vertical edges and horizontal edges.
  • the edge value used in our example is the Sobel magnitude.
  • Figure 6 explains the steps used to obtain age in our example.
  • the vertical Sobel operator used is and the
  • Step 3 Determining SBS and TAB values
  • SBS or 3D TAB For example, if a clip is shot with two trees in front of the camera there would be a 3D SBS similar image. The problem resides in putting enough cues in place to filter false positives.
  • the method works on projections made on the image for both TAB and SBS and figures out if the image has enough 3D SBS or 3D TAB similarity in it so that it detects it as 3D SBS, 3D TAB or 2D content.
  • the values determined for the cues set in place were found using adaptive tuning. The method was run over multiple streams and the best values for parameters were determined.
  • the "Y projection” is defined as the series computed by using edge values across the Y axis.
  • One element of an Y projection is defined as the sum of all edge values found on the column corresponding to that element.
  • the "X projection” is defined as the series computed by using edge values on the X axis.
  • One element of an X projection is defined as the sum of all edge values found on the line corresponding to that element.
  • Step 3 consists of several sub steps detailed in the following paragraphs. Further, the main projections are defined as well as the secondary projections.
  • the main projection is the most representative projection for the type of 3D content (SBS/TAB) under analysis and the secondary projection is the least representative. It was determined that many real life streams contain misleading information on some of the beginning or ending lines or columns.
  • the DISCARDED LIMIT parameter as the number of lines or columns depending on the case that are not used when computing the normalized projections value. Instead, these lines or columns are simply copied from the closest line or column which is considered valid.
  • SBS X Projections are main projections for 3D SBS testing and SBS Y Projections are secondary projections. SBS Projections are computed by duplicating the first non misleading column over the first DISCARDED_LIMIT columns as indicated by the markings A and C in Figure 7. The last misleading columns in SBS projections are replaced by their closest non misleading columns as indicated by markings B and D in Figure 7.
  • TAB Y Projections are main projections for 3D TAB testing and TAB X Projections are secondary projections.
  • TAB Projections are computed by duplicating the first non misleading line over the first DISCARDED LIMIT lines as indicated by the markings E and G in Figure 7. The last misleading lines in TAB projections are replaced by their closest non misleading lines as indicated by markings F and H in Figure 7.
  • the series are normalized against the maximum value in the series. This further eliminates wrong decisions created by images rich in content which might yield different edge values even though the same edges are detected. By using series normalization this source of wrong decisions is avoided.
  • Step 3.2 Main projections comparisons
  • FIG 7. A comparison example of two main projections may be seen in Fig 7. It is noticed that sometimes there are local maximum points in the absolute difference of the two projections being compared, as is visible around value 42 of the graph in Figure 8. It can be noticed in this graph is that projections show a slight shift value between one projection and the other.
  • SEARCH RANGE parameter It is defined as being the maximum shift value applied in order to achieve the best correlation between the main projection series. For example, in Figure 9 the left projection is scrolled over the right projection. The best match is considered to be the one where the sum of absolute differences between the projection series values is minimum compared to the rest.
  • PROJECTION 3D THRESHOLD ⁇ the number of corresponding samples in the series with an absolute difference smaller than a threshold that named PROJECTION 3D THRESHOLD. These are called Definite3D sample values because they indicate high 3D similarity; ⁇ the number of corresponding sample values with an absolute difference higher than a threshold called PROJECTION_2D_THRESHOLD. These are called Definite2D values because they indicate high similarity to 2D rather than to 3D;
  • PROJECTION 3D THRESHOLD nor over PROJECTION_2D_THRESHOLD. We call these UnsureVals;
  • PROJECTION 2D DEALBREAKER PROJECTION 3D THRESHOLD
  • PROJECT! ON 2D THRESHOLD are adaptively tuned across large numbers of test streams.
  • Another threshold D E FIN ITE 3 D AC C E PTE D TH RE S H O LD is introduced. This is also adaptively tuned and refers to a number of Definite3D values which is considered enough to force a 3D similarity conclusion.
  • Table 1 from page 10 shows recommended range values for the parameters defined in this section as well as other sections.
  • This method interprets the results in a cascading way, employing the early exit concept. That is: when enough confidence in the decision is achieved there is no reason to continue with the rest of the decision refinement steps. Refer to Figure 10.
  • UnsureGo2D as being a variable deciding if we should jump to 2D because of high UnsureVals value.
  • the number of unsure values is decided to be too high based on comparison with the UNSUREVALUES ACCEPTED THRESHOLD
  • the secondary projection series is compared against a secondary 3D threshold value which has more smaller value than the PROJE CTION 3D THRESHOLD because the method searches for a hint on this case, not a final answer.
  • This threshold is defined as
  • Table 1 shows recommended values for the parameters defined in this section as well as other sections.
  • Step 4 Using SBS and TAB values at stream level
  • the SBS factor is defined as
  • the method used is accelerated by a SIMD architecture capable of running this method on every 5 th frame.
  • a multicore architecture is preferred because this algorithm needs to be run in parallel to the stream coming into the architecture's memory, a process which is handled by a different processing core or a different circuit.
  • the 3D matching is run for a period that is adaptively tuned. It has been noticed experimentally that a too short period of time can generate decisions with a low degree of confidence while a too long period may include content events (scene cuts, transitions etc.) that are not relevant to classify the content.
  • Table 1 is a table listing recommended value ranges for the parameters required to implement the method run on the system.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

La présente invention, dont le domaine d'application est le traitement vidéo, concerne un système et un procédé utilisés afin de déterminer un contenu compatible avec une trame côte à côte 3D ou avec une trame dessus et dessous 3D. Le procédé consiste à employer un système associé pour lire des trames vidéo, leur appliquer des filtres et analyser les résultats. Un filtre de bord est appliqué sur des trames d'entrée. La partie significative des valeurs des bords est ensuite utilisée pour déterminer des projections d'image sur des zones requises de l'image. De multiples contrôles sont effectués, chaque contrôle contribuant à l'analyse de similarité entre les trames traitées et les caractéristiques de trames compatibles côte à côte (SBS) 3D ou des trames compatibles dessus et dessous (TAB) 3D. Les contrôles consistent en des comparaisons avec des paramètres spécifiques sur la base d'un apprentissage obtenu par un traitement d'un grand nombre de flux vidéo. Le procédé est réglé pour traiter un très grand nombre de types de contenus. L'usage de valeurs de comparaison réglées de manière adaptée signifie que le procédé peut être facilement formé par apprentissage de valeurs de paramètres plus effectifs provenant de nouveaux contenus, un aspect qui rend ce procédé préférable à d'autres procédés.
PCT/RO2012/000010 2012-05-29 2012-05-29 Procédé et système pour détecter un contenu 3d compatible avec une trame WO2013180590A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/RO2012/000010 WO2013180590A1 (fr) 2012-05-29 2012-05-29 Procédé et système pour détecter un contenu 3d compatible avec une trame

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/RO2012/000010 WO2013180590A1 (fr) 2012-05-29 2012-05-29 Procédé et système pour détecter un contenu 3d compatible avec une trame

Publications (1)

Publication Number Publication Date
WO2013180590A1 true WO2013180590A1 (fr) 2013-12-05

Family

ID=47891878

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/RO2012/000010 WO2013180590A1 (fr) 2012-05-29 2012-05-29 Procédé et système pour détecter un contenu 3d compatible avec une trame

Country Status (1)

Country Link
WO (1) WO2013180590A1 (fr)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100321390A1 (en) * 2009-06-23 2010-12-23 Samsung Electronics Co., Ltd. Method and apparatus for automatic transformation of three-dimensional video

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100321390A1 (en) * 2009-06-23 2010-12-23 Samsung Electronics Co., Ltd. Method and apparatus for automatic transformation of three-dimensional video

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
TAO ZHANG: "3D Image format identification by image difference", MULTIMEDIA AND EXPO (ICME), 2010 IEEE INTERNATIONAL CONFERENCE ON, IEEE, PISCATAWAY, NJ, USA, 19 July 2010 (2010-07-19), pages 1415 - 1420, XP031761509, ISBN: 978-1-4244-7491-2 *

Similar Documents

Publication Publication Date Title
RU2469418C2 (ru) Устройство обработки изображений, способ обработки изображений и программа
JP5680994B2 (ja) 画像カット方法および画像カット装置
US9167232B2 (en) System for converting 2D video into 3D video
US11763431B2 (en) Scene-based image processing method, apparatus, smart terminal and storage medium
US20080170067A1 (en) Image processing method and apparatus
WO2010021965A1 (fr) Dérivation de signature pour des images
US20120235996A1 (en) Method for distinguishing a 3d image from a 2d image and for identifying the presence of a 3d image format by feature correspondence determination
CN111695540A (zh) 视频边框识别方法及裁剪方法、装置、电子设备及介质
US20120242792A1 (en) Method and apparatus for distinguishing a 3d image from a 2d image and for identifying the presence of a 3d image format by image difference determination
EP3251053B1 (fr) Détection d'objets graphiques pour identifier des démarcations vidéo
CN107636728A (zh) 用于确定图像的深度图的方法和装置
US20090180670A1 (en) Blocker image identification apparatus and method
EP1863283A1 (fr) Méthode et dispositif d'interpolation d'image
WO2015168893A1 (fr) Procédé et dispositif de détection de qualité vidéo
EP2537346B1 (fr) Insertion de logo stereo
KR101193549B1 (ko) 티브이 프로그램의 에피소드 자동분할 시스템 및 그 방법
US20140132717A1 (en) Method and system for decoding a stereoscopic video signal
WO2013180590A1 (fr) Procédé et système pour détecter un contenu 3d compatible avec une trame
US20130169627A1 (en) Display apparatus and method for providing three dimensional (3d) image
US8538142B2 (en) Face-detection processing methods, image processing devices, and articles of manufacture
EP1936976A2 (fr) Stockage d'information de sous-titre directement accessible à la lecture
WO2012031995A1 (fr) Procédé et système permettant de déterminer un type de trame vidéo
US9860509B2 (en) Method and a system for determining a video frame type
EP4319150A1 (fr) Procédé de détection d'image de format 3d et appareil électronique utilisant ce procédé
RO127932B1 (ro) Metodă pentru detectarea cadrelor cu conţinut compatibil 3d

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12832746

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS (EPO FORM 1205A DATED 13-03-2015)

122 Ep: pct application non-entry in european phase

Ref document number: 12832746

Country of ref document: EP

Kind code of ref document: A1