Bruno Zatt , Muhammad Shafique , Sergio Bampi and Jrg Henkel 3D Video Coding for Embedded Devices 2013 Energy Efficient Algorithms and Architectures 10.1007/978-1-4614-6759-5_1 Springer Science+Business Media New York 2013
1. Introduction
Bruno Zatt 1, 2, Muhammad Shafique 3, Sergio Bampi 2 and Jrg Henkel 1
(1)
Department of Computer Science, Karlsruhe Institute of Technology, Karlsruhe, Germany
(2)
Institute of Informatics Federal University of Rio Grande do Sul (UFRGS), Porto Alegre, Brazil
(3)
Karlsruhe Institute of Technology, Karlsruhe, Germany
Abstract
The growing interest in novel and immersive multimedia formats has lead to the popularization of 3D videos along with its capturing, coding, transmission and displaying technologies. This media format was initially popularized in cinema theaters but is currently available in televisions and mobile devices such as state-of-the-art camcorders, tablets and smartphones. The 3D video technology is based on multiple video sequences captured by distinct cameras in the same 3D scene and is used in a wide range of applications such as 3D personal recording, telepresence, telemedicine, surveillance, etc. To reduce the huge amount of data used to represent the multiview videos the Multiview Video Coding (MVC) standard was defined. While providing high coding efficiency, the real-time realization of MVC poses serious challenges related to the high coding complexity and energy consumption mainly for embedded battery-powered devices. It leads to the main goal of this monograph that is proposing energy-efficient algorithms and architectures to enable real-time encoding of high definition multiview videos. In this chapter is presented an overview of the 3D video applications and multimedia embedded systems along with the requirements and trends for the next generations of 3D video technologies. Afterwards, a discussion on the problems and challenges related to the realization of the MVC encoding on embedded systems is presented followed by an overview of the main contributions of this work and the outline of the book.
The consumers thirst for new and more immersive multimedia technologies allied to the industry interest to boost the entertainment market has driven the fast popularization of 3D-video content generation, 3D-capable devices, and 3D applications. Although the first 3D-video device was developed in 1833 and the first 3D film demonstration dates from 1915, this format only became worldwide known in the 1980s through IMAX technology. The real 3D-video hype, however, was noticed in the late 2000s through the massive popularization and availability of 3D movies followed by the 3D-capable televisions dedicated to home cinema. For a better perspective of this popularization, more than 10 % of the televisions sold in USA in 2011 were 3D capable. The latest field to be affected by the 3D-video popularization is exactly the field responsible for the biggest IC (integrated circuits) industry growth after the popularization of personal computers: the mobile embedded systems. Smartphones, tablets, personal camcorders, and other mobile devices shipments already surpassed PC shipments. For instance, more than 650 million smartphones are expected to be shipped in 2013 compared to 430 million PCs in the same year. Jointly, the popularization of 3D videos and mobile devices is leading to a scenario where a large amount of such 3D-capable smart devices is reaching the users every day, resulting in a large amount of 3D-video content being generated, encoded, stored, transmitted, and displayed. According to CISCO, video content already represents 51 % of the current Internet traffic and is envisaged to touch the 90 % mark due 2014. It is also important to consider that the 0.6 Exabytes per month mobile traffic in 2011 is expected to reach 10.8 Exabytes per month in 2016.
To cover the gap between 3D-video content generation and network and storage capabilities there is a need to efficiently encode 3D videos and reduce the amount of data required for their representation. The multiview video coding (MVC), an extension to the H.264/AVC, is the state of the art on 3D-video coding. Based on the multiple views paradigm, as the majority of current 3D-video technology, the MVC reduces the 3D videos representation in 2050 % compared to H.264/AVC simulcast. The cost of this efficiency improvement comes from an increased coding complexity and increased energy consumption, mainly at the encoder side. The energy consumption incurs from multiple processing units working in parallel to attend throughput constraints (processors, DSPs, GPUs, ASICs) and intense memory access. In a scenario dominated by mobile devices, the increase in energy consumption goes against the battery restrictions posed by these mobile embedded systems. This conflict of interests between coding efficiency and energy constraints brings the main challenge related to 3D-video realization on embedded systems: jointly design algorithmic and architectural energy-efficient solutions to enable real-time high-definition 3D-video coding, while maintaining high video quality under severe energy constraints . The main goal of this monograph is to address this challenge by presenting novel algorithms and hardware architectures designed to show the feasibility of 3D-video encoding on embedded battery-powered devices.
In the next sections, after this introduction, an overview of 3D-video applications that make the 3D-video field so promising is presented. After that, a brief introduction on the trends for 3D-video coding and multimedia embedded systems is presented, followed by the related issues and research challenges. This chapter is finalized by a summary containing the contributions of this work.
1.1 3D-Video Applications
The adoption of 3D videos is directly associated with the existence of new applications requiring the deepness sensation in order to improve the users immersion experience. From here onwards an overview of the main 3D-video applications is presented. These applications share the same concept of capturing multiple views in the same 3D scene. To give the depth illusion, distinct views are displayed to each eye with displays that employ technologies based on parallax barriers, lenticular sheets, color polarization, directional polarization, or time interleaving; more details on this phenomenon are provided in .
Three-dimensional video personal recording : Popularized by the 3D-capable mobile devices and the 3D-video sharing services the 3D-video personal recording is the most massive 3D-video service in terms of video content availability. With a 3D-video recorder device the users are free to create and publish their own video content.
Three-dimensional television ( 3DTV ): 3DTV is an extension of the traditional 2D with the depth perception. In this kind of application two or more views are decoded and displayed simultaneously where each viewer sees two views, one for the right eye, and other for the left eye. The simplest 3D displays, which are the stereoscopic displays, show two simultaneous views requiring the use of special glasses (polarized or active shutter glasses) to provide 3D sensation. The evolution of stereoscopic displays is the auto-stereoscopic display, which eliminates the need for glasses. In this case, parallax barriers and lenticular sheets are the most common solutions. Multiview displays are able to display higher number of views at the same time increasing the observer freedom by supporting head parallax, i.e., the viewpoint changes when the observer changes its position.
Free-viewpoint television ( FTV ): In this application, the user is able to select the desired viewpoint in a 3D scene. It provides realism and interactivity to the user, i.e., the focus of attention can be controlled. The display technology used may vary from 2D televisions to multiview displays.