This edition first published 2011
2011 John Wiley & Sons Ltd.
Registered office
John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, United Kingdom
For details of our global editorial offices, for customer services and for information about how to apply for permission to reuse the copyright material in this book please see our website at www.wiley.com.
The right of the author to be identified as the author of this work has been asserted in accordance with the Copyright, Designs and Patents Act 1988.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by the UK Copyright, Designs and Patents Act 1988, without the prior permission of the publisher.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic books.
Designations used by companies to distinguish their products are often claimed as trademarks. All brand names and product names used in this book are trade names, service marks, trademarks or registered trademarks of their respective owners. The publisher is not associated with any product or vendor mentioned in this book. This publication is designed to provide accurate and authoritative information in regard to the subject matter covered. It is sold on the understanding that the publisher is not engaged in rendering professional services. If professional advice or other expert assistance is required, the services of a competent professional should be sought.
Library of Congress Cataloging-in-Publication Data
Troncy, Raphal.
Multimedia semantics : metadata, analysis and interaction / Raphal Troncy, Benoit Huet, Simon Schenk.
p. cm.
Includes bibliographical references and index.
ISBN 978-0-470-74700-1 (cloth)
1. Multimedia systems. 2. Semantic computing. 3. Information retrieval. 4. Database searching. 5. Metadata. I. Huet, Benoit. II. Schenk, Simon. III. Title.
QA76.575.T76 2011
006.7dc22
2011001669
A catalogue record for this book is available from the British Library.
ISBN: 9780470747001 (H/B)
ISBN: 9781119970224 (ePDF)
ISBN: 9781119970231 (oBook)
ISBN: 9781119970620 (ePub)
ISBN: 9781119970637 (mobi)
Foreword
I am delighted to see a book on multimedia semantics covering metadata, analysis, and interaction edited by three very active researchers in the field: Troncy, Huet, and Schenk. This is one of those projects that are very difficult to complete because the field is advancing rapidly in many different dimensions. At any time, you feel that many important emerging areas may not be covered well unless you see the next important conference in the field. A state of the art book remains a moving, often elusive, target. But this is only a part of the dilemma. There are two more difficult problems. First multimedia itself is like the famous fable of an elephant and blind men. Each person can only experience an aspect of the elephant and hence has only understanding of a partial problem. Interestingly, in the context of the whole problem, it is not a partial perspective, but often is a wrong perspective. The second issue is the notorious issue of the semantic gap. The concepts and abstractions in computing are based on bits, bytes, lists, arrays, images, metadata and such; but the abstractions and concepts used by human users are based on objects and events. The gap between the concepts used by computer and those used by humans is termed the semantic gap. It has been exceedingly difficult to bridge this gap. This ambitious book aims to cover this important, but difficult and rapidly advancing topic. And I am impressed that it is successful in capturing a good picture of the state of the art as it exists in early 2011. On one hand I am impressed, and on the other hand I am sure that many researchers in this field will be thankful to editors and authors for providing all this material in compact, yet comprehensible form, in one book.
The book covers aspects of multimedia from feature extraction to ontological representations to semantic search. This encyclopedic coverage of semantic multimedia is appearing at the right time. Just when we thought that it is almost impossible to find all related topics for understanding emerging multimedia systems, as discussed in use cases, this book appears. Of course, such a book can only provide breadth in a reasonable size. And I find that in covering the breadth, authors have taken care not to become so superficial that the coverage of the topic may become meaningless. This book is an excellent reference sources for anybody working in this area. As is natural, to keep such a book current in a few years, a new edition of the book has to be prepared. Hopefully, all the electronic tools may make this feasible. I would definitely love to see a new edition in a few years.
I want to particularly emphasize the closing sentence of the book: There is no single standard or format that satisfactorily covers all aspects of audiovisual content descriptions; the ideal choice depends on type of application, process and required complexity . I hope that serious efforts will start to develop such a single standard considering all rich metadata in smart phones that can be used to generate meaningful extractable, rather than human generated, tags. We, in academia, often ignore obvious and usable in favor of obscure and complex. We seem to enjoy creation of new problems more than solving challenging existing problems. Semantic multimedia is definitely a field where there is need for simple tools to use available data and information to solve rapidly growing multimedia data volumes. I hope that by pulling together all relevant material, this book will facilitate solution of such real problems.
Ramesh Jain
Donald Bren Professor in Information & Computer Sciences,
Department of Computer Science Bren School of Information and Computer Sciences,
University of California, Irvine.
List of Figures
Artist recommendations based on information related to a specific user's interest |
Recommended events based on artists mentioned in a user profile and geolocation |
Management of a personal music collection using aggregated Semantic Web data by GNAT and GNARQL |
Metadata flows in the professional audiovisual media production process |
Color layout descriptor extraction |
Color structure descriptor structuring element |
HTD frequency space partition (6 frequency times, 5 orientation channels) |
Real parts of the ART basis functions (12 angular and 3 radial functions) |
CSS representation for the fish contour: (a) original image, (b) initialized points on the contour, (c) contour after t iterations, (d) final convex contour |
Camera operations |
Motion trajectory representation (one dimension) |
Schematic diagram of instantaneous feature vector extraction |
Zero crossing rate for a speech signal and a music signal. The ZCR tends to be higher for music signals |
Spectral centroid variation for trumpet and clarinet excerpts. The trumpet produces brilliant sounds and therefore tends to have higher spectral centroid values |
Frequency response of a mel triangular filterbank with 24 subbands |
Schematic architecture for an automatic classification system (supervised case) |