Technical program

Time Day 1:
Wednesday 3rd
Time Day 2:
Thursday 4th
Time Day 3:
Friday 5th
08:00 – 08:30 Welcome desk 08:00 – 08:30 08:00 – 08:30
08:30 – 09:00 Opening plenary 08:30 – 09:00 08:30 – 09:00
09:00 – 10:00 Keynote:
F. Pachet
09:00 – 10:00 Keynote:
C. Schmid
09:00 – 10:00 Keynote:
A. Vinciarelli
10:00 – 10:30 Coffee break 10:00 – 10:30 Coffee break 10:00 – 10:30 Coffee break
10:30 – 12:30 Oral session: SS4 10:30 – 12:30 Oral session: SS6 10:30 – 12:30 Oral session: SS7
12:30 – 14:15 Lunch break 12:30 – 14:15 Lunch break 12:30 – 14:15 Lunch break
14:15 – 15:45 Poster session:
SS2 + RS2-RS3
14:15 – 15:15 Keynote:
H. Trettenbrein
14:15 – 15:15 Oral session: RS1
15:45 – 16:15 Coffee break 15:15 – 15:30 Coffee break 15:15 – 15:30 Coffee break
16:15 – 17:55 Oral session:
15:30 – 17:00 Poster session:
EU demos + SS1
15:30 – 17:00 Poster session:SS3 + SS5
18:00 – 20:00 Jazz, Wine & Cheese reception 18:00 – 23:00 Social event: Orsay museum

RS1: Regular session – Human activity/action/gesture recognition;
RS2: Regular session – Audio and music analysis;
RS3: Regular session – Multimedia content analysis and computer vision;
SS1: Special session – 3D reconstruction, coding and transmission for audiovisual interactive services;
SS2: Special session – Automatic categorization of multimedia web data, mono-modal and multi-modal approaches;
SS3: Special session – Content Enhancement for Improved Multimedia Applications;
SS4: Special session – Informed Music Audio Processing;
SS5: Special session – Real World Sound Scene Analysis;
SS6: Special session – Semantic Media – Time-Based Navigation in Large Collections of Media Documents;
SS7: Special session – Social stance analysis;
SS8: Special session – Visual attention, a multidisciplinary topic: from behavioral studies to computer vision applications.

Detailed technical program

Wednesday 3rd

10:30 – 12:30 – Oral session – SS4

  • 10:30 – 10:50: Real-time guitar string detection for music education software (Christian Dittmar*, Andreas Männchen and Jakob Abesser)
  • 10:50 – 11:10: Looking beyond sound: unsupervised analysis of musician videos (Cynthia Liem*, Alessio Bazzica and Alan Hanjalic)
  • 11:10 – 11:30: An overview of informed audio source separation (Antoine Liutkus*, Jean-Louis Durrieu, Laurent Daudet and Gaël Richard)
  • 11:30 – 11:50: A generic classification system for multi-channel audio indexing: application to speech and music detection (Elie-Laurent Benaroya* and Geoffroy Peeters)
  • 11:50 – 12:10: Freischütz digital: a multimodal scenario for informed music processing (Meinard Mueller*, Thomas Prätzlich, Benjamin Bohl and Joachim Veit)
  • 12:10 – 12:30: Query-by-example retrieval of sound events using an integrated similarity measure of content and label (Annamaria Mesaros*, Toni Heittola and Kalle Palomäki)


14:15 – 15:45 – Poster session – SS2 + RS2 and RS3

  • A LDA-based method for automatic tagging of youtube videos (Mohamed Morchid* and Georges Linarès)
  • Searching segments of interest in single story web-videos (Mickael Rouvier, Georges Linarès*, Benoit Favre and Bernard Merialdo)
  • Introducing Motion Information in Dense Feature Classifiers (Claudiu Tanase* and Bernard Merialdo)
  • Fusion methods for multimodal indexing of web data (Usman Niaz* and Bernard Merialdo)
  • Exploring intra-bow statistics for improving visual categorization (Usman Niaz* and Bernard Merialdo)
  • Exploring new features for music classification (Rémi Foucard*, Slim Essid, Gaël Richard and Mathieu Lagrange)
  • A heuristic for distance fusion in cover song identification (Alessio Degani*, Marco Dalai, Riccardo Leonardi and Pierangelo Migliorati)
  • Ultra-low latency audio coding based on DPCM and block companding (Gediminas Simkus*, Martin Holters and Udo Zölzer)
  • Infrared ship target segmentation based on region and shape features (Zhaoying Liu, Fugen Zhou and Xiangzhi Bai*)
  • Large-scale Semi-supervised Learning by Approximate Laplacian Eigenmaps, VLAD and Pyramids (Eleni Mantziou*, Symeon Papadopoulos,Yiannis Kompatsiaris)
  • Identification of moving objects in visual surveillance data (Jogile Kuklyte*, Kevin Mc Guinness, Ramya Hebbalaguppe, Cem Direkoglu, Leonardo Gualano and Noel O’Connor)
  • Vision-based maritime serveillance system using fused visual attention maps and online adaptable tracker (Konstantinos Makantasis*, Anastasios Doulamis and Nikolaos Doulamis)
  • Densely sampled local visual features on 3D mesh for retrieval (Yuya Ohishi and Ryutarou Ohbuchi*)


16:15 – 17:55 – Oral session – SS8

  • 16:15 – 16:35: An application framework for implicit sentment human-centered tagging using attributed affect (Konstantinos Apostolakis* and Petros Daras)
  • 16:35 – 16:55: Superpixel-based saliency detection (Zhi Liu*, Olivier Le Meur and Shuhua Luo )
  • 16:55 – 17:15: Sample Specific Late Fusion for Saliency Detection (Jie Sun and Congyan Lang*)
  • 17:15 – 17:35: Affine invariant salient patch descriptors for image retrieval (Furkan Isikdogan* and Albert Salah)
  • 17:35 – 17:55: Toward the introduction of auditory information in dynamic visual attention models (Antoine Coutrot* and Nathalie Guyader)


Thursday 4th

10:30 – 12:30 – Oral session – SS6

  • 10:30 – 10:50: Event-driven Retrieval in Collaborative Photo Collections (Markus Brenner* and Ebroul Izquierdo)
  • 10:50 – 11:10: Recent advances in affective and semantic media applications at the BBC (Jana Eggink* and Yves Raimond)
  • 11:10 – 11:30: Hello Cleveland! Linked Data Publication of Live Music Archives (Sean Bechhofer*, Kevin Page and David De Roure)
  • 11:30 – 11:50: Semi-Automated Video Logging by Incremental and Transfer Learning (Jongdae Kim and John Collomosse*)
  • 11:50 – 12:10: Challenges of Finding Aesthetically Pleasing Images (João Faria, Stanislav Bagley*, Stefan Rüger and Toby Breckon)
  • 12:10 – 12:30: Describing audio production workflows on the Semantic Web (Gyorgy Fazekas* and Mark Sandler (QMUL) )


15:30 – 17:00 – Poster session – SS1 + EU project demos


  • Sound field reproduction for consumer and professional audio applications (Etienne Corteel* and Khoa-Van Nguyen)
  • Blending real with virtual in 3Dlife (Konstantinos Apostolakis*, Dimitrios Alexiadis, Petros Daras, David Monaghan, Noel O’Connor, Benjamin Prestele, Peter Eisert, Gaël Richard, Qianni Zhang, Ebroul Izquierdo, Maher Ben Moussa and Nadia Magnenat)
  • A concise survey for 3D econstruction of building facades (Patrycia Klavdianos*, Qianni Zhang and Ebroul Izquierdo)

EU project demos

  • ALICE – Assistance for better mobility and improved cognition of elderly blind and visually impaired (Titus Zaharia)
  • AXES – The AXES pro video search system (Kevin McGuinness)
  • QUAERO – Audio oriented annotation of audiovisual content: a professional prototype (Félicien Vallet)
  • QUAERO II (Gregory Grefenstette)
  • REVERIE – Real and virtual engagement in realistic immersive environments (Noel O’Connor)
  • REWIND (Christian Dittmar)
  • SAVASA – Standards-based approach to video archive search and analysis (Suzanne Little)
  • SOCIALSENSOR – Sensing user generated input for improved media discovery and experience (Nikos Sarris)
  • TOSCA-MP – Task-oriented search and content annotation for media production (Werner Bailer)
  • VENTURI – ImmersiVe ENhancemenT of User-woRld Interactions (Paul Chippendale)


Friday 5th

10:30 – 12:30 – Oral session – SS7

  • 10:30 – 10:50: The expressivity of turn-taking: understanding children pragmatics by hybrid classifiers (Cristina Segalin*, Anna Pesarin, Alessandro Vinciarelli and Marco Cristani)
  • 10:50 – 11:10: Group detection in still images by F-formation modeling: a comparative study (Francesco Setti*, Marco Cristani and Hayley Hung)
  • 11:10 – 11:30: Likability of human voices: A feature analysis and a neural network regression approach to automatic likability estimation (Florian Eyben*, Felix Weninger, Erik Marchi and Bjorn Schuller)
  • 11:30 – 11:50: Getting rid of pain-related behaviour to improve social and self perception: A Technology-Based Perspective (Min Aung, Bernardino Romera-Paredes, Aneesha Singh, Soo Ling Lim, Natalie Kanakam, Amanda Williams and Nadia Bianchi-Berthouze*)
  • 11:50 – 12:10: Social stances by virtual smiles (Magalie Ochs*, Catherine Pélachaud and Ken Prepin)
  • 12:10 – 12:30: Automatic Recognition of Personality and Conflict Handling Style in Mobile Phone Conversations (Alessandro Vinciarelli*, Hugues Salamin and Anna Polychroniou)


14:15 – 15:15 – Oral session – RS1

  • 14:15 – 14:35: Real-Time Head Nod and Shake Detection for Continuous Human Affect Recognition (Haolin Wei*, Patricia Scanlon, Yingbo Li, David Monaghan and Noel O’Connor)
  • 14:35 – 14:55: Tapped delay multiclass support vector machines for industrial workflow recognition (Eftychios Protopapadakis*, Anastasios Doulamis and Nikolaos Doulamis)
  • 14:55 – 15:15: Multimodal classification of dance movements using body joint trajectories and step sounds (Aymeric Masurelle*, Slim Essid and Gaël Richard)


15:30 – 17:00 – Poster session – SS3 + SS5

  • JPEG backward compatible format for 3D content representation (Philippe Hanhart*, Pavel Korshunov, Martin Rerabek and Touradj Ebrahimi)
  • On coding and resampling of video in 4:2:2 chroma format for cascaded coding applications (Andrea Gabriellini and Marta Mrak*)
  • Optimized tone mapping with flickering constraint for backward-compatible high dynamic range video coding (Alper Koz* and Frederic Dufaux)
  • Versatile layered depth video coding based on distributed video coding (Giovanni Petrazzuoli, Corina Macovei, Irina-Emilia Nicolae, Marco Cagnazzo*, Frédéric Dufaux and Béatrice Pesquet)
  • Acoustic recursive bayesian estimation for non-field-of-view targets (Makoto Kumon* and Tomonari Furukawa)
  • Robust Localization and Tracking of Multiple Speakers in Real Environments for Binaural Robot Audition (Ui-Hyun Kim* and Hiroshi Okuno)
  • Robust Spectro-Temporal Speech Features with Model-Based Distribution Equalization (Samuel Kevin Ngouoko Mboungueng*, Martin Heckmann and Britta Wrede)
  • A Nested Infinite Gaussian Mixture Model for Recognizing Known and Unknown Audio Events (Yoko Sasaki*, Kazuyoshi Yoshii and Satoshi Kagami)
  • Saliency-based modeling of acoustic scenes using sparse non-negative matrix factorization (Benjamin Cauchi, Mathieu Lagrange*, Nicolas Misdariis and Arshia Cont)
  • Footstep Detection and Classification Using Distributed Microphones (Kazuhiro Nakadai*, Yuta Fujii and Shigeki Sugano)