Page not found. Your pixels are in another canvas.
A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.
This is a page not in th emain menu
Inspecting the training data and the results is usually underrated task which can help us to understand how our models works. Today I will show how apparently bad quality samples generated for lip reading revealed issues with the The Grid Audio-Visual Speech Corpus dataset.
Transformers are a trendy architecture nowadays. The paralellism and possibility of working with sequences of different lenghts allowed this architecture to achieve awesome results in different fields. Today we are gonna learn how to visualize the attention probabilities when using pytorch’s official transformer modules.
After the COVID19 pandemic, working on remote/hybrid is starting to be really common. In this post I show some tools to work from remote being data scientist, ML engineer, python developer or similar.
We usually claim that audio-visual methods performs better than audio-only in blind sound source separation. We are gonna check the performance of audio-only methods by doing a simple experiment. To code a U-Net to perform source separation using identiy embeddings.
In this blogpost I explain how masking works in sound source separation. It adresses binary mask and complex masks. An ablation study on their performance for the two-sources case is carried out.
Goal: Audio-visual sound source separation
- Only the Source separation functions has been ported, even though the project is opensource and open to grow.
Tutorial: Loading AudioVisual Content with Nvidia DALI
Goal: Weakly supervised pose detection
Resume: Self-supervised Learning of Interpretable Keypoints from Unlabelled Videos
Goal:Aligning two mages depicting objects of the same category
Let’s show how to preprocess audio data to be used in Deep Learning. To do so we are going to use two very standard libraries.
The experience of working from remote in Ecuador!
Published in IEEE MMSP2020, 2020
Audiovisual Dataset of musicians playing different instruments. Openpose skeleton provided framewise
Recommended citation: Juan F. Montesinos, Olga Slizovskaia, Gloria Haro (2020). "Solos: A Dataset for Audio-Visual Music Source Separation and Localization" IEEE MMSP 2020 1. https://arxiv.org/pdf/2006.07931.pdf
Published in IEEE MMSP 2020, 2020
Weighted losses applied to a Multi-channel U-Net
Recommended citation: Venkatesh Shenoy, Juan F. Montesinos, Gloria Haro, Emilia Gómez (2020). "Multi-channel U-Net for Music Source Separation." IEEE MMSP2020 1. https://arxiv.org/pdf/2003.10414.pdf
Published in S&S CVPR21, 2021
Audiovisual singing voice separation
Recommended citation: Venkatesh S. Kadandale,Juan F. Montesinos, Gloria Haro (2021). "Estimating Individual A Cappella Voices in Music Videos with Singing FacesS6S CVPR21 https://sightsound.org/papers/2021/Venkatesh_Shenoy_Kadandale_Estimating_Individual_A_Cappella_Voices_in_Music_Videos_with_Singing_Faces.pdf
Published in BMVC 2021, 2021
Graph CNN for singing voice separation
Recommended citation: Juan F. Montesinos, Venkatesh S. Kadandale, Gloria Haro (2021). "A cappella: Audio-visual Singing Voice SeparationBMVC 21 https://arxiv.org/abs/2104.09946
Published in Under review, 2022
Transformer for AV synchronization
Recommended citation: Venkatesh S. Kadandale, Juan F. Montesinos, Gloria Haro (2022). "VocaLiST: An Audio-Visual Synchronisation Model for Lips and VoicesReview https://arxiv.org/abs/2204.02090
Published in Under review, 2022
AV Transformer for voice separation
Recommended citation: Juan F. Montesinos, Venkatesh S. Kadandale, Gloria Haro (2022). "VoViT: Low Latency Graph-based Audio-Visual Voice Separation Transformerreview https://arxiv.org/abs/2203.04099
This is a description of your talk, which is a markdown files that can be all markdown-ified like any other post. Yay markdown!
This is a description of your conference proceedings talk, note the different field in type. You can put anything in this field.
Undergraduate course, Pompeu Fabra University, DTIC, 2019
Degree's Dissertation, Pompeu Fabra University, DTIC, 2019
Resume EN: This project proposes a method for the task of audio source separation of a signal, based on the movements of the players related to that signal. The process is composed of three blocks. The first block, computes a frequential analysis of the original signal by Non- negative Matrix Factorization (NMF). The video processing block estimates the velocity signal of the movements of each player by two types of video segmentation: the first one is based on motion trajectories of the objects in the scene, while the second one, uses optical flow and Principal Component Analysis. The last processing block makes a cor- relation between the frequential information and the velocity signals, using four variation of a method based on NMF and Non-Negative Least Squares. Finally, some experiments show the efficacy of the different variants of the audio source separation method.