A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.



Visualizing Pytorch’s transformer activations

4 minute read


Transformers are a trendy architecture nowadays. The paralellism and possibility of working with sequences of different lenghts allowed this architecture to achieve awesome results in different fields. Today we are gonna learn how to visualize the attention probabilities when using pytorch’s official transformer modules.

Optimal Remote working

4 minute read


After the COVID19 pandemic, working on remote/hybrid is starting to be really common. In this post I show some tools to work from remote being data scientist, ML engineer, python developer or similar.

Speech separation with voice identity

1 minute read


We usually claim that audio-visual methods performs better than audio-only in blind sound source separation. We are gonna check the performance of audio-only methods by doing a simple experiment. To code a U-Net to perform source separation using identiy embeddings.

Masks in sound source separation: An ablation

7 minute read


In this blogpost I explain how masking works in sound source separation. It adresses binary mask and complex masks. An ablation study on their performance for the two-sources case is carried out.

Tutorial: Audio Preprocessing

5 minute read


Let’s show how to preprocess audio data to be used in Deep Learning. To do so we are going to use two very standard libraries. numpy and librosa.





Calculus II

Undergraduate course, Pompeu Fabra University, DTIC, 2019

Source separation in musical videos via motion analysis

Degree's Dissertation, Pompeu Fabra University, DTIC, 2019

Resume EN: This project proposes a method for the task of audio source separation of a signal, based on the movements of the players related to that signal. The process is composed of three blocks. The first block, computes a frequential analysis of the original signal by Non- negative Matrix Factorization (NMF). The video processing block estimates the velocity signal of the movements of each player by two types of video segmentation: the first one is based on motion trajectories of the objects in the scene, while the second one, uses optical flow and Principal Component Analysis. The last processing block makes a cor- relation between the frequential information and the velocity signals, using four variation of a method based on NMF and Non-Negative Least Squares. Finally, some experiments show the efficacy of the different variants of the audio source separation method.