Using squeezed wav2vec2 to automatically detect owl calls

https://www.seangoedecke.com/animal-call-audio-recognition/

By gfysfm at 1:19 AM

refibrillator | 1 comment | 3 weeks ago

Props for mentioning BirdNET as a potentially more accessible starting point for less technical folks.

There are a couple relative advantages of your approach that I feel are notable though:

Squeezed wav2vec2 (SEW) architecture leverages Transformer layers and operates directly on time series inputs. But BirdNET converts audio to a spectrogram first and then uses 2D convolution layers (ResNet-like backbone).

This over-representation of inputs to BirdNET implies that SEW will be much more computationally efficient for a given audio classification task (all else held equal).

Plus, simply using a pre-trained SEW model and then training a linear classifier on the embeddings would almost certainly produce strong baseline results. No GPU would be necessary for that.

P.S. Minor typo - precision and recall are confused here:

> “precision” (how many of the animal calls it notices) and its “recall” (the rate at which it makes accurate predictions).

sdenton4 | 1 comment | 3 weeks ago

For bioacoustics there are significant problems with domain shift, label shift, label imbalance, and sample bias in training data. You need models to generalize to new data with very different noise profiles than the available training data, and handle significant intraclass variation.

The gold standard for input features is a PCEN melspectrogram, largely because it gives useful generalizable features, through compression, normalization, and approximate log scaling of frequency features. Learned frontends tend to overfit training distributions badly - someone finally wrote this up recently, but I'm struggling to find the paper on my phone...

sdenton4 | 1 comment | 2 weeks ago

Found that paper on frontends: https://www.sciencedirect.com/science/article/pii/S157495412...

selimthegrim | 0 comments | 2 weeks ago

Appreciate the post!

sdenton4 | 1 comment | 3 weeks ago

For those who want to try it a real world version of this, the BirdClef 2024 competition is currently running: https://www.kaggle.com/competitions/birdclef-2024

Lots more I can say here - I've been working on problems in bioacoustics for years - but for now will just leave a link to some work on using bird song embeddings from last year.

https://www.nature.com/articles/s41598-023-49989-z

jelorian | 0 comments | 3 weeks ago

This is what I love about HN, someone discovers a new field with a hobby project and someone else brings more detail with a paper in Nature.

adultSwim | 0 comments | 2 weeks ago

For a well-polished ready-to-use version, I recommend the Merlin Bird ID app from Cornell. https://merlin.allaboutbirds.org/

dekhn | 0 comments | 2 weeks ago

I record a few hours when the birds are noisy in the morning and then run it through birdnet. It's integrated with audacity so you can just pull up a wav and its annotations and listen through for the various species. I could imagine automating this and using it to track migration patterns, etc, around your house.