---
title: "Audio Classification"
output:
rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Audio Classification}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, eval = FALSE)
```
## Intro
First, we need to install ```fastaudio module```.
```
reticulate::py_install('fastaudio',pip = TRUE)
```
## Dataset
Grab data:
```{r}
URLs_SPEAKERS10()
path_dig = 'SPEAKERS10'
```
See audio extensions:
```{r}
audio_extensions()[1:6]
#[1] ".aif" ".aifc" ".aiff" ".au" ".m3u" ".mp2"
```
Read files:
```{r}
fnames = get_files(path_dig, extensions = audio_extensions())
# (#3842) [Path('SPEAKERS10/f0004_us_f0004_00414.wav')...]
```
## Visualize
Read audio data and visualize a tensor:
```{r}
at = AudioTensor_create(fnames[0])
at; at$shape
at %>% show() %>% plot(dpi = 200)
```
## Preparing the dataset
fastaudio has a AudioConfig class which allows us to prepare different settings for our dataset. Currently it has:
- BasicMelSpectrogram
- BasicMFCC
- BasicSpectrogram
- Voice
Voice module is the most suitable because it contains human voices.
```{r}
cfg = Voice()
cfg$f_max; cfg$sample_rate
#[1] 8000 # frequency range
#[1] 16000 # the sampling rate
```
Turn data into spectrogram and crop signal:
```{r}
aud2spec = AudioToSpec_from_cfg(cfg)
crop1s = ResizeSignal(1000)
```
Create a pipeline and see the result:
```{r}
pipe = Pipeline(list(AudioTensor_create, crop1s, aud2spec))
pipe(fnames[0]) %>% show() %>% plot(dpi = 200)
```
## Dataloader
As usual, prepare a datalaoder:
```{r}
item_tfms = list(ResizeSignal(1000), aud2spec)
get_y = function(x) substring(x$name[1],1,1)
aud_digit = DataBlock(blocks = list(AudioBlock(), CategoryBlock()),
get_items = get_audio_files,
splitter = RandomSplitter(),
item_tfms = item_tfms,
get_y = get_y)
dls = aud_digit %>% dataloaders(source = path_dig, bs = 64)
dls %>% show_batch(figsize = c(15, 8.5), nrows = 3, ncols = 3, max_n = 9, dpi = 180)
```
## Pretrained model
We will use a pretrained ResNet model. However, the channel number and weight dimension have to be changed:
```{r}
torch = torch()
nn = nn()
learn = Learner(dls, xresnet18(pretrained = FALSE), nn$CrossEntropyLoss(), metrics=accuracy)
# channel from 3 to 1
learn$model[0][0][['in_channels']] %f% 1L
# reshape
new_weight_shape <- torch$nn$parameter$Parameter(
(learn$model[0][0]$weight %>% narrow('[:,1,:,:]'))$unsqueeze(1L))
# assign with %f%
learn$model[0][0][['weight']] %f% new_weight_shape
```
Find ```lr```:
```{r}
lrs = learn %>% lr_find()
#SuggestedLRs(lr_min=0.03019951581954956, lr_steep=0.0030199517495930195)
```
## Conclusion
And ```fit```:
```{r}
learn %>% fit_one_cycle(10, 1e-3)
```
```
epoch train_loss valid_loss accuracy time
0 5.494162 3.295561 0.632812 00:06
1 1.962470 0.236809 0.877604 00:06
2 0.801965 0.174774 0.917969 00:06
3 0.391742 0.208425 0.881510 00:06
4 0.243276 0.149436 0.914062 00:06
5 0.174708 0.134832 0.929688 00:07
6 0.142626 0.127814 0.910156 00:06
7 0.131042 0.120308 0.924479 00:07
8 0.121679 0.126913 0.919271 00:06
9 0.118215 0.114659 0.924479 00:06
```