--- title: "Speech Recognition" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Speech Recognition} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE, eval = FALSE) ``` ## Intro First, we need to install ```fastaudio module```. ``` reticulate::py_install('fastaudio',pip = TRUE) ``` ## Dataset Grab data from TensorFlow Speech Commands (2.3 GB): ```{r} commands_path = "SPEECHCOMMANDS" audio_files = get_audio_files(commands_path) length(audio_files$items) # [1] 105835 ``` ## Preprocess Prepare dataset and put into data loader: ```{r} DBMelSpec = SpectrogramTransformer(mel=TRUE, to_db=TRUE) a2s = DBMelSpec() crop_4000ms = ResizeSignal(4000) tfms = list(crop_4000ms, a2s) ``` ```{r} auds = DataBlock(blocks = list(AudioBlock(), CategoryBlock()), get_items = get_audio_files, splitter = RandomSplitter(), item_tfms = tfms, get_y = parent_label) audio_dbunch = auds %>% dataloaders(commands_path, item_tfms = tfms, bs = 20) ``` See batch: ```{r} audio_dbunch %>% show_batch(figsize = c(15, 8.5), nrows = 3, ncols = 3, max_n = 9, dpi = 180) ``` ## Model Before fitting, 3 channels to 1 channel: ```{r} torch = torch() nn = nn() learn = Learner(dls, xresnet18(pretrained = FALSE), nn$CrossEntropyLoss(), metrics=accuracy) # channel from 3 to 1 learn$model[0][0][['in_channels']] %f% 1L # reshape new_weight_shape <- torch$nn$parameter$Parameter( (learn$model[0][0]$weight %>% narrow('[:,1,:,:]'))$unsqueeze(1L)) # assign with %f% learn$model[0][0][['weight']] %f% new_weight_shape ``` ## Add callbacks Weights and biases could be save and visualized on [wandb.ai](https://wandb.ai/): ```{r} # login for the 1st time then remove it login("API_key_from_wandb_dot_ai") init(project='R') ``` ``` wandb: Currently logged in as: henry090 (use `wandb login --relogin` to force relogin) wandb: Tracking run with wandb version 0.10.8 wandb: Syncing run macabre-zombie-2 wandb: ⭐️ View project at https://wandb.ai/henry090/speech_recognition_from_R wandb: 🚀 View run at https://wandb.ai/henry090/speech_recognition_from_R/runs/2sjw3juv wandb: Run data is saved locally in wandb/run-20201030_224503-2sjw3juv wandb: Run `wandb off` to turn off syncing. ``` ## Conclusion Now we can train our model: ```{r} learn %>% fit_one_cycle(3, lr_max=slice(1e-2), cbs = list(WandbCallback())) ``` ``` epoch train_loss valid_loss accuracy time ------ ----------- ----------- --------- ----- epoch train_loss valid_loss accuracy time ------ ----------- ----------- --------- ----- WandbCallback requires use of "SaveModelCallback" to log best model 0 0.590236 0.728817 0.787121 04:18 WandbCallback was not able to get prediction samples -> wandb.log must be passed a dictionary 1 0.288492 0.310335 0.908490 04:19 2 0.182899 0.196792 0.941088 04:10 ``` See beautiful dashboard here: ``` https://wandb.ai/henry090/speech_recognition_from_R/runs/2sjw3juv?workspace=user-henry090 ```