Using machine learning for audio file recognition

Rob Wareing, May 12, 2019


Altissimo has implemented and tested a tool for detecting and classifying modulated traffic noise using a machine learning algorithm. Initial selection of audio samples utilises the Australian In-service Standard for Engine Brake Noise, which detects engine-braking like events based on the RMS modulation of a recorded sound. In field testing resulted in hundreds of “engine-braking” events being detected every day. During manual review of these audio files and the associated images it was evident that the majority of events were due to extraneous environmental noise effects such as:

  • Wind and rain against the enclosure of the detector
  • Dogs barking
  • People talking
  • Birds chirping

In additional a large proportion of the detected events were other forms of traffic noise, including:

  • Cars with loud or poorly muffled exhausts
  • Motorbikes
  • Trucks accelerating/decelerating
  • Body slap/rattles from trucks

The large volume of detected events means manual review was not practical.

Machine learning for classification

A machine learning algorithm was tested for the classification of the detected events. The pyAudioAnalysis Python library has been utilised to implement machine learning for audio classification. pyAudioAnalysis is an open-source Python library for audio feature extraction, classification, segmentation, and applications. The pyAudioAnalysis toolbox is provided as an open-source package and is published under the Apache License 2.0.

To implement the machine learning tools a large set of classified events was required. Detected events were manually classified, and the resulting dataset was utilised to train the machine learning tools.

The inbuilt pyAudioAnalysis test-train tools were used to train machine learning tools based on this data-set. The following classifiers were evaluated (with and without beat extraction):

The SVM_rbf classifier yielded the highest accuracy for the classification of traffic noise events and the exclusion of extraneous environmental noise.


Several months of monitoring were performed and all events classified as engine braking were manually reviewed. This yielded the following data-set:

This was used to retrain the model, again the SVM_rbf classifier was found to be the most accurate. The correct prediction rates for the two data-sets are shown below:

Category Data Set 1 Data Set 2
Engine braking 47% 98%
Motorbikes 85% 83%
Trucks 67% 63%
Cars 37% 40%
Inclusion 86% 90%
Exclusion 90% 87%

Installation and use

We run pyAudioAnalysis on a Linode server, running Ubuntu 16.04 LTS. The pyAudioAnalysis library has also been used on Mac OS and Ubuntu 18.04. Installation is described on the github source repository. The documentation provides a detailed description for the use of the pyAudioAnalysis library.