Abstract
Understanding the behaviours of free-ranging animals over biologically meaningful time scales (e.g., diel, tidal, lunar, seasonal, annual) gives unique insight into their ecology. Bio-logging tools such as accelerometers allow the remote study of elusive or inaccessible animals by recording high resolution movement data. Machine learning (ML) is becoming a common tool for automatic classification of behaviours from these types of large data sets. These classifiers often perform best using high sampling frequencies; however, these frequencies also limit archival device recording duration through elevated battery and memory use. In this study we assess the effect of sampling frequency on a ML algorithm's ability to correctly classify behaviours from accelerometer data and present a framework for programming bio-logging devices that maintains classifier performance while optimizing data collection duration. Accelerometer data (30 Hz) were collected from juvenile lemon sharks (Negaprion brevirostris) during semi-captive trials at Bimini, Bahamas, and were ground-truthed to a discrete catalogue of behaviours through direct observation of sharks during trials. The ground-truthed data were re-sampled to a range of sampling frequencies (30, 15, 10, 5, 3 and 1 Hz) and behaviours (swim, rest, burst, chafe, headshake) were classified using a random forest ML algorithm. We demonstrate that as sampling frequency decreases, classifier performance decreases. Best overall classification was achieved at 30 Hz (F-score > 0.790), although 5 Hz was appropriate for classification of swim and rest (F-score > 0.964). For fine-scale behaviours characterised by faster kinematics (headshake, burst and chafe), classification performance was lower across the entire range of sampling frequencies (0.535-0.846, 1-30 Hz), though did not decrease significantly until sampling frequency was < 5 Hz. We discuss the effects of signal aliasing and recommend that for best classification of fine-scale behaviours, frequencies > 5 Hz are required. However, when seeking to maximise the available device memory and battery capacity and therefore extend deployment duration, 5 Hz is an appropriate sampling frequency for classifying behaviours in similar-sized animals.