^{1}

^{2}

^{*}

^{3}

^{4}

The authors have declared that no competing interests exist.

Conceived and designed the experiments: AK. Performed the experiments: AK. Analyzed the data: AK. Contributed reagents/materials/analysis tools: LS VJ. Wrote the paper: AK LS VJ.

Bottlenose dolphins (

The complexity of dolphin vocalisations has long fascinated scientists, and inspired numerous attempts to classify and decode them. Dolphins and other cetaceans produce a wide range of vocalisations, including tonal whistles, clicks, and burst pulses. Several species, including the bottlenose dolphin (

Several studies have shown that human observers can reliably identify individual dolphins from a spectrographic representation of their signature whistles

Several studies have tried to find methods that classify whistles by other means. A variety of methods have used computer algorithms to classify whistles in the absence of any information on the underlying categories used by dolphins themselves. Examples are correlation of fixed-point sampling

A fruitful field to use as a basis for this effort might be human musical recognition and encoding. “Expert” recognition of musical tunes appears to involve a “lossy” representation of the original signal, i.e., one where much of the data has been discarded

If the individual information in dolphin signature whistles is preserved under a Parsons-type encoding technique, then we would expect a good clustering performance of Parsons-encoded whistles, since most machine-learning algorithms can be expected to benefit when the input data are pre-processed to include only relevant features

We reanalysed the same data described in Sayigh et al

We also reused the visual clustering from 10 inexperienced human observers (i.e. unfamiliar with data set), as described in the same study. Each observer was asked to group spectrograms of all 400 whistles into classes by frequency profile similarity, without having any information indicating how many individual dolphins were represented, how many whistles there were for each dolphin, or what guidelines should be used for grouping similar whistles. For the automatic clustering, we extracted the whistle frequency profiles obtained by sketching the course of the dominant frequency manually on the spectrogram, using custom visualisation software to assist manual whistle tracking. This provided a set of time-frequency points of variable length, depending on the duration of the whistle. We then filtered these data using a cubic-spline technique

We examined three separate metrics for whistle similarity: (1) the correlation metric (CM) suggested by McCowan & Reiss

For each of the metrics, and each of the clustering algorithms, we measured the success of the clustering assignment using the Normalised Mutual Information _{c}_{k}_{k,c}

For the correlation metric (CM), we followed the technique suggested by McCowan & Reiss

The dynamic time-warping (DTW) metric measures the minimum distance between individual whistles, when the x-axis (time) spacing between data points is allowed to vary freely (see Buck & Tyack

The left frame shows the original signals on arbitrary time and frequency axes. The right frame shows the red sample having undergone a dynamic time-warping transformation to produce the minimum least-squares distance from the blue sample. Note how the spacing of the points in the curve have been varied.

To calculate the Parsons code metric, we resampled each whistle into 10 equally spaced segments, and recorded whether the mean frequency of each segment was higher (“up”), lower (“down”), or within a tolerance of one pixel (“constant”) of the previous segment. We chose 10 segments since this provided a compromise between loss of information (few segments) and convergence on the continuous-time analysis (many segments). This produced a nine digit, base-3 code for each whistle. In a preliminary investigation, we verified the choice of a 10 segment code by measuring the clustering success (as measured by the Normalized Mutual Information using the k-means clustering algorithm) when the number of segments is varied between one and 25 (

A number of authors

We used three separate and very different clustering algorithms to exclude the possibility of our results arising from the idiosyncrasy of a particular clustering algorithm; different clustering algorithms may produce different results when applied to the same data

For hierarchical clustering, we used the Matlab function

We also used an unsupervised neural network clustering algorithm based on the Adaptive Resonance Theory (ART) approach

Having generated cluster assignments for all 400 whistles using each of five metrics (CM, DTW, 1-PC, optimum

For analysing the human visual clustering taken from Sayigh et al

The success of retrieving identity information from

Error bars indicate the standard error of the 100 bootstrapped iterations.

Visual clustering produced near-perfect allocation of whistles to individual dolphins, with NMI values between 0.90 and 0.99 (mean 0.96). All of the automatic metrics produced much lower NMI values (

The panels show ART clustering (left), k-means clustering (middle) and hierarchical clustering (right), using the different proximity metrics. Each box shows the 25^{th} and 75^{th} centiles, with the median indicated as a red line. Whiskers show the extreme values (±2.7σ) using the Matlab

The automatic algorithms produced NMI values between 0.52 and 0.79, and all provided better clustering than the random control matrix. For each of the three clustering algorithms, analysis of variance (ANOVA) showed a significant difference between the encoding techniques (ART: F(5,510) = 20758,

As noted in previous studies

Although we do not propose that dolphins make use of a Parsons-like comparison of whistles to identify individuals, our bottom-up, or model-based, approach to call categorisation ^{9}≈2^{25} combinations (25 bits), whereas a 60-point frequency profile on a spectrogram with a frequency resolution of 128 can differentiate 128^{60} = 2^{420} combinations (420 bits). When reduced by principal component analysis (PCA) to 16 dimensions, this falls to 128^{16} = 2^{112}, still far more than the information contained in the Parsons code. As Beyer et al

Fripp et al

Both the ability of dolphins to recognize the modulation pattern of the fundamental frequency

Cetacean vocalisations are highly varied and presumably also of varying function. To analyse these vocalisations and to determine their significance, it is vital to be able to classify them and distinguish calls with biologically distinct origins or functions. Such distinction is necessary to correlate call types with their associated ethological function. This process is unlikely to be possible unless we can identify elements of the signals that contain information relevant to the animals. To develop and test classification methods we need representative data sets of animal vocalizations. In this study, we used a data set balanced for sample size that came from a very specific but artificial context in which dolphins only produce signature whistles. However, in free-swimming dolphins signature whistles only account for around 50%

We would like to thank Livio Favaro for comments on an earlier version of this manuscript, and Carter Esch for assembling the whistle sample used in this study. We would also like to thank Randall Wells and the Sarasota Dolphin Research Program for the opportunity to record the Sarasota dolphins. Data were collected under a series of National Marine Fisheries Service Scientific Research Permits issued to Randall Wells.