Pages

Wednesday, 16 December 2015

SMM 2015

Welcome to my 2015 Marine Mammal Biennial page.  This page contains explanations of some of the work from the poster presented at the 21st Biennial Conference on the Biology of Marine Mammals. I've added some extra material here pertaining to the extraordinarily geeky components of my poster that wouldn't fit in the allotted space as well as some of the future directions the research will take.


This will be "published" as a regular blog post shortly after the manuscript pertaining to this information is revised. As a general warning to anybody who hasn't had the (mis)fortune of visiting this website, it's very casual.

Some Background...

The goal of my PhD is to look at patterns in acoustic behaviours of bottlenose dolphins at 30 different locations on the eastern Scottish coast. By understanding baseline behaviour we can eventually investigate large-scale, sub-lethal effects of the planned offshore construction. Do the animals move? Do they stay put? Do they make more foraging attempts (buzzing), less?

The data for my project comes from 30 different sites on the eastern scottish coast. At each site an echolocation click detector (C-POD) has been deployed. At 10 of these locations an acoustic recorder (SM2M, fs=96 kHz, 10/10min duty cycle) has also been deployed. From these data we can look at trends in clicking behaviour from the local dolphin species.



However, there are several dolphin species sharing the same habitat so we have a problem, particularly with the C-PODs, in determining the what species are producing the echolcation clicks. The work I've presented on the poster represents one method for getting at that question.

As is hopefully evident by the map, the area covered by this acoustic survey is MASSIVE. While that's, you know, great an all-it does pose some serious issues on the processing end. One of the biggest issues is telling what species of dolphin are producing the clicks that are picked up by the C-PODs.  

Testing the Model

GAM Validation

For anybody super keen to know about the extensive bits and bobs that went into testing and validating the model it's below. I'm going to leave the jargon (scientific slang) in on the presumption that anybody that interested will have already be what's going on.

Residual plots, below. Looks pretty good to me. Not perfect, but a good step in the right direction.



ROC Performance

The ROC curve below measures the proportion of correct and incorrectly identified click trains. The blue dots represent the frequency banded click trains and the red dots represent the broadband click trains. The verticle axis is the predicted value for each training point. The threshold (black line) refers to the value above and below which clicks are classified as frequency and broadband respectively. In this case the blue dots above the upper black line were incorrectly classified and the red dots below the lower black line were incorrectly classified. All points between the red black lines are classified as "unknown". As you can see, the further from the center (0.5) the threhold values get the lower the false classifications. However, greater thresholds also result in higher rates of unclassified clicks. The relationship between these values is shown in the second panel below.

A) The proportion of the "known" training data that has been classified as broadband (above the upper threshold), frequency banded (below the lower threshold) and unknown (between the two lines). B) As the classification threshold increases (the lines move toward the edges) the proportion of click trains that meet the classification decrease. As the classification thresholds decrease (move toward the centre of  image A, the propriton of classified click trains increases but so does the proportion of incorrectly classified click trains (blue click trains classified as red and visa versa) 


I have also run k-fold cross validation with 1/5 of the data held out and 400 replicates. The classifier got a median correct classification rate of ~0.96 when the threshold was set to 0.42. That's pretty good if you ask me, which nobody did. 

Likelihood Analysis for Dolphin Encounters

The above analysis and classification was done assuming that each click train was independent from the all others. In one way this is correct, animals can adjust many aspects of their echolocation systems on the fly. Of particular interest to myself is the inter-click-interval which is indicative of the animal's behaviour. So, in that sense the analysis I've done is accurate. However, the assumption fails when we look to include encounter analysis. Let me 'splain.

Dolphins on the eastern Scottish coast don't generally form mixed-group associations. In fact, there is overwhelming support for the hypothesis that bottlenose dolphins are complete, unassailable, assholes. In Scottish waters they are known to kill harbor porpoises, practice infanticide and even take out juvenile pilot whales. Seriously, google it.  Thus, it is not surprising that bottlenose dolphins are rarely seen at the same time and place with other local cetacean species. Therefore when click trains are recorded in quick succession by one of the CPODs those click trains were most likely produced by the same individual or group of animals and all click trains found within 15 minutes of each other (a somewhat arbitrary threshold) comprise an "encounter".

Using the data from the encounters we can pool the click train classifications to estimate the likelihood that the entire encounter was made of broad or frequency banded clicks. Here is an example.

Say there was an encounter that was comprised of 4 click trains.  The GAM classifier returns the following values for each click train in the encounter  0.2, 0.5, 0.9, 0.9. 

These values might suggest that within the encounter there was one frequency banded click train, one unknown and two broadband click trains. From a biological standpoint (see above) this is just silly. 

Therefore assuming that these click trains were produced by the same species, what is the likelihood that the encounter was comprised entirely of broadband click trains (1.0)  vs frequency banded click trains (0.0)?



P(Broadband)=  0.2*0.5* 0.9*0.9.=0.081
P(Frequency Banded)= (1-0.2)*(1-0.5)* (1-0.9)*(1-0.9)=0.004

We then look at the ratio between these two values

Likelihood Ratio=P(Broadband)/P(Frequency Banded)=.081/.004=20.25

This Likelihood Ratio means that this encounter was 20.25 times more likely to have been produced by a broadband clicking species than a frequency band producing species. 

Again, we can set a minimum likelihood ratio thresholds for classifying encounters as, broadband, frequency banded or unknown. For the work that's presented on the poster I chose a minimum likelihood ratio of 5 for broadband clicks and 1/5 for frequency banded clicks. Any encounters with likelihood ratios between those values were left unclassified. This  value can, of course, be changed based on the research questions one is asking.

What's REALLY cool about this method is that it allows us to classify a larger proportion of the click trains detected by the CPOD units. Where before 50-60% of the click trains were left as "unknown" now it's down to ~5%. It also homogenizes the encounters, such that the detector no longer suggests that a encounter was made of multiple species, which is again-unlikely.

So hopefully some of you are asking why  didn't I include this information when I 1) built the classifier and 2) characterized it. The simple answer is that I couldn't because there were not enough data to do so.  For the training data set there were approximately 6 encounters for frequency banded clicks and 2 for broadband. The number of click trains in each encounter also varied wildly. The largest encounter was made of over 100 click trains but the majority of encounters contained between 3-5 click trains. This is a statistical nightmare. In the future I plan on adding more training data and building on the classification and verification but for now this is the best that can be done.

Results, Again

Below is one of the graphs that I wasn't able to fit on the poster, which is a shame because it's important. It shows the number of click trains at each C-POD location. This is what would be seen if the data were left in their unprocessed state.


Original, barely processed C-POD data. Can pick out peaks in dolphin habitat use, but which dolphins?


From this alone you can see that some locations have more click trains than others. Great cool. But what you can't easily see is that the Latheron (Lat), Helmsdale (Hel) and Fraserburgh (Fra) and Cruden Bay units are dominated by frequency banded clicks while the Cromarty units are completely dominated by broadbancd clicks. Thus, if you were only interested in one species (or group of species) you could easily mis-identify important habitat.

Data Treated with the GAM for Encounters. Salmon (it's not pink!) represents probable bottlenose dolphin/common dolphin click trains and light perrywinkls (not pruple!) represents probable Risso's Dolphin/White-Beaked Dolphin click trains


Lastly, many researchers chose to report C-POD data in terms of "dolphin positive days" or hours. This gives readers a general idea about what proportion of time animals are around a given sensor. The above graph my be a bit misleading if there were only a few days where dolphins were present but lots of clicks were recorded.

Here is what the processed data look like for broadband and frequency banded clicks. I've also compared them to the raw data (black) to again show how the results of a study like this differ if you are only interested in one type of animal.


 A Time/Space/Species Animation

The next steps in this whole mess. I'm/my funding agency is interested in how bottlenose dolphin acoustic behavior changes over time and space. The work that I've done here is a first step in identifying which click trains recorded by the C-PODs. Unfortunately, it is unlikely that I'll ever be able to say that the broadband clicks are bottlenose dolphins because the clicks of bottlenose dolphins and common dolphins are just too similar. But, I may be able to use historical data and visual sightings to add some certainty to the data and or exclude periods when common dolphins were seen.

In the meantime here is an animation showing the relative number and type of click trains recorded by the C-PODs for the 2013 deployment season. This animation should be taken with an extraordinarily large grain of salt as the data have been smoothed and scaled to avoid causing seizures in unsuspecting people. I haven't begun to analyze these time series yet but I'm very much looking forward to it. 
Animation of the proportion of click trains found at each location. These data have been smoothed as to prevent seizures in unsuspecting viewers.  
Lastly, and I've been harping on this for years, we need to consider the noise levels at these locations. In exceptionally loud areas many echolocation clicks will be masked by ambient noise conditions. So, if perhaps, one of these units is placed in the middle of a shipping channel that is also home to a salmon run (e.g. Aberdeen harbour) they may record significantly fewer dolphins than were actually there. To account for this I'm currently building a Bayesian model to estimate occupancy while accounting for ambient noise conditions.








2 comments:

  1. Very interesting reading. In the lovely animation (with anti-seizure smoothing), what doea the white in the pie charts represent?

    ReplyDelete
  2. Oh bother, the legend didn't transfer. Sorry about that. The white bit is the proportion of unclassified clicks. This image was made prior to the likelihood analysis.

    ReplyDelete

Comment forum rules.
1. Be accurate
2. Cite your sources
3. Be nice

Comments failing to meet these criteria will be removed