Can AI help us speak to animals? Karen Bakker interview
This is an audio transcript of the Tech Tonic podcast: ‘Can AI help us speak to animals? Karen Bakker interview’
Karen Bakker
The ability to record animal sounds and use this to try to communicate with them seems very tantalising. You know, a lot of people have a deep interest in nature and a sort of deep yearning to maybe understand what other species might be saying. But there are great risks that I think we should be wary of.
John Thornhill
That’s Karen Bakker, the eminent Canadian academic and author of a book called The Sounds of Life, about how digital technology is helping us to understand the non-human world. Karen was a leading voice in the field known as bioacoustics. In fact, her book was what inspired us in the first place to make this latest series about using AI to one day speak with animals. But while we were putting the series together, we learned the tragic news that Karen had passed away. We used extracts from her interview in the previous two episodes, but the whole conversation was so interesting, we wanted to share it with you.
[MUSIC PLAYING]
John Thornhill
I’m John Thornhill, the Financial Times innovation editor, and this is Tech Tonic. The interview you’re going to hear with Karen wasn’t my first. I spoke with her on stage at the FT Weekend Festival in Washington, DC in May this year, where she mesmerised the audience with her astonishing expertise and rare eloquence and some spine-tingling recordings of bats and whales. As her colleagues at the University of British Columbia have said, Karen’s immense energy, passionate intellect will be dearly missed. This interview, broadcast with the blessing of Karen’s family, is our own tribute to a remarkable woman.
John Thornhill
In your book, you start off with this kind of wonderful image of this kind of planetary conversation that has been going on all around us for a very, very long time, which we haven’t been able to listen into. Could you just tell us a bit about that planetary conversation and how we are now able to listen in to this thanks to machine tools?
Karen Bakker
So if you pick up your smartphone, we think of this as a device for communicating with other humans and it is. But the same technology that’s in your smartphone, including microphones, accelerometers, is now being used to track and monitor the lives of other creatures on this planet. So if you think of the internet of things being ubiquitous in our lives, imagine the internet of earthlings in which we extend monitoring and sensing capacity to all living things on the planet. What this is doing is revealing a remarkable set of capacities amongst other species. And one of those capacities is complex communication, much of which occurs at frequencies which we are not able to hear. So much communication in nature occurs on the high ultrasound above our hearing range or in the deep infrasound below our hearing range at the high frequencies. Mars, mice, beetles, rats, tarsiers these little primate cousins of ours at the low frequency whales, elephants, tigers, even peacocks are all communicating in sound, which we simply cannot hear. And because humans tend to believe that what we cannot perceive does not exist. These sounds have largely passed us by until the advent of digital bioacoustics, which uses pretty much the same tech in your smartphone to listen to these hidden sounds of nature. And we’re uncovering some marvellous things.
John Thornhill
And you have this kind of wonderful analogy in your book that it’s as though our radio frequency has only just been set on one dial and now we can move along the dial. Is that right? Could you just explain that to our listeners?
Karen Bakker
Yes, so Leroy Little Bear, who’s a Blackfoot philosopher, has this wonderful insight that the human mind is fixed and narrow, as if we were just listening to one station on the radio dial, inattentive or unaware of all of the other species communicating in a range of other frequencies. And if we could just turn the dial and tune into those other stations, we’d be able to access remarkable types of communication across the acoustic spectrum and across the tree of life. And that is what digital technology allows us to do.
John Thornhill
This kind of revolution in bioacoustics has been kind of fuelled by two changes. One is the ubiquity of microphones, which is collecting data on a scale that we’ve never done before. And two is the application of the machine learning systems to that data. So could you just walk us through both of those? First of all, how do we harness the data in the first place? And then secondly, how are we getting cleverer at analysing that data?
Karen Bakker
Yes, so the use of digital bioacoustics simply accelerates something humans have long done. Listening to nature is, of course, an ancient art. It doesn’t require digital tech. But what digital listening devices do is they enable the automated, low cost, ubiquitous and continuous recording of nature sounds even in remote environments. And that creates an abundance of digital acoustic data. So scientists are installing large arrays in west African forests to listen for forest elephants or deep in the ocean to listen for whales. So with these very large datasets, comes a new challenge, a wonderful new challenge. There’s simply too much data to analyse manually, but with the aid of machine learning, so a specific form of artificial intelligence, we can automate the analysis of this data and search for patterns, acoustic patterns, communicative patterns, and build dictionaries.
Karen Bakker
So scientists are building a dictionary of sperm whale bioacoustics. They sound a lot like morse code. You can use the same principle to study lots of other species. So there’s now an elephant dictionary with thousands of elephant sounds. And once you begin to parse those patterns and associate them with behaviour, you start building up an understanding of what the communicative regimes of other creatures look like. So we know, for example, that elephants have a specific signal for honeybee. Elephants will make very specific sounds when in oestrus, when in a bonding ceremony at the birth of a new baby. So this is pulling back the curtain, if you like, on these complex social lives of these animals. As we begin to understand the patterns in their communication, many of which escaped us before the advent of these automated tools that essentially allow us to see the landscape of communication of another creature at scale.
John Thornhill
Now we’ve been arguing for a very long time about the nature of language, what it is and how we communicate and so on. And I thought it was very interesting the phrase you used just now that you were talking about the communicative regimes of different species. Are they using language?
Karen Bakker
So the question of the definition of language is a thorny one. And philosophers, linguists, psychologists will all weigh in. Many will argue from those disciplines that there is something unique about human language. Be it the symbolic content, the ability to abstract concepts, you know, sort of combinatorial capacities. Scientists who study bioacoustics don’t actually talk about this definition of language very much. And that is in distinction to where the debate was 20 or 30 years ago when we sought in a very anthropocentric way to assess the communicative abilities of other species by seeing if they could learn human language or if they had some kind of language that was like ours. This new generation of scientists is interested in the communication of other species on their own terms, from their own worldview, or umwelt, the sort of embodied experience of being in the world. So one could argue that the question of language is actually slightly irrelevant. That is, we can deeply appreciate and learn much about the lives of creatures like bats or elephants by immersing ourselves in their communicative worlds.
Karen Bakker
Now it turns out that as we do so, we’re realising that their communication is far more complex than we realised, and the ability to sense complex ecological information conveyed through sound extends much more widely across the tree of life to insects and even plants than we realised. So we may one day arrive at a moment where we redefine the concept of language in less human-centred terms, but in a way we’ve transcended the debate. We can appreciate the marvellous, complex communication of other species without really needing to, if you like, hit the wall of whether or not we would include them in the human perspective of language. Now one thing I’ll say just about that, there’s a wonderful quote from Vic Einstein who says, you know, if a lion could speak, we could not understand them. And of course, Nagel argued that we even if bats could communicate, we’d never understand them because we do not live like a bat. We don’t fly roost upside down. We’re not embodied like a bat. But they wrote those arguments before the advent of digital technology. You and I could never echolocate like a bat, trumpet like an elephant, buzz like a bee but our computers can. And they offer us these translation devices that were not available when many of these arguments about the uniqueness of human language were formulated. So I argue in the book that those debates are being destabilised by these new digital technologies.
John Thornhill
And as you’re rightly warning against the dangers of anthropomorphising what we’re doing and thinking, it’s all about us. But are there examples that we know of of interspecies communication already happening between plants and animals or between animals themselves?
Karen Bakker
Yes, there are examples of interspecies communication. So in nature, silence is an illusion because sound is everywhere. If you could hear it all these frequencies, you would hear the sounds being generated by plants, which many insects can hear, but we cannot. You will hear the sounds of our planet, which makes infrasound thanks to volcanoes and earthquakes. So there’s all sorts of low sounds that you and I cannot hear, but some animals can. So there’s just like we cannot see in the ultraviolet, we cannot hear in a lot of these frequencies. So there’s an enormous amount of acoustic information that’s circulating. And some of this allows communication within species, some of it between. So one example would be the acoustic tuning communication between predators and prey. Bats will use sound to echolocate when they hunt mogs. But mogs can also jam bat sonar. There’s acoustic tuning between pollinators and pollinated plants. There’s also really interesting evidence of acoustic sensing between plants and insects. So Heidi Appel at the University of Toledo who has done a wonderful experiments, very ingenious, showing the degree to which plants are sensitive to the sounds of insects. And they can discern with a high degree of acuity the sounds of non-threatening environmental sounds like rainfall, and the sounds of threatening environmental sounds like the sound of an insect that is a predator of their plant chewing on plant leaves and only release defensive biochemicals when needed. So all of these are examples of the transmission of complex ecological information between species that we’re only just waking up to.
John Thornhill
One of the things that comes out in your book as well is that our ancestors were far more attuned to the kind of sound than we are. And that’s also true of a lot of indigenous people as well. I mean, we have moved, as you described, from relying on oral communication to visual communication by reading and so on. But you talk about a resurgence of an oral culture. Could you just explain that to us?
Karen Bakker
So western science and western culture tends to privilege sight over hearing and the visual over the acoustic. To some extent, that can be attributed to a long evolutionary process of us as terrestrial creatures. But more recently, to a cultural process whereby the advent of the printing press and the channelling of culture into written texts also leads us to privilege textuality over orality. So with that, what may be latent capacities to listen have been attenuated. We can really get a sense of this when indigenous knowledge reveals that human listening capacities can be cultivated to be far more sensitive than we experience in western societies. So in the book, I talk about some wonderful examples in Brazil, where specific tribes cultivate the art of listening from an early age, and their sense of hearing allows individuals to hear the sounds of, let’s say, fish in an environment where the sounds would be completely inaudible to two western anthropologists.
Karen Bakker
Similarly, there is evidence of humans being able to echo range, particularly humans born blind in contemporary humans in western society do not cultivate these capacities which may be dormant or may have simply vanished. If we look back further in evolutionary time and along the tree of life, we see that some of our primate cousins like tarsiers, these are these tiny creatures with pointy ears and these very cute, large round eyes. They can listen and communicate using ultrasound. So it may be that our human ancestors once had this capacity, which we have since lost. So digital listening devices function like a hearing aid looking to a prosthetic that extends our hearing capacity beyond the sensory limitations of our bodies. And at the same time, as I note in the book, digital listening is not the only way to access these sounds as indigenous communities show us. There are other forms of deep listening that are also very powerful forms of revealing the hidden sounds of nature.
John Thornhill
When I interviewed you on the phone a while back, you came up with this wonderful phrase that sonics is the new optics, but in a double sense. I mean, one is that you’re in a way saying in the sonic says the old optics. And do you think we’re moving into a world where we combine both sonic and optics in a way that we haven’t been able to before?
Karen Bakker
Yeah, let me explain that a little bit. So the importance of digital bioacoustics can only be understood in a historical context. And let me give you a historical analogy. So about 400 years ago, the inventors of the microscope were astonished to discover the microbial world. Van Leeuwenhoek originally kept his discovery secret for fear of being thought crazy. And for a time, he was. So at that time, the early inventors of the microscope had no idea their invention would eventually lead to the discovery of DNA and the ability to manipulate, decode and manipulate the code of life. At around the same time, the inventors of the telescope were gazing up at the stars, not realising that their device would one day allow humanity to look back in time nearly to the origins of the universe. Optics was profoundly important for the scientific revolution, but also for decentring the concept of the human within the solar system and within the cosmos with a number of, of course, cultural and religious implications.
Karen Bakker
In the sounds of life, I argue that sonic’s offer is the equivalent of optics insofar as it decenters humanity within the tree of life. We’re only at the beginning of exploring this vast new scientific terrain. But it’s already revealing some hugely surprising insights about the large number of species that communicate that we thought were mute and deaf. It’s also revealing the existence of complex ecological information and much more complex social lives and other creatures than we previously realised. And as we discussed, it’s also challenging notions of human uniqueness about language. All of this decenters humanity from our supposedly privileged position within the tree of life and the sonics. Techniques can be applied to many other fields. So your listeners might be aware of the growth of processes of sonification of data, the very beautiful work that’s being done to sonify data, to listen to the universe and physics. So there are many, many applications of sonics that go beyond biology or ecology and apply across the sciences. Given the sort of the primordial importance of not only acoustics but trimology, microtrimology, biotrimology, vibrations. So this is leading to, I think, a very exciting new set of scientific advances that may turn out to be as important as optics was several centuries ago.
John Thornhill
I’d like to move on to the AI side of this. Could you explain to us how are you using machine learning tools to deepen our understanding of what’s going on in bioacoustics?
Karen Bakker
So imagine using a set of digital recorders out in the ocean to record whale sounds. You’re probably familiar with humpback whale sounds these these long, beautiful songs that they sing. But many other whales make sounds. For example, sperm whale, bioacoustic sounds a lot more like Morse code. Now, if you were to listen in on these recordings, it would be quite confusing, in part because, you know, the whales are all singing or clicking at once. How do you know which whale is making which sound? How do you know what they’re doing when they make the sounds is all happening deep underwater. But with the advent of this digital listening tools combined with machine learning or artificial intelligence to analyse the data, we can do a few things. First of all, the machines can identify which animal is making which sounds. So they each have a voiceprint, much like you do. Algorithms can also associate the specific sounds that are being made with specific movements. If the whales are also carrying bio loggers and those devices, their little suction cup devices you can put on the whale. They’re remarkable. It can detect the whales depth pitch rule. It’s so sensitive, it can sense whether the whale is moving its flukes. So you have this amazing picture that wouldn’t be accessible to humans of exactly how the whale is moving and where it is when it’s making different sounds. And then you can start associating these patterns. So what exactly is the whale doing when it makes this set of sounds? Oh, those are greetings. What exactly is the dolphin doing when it’s making those sounds? Oh, those are echolocation clicks used for hunting. We can also decode individual vocal signatures that function much like names and then scale out and identify dialects. Because each family of cetaceans, let’s say, in the case of orcas or humpback whales, will have a specific dialect, a set of patterns that is unique to its family group and that are passed down from one generation to the next. And that do evolve over time, but that are a marker of culture, and the algorithms can also detect those patterns.
Karen Bakker
So because the algorithms are very powerful pattern recognition machines, they can then track those patterns in real time and give you a sense of what the whales are doing and where all of this would be something that would take humans hundreds of years. But if the humans can label and interpret the data, which the AI of course cannot do. But once they sort of inject meaning into the data through labelling it, the AI can then automate pattern recognition. And what we get is this beautiful sense of the move scape, the sonic move scape of these creatures. That helps us understand what they’re saying, to whom, in what circumstance and when. And that’s a building block for something scientists hope to achieve with AI, which is decoding animal communication. And if there is complex communication and something akin to human language, the breakthrough will likely come using AI, studying species that are large-brained, highly social and long-lived like whales or elephants. And indeed there are a number of teams, these really interdisciplinary teams bringing together computer scientists, machine learning experts, linguists, biologists to try to decode other species communication patterns. And I think we’re going to see some amazing breakthroughs in the next 5 to 10 years on this front.
John Thornhill
You’re talking about AI as being in pattern recognition machines, but as you’re saying that they can also be pattern replication machines. So that gives us the ability to communicate, we think. So how close do you think we are to communicating with sperm whales, using the dictionary that we’ve been able to build?
Karen Bakker
So for context for your listeners, because this is a little abstract, imagine a zoological version of Google Translate where instead of options for Spanish or Cree or Inuktitut, you have an option for sperm whalish or bottlenose dolphin or east African elephant. That is, you could type something in in English and then sounds would be produced in the other language. Now, we have not yet invented that, but scientists hope that one day soon we will, and that would allow automated playback experiments. So right now, the methods we use to determine whether we understand the different vocalisations of other species are these manual playback experiments. So a good example would be we think we figured out what sounds female elephants make when they’re in oestrus, and in some species they’re only in oestrus once every four years. So the males which don’t live with the females, are very attentive to these sounds and will come very quickly along distances if they hear these particular sounds. And the way you test that is you play that particular sound from a speaker. In the savanna and see if the elephant meals show up. You can do similar things for, let’s say, the honeybee alarm call for elephants, exhibit a very specific behaviour, bunching all together, showering each other with dust. But those playback experiments are very difficult to conduct. Very, very sort of slow and painstaking.
Karen Bakker
Imagine if you have an automated device that could not only respond to human commands in real time to make sounds, but automatically pick up the sounds being made by these creatures and then play the appropriate sound back to them. And we are on the brink of being able to do that with bats and elephants and some species of whales. The problem is that we run the risk of exactly the trap that generative AI is posing for human societies. Essentially deep fakes of animal sound or sounds that sound plausible, but actually don’t carry the right meaning in the actual situation or context. So generative AI is anyone who’s used ChatGPT knows is prone to hallucination. It will confabulation without appropriate guardrails. It will simply mislead the reader because it cannot distinguish fact from fiction. So we would need all of those guardrails in place for these nonhuman communication systems, and those guardrails are going to be much, much harder to figure out. So it may be that we have these automated playback systems and then we may do harm to these creatures, we may mislead them. It may just sound like gibberish. So hard to know, actually. So I don’t think we should overestimate our capacity to have complex conversations in real time with other species mediated by digital tech. We may be able to do simple things like issue better alarm calls or better interpret the sounds of other species so that we can ensure less interference at certain critical moments like nesting or mating. But I don’t think we’re going to have a zoological version of Google Translate available in the next decade. I think that’s much further off.
John Thornhill
As you are saying, there are a lot of considerations to be borne in mind on this. What do you think are the most important guardrails to make sure that we don’t mess this up?
Karen Bakker
Thank you for asking that. So the ability to record animal sounds and use this to try to communicate with them seems very tantalising. You know, a lot of people have a deep interest in nature and a sort of deep yearning to maybe understand whether other species might be saying. But there are great risks that I think we should be wary of, including the potential for deep fakes, precision hunting or poaching, which would accelerate biodiversity loss, particularly of endangered species. So one of the guardrails I would like to suggest is that this is a technology that should simply not be used except by for the moment academic researchers who are subject to ethics review. There’s too much potential for this to be used to lure elephants or rhinos, for example, by poachers. There’s too much potential to lead animals astray. Particularly we’ve got this huge problem of climate refugees at the moment as climate change is causing a lot of species to change their habitat and their range distributions. So my personal belief is that the first ethical guardrail applies to who should use it. No one but academic scientists subject to ethics review, not hunters. And then the second thing I think is really, really important is that we refrain from doing widespread generative AI playback experiments until we have a much better system in place for ensuring essentially an independent validation of the underlying dictionaries.
Karen Bakker
A lot of these tech innovations are kept private. A lot of the harvesting of the data is being done by big tech and private companies. It would be, I think, preferable to treat this data as a sort of a comments subject to scrutiny. And I don’t mean open access. I mean a closed comments in which researchers can vet one another’s data to ensure the animals are not being harmed. The third thing that we need to take into account is indigenous data sovereignty. Much of this data is being harvested from lands which are the traditional territories of indigenous communities which have not given their consent for the harvesting of acoustic data and which those communities already have strong ethical guardrails in place deeply associated with their indigenous laws and relationship to territories. And so their oversight of any experiments that might be undertaken is a third and very important ethical guardrail that I think we need to implement universally.
John Thornhill
This debate reminds me a bit of a parallel debate that’s going on at the moment amongst astrophysicists. You have the search for extraterrestrial intelligence, Seti. But you also have messaging, extraterrestrial intelligence, Meti. And some of the people in the Meti world are debating who has the right to communicate on behalf of humanity. If we’re going to send messages out to the rest of the universe. Who gets to send those messages? What should they be saying? Is that a kind of similar consideration, you think, for interspecies communication on Earth?
Karen Bakker
That’s a great analogy, and it’s a much harder problem on Earth because the advent of this technology means many more people could be attempting these types of communication than could be attempting Meti. So what I think will occur is a much needed overhaul of environmental regulations. For example, hunters are forbidden already from using some kind of bird calling devices during hunting season. It will simply get a refresh, an update of those regulations on the one hand. And then I think more broadly, we need to take a step back and ask a general question about the implications of digital tech for environmental conservation. Because the issue you’ve pointed out with respect to the data actually applies across many fields, not just acoustics, it applies across satellite data, wider data. We we now have a situation where the two big constraints on environmental conservation of the 20th century are being reversed by digital tech. So we now have an abundance of data rather than a scarcity of data, and we have many people having access to that data rather than a few experts. So our environmental regulations basically date from the 1970s, and we need to rethink our approach to environmental data in a digital world. There’s some really good initiatives underway. The United Nations Environment Programme is putting forward an agenda for digital environmental data that I think will create a broader context in which we can sort of systematically answer these questions for acoustic data, but for also lots of other environmental data.
John Thornhill
I’m very intrigued by the commercial applications of this in your book, and when I’ve heard you speak as well, you’ve talked about people using some of the information that we can gain from bioacoustics to apply that in the commercial world, whether it’s kind of honeybee algorithms computer coders have used or whether it’s trying to use sonic bionic sounds for cryptography and so on. Could you just tell our listeners a bit about that? What do you see people doing and in what ways are people imaginatively using the data that you’re creating for the commercial purposes?
Karen Bakker
So there are many, many ways right now where low-cost automated monitoring of non-human sound could rapidly enhance biodiversity conservation. Really important because of this massive, unprecedented sort of mass extinction we’re experiencing on the planet. One of the examples I love is currently taking place off the east coast of North America in the Gulf of St Lawrence, where highly endangered right whales have been suffering through ship strikes, traffic accidents. And using bioacoustics to triangulate their location simply on the basis of listening to the whales. We can figure out where they are at any moment in time and transmitting that information to ship’s captains in real time. And the ship captains then have to slow down, stop, move out of the way that bioacoustics protection system has led to a remarkable result. Not a single North Atlantic right whale has died of a ship strike in that zone since this program was launched. And it may be the thing that actually saves this species. So if we generalise from that, we can use bioacoustics to create these mobile protected areas that sense where an endangered species is on the basis of the sounds it’s making, and then constrain human action to make sure that we’re not disturbing, harming, injuring or killing it. And so this new commitment to preserving biodiversity across the oceans means that we’re probably going to see lots of these mobile protected areas in the oceans. They sort of follow the fish, as it were, these protected areas just hovering over the species and diverting ships as they do so. So imagine a future...
Comments