I had the privilege to be a speaker at AWS re:Invent, Amazon’s annual user conference dedicated to all-things Amazon Web Services. It was a fantastic opportunity to not only present what we at Trinity Audio do and are striving to do but to also get the inside scoop on various emerging technologies that will soon take place in the respective tech stacks of businesses worldwide.
Naturally, an added bonus for me was the opportunity to make a significant leap into the speaking arena, so to speak (hah!). Along the ride was our CTO Shlomi Sutton, and we received positive feedback so who knows – maybe there’s a career in it for me if this audio thing doesn’t pan out?
Just kidding – it was very evident that audio is in full force. Specifically, the topic of text-to-speech (TTS) AI and its subsequent monetization is definitely getting more and more traction. Businesses and brands are looking to generate more out of audio, and look to it as another way to create monetized content.
Audio content has grown up but there’s still a lot to figure out with respect to audio advertising
Naturally, not everything is smooth sailing and there are certain challenges to overcome. With all the accomplishments voice assistants and smart speakers made, the industry is still missing some sort of platform where you can buy ads or upload them programmatically, not to mention buy audio traffic. On top of that, while creating audio content is considered easier than creating video content, it’s still more demanding than typing out a few coherent sentences.
Therefore, any kind of AI-powered TTS solution is highly appreciated and sought after on the content creation side, regardless if it’s a large publisher or a small individual blogger.
Monetization-wise, the audio advertising process is very similar to the video one, which makes it familiar and as such, easier to execute. It’s a pretty big advantage for audio but the more important thing for every content creator out there looking to monetize is the fact that there is not enough traffic. So, if you are able to generate or sell it – there are plenty of advertisers out there who would like to have a word with you.
This is in no way exclusively tied to the US market but literally all across the globe, minus Antarctica (probably). The demand is larger than the supply and digital audio advertising is definitely one area where I expect to see a significant increase in investments.
Audio as a commodity
As for the text-to-speech developments, technology is improving all the time across the landscape. There are different formats that are relevant for specific needs, such as the Newscaster reading style tailored to news narration or the Conversational speaking style for a wide variety of use cases. The artificial intelligence behind these services is getting an inch by inch closer to the real thing, which is super important because it drives up the human tolerance for synthesized/mechanical voices.
All of this leads me to believe that full audio content solutions such as Trinity Audio will soon be commodities. In his keynote speech to a crammed venue (the entire conference was jam-packed), AWS CEO Andy Jassy mentioned one very interesting thing:
There are a lot of companies that are looking to update, figuratively speaking, and take their business to the cloud. Moving away from all the robust and largely unnecessary legacy systems is a relief of sorts as it means moving away from skills that are no longer needed. Most of all – it means moving away from the out-of-control costs.
The message didn’t particularly resonate with me at that moment, not until after my second speaking session when someone actually drew a parallel between that modernization trend and the current status of digital audio advances, and in particular – what we at Trinity Audio do.
No false modesty or smug sense of superiority here – people were really digging our presentations but more importantly, what we are doing as our solution effectively addresses and answers demands of all sides of the ecosystem: publishers and content creators, advertisers, and users. You know – the holy triangle of this industry. Fun fact: now you know why we’re called Trinity Audio.
Machine learning as a highlight
With such a huge conference, it’s tough to single out a specific announcement, service, or technology that deserves special mention. I work closely with Amazon and even I was surprised at the sheer volume of the technical foundations of AWS services and the architectural design investments the company is making. These guys never fail to amaze me, and this was no exception.
My particular pick is quantum computing technologies as arguably the next stage of computing. However, I have to say machine learning (ML) was the focus of a lot of conversations. ML is a fantastic thing that can learn from data, recognize patterns, and even make decisions on its own with minimal human intervention.
We get all kinds of feedback from users about our solution, and this helps us deliver a more attuned experience. I expect enhanced machine learning will mostly impact SSML tags, which represent a way to control the output of audio. For instance, stops between or within sentences so the conversational responses and/or narration seem more like natural speech.
While the current developments are largely aimed at improving ML on a general basis, looking ahead, there will be improvements aimed at specific areas. For instance, what’s relatable to people in the US isn’t for the people in the UK, even though it’s the same language. Even more, what’s relatable for people in New York and the East Coast isn’t necessarily the same for the West Coast, and so on. While this is not the next phase of ML development, I see it as the next next phase – the one after.
There is a far bigger fish to fry
In terms of long-term strategy, audio is not just about creating a podcast in the same way voice is not just about creating skills or actions. These come together and are relevant to every facet of media and almost every business for one simple thing:
people want to engage with this content.
Two and a half years ago, around the time I started Trinity Audio, I attended one of those publisher-centric Digiday conferences. The vast majority of people present looked at me like I escaped from a mental institution when I mentioned my vision of voice technology and audio content. Today, that same group of people is approaching me and asking for anything and everything.
The shift that started over a year ago has produced an increasing number of people from the “holy triangle” who want to engage in some sort of audio/voice solution. If there’s one grand takeaway from AWS re:Invent 2019 regarding voice and audio, it’s this:
we are past the early adopter stage.
Make sure you’re following me on Twitter for ongoing updates, tips, and industry takeaways!