If you’re listening to this and you’re hearing the sound of a slightly old Englishman reading as if he is following his finger along the words on a page, then you are listening to a clone of my voice.
That is one of the latest and rather beautiful things from a company I’ve been using for a couple of years now.
It first started when I revamped this blog, and really, really wanted to do an audio version of every blog post. I had a good search round, and this one was a mix of the best audio and they had the cleanest WordPress integration. it delivered everything I wanted, so I paid a couple of hundred pounds to get the posh version.
Since then, I have been very happy with it. It offers a kind of fidelity and text to audio behaviour that I’ve never met before. Obviously very Artificial Intelligence driven and there are a lot of competitors in this market, but there’s a lot of good tech underneath it.
So we’ll split this review up into the experience and any pluses and minuses.
On the experience side, it’s obviously a very fast moving company. They are keeping themselves cutting edge by adding lots of new features constantly. The text to speech is absolutely phenomenal and more than I could have ever wanted.
Every time I am happy with their service, they bring out a new feature. Which offer enough value so that I’m willing to pay out the money to move up their pricing levels. Currently I’m on their studio Unlimited, which lets me do full conversions with my own voice. Very, very happy with it. And I couldn’t recommend it more.
They have a good website and nice integrations with other platforms (I first found them via a WordPress plugin search), the voices them selves run the full spectrum from stock generic voices all the way to custom clones of your own voice, obviously the pre done voices are far more polished
and sound so close to human that its only the fact they don’t auto adjust to your bad grammar and punctuation that gives them away.
The process of recording and cloning your own voice is an interesting one. and while Play H.T. make the process very easy there are some lessons learnt that I would give to anyone that are trying to clone their voice.
They get you to read out and record a great big chunk of your own text and upload it from which they clone your voice, however once you’ve done that it becomes quite obvious that you have different voices. I have a narration voice as if I was reading a story to somebody, Then I have my chatter voice when I’m talking to another human, or I am on a phone call.
So if I was to add advice, I would read out the type of source material that you are going to generate audio from in this case blog posts, and do a couple of full practice runs before you do your recording, Because your Ums, Ah’s and pauses get incorporated into your speech pattern.
The Good 1
1) Their customer service. I don’t know what they’re paying their customer service chat people, but it’s not enough. Genuine 24 7 service, I pestered them on New Year’s day at about nine in the morning and they answered within 30 seconds. Knowledgeable, consistent, helpful and with the power to make account changes at the financial level.
This has meant that every bad item I have hit they have fixed within 24 hours.
2) You can keep making your voice better by adding extra data to it. obviously this only applies to their clone service and currently you can’t append more audio to an existing voice but you can delete the existing one and reupload it with more data for free.
Which while a bit long winded, works just fine. 2
The Bad
1) Their User Interface consistency. The company on the whole feels very driven by it speech engine, which to be fair is completely as you want it to be, that is their big differential and is the core of their business.
However, I get the slight feeling that they will release a new part to the voice engine or new feature update, and then give the UI people like 30 seconds to adapt to it and jam it out to live. A bit more Quality Assurance guys! and consistency between features would help. things like the WordPress is massively behind the website.
That all needs a bit of polish. It has the feeling of either a very young company or a company where the marketing people have got a death grip on the schedule.
2) Invoicing. Their Invoicing is mad. Again, I think this is driven by the fast movement of their internal processes. They double invoice, they loose track of plans and a feature or 2 will be half accessible.
And while their customer service is absolutely fabulous, has the power to address these things and does so at high speed, it is something that makes you go “Why?”
The Niggles 3
1) Long Paragraphs. While it has a hard limit of 250 words in a paragraph, there is also a soft limit of about 4 Lines, if you go over that it seems to have to glue them together and it near always repeats a few words in the middle.
2) Haunted recordings. Sometimes the Engine has a little moment, like it is listening to a noise in the background, it goes silent. stopping talking while it listens to ghostly voices. then it restarts where it left off.
3) Acronyms. It REALLY hates acronyms, its fine with the common ones like countries and weights, but give it a computer one or a website and it will do a variety of strange things which can be different each time,
I’ve tried spaces, full stops, spaces AND full stops, speech marks everything to get them to behave, but it really sulks with them which is strange, as the previous version of the editor without the voice cloning had no problem at all.
Conclusion
For the text to speech situation generally, services like this will mean that all but the very best audio narrators should be genuinely worried. But for the rest of us this latest revolution really starts to deliver on the promise of computer speech. You still have to put an awful lot of effort into getting a very good human sounding voice, but now at a level that is totally practical to use.
More specifically on Play H.T. They are constantly evolving and improving, and I am very satisfied with them as a platform. Once they get over their slight growing pains and get their consistency sorted they will be perfect.
- Other than the audio quality and features already mentioned above.[↵]
- They actually give you a little bit of conflicting information about how much audio they need, so parts of the website say, minimum 10 minutes and then one to two hours for best results, other places say minimum 30 minutes for best results[↵]
- Niggles are issues that the site currently has that either they don’t have in the standard offering or that I am sure they will fix so these may be out of date by the time you read them[↵]