Are you using Amazon Polly for Text to Speech?

Jan 05, 2020

I’ve been working on an application to make it easier (and faster!) to get your audio files back from Amazon Polly. Instead of having to do one script at a time, the application makes it possible to send multiple script files.

I’m looking for a few people to test it out. All you need is an Amazon Polly account and Windows 10. Would you mind letting me know how it goes and what can be done to make it better? I’ll be making the application available in the Windows App Store soon.

Here are the key features:

Select your script files from multiple locations
Easily select the voice and language from all Amazon Polly voices
including neural and standard voices
Add conversational or newscaster voice style to Joanna or Matthew without having to write any SSML
Support for SSML, too!

If you’re interested, please let me know and then I’ll send you the details on how you can test the application.

Thor

5 Replies

Thor Melicher
Author

almost 4 years ago05/11/20 at 10:29 pm (UTC)

HeroVoice TTS is available now so you can try it out. Download it from the Microsoft Windows Store - https://www.microsoft.com/store/apps/9P7Q072TRWMJ

Daniel Bolia

almost 4 years ago06/02/20 at 7:44 pm (UTC)

This post was removed by the author

Thor Melicher
Author

almost 4 years ago06/02/20 at 8:01 pm (UTC)

Daniel,

Here are some things that may help - if it's still not working, feel free to send me a private message.

1) Check that the credentials file doesn't have the extension .txt attached to it. If you used Notepad and saved the file, it's very likely that the .txt is there. From File Explorer click View and then check File name extensions.

As in the screenshot here, you can see that credentials has no extension.

2) Is your folder .aws? Check to make sure there is a period in front of the aws.

3) If you have [default] in your credentials file (which is correct!), in HEROVOICE TTS make sure it is default without the brackets.

4) In the Credentials file you typically don't need the aws_session_token unless you're using a AWS corporate account.

Let me know how it goes.

Daniel Bolia

almost 4 years ago06/02/20 at 8:40 pm (UTC)

Hi Thor,

I downloaded your app for testing, I have some questions:

How do I apply a lexicon file? I'm dealing with a lot of acronyms and need to use the lexicon list for pronunciation references.
The Remove, Edit and Preview buttons doesn't seem to be working.
Using your app, clicking the Start Encoding is the same as clicking the Synthesize to S3 or Download MP3 button from within the Polly Console. Is this a true statement? If this is the case, it doesn't allow users to preview the audio before generating the audio. This could cause unnecessary expenditure of character count against the account.
As it is designed now, there are many things that users are unable to do, which are possible using the TTS Console. For example; customizing the voice quality with standard and Amazon ssml tags.

I may not be using you app correctly, if this is the case, please let me know.

Thank you,

Dan

almost 4 years ago06/02/20 at 8:46 pm (UTC)

I recommend that you update your instructions to let people know that they will need to use the CLI to create a folder name that starts with a period "." (.aws). I haven't used the CLI in a while and was frustrated trying to create the .aws using explorer or other GUI methods.

Thor Melicher
Author

almost 4 years ago06/02/20 at 10:16 pm (UTC)

Dan,

Excellent feedback - If I miss responding to any, please let me know.

How do I apply a lexicon file?

This is not a feature that is currently supported. I'll add it to the list of future improvements.

The Remove, Edit, and Preview buttons don't seem to be working

I'll have to take a look into this - what should happen is the Remove button will drop the selected file from the list, the Edit button will open the default application for the file selected, and the Preview button will play the selected file in the list. When clicking on the Preview button, it will result in an expenditure of the character count as the text needs to be sent to Amazon to be encoded. When using the AWS console, I honestly don't know if Amazon counts this towards your character usage or not.

What happens when you click 'Start Encoding' - is it the same as clicking Synthesize to S3 or Download MP3 button?

It's more like the latter, clicking the Download MP3 button. My application differs slightly here in that I get around the 3000 character limitation in the Amazon Console by combining the resulting MP3 files. At this time I don't support the S3 buckets as it adds more complexity. It's as you say - when clicking the 'Start to encode' button, it will count towards your character usage.

Is it possible to customize the voice (type, quality, and SSML tags)?

Yes, it's possible to do so and I provide a couple of different ways.
- If you're not experienced with SSML, you can use the app to apply some of the most requested 'asks' I've seen in the forums here - pause duration for commas, volume, and speed (rate of speech). These options are available when you indicate your file type as being 'Text'. It's a global setting, meaning that it's applied to all of your files when you click 'Start to Encode'.
- If you're experienced with SSML then I assume you have written the encoding yourself. I wouldn't want to introduce my 'global settings' as it would likely create errors. On another note, in this version of the app I decided not to create a full blown SSML editor. It's on my list of future enhancements as well as being able to indicate different parts of the file to be different voices (not quite sure how to do that yet but if you have any ideas, let me know!)
- I'm also making the assumption with 'Text' is that you want consistency without having to do more work. For example, let's say you decide you want to use a Neural voice (which isn't available in Storyline today) such as Joanna, Matthew, or Lupe and you want to add the Newscaster style (the only Neural voices that support this style as of this writing). If your course has 30+ files then it's time consuming to add <speak><amazon:domain name="news"> at the front of each file and then </amazon:domain></speak> at the end of each file. When doing this by hand in the Amazon Console, you also have to contend with the 3000 character limit as all SSML tags count against it. This is where I think my application shines as it adds it to each file and ensures the 3000 character limit won't get in the way.

What about adding instructions for using Amazon CLI (Command Line Interface)?

Excellent suggestion! Someone else recently suggested that, too. I'll be adding it soon. I didn't go this route initially because in my mind for some users it would add a different type of complexity (another app to install, more steps to take, and so on). My instructions (at least to me) feel a bit heavy already so I need to put thought on how to make it as clear as possible.

Again, thank you for your thoughtful questions and posing the challenges that you see in the current implementation. My goal is to make the program as useful as possible. We all have different needs in our course development so it's hard to hit that sweet spot but something I want to achieve.

Are you using Amazon Polly for Text to Speech?

5 Replies

This discussion is closed. You can start a new discussion or contact Articulate Support.