BookStudyDigest

Saturday, 30 April 2022

[New post] Gave Google Books’ Publisher service for AI Audiobooks a Try – It has Pros and Cons

Site logo image multimindpublishing posted: " I got an email inviting me to try the beta testing of the Google AI audiobook service. It basically converts your ebooks you have uploaded on Google Books as a Publisher into an audiobook. They have many different voices to chose from, women and men, dif"

Gave Google Books' Publisher service for AI Audiobooks a Try – It has Pros and Cons

multimindpublishing

Apr 30

I got an email inviting me to try the beta testing of the Google AI audiobook service. It basically converts your ebooks you have uploaded on Google Books as a Publisher into an audiobook. They have many different voices to chose from, women and men, different accents/nationalities, etc. By the way, through the nature of metadata, Google themselves will probably see this so: Screw You, Google. Learn how to tell the difference between a dialect and a disability. (For everyone else: I'm Baltimorean, this is why I said that)

Admittedly, some of the voices sound really, really human-like, such as "Madison". As long as there's no character dialogue, I would most likely believe at a glance that it was a human being – a bit of a bored one but a human being all the same. With character dialog, it becomes really obvious.

Now, I have actual human narrators for my works. All my works that are in audio are all by living people. I like tech (tho I should probably like it a little less because I spent so much time trying to edit out the breaths of my poor narrator, Soraya Butler, on Dreamer … because I was used to AI voices and thought "Oh, noes, is it okay if people can hear a narrator breathe normally?" Yes, yes it is. It very is.) but human narration is not going away any time soon. The thing about human narrators is that they can inflect, have tone and express emotion that an AI simply can't grasp. I've worked with AI, human emotions are hard to replicate, especially emotional tone. (And the AI doesn't get emotion much either, especially when the emotion play a role in decision making.) And it showed that here in my samplings of the audiobook Ais. The AI got the narration down fantastically … as long as you want a calm, mostly unfeeling voice.

This audiobook idea might be good for people who are putting up literary works that don't really require much emotion and inflection, like a non-fiction work perhaps. For fiction, especially in speculative fiction, which is what I do, it might not really be too helpful, outside of helping catch remaining sneaky typos as you read your book along with the spoken word. Self interruptions, multi-character interruptions, trailing off, things like that are not really caught well in the AI. If a character is winding up from anger, the AI will 1000% not convey that. Everything is pretty, welp, flat for the most part. If you have calm prose, this AI route is the route for you.

Unless you use Findaway Voices for distro. Findaway Voices explicitly says in its contract that it will not distribute any AI narrated works whatsoever, and that they do indeed check. So that means the AI audiobook would only live on Google Audiobooks … which Findaway already distributes to. That's for Findaway and Google to sort out.

Speaking of contracts, Google's AI contract isn't 100% "free and clear". I would have to read it more but basically, they own the voice while you own the words. So, if there is any dispute going, they can snatch the voice, pretty much taking down your book. What if Google wants to censor the book somehow? AI uses deep machine learning to say the correct words correctly, it therefore "knows" what it's saying. What if you write a book and there's a passage about Uighur people? Even if it is just a plain ol' Uighur character sitting in a park eating ice cream, no mention whatsoever of genocide or oppression. Google already has been caught with being sneaky about this stuff. The example that stands out in my mind is when Hong Kong was taken over by China, if you put into Google Translate "I'm sad for Hong Kong" to translate into Chinese, it will literally say "I'm happy for Hong Kong". The two words "happy"/"sad" are not similar in Pinyin (English written Chinese) or in neither traditional or simplified Chinese characters. But there would have been no way for a person to spot that unless they also knew both English & Chinese. Someone at Google had put into the coding, "when someone types in this, put out that". It only was changed back to the accurate Chinese phrase once people pointed that out en masse.

All AI, algorithms, deep machine learning, all code everywhere is just plain 1s and 0s. All they do is execute orders, no independent thinking whatsoever. If you type in the code, "every time someone says 'hello', jump three times and chirp", that's exactly what the code is going to pump out. Unless you made an error in the code somewhere, the tech is going to jump three times and chirp when someone says "hello". It isn't because the tech is an English speaker by nature but someone told the tech, "when you hear this particular string of sound, this is how you react". It could be Japanese, it could be Swahili, whatever, it is up to the person coding, not the technology itself.

This means that if you write a book about Black issues and Google feels like suppressing that because, who knows why, maybe because clearly no one at Google really likes reading the book Algorithms of Oppression by Safiya Noble there or whatever random reason floats across their brain, their AI voice is going to be informed either "don't say these parts", "skip these passages/chapters" or "say something else" and, most importantly, "make it look seamless". The average audiobook listener is not reading along with the book in hand. Just like the English/Chinese example above, it only works if you don't know you're being tricked. It wouldn't be difficult to mod things up from Google's side. As long as you're not checking what they're doing, it flies.

Also, who knows, it could get the account, publisher or author flagged/shadow-banned without knowing it. Or passed over to governments and institutions who are being nosy for really nefarious (and usually oppressive) reasons. Because, remember, the AI "knows" what it is saying. It can be useful so that people don't try to upload, say, "The Beautyful Ones Are Not Yet Born" into audiobooks and thus steal royalties, the AI can point it out. But it also can flag the book on the back-end as "Talks about Black issues in a way that makes anti-Black people moody, caution alert".

AI is very good at deciphering human words but it still kind of screws up when it comes to our inflections, accents (again, screw you Google. Accents are not disabilities (sorry everyone, just gotta throw that in there)), etc etc. It's not as easy to figure which works could be a "problem" work with human parts, but when you feed a literal text into the AI and let the AI pump out whatever it pumps out? Wow, so much easier, looking at it from a "nefarious coder" perspective.

Will I be using AI for my works? Maybe just to run a last check on the print versions, reading the words along with the book so I can catch remaining typos. But I'm not publishing them and I'm not going to give the AI that much help where I don't need to. It's just going to be so that I can give my human narrator a cleaner script and get rid of the last typos that somehow eluded me. A faceless tool that will be replaced the second I find a better alternative, in other words.

Comment
Like
Tip icon image You can also reply to this email to leave a comment.

Unsubscribe to no longer receive posts from MultiMind Publishing.
Change your email settings at manage subscriptions.

Trouble clicking? Copy and paste this URL into your browser:
http://multimindpublishing.com

Powered by WordPress.com
Download on the App Store Get it on Google Play
at April 30, 2022
Email ThisBlogThis!Share to XShare to FacebookShare to Pinterest

No comments:

Post a Comment

Newer Post Older Post Home
Subscribe to: Post Comments (Atom)

The Consecrated Eminence: Reflections on the Objects Collection

...

  • [New post] Mackintosh — Beyond the Swelkie (2021)
    peterson10 posted: "Mackintosh, Jim, and Paul S. Philippou, eds. Beyond the Swelkie: A Collection of Poems and Writings Cel...
  • PLDT Home honors mothers on their special day with a heartwarming video titled Backstage Moms
    Motherhood is definitely one of the hardest endeavors a woman can take in her li...
  • https://www.youtube.com/watch?v=e4HzWQvkVWY&list=PL3yuCT4HAt-cxd8mbfqU-9oN6bsd4YOzmhttps://www.youtube.com/watch?v=e4HzWQvkVWY&list=PL3yuCT4HAt-cxd8mbfqU-9oN6bsd4YOzm
    https://www.youtube.com/watch?v=e4HzWQvkVWY&list=PL3yuCT4HAt-cxd8mbfqU-9oN6b...

Search This Blog

  • Home

About Me

BookStudyDigest
View my complete profile

Report Abuse

Blog Archive

  • April 2025 (1)
  • September 2024 (859)
  • August 2024 (946)
  • July 2024 (879)
  • June 2024 (843)
  • May 2024 (875)
  • April 2024 (1018)
  • March 2024 (1239)
  • February 2024 (1135)
  • January 2024 (934)
  • December 2023 (923)
  • November 2023 (818)
  • October 2023 (743)
  • September 2023 (712)
  • August 2023 (722)
  • July 2023 (629)
  • June 2023 (566)
  • May 2023 (584)
  • April 2023 (629)
  • March 2023 (551)
  • February 2023 (399)
  • January 2023 (514)
  • December 2022 (511)
  • November 2022 (455)
  • October 2022 (530)
  • September 2022 (418)
  • August 2022 (412)
  • July 2022 (452)
  • June 2022 (467)
  • May 2022 (462)
  • April 2022 (516)
  • March 2022 (459)
  • February 2022 (341)
  • January 2022 (385)
  • December 2021 (596)
  • November 2021 (1210)
Powered by Blogger.