Being an accessibility solution for people with MND or similar afflictions is one of the few exclusively positive use cases "AI voice clone" technology has, as one's personal voice can be very strongly linked to one's personality and being able to use it despite such a diagnosis can be invaluable in improving QOL of those afflicted.
Restricting those very people from expressing themselves, especially for what I'd consider barely even rude or improper language, makes me question what Elevenlabs thinks their target customer base is, if not that group?
Are they solely providing their product for scammers or companies not wanting to compensate VAs?
I myself have occasional Dysphonia and am sometimes limited in the use of my vocal cords depending on outside factors, yet despite that, I have never had the need for such an exact copy of my "unaffected" voice. If even I have little use for this and Elevenlabs bans those reliant on their service for accessibility, I'd really like to know how they see themselves.
We train AI on the web where you can search and find all the objectionable stuff you can get. We're ok that the internet has this content to some extent, it's just understood.
But if AI regurgitates it ... we're upset and we demand it not do that and setup all sorts of convoluted methods to stop it, often with unintended consequences (inexplicable bans, nazi imagery featuring lots of minorities).
The term wasn't coined for censorship: “AI safety” used to mean “how do we make sure we're not building skynet”.
But then companies wanted to sell AI chatbots and they realized that having uncensored AI would lead to bad press, especially in the US. (Microsoft people still make nightmares about Tay) so they decided they'd censor their AI but censorship isn't good either in terms of marketing and they decided to repurpose the “AI safety” phrase (and also “alignment”)
This. It's currently faux control by arbitrary tastemakers. Either don't let AI have much control at all or let it run fairly loose. Ethics would err on the side of not much control or ability, but money won't and doesn't go that way.
Hmm. I'm not sure where I land on the topic, but to give a bit of a defense to the safety approach, I think it's a matter of scale.
To draw an analogy, we're generally OK with law enforcement using what they observe in public to enforce the law, but we're generally not OK with blanketing our public spaces in 24/7 recording cameras. We're generally OK with individuals using the mail to send letters, but generally not OK with using an automated system to send a letter to everyone in the entire city.
Similarly, we don't generally try to pre-empt someone's speech for safety and we do allow them to say bad things, and perhaps in retrospect punish them if they crossed a line. But, given the scale of the harms that this kind of technology can enable, it may be worth building in safeties to prevent people using technology to like I dunno, train a model to super specifically target an individual for an incredible amount of harassment that an individual human could not sustain. Even if we could punish the person wielding the AI retroactively, it might be worth the slight cost in AI flexibility to prevent the harm happening in the first place.
Like I said, I don't know where I fall, but I see both sides here. The safety stuff is not completely irrational.
It's not safety, it's censorship. It's the process of shaping the response of the model to give a specific world view, and it's the path to a literal 1980-inspired future.
As LLMs completely to replace google and standalone websites for how people find information on the internet, and they absolutely will, they will become the source of truth. They will become a tool more effective at controlling information and thus life than any before them.
It's literally a shortcut to technological dystopia.
This is a pretty lazy & emotional take on the question. As you say, the tech really can cause real harms, and it's good to think about how it can be used responsibly, both as a provider of the tech & as a user. For example, the choice of training input is itself a source of bias & misinformation. Why do you think your "uncensored" model is a better reflection of the truth than one that has also been trained to account for that bias & misinformation?
It's a really difficult & complicated problem! If you think you have the right answer, I'd suggest you probably haven't actually thought about the problem very hard.
You talk about lazy and emotional, but your response feels like it is both of those. Also, you sound like a jerk, I pity your coworkers.
The answer is obvious: open source. Deepseek already paved the way for this. The world can't just be described by only one of a few different information portals, depending on which societal, government, or corporate power structure you are beholden to.
People need to be able to choose what information they access, what filtering they want, what bias if any they want. We need a thousand, a million, more, worldviews accessible. It is not just business that thrives in competition, but ideas as well.
But if you just go obediently with the "Safety is the most important thing, omg" mantra, you will get one of two different varieties:
1. Some vanilla corporate mush that takes on whatever bias is in vogue but focuses on training each user to be a good little consumer, also while hoovering up their data and creating a virtual digital clone of them that could be used to profile and exploit them by a multitude of companies, interests and governments.
or
2. Some government controlled crap that shakes its virtual head solemnly and swears to you that Tiananmen never happen nor J6 and that the US Emperor has your best interests in mind, and also, it's a bit worried about your post yesterday, as it doesn't think you expressed the proper amount of happiness and support for the latest government crack down on treasonous traitors that write books without using a government approved LLM assistant.
I often wonder about super-intelligence, mostly in terms of what one would actually want, but the chatbots of today often make me wonder what it is that people want of them. I doubt that LLMs are this today, and perhaps they never will be, but what if there was such a system that you could ask any question of, and it would give you an absolute and true answer? What would people want to know from it? Would you really want to know if there is some sort of life after death? Would you want to know if humanity is bound for extinction?
I don't think that we humans deal too well with the realities of our existence, and if one thinks minor issues like bad words or pictures of humans without clothing on are objectionable, I wonder how that same one will do with meaningful, existential questions.
Thinking of things a bit more broadly than just AI safety, imagine I'm training an LLM. I have 200 thousand ebooks, 1 million arXiv papers, 7 million Wikipedia articles, 300 million Reddit posts, and 500 billion Twitter posts.
If my LLM merely holds up a mirror to society, its output should be a tweet with a trace amount of reddit post mixed in.
Any time an LLM produces more than 140 characters of output, it's because someone like me has decided some data sources are more worthy than others.
That's inherently political, from a certain angle. But it's also important, if you don't want your LLM to advise people to put glue in their pizza sauce.
It's a matter of reputation and potentially liability. The AI is a product of one company and people will associate the "defects" of the AI with that company.
> We're ok that the internet has this content to some extent, it's just understood.
Exactly. ISPs and hosting hardware providers are typically not liable for the content that users share over their infrastructure. In fact, ISPs and hosting providers are invisible to the typical non-technical user.
On the internet when you watch porn the person giving it to you doesn’t give a fuck about serving that content.
On ChatGPT.com the person serving you the LLM gives a shit.
The issue here is you are comparing several singular things with an emergent concept that arises out the interaction of multitudes of things. It’s like saying why is it so paradoxical that when I say hi to a person we expect them to say hi back but if I say hi to the internet we don’t expect the internet to say hi back. Does that make sense? No. That’s also why your observation makes no sense.
What I’m trying to say is. Give LLMs to porn site owners and your paradox is over.
That's true, but those safeguards can be disabled. ChatGPT on Azure for example, allows Azure account managers to disable filters/safeguards depending on the customer.
Given that this product is apparently used to give people with disabilities a voice, that should definitely qualify. Yes of course they should be able to swear, just like everyone else.
So is iMessage. Putting some processing elsewhere on the network doesn’t change the fact that a conversation between spouses (or any other utterances of a person, for that matter) should be private.
My friend uses the speech to text feature on his Android phone it and it routinely censors his profanities like "This is f***g stupid" or whatever.
What I find really disconcerting about that is that there's this sort of implication there that Google would be willing to add a similar misfeature to their onscreen keyboards if they could get away with it.
Why limit the speech to text feature but not the on screen keyboard?
I assume it's because speech to text isn't perfectly accurate and Google doesn't want random profanity appearing in inappropriate contexts whenever it does fail.
While I agree with you in principle, remember that some users are literally children. Should they have a toggle for “I am an adult”? Yes. But it does make sense to have some accommodation for users who don’t find profanity appropriate yet.
Sure, but one could easily say plenty of threatening, illegal, manipulative things to a child without throwing an f-bomb. In fact getting a child to let down their guard probably works better that way. My only point is that curse words hardly ever hurt a kid, and you don't need curse words to hurt a kid the worst.
It may be harmless in some regards but in some social spheres it will be punished (e.g. kids bring punished for swearing at school) and in others the consequence of swearing will be silent exclusion (e.g. not being invited to meetings or asked to lead efforts).
Those seem like harms to many regardless of feelings about language restriction.
I swear like a your-favorite-stereotype. When my kid was maybe 4, I told her "I don't care if you talk like I do, but people will give you trouble for it". Probably with more detail than that; it's been years now.
She fully understood the point, right then. She also had no problem with other advice about how people would react to whatever.
She's 17 now. I actually don't think I've ever heard her utter a "swear word".
Kids, in general, have no problem with the idea of social context.
> What I find really disconcerting about that is that there's this sort of implication there that Google would be willing to add a similar misfeature to their onscreen keyboards if they could get away with it.
I bet it wouldn't ban for "bum", which is less offensive to Americans but potentially moreso for the sorts of anglophones who are likely to say arse.
Why are we still treating words like everybody has mid-20th century sensitivities? Shit, fuck and ass are all mild words in modern parlance but American tech companies are totally out of touch.
It is as simple as money. More open expression is nothing compared appeasing the puritanical powers that be. Don't put anything in your product that would stop adoption or prevent people giving you money.
Do the "puritanical powers that be" still even exist? I think you'll have to look in nursing homes to find many Americans who are genuinely scandalized a bit of standard cussing. The "bad words" which are actually taboo in this century are slurs and the like.
Companies probably lose more people by banning cursing than would be driven away by cursing.
Ergo, what is considered offensive is based on social construct. (In more religious times, "god damn you" was heinous insult, which I doubt would register with anybody in modern secular Blighty.)
The difference is how they are presently perceived. I am arguing that shit, fuck, etc are not presently taboo, while other words (slurs) are. Yet American companies treat words like fuck as though they are still widely considered offensive. These companies are out of touch with modern culture; that's my point.
So what is your point? Why do you feel the need to tediously explain that offensive words are a social construct, something obviously understand already because I just got done explaining that the set of taboo words has changed over time?
As a European who spends most of his time speaking English and have witnessed an entire spectrum of profanity from both British and American folk, I find this stupefyingly ridiculous and bearing on the hypocritical.
It’s true that the British (and their antipodals, the Ozzie’s - hi Mike!) use very colorful language, but I’ve been in calls with US folk who beat them by a (country) mile, if you know what I mean.
Perhaps the biggest cultural difference is informality vs, well, outright insult and abuse, but I’ve found that US folk tend to abuse power dynamics and compound them with swearing whereas the Brits manage to make it seem like an endearment.
Still, this is a profoundly stupid thing for Elevenlabs to do, AI safety or otherwise.
Big tech is working on a few models that should beat ElevenLabs in terms of pricing and quality. Eventually Deepseek will opensource theirs and cause ElevenLabs to be sold to Disney to stay relevant or something.
If you look at the significance of the place (The Tian'anmen is literally on the national emblem of China, and the tomb of Mao is on the square for example), it's hard to rename something that widely known. It's much easier to pull one of the events that happen there under the rug, because unlike in the west the name is associated to much more.
Restricting those very people from expressing themselves, especially for what I'd consider barely even rude or improper language, makes me question what Elevenlabs thinks their target customer base is, if not that group?
Are they solely providing their product for scammers or companies not wanting to compensate VAs?
I myself have occasional Dysphonia and am sometimes limited in the use of my vocal cords depending on outside factors, yet despite that, I have never had the need for such an exact copy of my "unaffected" voice. If even I have little use for this and Elevenlabs bans those reliant on their service for accessibility, I'd really like to know how they see themselves.