How OpenAI sells GPT-2 as NLP killer app?

Originally published in Language Tech Modern anthology, December 2019.

Part of the cover of “Crazy Jazz” by Blind Mr. Jones

So, the thing we’ve been waiting for a better half of the year just happened — OpenAI had released a full unabridged version of their infamous Natural Language Processing model GPT-2.

You know, the one known for being capable of generating texts that look and read just like the ones written by actual human beings. The one OpenAI deemed to be “too dangerous” for the public release. The “Pandora’s Box” that, if misused, would probably flood the world with fake news and otherwise manipulative synthetic content. The one that would make regular copywriters and journalists utterly obsolete. The one that birthed AI Weirdness blog, after all. You know, that one.
Well, it is back again and it is a good reason to talk about one of its lesser-known aspects.

One of OpenAI’s competitive advantages is that it is “ capped-profit” limited partnership, which means that OpenAI doesn’t have to generate profit per se, but their sponsors (in limited partnership — folks like Microsoft) can use OpenAI’s research for their own purposes (products or solutions within products). At the same time, OpenAI’s primary performance indicators are more or less spread of their technological solution and maintaining its notoriety.

Let’s talk about the latter.

The thing you need to know about OpenAI GPT-2 is that its marketing campaign was pitch-perfect. This is how you market the technological product to the unwitting masses. This is how you present a value proposition to those in the know.

The secret is that you mostly don’t really talk about the product itself but instead concentrate on its impending impact, the way it will shake things up, “make a difference,” and so on. Hence “ too dangerous” narrative.

This approach creates a mild case of hysteria perpetuated by an eerie mystery, which is mostly based on a bunch of “um”’s and “uh”’s and whole lotta sweet technobabble. The goal is to exemplify what the thing is capable of subtly — leaving enough space for speculation. On the one hand, this approach created a stir in the media, thus guaranteed a higher degree of visibility than competing products. On the other hand, it signaled to the right people that there is a multi-faceted solution waiting for proper implementation.

Just look at the launch sequence and how it establishes the product. It kicked off with the press release that stated, “ oh well, we guess we just made some kind of doomsday device. And we are really concerned about it. And we’re definitely not going to do anything with it, because it would be ridiculous to put it out in the open just like that. So we are just going to keep under wraps for now”.
What happened next is the tortsunadomi of news pieces that retold the story over and over again and also added a few meandering thoughts full of piss and vinegar to the conversation. It is then quickly incorporated into the “fake news” and “deep fake” narratives and firmly contextualized as something definitely “threatening.” As a result, the burning mystery is perpetuated and elaborated.

The next step was to provide a proper showcase of the model capabilities while avoiding spoiling the big thing. It is a great sleight of hand. The whole thing is out in the open, but you claim it isn’t. The showcase is made by an abridged version of the product presented as a technical paper and a tool for researchers.
It played out perfectly — the presentation of expanded and diversified by multiple points of view on the model. As a result, — more opportunities became apparent.

There were a couple of NLP projects that provided GPT-2 a lot of publicity. For example, Talk to Transformer presented the original use case of GPT-2 — text prediction. The way it was presented neutralized any emerging criticism regarding its clumsiness (for example, it couldn’t predict Austin 3:16). It was simply freestyling a bunch of stuff — sometimes it was more coherent, sometimes it wasn’t.

Then there was the Giant Language Testing Room that visualized the text analytics and showed the mechanics behind computer-generated and human-generated texts. And then there was Allen Institute’s Grover than was designed to expose deepfake texts and also showcase how easily the model can recreate the text with a couple of settings tweaked (except when you ask it to write an essay on Tawny Kitaen for some reason).

Also, there were numerous blogs that applied GPT-2 for creative purposes, like writing unicorn fiction or generating incongruous names. Blogs like AI Weirdness that basically celebrated model’s bumbling nature.

In a way, this kind of presentation operates on a similar framework to Marvel Comics’ built-up the arrival of Galactus back in 1966. First, The Watcher shows up and warns about the coming “doom.” And everybody go “ah, we’ve gotta do something!” Then you get a harbinger in the form of Silver Surfer, who wreaks havoc but at the same time continually reminds us that “the worst is yet to come.” The heat is right around the corner now.
And then we get the main event — Galactus, who was just another supervillain in the long line of “ultimate threats” just ten times bigger and because of that characterized as ten times more imposing and menacing and dangerous because authors of the comic said so.

And that’s what happened. After months of build-up and speculations, in early November, the big 40GB 1,5 billion parameters version of GPT-2 was released to the public. And it was exactly what it said on the tin from the very start, just ten times bigger. However, the momentum generated by the build-up made it seem less apparent. But there is a catch which makes things even more interesting.

The thing is that “ten times bigger” doesn’t translate into “ten times better” because the technology doesn’t really work that way. What matters is how it manages to accomplish specific use cases and how does it correlate with the solutions for actual practical needs.

GPT-2 is presented as a tool that generates texts that seem natural as if an actual human being wrote it. It is cool, but you need to understand that the model operates on the finite dataset, and all it really does is derives the text out of probabilities and context approximations. Text generation is not really generation. It is more of regurgitation in sophisticated combinations. GPT-2 output looks and reads just like a real thing in a way that the real thing can be bland and passable and not really saying anything, just like loads and loads of content on the internet, especially, its corporate parts.

What GPT-2 really does is creating a serviceable text product that serves a specific well-defined function. But you can’t really market that because it is not really unappealing beyond those in the know (say hi to BERT). However, that’s the absolute peak of practical NLP.
A perfect example is a conversational interface. You say “hi,” and it says, “hello, how are you?” because that how conversations usually go. The other example is a verbal interpretation of data like stats in Mailchimp where you can get reports like”22% opened your newsletter, 5% of which clicked on the link, and 1% gave it a thumbs up”, but it is beyond current GPT-2 use case.

That’s why the “fake news” narrative is so beneficial for GPT-2.
On the surface, fake news seems to be a natural avenue for that kind of thing. That game is all about quantity, not quality, and the NLP generator can help with that. It was hammered over and over again in the press releases and subsequent news pieces. And the use case seems legit. Technically, the goal is to generate derivative content that will reinforce and possibly perpetuate prerequisite beliefs of the target audience. Just what GPT-2 is good at.

Except, that’s not really the message of the narrative. The real message all along was, “ this thing can be used like that if YOU won’t use it.” And that leaves an intriguing mystery of what is going to happen next, which in turn engages the target audience more than the product itself.

Writer, translator.