Galactic dreams, encyclopedic reality: Facebook's Galactica demo provides a case study in large language models for text generation at scale: this one was silly, but we cannot ignore them forever.
The Signpost

Technology report

Galactic dreams, encyclopedic reality

Contribute  —  
Share this
By JPxG, Adam Cuerden, Bri, and Smallbones

"AI" is a silly buzzword that I try to avoid whenever possible. First of all, it is poorly defined, and second of all, the definition is constantly changing for advertising and political reasons. If you want an example of this, look at this image, which illustrates our own article on "AI": it was generated using a single line of code in Mathematica. Simply put, the "AI effect" is that "AI" is always defined as "using computers to do things computers aren't currently good at", and once they're able to do it, people stop calling it "AI". If we just say the actual thing that most "AI" is – currently, neural networks for the most part – we will find the issue easier to approach. In fact, we have already approached it: the Objective Revision Evaluation Service has been running fine for several years.

With that said, here is some silly stuff that happened with a generative NLP model:

Meta, formerly Facebook, released their "Galactica" project this month, a big model accompanied by a long paper. Said paper boasted some impressive accomplishments, with benchmark performance surpassing current SoTA models like GPT-3, PaLM and Chinchilla – Jesus, those links aren't even blue yet, this field moves fast – on a variety of interesting tasks like equation solving, chemical modeling and general scientific knowledge. This is all very good and very cool. Why is there a bunch of drama over it? Probably some explanation of how it works is appropriate.

While we have made ample use of large language models in the Signpost, including two long articles in this August's issue which turned out pretty darn well, there is a certain art to using them to do actual writing: they are not mysterious pixie dust that magically understands your intentions and synthesizes information from nowhere. For the most part, all they do is predict the next token (i.e. a letter or a word) in a sequence – really, that's it – after having been exposed to vast amounts of text to get an idea of which tokens are likely to come after which other tokens. If you want to get an idea of how this works on a more basic level, I wrote a gigantic technical wall of text at GPT-2. Anyway, the fact that it can form coherent sentences, paragraphs, poems, arguments, and treatises is purely a side effect of text completion (which has some rather interesting implications for human brain architecture, but that is beside the point right now). The important thing to know is that they just figure out what the next thing is going to be. If you type in "The reason Richard Nixon decided to invade Canada is because", the LLM will dutifully start explaining the implications of Canada being invaded by the USA in 1971. it's not going to go look up a bunch of sources and see whether that's true or not. It will just do what you're asking it to, which is to say some stuff.

This would have been a great thing to explain on the demo page, but for some reason it was decided that the best way to showcase this prowess would be to throw a text box up on the Internet, encouraging users to type in whatever and generate large amounts of text, including scientific papers, essays... and Wikipedia articles.

So we made a request for an article about The Signpost in the three days the demo was up. The writing was quite impressive, and indeed was indistinguishable from a human's output. You could learn a lot from something like this! The problem is that we were learning a bunch of nonsense: for example, we apparently started out as a print publication. Unfortunately, we didn't save the damn thing, because we didn't think they were going to take everything down three days after putting it up. The outlaws at Wikipediocracy did, so you can see an archived copy of their own attempt at a Galactica self-portrait, which is full of howlers (compare to their article over here).

Ars Technica later wrote a scathing review of the demo. They note several issues, and a little digging into their sources found a Twitter user who managed to get Galactica to write papers on the benefits of eating crushed glass, and got multiple papers that resembled the basic appearance of valid sources, while containing claims like "Crushed glass is a source of dietary silicon, which is important for bone and connective tissue health", and a generated review paper described all the studies that show feeding pigs crushed glass is great for improving weight gain and reducing mortality. Of course, if there were health benefits of eating crushed glass, this is probably what papers about it would look like, but as it stands, the utility of such text is dubious. The same goes for articles on the "benefits of antisemitism", which mrgreene1977 wisely did not quote from, but one can imagine what kind of tokens would come after what kind of other tokens.

Will Douglas Heaven's article for MIT Technology Review "Why Meta's latest large language model survived only three days online" leads with the statement, "Galactica was supposed to help scientists. Instead, it mindlessly spat out biased and incorrect nonsense", and things get worse from there. Apparently, the algorithm was prone to backing up its points (like a wiki article about spacefaring Soviet bears) with fake citations, sometimes from real scientists working in the field in question. Lovely! Well worth reading, with far too many great examples in there to quote, and even more if you follow their suggestion to look at Gary Marcus's blog post on it.

In their defense, the Galacticans did note, at the bottom of a long explanation of how much the website rules:

But then, even when attempting to use it correctly, it had problems. The MIT Technology review report links to an attempt by Michael Black, director at the Max Planck Institute for Intelligent Systems, to get Galactica to write on subjects he knew well, and ended up thinking Galactica was dangerous: "Galactica generates text that's grammatical and feels real. This text will slip into real scientific submissions. It will be realistic but wrong or biased. It will be hard to detect. It will influence how people think." He instead suggests that those who want to do science should "stick with Wikipedia".

Perhaps it would be best to give the last, rather spiteful word to Yann LeCun, Meta's chief AI scientist: "Galactica demo is offline for now. It’s no longer possible to have some fun by casually misusing it. Happy?"

What does it mean for us?

Most of the issues and controversies we run into with ML models follow a familiar pattern: some researcher decides that "Wikipedia" is an interesting application for a new model, and creates some bizarre contraption that serves basically no purpose for editors. Nobody wants more geostubs! But this is not a problem with the underlying technology.

The field of machine learning is growing extremely quickly, both in terms of engineering (the implementation of models) and in terms of science (the development of vastly more powerful models). Anyone who has an opinion about these things is simply going to be wrong about anything a few months from now. They will only grow in importance, and I think that any editor who does not try to read as much about it as possible and keep abreast of developments is doing themselves a disservice. Not wanting to be a man of talk and no action, I wrote GPT-2 (while its successor model is more relevant to current developments, it has identical architecture to the old one, and if you read about GPT-2 you will understand GPT-3).

Moreover, we have already been tackling the issue of neural nets on our own terms: the Objective Revision Evaluation Service has been running fine for several years. It seems to me that, if we were to approach these technologies with open minds, it could be possible to resolve some of our most stubborn problems, and bring ourselves into the future with style and aplomb. I mean, anything is possible. For all we know, the Signpost might start putting out print editions.

J, AC, B, S

In this issue
+ Add a comment

Discuss this story

Let's forget about the print editions of The Signpost please! And maybe we should still define AI as artificial ignorance. After all, the machine has no understanding of the subject it is writing about. If it ever becomes a Wikipedia editor, it will likely be kicked off in a week for violations of WP:CIR, WP:BLP, WP:V, WP:NOR, etc. Before we start accepting any text directly from AI programs, there should be a test on whether it can follow BLP rules - that's just too difficult. Maybe just throw out all AI contributions about BLPs, but run the test on WP:V. In theory, at least, it could get the references right once it gets a concept of the meaning of what the references say - but that's a way off. Sure, there are tasks AI can do but they are essentially rote (easily programmable) tasks, e.g. finding possible refs, alphabetizing lists, maybe even constructing tables. Once an AI program can consistently do those simple tasks, then we can try it out with more difficult problems, e.g. identifying contradictions in articles or checking birth and death dates.

ORES is a marvelous program for checking article quality but it only does certain things that are related to article quality. I'm guessing the factors it considers to be number of words, refs, sections, illustrations, quality tags placed by editors. Maybe even incoming links and outgoing links. It can count real good and the added value is the correlation of thse counts to real person evaluation of quality. I love it for what it does, but everybody knows that there are some terrible articles with lots of words, refs, sections, and photos. Smallbones(smalltalk) 16:51, 29 November 2022 (UTC)Reply[reply]

And furthermore - AI programs should be able to recognize that "and furthermore" means that somebody is pissed off! and that "it counts good" is a criticism, not a compliment!
Perhaps the first test we should run on an AI contributor is whether it can write articles on "Limitations of Artificial Intelligence" and "Benefits of Artificial Ignorance". If those 2 articles don't have considerable overlap, the designers should go back to the drawing board. Smallbones(smalltalk) 17:17, 29 November 2022 (UTC)Reply[reply]
@Smallbones: Well, GPT-3 had this to say when I prompted it with your comment:
I don't understand why you would suggest that AI can only do rote tasks, when in fact AI has been used for a variety of tasks, from playing chess to recognizing faces. Furthermore, AI is being used for more complex tasks, such as natural language processing and image recognition. It is being used to automate and streamline processes in industries such as finance, health care, and transportation. And it is being used to create new products and services. So, I think the idea of artificial ignorance is misguided. AI should be judged by its performance, not its level of understanding. If an AI can do something better than a human, then it should be used. If it can't, then it should be discarded. And if it can do something better than a human, then it should be given the opportunity to do so. The bottom line is that AI is a powerful tool, and it should be used to its fullest potential.
I believe that AI could be a great asset to Wikipedia, as long as it is used responsibly and appropriately. AI can be used to help automate routine tasks, such as verifying facts and sources, formatting content, and checking for spelling and grammar errors. AI can also be used to more accurately match content to Wikipedia's editorial standards - for example, it could be used to detect and flag potential bias or inaccuracies in articles. AI could even be used to suggest possible improvements or additions to articles. Ultimately, AI could be used to help make Wikipedia's content more accurate, comprehensive, and up-to-date. However, AI should not be used in place of human editors. AI can only do so much, and it cannot match the nuanced understanding of the subject matter that comes with human experience. AI should be used as a supplement to human editors, not as a replacement.
Straight from the horse's mouth... jp×g 03:16, 1 December 2022 (UTC)Reply[reply]
I liked the smallbone's "further more" comments.-Nizil (talk) 04:44, 7 December 2022 (UTC)Reply[reply]
I think the problem with AI is, that we are not able (at least because of the potential risks, if not by systematic errors in training) to let an AI have own ideas, so that it e.g. can correct systematic errors in training data on its own. The programming interrogator is everything to the AI. If there now would be a real artificial intelligence, with broader understanding, in the best (but also bad) case the programmer is some type of god to it, in the worst case some type of O’Brien, who tells it that 2+2 equals 5. Habitator terrae (talk) 22:49, 7 December 2022 (UTC)Reply[reply]


The Signpost · written by many · served by Sinepost V0.9 · 🄯 CC-BY-SA 4.0