The Signpost

Special report

New internal documents raise questions about the origins of the Knowledge Engine


  • The Discovery FAQ on MediaWiki states that "We are not building Google. We are improving the existing CirrusSearch infrastructure with better relevance, multi-language, multi-projects search and incorporating new data sources for our projects. We want a relevant and consistent experience for users across searches for both wikipedia.org and our project sites."
  • In a November 4 email to all WMF staff, provided to the Signpost by several WMF staffers, executive director Lila Tretikov expressly stated that the Knowledge Engine "is NOT ... a search engine".
  • Just hours before the release of the grant agreement, Jimmy Wales was even more blunt: "To make this very clear: no one in top positions has proposed or is proposing that WMF should get into the general "searching" or to try to "be google". It's an interesting hypothetical which has not been part of any serious strategy proposal, nor even discussed at the board level, nor proposed to the board by staff, nor a part of any grant, etc. It's a total lie."
  • However, these statements are flatly contradicted by the now-released grant agreement between the WMF and the Knight Foundation. Quotes such as the following make it abundantly clear that what is envisioned under the terms of the grant is indeed a search engine:

  • "Knowledge Engine by Wikipedia, a system for discovering reliable and trustworthy public information on the Internet." (Page 1.)
  • "Would users go to Wikipedia if it were an open channel beyond an encyclopedia?" (Page 2.)
  • "Knowledge Engine by Wikipedia will democratize the discovery of media, news and information – it will make the Internet's most relevant information more accessible and openly curated, and it will create an open data engine that's completely free of commercial interests. Today, commercial search engines dominate search-engine use of the Internet, and they're employing proprietary technologies to consolidate channels of access to the Internet's knowledge and information. Their algorithms obscure the way the Internet's information is collected and displayed. ... Knowledge Engine by Wikipedia will be the Internet's first transparent search engine, and the first one originated by the Wikimedia Foundation." (Page 10.)
  • "Proceed with the search engine project as deliberately as possible – which is what the Wikimedia Foundation is doing" (Page 13.)
  • Three internal WMF documents illustrating how WMF thinking about the project evolved have been leaked to the Signpost:

    Related articles
    Knowledge Engine

    WMF strategy consultant brings background in crisis reputation management; Team behind popular WMF software put "on pause"
    6 February 2017

    Knowledge Engine and the Wales–Heilman emails
    24 April 2016

    [UPDATED] WMF in limbo as decision on Tretikov nears
    24 February 2016

    Search and destroy: the Knowledge Engine and the undoing of Lila Tretikov
    17 February 2016

    New internal documents raise questions about the origins of the Knowledge Engine
    10 February 2016


    More articles

    While Heilman did not provide these documents to the Signpost, he confirmed their authenticity and stated that these were the same documents that were released to the entire Board following pressure from him and fellow Board member Dariusz Jemielniak, in the face of reluctance from other Board members and Tretikov. He told the Signpost that after "other board members told us we did not need to see" them "we pushed hard to have these documents released to the Board."

    We describe the documents in detail in this week's "In Focus". The earliest document, dated April 2, 2015, is a 12-slide presentation marked "FINAL". While the phrase "Knowledge Engine" does not appear, it's clear that even at this early stage, the "Wikipedia Search" referred to here was a well-developed concept. The presentation contrasts the ideals and motivations of commercial search engines – they "highlight paid results, track users' internet habits, sell information to marketing firms" – with those of "Wikipedia Search", which will be private, transparent, and globally representative. It repeatedly stresses that "No other search engines carry these ideals".

    Several well-designed examples of search results follow, including the one pictured above. They prominently brand Wikipedia and feature multimedia content and multiple Wikimedia projects such as Wiktionary and Wikivoyage. The results include non-wiki sources like Fox News and Open Maps.

    The June 24 document is a draft proposal for the project, by then referred to as the Knowledge Engine, which promises to be "a new global project that will once again change the way people access knowledge on the Internet", fully leveraging Wikipedia's and the WMF's resources, values, and reputation. The Knowledge Engine is described as "a federated knowledge engine that will give users the most reliable and most trustworthy public information channel on the web" that "will make the Internet’s most relevant information more accessible and openly curated, and it will create an open data engine that’s completely free of commercial interests". Knowledge Engine "will be the Internet’s first transparent search engine, and the first one that carries the reputation of Wikipedia and the Wikimedia Foundation."

    The proposal divides the plan into four stages, each lasting 16–18 months. Interestingly, the first stage is called Discovery, which is the term the WMF currently uses to refer to the Knowledge Engine project. The proposal asks for US$6M from the Knight Foundation over three years. It pledges $2.4M of the WMF's own resources to the project for the current fiscal year, including eight presumably full-time engineers and two data analysts.

    The final document, dated August 5, 2015, resembles the publicly released current grant agreement in many ways, including much of the same language. The grant amount has dropped to its current $250,000, but this amount is only for the first Discovery phase of the larger Knowledge Engine project. Both the amount and its designation for phase one appear in the current grant agreement.

    These documents raise significant questions about how much the Knowledge Engine has actually evolved from April 2015 and what the technical and social implications of this project will be for Wikimedia.

    These questions are at the heart of the current debate regarding transparency, accountability, the relationship between the WMF and the Wikimedia community, and the uncertain direction of that movement.

    + Add a comment

    Discuss this story

    These comments are automatically transcluded from this article's talk page. To follow comments, add the page to your watchlist. If your comment has not appeared here, you can try purging the cache.

    The thing is, to be honest, does any of this drama matter? We all know what the end results are going to be. The WMF is going to do whatever the heck they want no matter what anyone else says, they'll spend a ton of time and money on a technical project in the face of opposition, they'll release "in beta" a broken, buggy version that sort of resembles what they promised to release, and years later it will still not be done and will get quietly shut down. For a non-profit that so desperately wants to be a "tech" startup, they have a terrible track record at actually producing usable software projects, much less managing their PR cleverly. --PresN 05:52, 15 February 2016 (UTC)[reply]

    It matters if the Foundation alienates enough volunteers that they quit & many or all projects -- Wikipedia, Wiktionary, Commons -- go into a death spiral. It doesn't look good if someone is known as the "ED that killed Wikipedia." -- llywrch (talk) 06:21, 15 February 2016 (UTC)[reply]
    @PresN: I would tend to agree, but I find two of the bullet points in the application worth serious attention. In section 4 ("Activities"), WMF lists as two of their initial tasks:
    • Develop prototypes for evolving wikipedia.org, which will become the home of the knowledge engine.
    • Answer this targeted learning question: Would users go to Wikipedia if it were an open channel beyond an encyclopedia?
    From these two bullets, it appears that there is some sort of plan to release the knowledge engine as an overhaul ("evolution") of Wikipedia as a whole. If they follow the usual buggy, "in beta" pattern, it could be catastrophic. Even if the project is executed to perfection, there needs to be serious community discussion about "evolving" Wikipedia into "an open channel beyond an encyclopedia" (with this context, it seems clear that "beyond an encyclopedia" means "instead of just an encyclopedia"). It's a wee bit distressing that the WMF applied for and received funding to plan the fundamental transformation of Wikipedia.org without any community consultation. Or maybe I'm just reading it wrong. A2soup (talk) 06:29, 15 February 2016 (UTC)[reply]
    Last time I checked, Wikipedia was an encyclopedia with search capabilities, not a search engine backed by an encyclopedia. If the WMF wants to build its own search engine, so be it, but call it something else (Wikisearch?). MER-C 06:58, 15 February 2016 (UTC)[reply]
    Agree 100%. If they really want to leverage the name, they could call it "Wikipedia Search" or "Search by Wikipedia" and put it at search.wikipedia.org. They could even link it prominently on wikipedia.org. All of that would be fine by me. But let's leave the Wikipedia search box for searching Wikipedia exclusively, at the very least until the knowledge engine has been completed, rolled out, and proven to be successful over a period of years. A2soup (talk) 07:03, 15 February 2016 (UTC)[reply]
    The impression I get from the materials is that the public document access/updated search are just a part of the new framework. It looks as if the idea is to leverage the name/domain to attach to a GUI which accesses data from across Wikimedia's projects as well as certain other databases. But I agree that it does not feel like the right answer. This can only create confusion for users and volunteers alike, and who knows what the technical or community implications could be. Maybe there's something to this knowledge engine notion; it genuinely seems like something I might be very happy to see come to fruition. But sell it on it's own merit, with it's own domain/branding. It's notable that these documents present a great deal more focus on whether readers/end-users would react to the idea of a conceptual overhaul and facelift, and considerably less focus on how the community of volunteers would view the change. Mind you, I'm not the type to view absence of evidence as evidence of absence--that is, maybe it was paramount on their minds. But something just seems off in the way the WMF seems to be viewing this process, like they are putting the cart before the horse, in more than one respect. Snow let's rap 07:58, 15 February 2016 (UTC)[reply]
    I'm sure everyone here agrees that Wikipedia's search functionality isn't great, and spending some of this money to improve it is non-controversial. I wouldn't mind an additional sidebar on our search results that say "hey, Wikivoyage has a travel guide on X" or "Japanese Wikipedia has an article on Y" (in Japanese), but the full knowledge engine concept with its Google-like GUI needs to be a separate Wikimedia project with its own branding. "Wikipedia Search" or "Search by Wikipedia" are not sufficient because they dilute Wikipedia's brand and purpose, which is strictly to build a free content encyclopedia. MER-C 08:21, 15 February 2016 (UTC)[reply]
    • I'm curious as to which community is being referred to when the June 24 attachment offers: "Open curation via vast, international community of editors." Does anyone know how the curation is meant to work?
    I also wonder which advisory team is being referred to: "We’re focused on creating resources and tools for an open knowledge-engine community, and building on the input of an advisory team." SarahSV (talk) 07:41, 15 February 2016 (UTC)[reply]

    "However, these statements are flatly contradicted ..." - Uh-oh Mister Ranger isn't gonna like this, Yogi! (n.b., that's older US slang which means "The authorities shall be displeased by your bold action"). The above is excellent work overall, my compliments. But there's a bit of background context which would have avoided a slight misstep there. When they talk about NOT compete with Google, they mean they aren't building a complete-web database, funding by advertising, to try to get a piece of that amazing money-machine that's been mastered by Google (the amounts involved are enormous). Rather, they're focusing on a restricted segment, and going for a different strategy for support. Now, the following remarks are purely speculative and the product of a very jaded and cynical person. Given Wales's previous Wikia Search project, and the extensive Google connections with the current Wikimedia Foundation Board, I would be extremely wary that this project exists to help Google in further improving its search results (that's indeed not competing with Google!). The spam and junk battle is ongoing. If Google can get Wikipedians to "volunteer" to mostly work for free in refining algorithms and curation, aiding it even more than they do already, that's advantageous to both Google and the Wikimedia Foundation people (who will likely somehow eventually end up with tangible reward, while you will get the joy and happiness of having oiled the amazing money-machine, excuse me, helped distribute knowledge to the world). Perhaps the proponents of the project will say I am an idiot for such thoughts, but always ask, "Who benefits?". -- Seth Finkelstein (talk) 08:00, 15 February 2016 (UTC)[reply]

    Quite right. I mused about these possibilities in last week's op-ed: Wikipedia:Wikipedia_Signpost/2016-02-03/Op-ed. Have a look, Seth, if you missed it.
    There is another related issue: while the example in the early mock-up shown above included a result from Fox News, it's unclear how that intent has evolved. More recent documents seem to indicate a search engine whose results include open-access sources only. Is that the intent today? (The answer will probably be crickets, but hey, it doesn't hurt to ask.) The notion of "disappearing" all copyrighted information from search engine results, creating a universe of knowledge that consists of freely copyable sources only, might be attractive to some (most of all Google and other Silicon Valley players, which would be free to reproduce salient bits of this content on their own search engine results pages and slap ads on it), but I'd find it a bit Orwellian. Andreas JN466 08:25, 15 February 2016 (UTC)[reply]
    Excellent piece! I particularly liked the line about "unpaid hamsters driving the spinning cogs ..." with the picture of the "Volunteer". I obviously concur with your conclusions, though I take a somewhat more minimalist reasoning path based on the economics of search engines and the business models. Note I suspect the "open access" aspect is primarily not ideological, but pragmatic. Google Books has involved a long, expensive, copyright lawsuit. Outside the US, Google News is also embroiled with various disputes with copyright laws. The Wikimedia Foundation, even with its current budget, doesn't have the money to risk being a lawsuit target in such a dispute (those lawsuits are also matters of enormous amounts of money). -- Seth Finkelstein (talk) 10:13, 15 February 2016 (UTC)[reply]


    Great idea.. terrible management

    The more I see of this, the more I like it actually. Step 1, let's improve search, discovery and exploration within our own websites, then slowly pull in more stuff (everything that is open), then see if we can do some open model to even pull in the rest of the world. As a 10 year vision it's actually something that I have been waiting for.... Everyone knows that there will be left and right turns along the way, and marketing bullshit speak, and what not. Maybe we will get there, maybe not, whatever, at least it's a point in the future to strive for (and actually a pretty achievable one I suspect). But once again, it's a total F'up of communication towards the community. It is pathetic. It's shameful. And the worst part is that it's apparently equally bad handled internally, causing staff to feel the exact same way. (addendum: and yes. also managed badly towards the knight foundation of course). —TheDJ (talkcontribs) 09:23, 15 February 2016 (UTC)[reply]

    Very much agree with TheDJ’s message above. Jean-Fred (talk) 10:58, 15 February 2016 (UTC)[reply]
    +1 Wittylama 11:27, 15 February 2016 (UTC)[reply]
    +2 Daniel Mietchen (talk) 23:32, 24 February 2016 (UTC)[reply]
    From Knight Grant Agreement page 9, a fourteen-sized engineering team at WMF costs circa 2,500,000$ a year (wages, equipment, travel, computers, coffee pots and perhaps buildings too). This amount will be paid out whatever the team will be committed to: creating another Gather, gathering another Create, searching for Knowledge, knowing about Search, or any other inventive item Discovered™ by the Heads of General Staff. It seems that, instead of funding three years full scale, the Knight Fundation only funded 10% of the first year of the Discovery Team, with some obligation of results (that's the meaning of a restricted grant). Maybe this was a prudent move since, for the moment, instead of a piece of Discovery©, the result is rather another piece of WikiShitStorm©, a distributed application aimed at disheartening the granters. Pldx1 (talk) 10:00, 15 February 2016 (UTC)[reply]
    I would look first at the idea that the funders want to see if anything at all useful will result, rather than throw money at something where the grantee basically can't deliver anything more than a report on how the money was spent (i.e. the funders want to have some indication that there's an avenue worth pursuing) -- Seth Finkelstein (talk) 10:10, 15 February 2016 (UTC)[reply]

    Data sources

    If Fox News or TeleSUR have the slightest chance of appearing as data sources of this searching project, I will campaign to stop it. --NaBUru38 (talk) 14:04, 15 February 2016 (UTC)[reply]

    I would very much like to know who wrote the mockup that included a sample search result attributed to Fox News, and what they could conceivably have been thinking. This seems spectacularly imprudent. MarkBernstein (talk) 16:12, 15 February 2016 (UTC)[reply]
    I'm guessing that it was actually meant to be provocative. If it had been BBC news, possibly no one would have noticed it, so they went opposite to that, even though it probably isn't very realistic. It is meant to stand out and communicate: "We [as a community] could even get as crazy as doing this".
    Also can we please allow people the freedom to sketch ideas ? Not everything that is discussed or sketched or mocked needs to be accounted to death. If we start treating every single variation of an idea as a long term plan, then we are curtailing people's creativity beyond reason. That's not really healthy. In design you often initially step out of your comfort zone, before building a new one. Anyone who doesn't understand something like that... well that person should probably not read all of the essays that Wikipedia hosts either. —TheDJ (talkcontribs) 16:31, 15 February 2016 (UTC)[reply]
    "Also can we please allow people the freedom to sketch ideas?"
    This is an actual plan, and a very bad one. --NaBUru38 (talk) 21:32, 16 February 2016 (UTC)[reply]
    Eh no, it's one of several sketches accompanying a plan —TheDJ (talkcontribs) 07:41, 17 February 2016 (UTC)[reply]

    I would very much like to know if the copyright owner of the "April 2 - FINAL- Knight Search Presentation - 04.02.15.pdf", i.e. the Wikimedia Foundation, decided to make a public release of this document, that seems to be an internal document. Pldx1 (talk) 16:24, 15 February 2016 (UTC)[reply]

    @Pldx1: You should ask Gamaliel, who uploaded the screenshot on Commons under CC-BY-SA 3.0, licensing which we obviously cannot verify. – Finnusertop (talkcontribs) 17:38, 15 February 2016 (UTC)[reply]

    What if it works?

    There's a lot of negativity in the comments above, about the competence of the WMF to manage the creation of a good, unbiased, advert-free search engine (or whatever they would like it called). This is hardly surprising, after Visual Editor and Media Viewer. But supposing they get everything right this time? Who would lose most? I think I now understand the recent appearance of several Google employees on the WMF board. Maproom (talk) 00:06, 16 February 2016 (UTC)[reply]

    Wikimedia is a community of people that produce and share educational resources. Developing and maintaining a search engine is stretching the terms "produce" and "resources". --NaBUru38 (talk) 21:34, 16 February 2016 (UTC)[reply]

    The shadow

    A shadow hangs over the WMF. The board should have never exercised its "rights and privileges", like some tyrant, in such a fashion as they did, to police our representatives like some sovereign. This public act by the board has as much tact as calling a person a cunt. And since, so many of us have viewed the WMF board as corrupt (as in bribery), and viewed this corrupt product with extreme bias.

    The logic of a simpleton I admit. But please tell, would it not be equally OK to pass an Act of the People of Florida dissolving the WMF and taking possession of its property? (You better believe the courts in San Francisco, California (as in the WMF terms of use) would give full faith and credit to such an act, and I don't think Jimbo would be able to successfully re-incorporate his trusteds.) Would that not be at least equally acceptable as this nefarious, recent act of this trusted board, in terms of the excuses we've been given?

    We have identified a flaw in our government, and it is time that, just as the people of people of Florida rule Florida , that the community regain trust in this trusted board. Or whatever negative feelings we have towards this project will grow in time into general contempt for all things WMF. int21h (talk · contribs · email) 07:01, 17 February 2016 (UTC)[reply]

    Wikidrama

    "An open data engine that’s completely free of commercial interests"? – THE HORROR!!!!!!! If the people in question had known there would be such an outcry over goals like "Credible", "Publicly curated", "Open source" and "Unbiased by commercial concerns", they would probably have tackled it differently. But unfortunately they can't read your minds retroactively.--Anders Feder (talk) 06:46, 19 May 2016 (UTC)[reply]



           

    The Signpost · written by many · served by Sinepost V0.9 · 🄯 CC-BY-SA 4.0