The Signpost
Single-page Edition
WP:POST/1
6 September 2016

News and notes
AffCom still grappling with WMF Board's criteria for new chapters
Special report
Olympics readership depended on language
In the media
Librarians, Wikipedians, and a library of Wikipedia coverage
WikiProject report
Watching Wikipedia
Featured content
Entertainment, sport, and something else in-between
Traffic report
From Phelps to Bolt to Reddit
Technology report
Wikimedia mobile sites now don't load images if the user doesn't see them
Recent research
Ethics of machine-created articles and fighting vandalism
Blog
Upload of free photos from Swiss library underway
 

2016-09-06

AffCom still grappling with WMF Board's criteria for new chapters

What is AffCom? The Foundation's volunteer Affiliations Committee, created by the Board of Trustees 10 years ago, advises the Board on the approval of new WMF affiliates—chapters, thematic organisations, and user groups. AffCom's membership is large: currently there are 22 members, comprising 12 voting members, most of them with strong connections to an affiliate, and ten non-voting "advisers". These advisers enable the WMF to monitor and exercise a degree of control over AffCom; they include two board liaisons, three staff liaisons, and three staff observers.



Reader comments

2016-09-06

Olympics readership depended on language

A special Traffic Report: the Olympics

The past few weeks of the Traffic Report have been dominated by the 2016 Summer Olympics. Since the Olympics are one of the world's biggest international events, you might guess that it dominated the most-viewed articles of other language Wikipedias. And you would be right. But the topics of interest around the world show interesting variations. We love the Olympics, but also love our own Olympics and Olympians.

Using the WMF data available through TopViews*, we compiled charts of the 15 most popular Olympic-related articles for the period of August 5–21, the official period of the Olympics, for seven different language Wikipedias: English, Spanish, German, Portuguese (the language of Brazil, the host country), Russian, French, and Japanese. We considered, but declined, to include the Chinese Wikipedia due to its blockage in China greatly affecting its viewership.**

Michael Phelps is big everywhere, except in Japan.

First of all, Michael Phelps really is popular worldwide. His biography was far and away #1 in English, #2 in Russian and Spanish, #3 in Portuguese, #4 in French, and #5 in German. Similarly, Usain Bolt was generally behind Phelps, and solidly the second most popular athlete of the Games. He ranked #3 in English, #4 in Spanish, #5 in Russian, #6 in Portuguese and French, #8 in Japanese, and #11 in German.

But the old saying "big in Japan" did not apply to Phelps, where he placed 12th, the only place where Bolt was about 25% more popular. To be big in Japan, though, you really had to be Japanese—the top seven Olympic-related articles were filled by Japanese medalists, not even interrupted by general articles like 2016 Summer Olympics (#1 on five lists) or the All-time Olympic Games medal table which were usually popular across the board. Japan's list was led by Saori Yoshida, who won wrestling silver, and had 240% the views of Phelps. She was followed by many others, presumably now household names in Japan, including gymnast Kōhei Uchimura (#2) and table tennis whiz Ai Fukuhara (#3).

Though the Japanese Wikipedia is the most extreme case, it is not fair to single it out; the data reveals that every language edition tends to favor its own. French judo practitioner and gold medalist Teddy Riner beat Phelps and Bolt on the French Wikipedia. Elsewhere, local favorites were not far behind Phelps and Bolt. In Spanish, Argentine tennis player Juan Martín del Potro, who won silver, was #5, and Spaniard Rafael Nadal was #9. In German, horizontal bar gold medalist Fabian Hambüchen (#8) was the top local hero. And in English, American gymnasts including Simone Biles (#4) and Aly Raisman (#9), and swimmers Katie Ledecky (#8) and Ryan Lochte (#11), were prominent, though India's P.V. Sindhu, who won silver in badminton, drew an impressive #6 showing on the otherwise American-dominated list. Sindhu and the top Americans, other than Phelps, do not appear on the other charts. And vice-versa: English speakers, for instance, were not focused on the three medals won by Russian gymnast Aliya Mustafina (#6 in Russia); she doesn't appear anywhere on the English (or other) charts.

Everybody wants to know how everyone else is doing; medal table charts were also popular articles, including the All-time Olympic Games medal table and the 2012 table. But people especially want to know how their country is doing. Thus the Spanish Wikipedia saw Mexico at the Olympics at #10, Colombia at the Olympics at #11, and Argentina at the Olympics at #13. Brazil at the Olympics was #5 on the Portuguese Wikipedia, and in their respective domains, Russia at the 2016 Summer Olympics was #3, and France at the 2016 Summer Olympics was at #10.

Not popular in English, but rather popular elsewhere, was Football at the 2016 Summer Olympics. Perhaps because the American women's team floundered, no football-related articles are in the English Top 15, but such articles hit #3 in Germany (who won medals in both men's and women's), #7 in Spanish, #8 in Portuguese, and #14 in Russian. But if your country is good in a sport, like Germany was in football, or France was in the modern pentathlon (women's silver, #5), that's what you're most likely going to watch.

Our data collection showed that the Olympics were very popular everywhere. Other non-Olympic topics do appear in their general charts (remember the charts below are Olympic-only articles), just as we see on the Traffic Report, but to about the same extent. The lone exception may be Russian, where the popularity of other articles such as the film Suicide Squad seemed a bit higher—perhaps a reflection of the disqualification of many Russian athletes.

So, just like the Ancient Olympic Games brought together all of Greece, the modern Olympics does seem to bring us all together. We may celebrate our own victories a bit more, but that is part of a human nature we all share and treasure.

English Wikipedia

Indian badminton star P.V. Sindhu, #6, earned her position among a slew of Americans on the English Wikipedia.
Rank Views Article Notes
1 8,541,642 Michael Phelps American swimmer
2 5,834,783 2016 Summer Olympics
3 3,972,644 Usain Bolt Jamaican sprinter
4 3,047,891 Simone Biles American gymnast
5 2,069,683 Olympic Games
6 2,046,156 P.V. Sindhu Badminton silver for India
7 1,941,000 Aly Raisman American gymnast
8 1,833,635 Katie Ledecky American swimmer
9 1,833,545 2012 Summer Olympics medal table
10 1,825,836 List of Olympic Games host cities
11 1,784,183 Ryan Lochte American swimmer
12 1,717,762 All-time Olympic Games medal table
13 1,635,559 2024 Summer Olympics
14 1,630,544 2020 Summer Olympics
15 1,524,028 India at the 2016 Summer Olympics

Spanish Wikipedia

Juan Martín del Potro of Argentina (#6) won silver in men's singles tennis.
Laura Ludwig (#10) and Kira Walkenhorst of Germany won gold in beach volleyball.
German Wikipedia
Rank Views Article Notes
1 1,194,670 Olympische Sommerspiele 2016 2016 Summer Olympics
2 424,724 Medaillenspiegel der Olympischen Sommerspiele 2012 2012 Summer Olympics medal table
3 379,697 Olympische Sommerspiele 2016/Fußball Germany won women's gold and men's silver in football.
4 366,095 Medaillenspiegel der Olympischen Sommerspiele 2016 2016 Summer Olympics medal table
5 328,098 Michael Phelps #1 on en.wiki
6 259,090 Ewiger Medaillenspiegel der Olympischen Spiele All-time Olympic Games medal table (#12 on en.wiki)
7 231,559 Moderner Fünfkampf Modern pentathlon; Germany did not medal
8 226,895 Fabian Hambüchen German gymnast, gold in horizontal bar
9 225,299 Olympische Spiele Olympic Games
10 214,151 Laura Ludwig German, won gold beach volleyball
11 211,147 Usain Bolt #3 on en.wiki
12 183,147 Angelique Kerber German, won silver in tennis
13 175,795 Fußball bei den Olympischen Spielen Football at the Summer Olympics
14 167,722 Franziska van Almsick Famed German swimmer 1992–2004 Games
15 161,435 Isabell Werth German, two medals in equestrian events

Portuguese Wikipedia

Brazilian Daiane dos Santos (#9) appeared in the 2004–2012 Olympics.
Russian gymnast Aliya Mustafina (#6) won three medals at this Olympics, including gold in uneven bars.
Judoka Teddy Riner of France was the most popular athlete on the French Wikipedia.
Table-tennis player Jun Mizutani won two medals and was the seventh-most popular athlete in Japan, but that was still more popular than both Phelps and Bolt.
Rank Views Article Notes
1 820,546 吉田沙保里 Saori Yoshida won wrestling silver.
2 649,113 内村航平 Kōhei Uchimura won two golds in artistic gymnastics.
3 553,213 福原愛 Ai Fukuhara won table tennis bronze
4 549,533 ケンブリッジ飛鳥 Asuka Cambridge, silver in 4×100 relay
5 503,043 伊調馨 Kaori Icho, wrestling gold
6 482,702 ベイカー茉秋 Mashu Baker, judo gold
7 442,357 水谷隼 Jun Mizutani, 2 table tennis medals
8 429,937 ウサイン・ボルト Usain Bolt
9 384,173 松友美佐紀 Misaki Matsutomo, tennis gold
10 366,963 伊藤美誠 Mima Ito, table tennis bronze
11 344,874 ロンドンオリンピック (2012年) での国・地域別メダル受賞数一覧 2012 Summer Olympics medal table
12 341,853 マイケル・フェルプス Michael Phelps
13 328,527 近代オリンピックでの国・地域別メダル総獲得数一覧 All-time Olympic Games medal table
14 306,033 石川佳純 Kasumi Ishikawa, team table tennis bronze
15 291,440 リオデジャネイロオリンピック 2016 Summer Olympics

Notes

  • *One caveat on TopViews: TopViews compiles data on the 1,000 most viewed articles on a Wikipedia for each day. Running charts for longer periods compiles from those daily charts. Thus, when an article drops out of the top 1,000, those views for a day will not be included in the compiled data. Generally speaking, we have found that this gap is not a significant problem when looking at the most popular articles. The English Traffic Report and WP:TOP25 are usually derived from the WP:5000, which includes all viewcount data, but there is no similar source for other-language Wikipedias. On the current WP:5000, the 1,000th most viewed article has under 59,000 views for one day. This number should be significantly lower on other language Wikipedias, which receive less traffic.
  • **We also reviewed statistics[1] for the Bengali Wikipedia (7th on the list of languages by total number of speakers), but traffic and usage there was too low to yield usable information. Though their page on the 2016 Summer Olympics was in their top 10 (#5), many of the more viewed articles on that project are traditional encyclopedic topics, e.g., #1 was Sheikh Mujibur Rahman, the founding leader of Bangladesh. Only 21 articles (on any topic) had more 5,000 views during the Olympics on that project.
The Arabic Wikipedia was also considered.[2] Though it has more traffic than the Bengali project (their 2016 Summer Olympics article was #1, showing users go there for topical information, the general Olympics Games article was #2, and Phelps was #10 among all articles), but only about 50 articles on that project broke 50,000 views during the Olympics, and primary encyclopedic articles (like Egypt and Saudi Arabia) were among them. Ultimately, space and time limitations led to the selection of seven languages to sample.



Reader comments

2016-09-06

Librarians and Wikipedians—meant to be together?

Ties between libraries and Wikipedia are growing

A forum, “Library Engagement and Wikipedia,” (slides) was held at the "International Federation of Library Associations’ 2016 World Library and Information Congress" in Columbus, Ohio, as reported by American Libraries. Alex Stinson and Jake Orlowitz of the Wikimedia Foundation highlighted initiatives such as #1Lib1Ref, which encourages librarians to verify and add citations to articles.

In the same vein, The Week reported on a new $250,000 grant by the John S. and James L. Knight Foundation to link library resources to Wikipedia. The project aims to provide better library archive access to editors and to train librarians in Wikipedia editing.

These discussions and initiatives inevitably link back to discussions about Wikipedia's culture and the gender gap. Inside Higher Ed lamented Wikipedia's current culture in the context of greater internet culture, where "highly stylistic lulz-based trolling" infects attempts at reasoned discussion. As has been stated before, a gender gap cannot be bridged where a community is seen as hostile by many female editors. Highlighting a blog post by Andromeda Yelton, who apparently attended the IFLA conference noted above, the article notes that librarians are 80% female and Wikipedians are 90% male, such that many see Wikipedia having an "adversarial, argumentative bent" that is not enjoyable to all.

Yet, the above initiatives evidence Wikipedia receiving more credit as an established institution, and thus becoming the target of more projects from the traditional institutions that curate knowledge. Perhaps Wikipedia got to where it is without as much formal support (and indeed in the face of many detractors), but the old guard eventually incorporating the nouveau riche is human nature. MW



Reader comments

2016-09-06

Watching Wikipedia with the project devoted to television


A television straight from the 1950s.
As this logo consists of simple geometric shapes and text, it is deemed ineligible for copyright and therefore permissible on Wikimedia Commons.

I asked about contributing to commons:Wikimedia Commons, and CAWylie felt that the commons was “like a separate entity from Wikipedia.” However, uploading screenshots of TV show title screens, or intertitles and crucial scenes from shows, is allowable. CAWylie has even seen fan or user-created logos pass on Wikimedia.

CAWylie tends to edit shows in which he’s familiar with the creative team or likes the show itself. He also edits articles he feels may be of interest to Wikipedia readers. Some of his favorite shows are ones that “change viewers’ perceptions. For example, at first Breaking Bad seemed to me like it would glorify the meth business. I was pleasantly surprised and happily proved wrong.”

One of his favorite articles to work on was the biography of Christopher Chapman, which CAWylie started and expanded. CAWylie says that Chapman was a pioneer in the film industry and influenced the way television was later filmed. CAWylie says that “Biographies are usually more fun to do, as research might reveal info not commonly known,” and he felt honored to create Chapman’s biography.

For anyone interested in getting involved with WikiProject TV, the talk page is active and editors can make requests or ask for help over there. Thanks to CAWylie for sharing his work on Wikipedia!





Reader comments

2016-09-06

Entertainment, sport and something else in-between

Covers of the first and last issues of Science-Fiction Plus

This Signpost "Featured content" report covers material promoted from 14 to 27 August.
Text may be adapted from the respective articles and lists; see their page histories for attribution.

Lynx illustration by Sidney Hall from Urania's Mirror
Design model of the Canadian National Vimy Memorial by Walter Seymour Allward
Miami Central Station is the newest metro station operated by Miami-Dade Transit.
Michael Phelps holds four individual and three team Olympic swimming records.
Quentin Tarantino's direction of "Grave Danger", a CSI: Crime Scene Investigation episode, garnered him a Primetime Emmy Award for Outstanding Directing for a Drama Series nomination.

Twelve featured articles were promoted these weeks.

  • No. 91 (Composite) Wing (nominated by Ian Rose) was a Royal Australian Air Force wing that operated during the Korean War and its immediate aftermath. It was established in October 1950 to administer RAAF units deployed in the conflict: No. 77 (Fighter) Squadron, flying North American P-51 Mustangs; No. 30 Communications Flight, flying Austers and Douglas C-47 Dakotas; No. 391 (Base) Squadron; and No. 491 (Maintenance) Squadron. The wing was headquartered at Iwakuni, Japan, as were its subordinate units with the exception of No. 77 Squadron.
  • Lynx (nominated by Casliber) is a constellation in the northern sky that was introduced in the 17th century by Johannes Hevelius. It is a faint constellation with its brightest stars forming a zigzag line. The orange giant Alpha Lyncis is the brightest star in the constellation, while the semiregular variable star Y Lyncis is a target for amateur astronomers. Six star systems have been found to contain planets.
  • Rare Replay (nominated by Czar) is a 2015 compilation of 30 video games from the 30-year history of developers Rare and its predecessor, Ultimate Play the Game. The emulated games span multiple genres and consoles, and retain the features and errors of their original releases with minimal edits. The compilation adds cheats to make the older games easier and a Snapshots mode of specific challenges culled from parts of the games. Player progress is rewarded with behind-the-scenes footage and interviews about Rare's major and unreleased games.
  • HMS Emerald (nominated by Ykraps) was a 36-gun Amazon-class frigate that Sir William Rule designed in 1794 for the Royal Navy. She was completed in 1795 and joined John Jervis's fleet in the Mediterranean. Emerald was one of several vessels to hunt down and capture Santisima Trinidad. She was part of John Thomas Duckworth's squadron during the Action of 7 April 1800 off Cadiz. Emerald served in the Caribbean throughout 1803 in Samuel Hood's fleet, then took part in the invasion of St Lucia, and of Surinam. Returning to home waters for repairs in 1806, she served in the western approaches before joining a fleet under James Gambier in 1809, and taking part in the Battle of the Basque Roads. In 1811 she sailed to Portsmouth where she was laid up in ordinary. Fitted out as a receiving ship in 1822, she was eventually broken up in 1836.
  • Wrestle Kingdom 9 (nominated by Ribbon Salminen and Starship.paint) was a professional wrestling pay-per-view event, produced by the New Japan Pro Wrestling promotion, which took place at the Tokyo Dome in Tokyo, Japan, in 2015. It was the 24th January 4 Tokyo Dome Show and the first event on the 2015 NJPW schedule. The event featured ten professional wrestling matches and one pre-show match, six of which were for championships. The event was attended by 36,000 people, and received universally positive reviews from critics.
  • The Boat Races 2016 (nominated by The Rambling Man) took place on 27 March. Held annually, The Boat Race is a side-by-side rowing race between crews from the universities of Oxford and Cambridge along a 4.2-mile (6.8 km) tidal stretch of the River Thames in southwest London. For the first time in the history of the event, the men's, women's, and both reserves' races were all held on the Tideway on the same day.
  • Science-Fiction Plus (nominated by Mike Christie) was a U.S. science fiction magazine published by Hugo Gernsback for seven issues in 1953. It was initially in slick format, meaning that it was large-size and printed on glossy paper. Gernsback had always believed in the educational power of science fiction, and he continued to advocate his views in the new magazine's editorials. Sales were initially good, but soon fell. For the last two issues Gernsback switched the magazine to cheaper pulp paper, but the magazine remained unprofitable. The final issue was dated December 1953.
  • "No Me Queda Más" (nominated by AJona1992) is a song by American recording artist Selena for her fourth studio album, Amor Prohibido. It was released as the third single from the album in 1994 by EMI Latin. "No Me Queda Más" was written by Ricky Vela, and production was handled by Selena's brother A.B. Quintanilla. A downtempo mariachi and pop ballad, the song portrays the ranchera storyline of a woman in agony after the end of a relationship. Its lyrics express an unrequited love, the singer wishing the best for her former lover and his new partner. Praised by music critics for its emotive nature, "No Me Queda Más" was one of the most successful singles of Selena's career.
  • The Canadian National Vimy Memorial (nominated by Labattblueboy) is a memorial site in France dedicated to the memory of Canadian Expeditionary Force members killed during the First World War. It also serves as the place of commemoration for First World War Canadian soldiers killed or presumed dead in France who have no known grave. The monument is the centrepiece of a 100-hectare (250-acre) preserved battlefield park that encompasses a portion of the ground over which the Canadian Corps made their assault during the initial Battle of Vimy Ridge offensive of the Battle of Arras.
  • "Did You Hear What Happened to Charlotte King?" (nominated by Aoba47) is the seventh episode of the fourth season of the American television medical drama, Private Practice, and the show's 61st episode overall. Written by Shonda Rhimes and directed by Allison Liddi-Brown, the episode was originally broadcast on ABC. The episode revolved around KaDee Strickland's character, and was intended to accurately portray a victim's recovery from rape. It earned the series, Rhimes, and Strickland several awards and nominations and was well received by critics, with Strickland's character and performance praised.
  • State Route 94 (nominated by Rschen7754) is a highway in the U.S. state of California that is 63.324 miles (101.910 km) long. The western portion, known as the Martin Luther King Jr. Freeway, begins at Interstate 5 in downtown San Diego and continues to the end of the freeway portion past State Route 125 in Spring Valley. The non-freeway segment continues east through the mountains to Interstate 8 near Boulevard is known as Campo Road.
  • Emma Stone (nominated by FrB.TG) (born 1988) is an American actress. Born and raised in Scottsdale, Arizona, Stone was drawn to acting as a child, and her first role was in a theater production of The Wind in the Willows in 2000. As a teenager, she relocated to Los Angeles with her mother, and made her television debut in VH1's In Search of the New Partridge Family (2004), a reality show that produced only an unsold pilot. After a series of small television roles, she won a Young Hollywood Award for her film debut in Superbad (2007), and received positive media attention for her role in Zombieland (2009).

Eight featured lists were promoted these weeks.

  • Miami-Dade Transit operates the Metrorail rapid transit system and the Metromover people mover system in Miami and Greater Miami-Dade County, Florida. The network consists of two elevated Metrorail lines and three elevated Metromover lines. Miami-Dade Transit operates 42 metro stations (nominated by Dream out loud), with 23 in the Metrorail system and 21 in the Metromover system (Brickell and Government Center stations serve both systems).
  • Sam Waterston (born 1940) is an American actor, producer and director. Waterston has appeared in numerous films, television shows as well as on stage (nominated by Arbero) during his career.
  • Marilyn Monroe (1926–1962) was an American actress who appeared in 29 films between 1946 and 1961 (nominated by SchroCat). After a brief career in modeling she signed short-term film contracts, and appeared in minor roles for the first few years of her career. Her major breakthrough came in 1953, when she starred in three pictures: the film noir Niagara, and the comedies Gentlemen Prefer Blondes and How to Marry a Millionaire. Monroe won, or was nominated for, several awards during her career. Those she won included the Henrietta Award for Best Young Box Office Personality and World Film Favorite, and a Crystal Star Award and David di Donatello Award. She was inducted to the Hollywood Walk of Fame and a Golden Palm Star was dedicated at the Palm Springs Walk of Stars. She continues to be considered a major icon in American popular culture.
  • The Adelaide Oval is a cricket ground in Adelaide, Australia. It is the home ground of the South Australia cricket team and both the men's and women's team of Adelaide Strikers as well as Australian rules football and soccer teams. Two-hundred international cricket centuries have been scored at the stadium (nominated by Yellow Dingo). The first century at the ground was scored by the Australian Percy McDonnell, and Don Bradman's 299 not out, is the highest individual score by a batsman at the ground.
  • The International Olympic Committee recognises the fastest performances in pool-based swimming events at the Olympic Games (nominated by The Rambling Man). Men's swimming has been part of the Summer Olympics since the Games' modern inception in 1896; but it was not until 1912 that women competed against each other. Races are held in four swimming categories: freestyle, backstroke, breaststroke and butterfly, over varying distances and in either individual or relay race events. Medley swimming races are also held, both individually and in relays, in which all four swimming categories are used. Of the 32 pool-based events, swimmers from the United States hold eighteen records, including one tied with a swimmer from Canada, Australia and China three each, Hungary two, and one each to the Netherlands, Brazil, Japan, Great Britain, Singapore and Sweden. Thirteen of the current Olympic records were set at the 2016 Games.
  • Selena Quintanilla-Pérez (1971–1995) was an American singer, songwriter, spokesperson, actress, and fashion designer. During her career, she has released (nominated by AJona1992) twenty-seven official singles, seven promotional singles, and made five guest vocalist appearances.
  • S.L. Benfica is a Portuguese professional football team based in São Domingos de Benfica, Lisbon. The club was formed in 1904, and played his first competitive match in 1906. Since their first competitive match, 247 players have played between 25 and 99 matches (nominated by Threeohsix). Three players fell one short of 100 appearances, and four former players went on to be first-team managers.
  • Quentin Tarantino (born 1963) is an American director, producer, screenwriter and actor. His filming career (nominated by FrB.TG) began in the late 1980s by directing, writing and starring in the black-and-white My Best Friend's Birthday, a partially lost amateur short film which was never officially released. Since then he has appeared in twenty-seven more films, directed ten more films (also guest directing in Sin City), wrote seven-teen more films and produced four-teen films. Tarantino has also appeared in eight television episodes, directed two and wrote one. He also appears in the game Steven Spielberg's Director's Chair as Jack Cavello.

Four featured pictures were promoted these weeks.



Reader comments

2016-09-06

From Phelps to Bolt to Reddit

Week of August 14–20, 2016

The Olympics reigned again this week, shifting from swimming to track as the games neared their end. Seven of the Top 10 slots are Olympic-related, as are 15 of the Top 25. But somehow the incomprehensible internet meme Killing of Harambe still creeped into the Top 25 at #25.

In technical news in follow-up from in August, we are happy to report that this report is now using data from a revamped WP:5000 report which uses WMF's newer data feeds, thanks to Chief Traffic Data Guru West.andrew.g (not an official title). All WP:5000 reports have been re-run for 2016 and are available in that page's history. So far we don't expect the changes to have a significant effect on our charts, though it may help us exclude some spider/bot traffic, and may include Wikipedia Zero traffic not captured before. Unfortunately, however, the new WMF data does not keep records of red link hits, so the WP:TOPRED report has been retired.

For the full top-25 lists (and archives back to January 2013), see WP:TOP25. See this section for an explanation of any exclusions. For a list of the most edited articles every week, see WP:MOSTEDITED.

The ten most popular articles for the week of August 14–20, 2016, as determined from the newly revamped WP:5000 report, were:

Rank Article Class Views Image Notes
1 Usain Bolt Good Article 3,103,335
The rhythm of the Summer Olympics went according to prediction. As swimming and Michael Phelps (#3) finished up, track took over, and Bolt took center stage, winning gold in both the 100 m and 200 m, for the third straight time. And he also won his third straight gold in the 4 × 100 m relay. Being regularly called the "greatest sprinter of all time" is not hyperbole at this point. An impressive 3.1 million views lead the chart, though well shy of the astounding 5.4 million views Phelps got last week.
2 2016 Summer Olympics C-class 2,125,265
Holding steady at #2 for a second week, a drop of about 150,000 views.
3 Michael Phelps Good Article 1,946,890
Down from #1 last week.
4 P. V. Sindhu C-class 1,858,843
Last week we noted that although India at the 2016 Summer Olympics was at #23 (#16 this week), the country had won no medals yet. Sindu became the first Indian woman to win an Olympic silver medal, in badminton. (And to tell you how lame American television coverage is, I had no idea badminton was a sport in the Olympics.) Sindhu was one of only two medalists from India, the second being a bronze won in women's wrestling by Sakshi Malik. Of course India's lack of medal haul regularly produces articles asking why. They are just SPORTS, people. Let's celebrate those who compete and shine.
5 Suicide Squad (film) C-class 1,254,079
DC Comics' ramshackle crew of press-ganged supervillains, forced to do the will of a shadowy organization or let their heads explode, are the stars of one of the most anticipated films in the nascent DC Cinematic Universe, which was released on August 5 to generally negative reviews. Nonetheless, it grossed $267 worldwide in its opening weekend.
6 Simone Biles C-class 935,583
The 19-year-old Olympic first-timer from America completed her medal haul with four golds (including the team competition) and one bronze.
7 Stranger Things (TV series) C-class 920,502
This Netflix science-fiction series is basically an 8-hour homage to early 80s kid-centric flicks like E.T., The Goonies and Explorers, though aimed mostly at adults. It has been a smash hit for Netflix, evidenced by its continuing appearance on this chart – five straight weeks. The Internet has seized on even the most mundane facets of the show, such as turning minor character "Barb" into a celebrity.
8 2012 Summer Olympics medal table List 874,861
With over 250,000 more views than 2016 Summer Olympics medal table (#18). Everyone likes to do their statistical comparison it seems.
9 Decathlon C-class 850,348
The competition in this traditional Olympic event was won by American Ashton Eaton (#12). Women compete in the seven-event heptathlon. Both events derive from the five-event pentathlon of the Ancient Olympic Games.
10 Rustom (film) Stub Class 780,159
This Indian crime thriller featuring Akshay Kumar (pictured) was released 12 August 2016.

MW

Week of August 21–27, 2016

Hello again, Reddit. One of the discoveries the Top 25 project has made over the years is that the site Reddit, which bills itself as "the front page of the Internet" because Wikipedia doesn't, has been a major factor in driving traffic here. It has also proven to be a massive justification for every quirky, oddball page that manages to make it through the deletion process, as these are frequently the most popular. In the past I've made impassioned defences of Reddit and its role in aiding Wikipedia, pointing out that our site has done little to draw people's attention to the information it conveys, leaving that job to Reddit and Google Doodles. I still feel that way, at least, for the section of Reddit that nearly always makes it here: TIL, or "Today I Learned". Comments on TIL threads seem to be fairly civil and genuinely inquisitive, but those make up only a tiny fraction of Reddit's user base. But, it is not those threads that best exemplify Reddit; rather it is the river of bile and toxicity that has flowed from the Killing of Harambe that best illustrates what Reddit has become. These days Reddit is mostly famous in the wider media as a den of race hate, misogyny, borderline paedophilia, and every other objectionable but not strictly illegal form of behaviour. The commitment of the site's owners to free speech has meant that many of their topic threads, or subreddits, have become echo chambers of vitriol, as those who disagree are shouted down or chased off. One writer for Time magazine has written Reddit off as unsalvageable. As such, I think Wikipedia would be better off taking on more of the job of spreading word of its content.

The ten most popular articles for the week of August 21 to 27, 2016, as determined from the newly revamped WP:5000 report, were:

Rank Article Class Views Image Notes
1 SummerSlam 2016 N/A 1,102,249
WWE's latest pay-per-view pantomime was held on August 21, 2016 at the Barclays Center in Brooklyn, New York with the headline bout "won" by Brock Lesnar (pictured)
2 2016 Summer Olympics C-class 1,019,002
Numbers are down by half, but the article is still holding at #2. The closing ceremony was held on August 21, the first day recorded by this list, so interest in the Olympics clearly has faded quickly. It will be interesting to see what will happen when the Paralympics get underway.
3 Stranger Things (TV series) C-class 933,503
This Netflix science-fiction series is basically an eight-hour homage to early-80s kid-centric flicks like E.T., The Goonies and Explorers, though aimed mostly at adults. It has been a smash hit for Netflix, evidenced by its continuing appearance on this chart – six straight weeks. The Internet has seized on even the most mundane facets of the show, such as turning minor character "Barb" into a celebrity. Numbers have not shifted particularly since last week, but with the overall low view count it has let it rise four slots.
4 Suicide Squad (film) C-class 776,092
DC Comics' ramshackle crew of press-ganged supervillains, forced to do the will of a shadowy organization or let their heads explode, are the stars of one of the most anticipated films in the nascent DC Cinematic Universe, which was released on August 5 to generally negative reviews. Nonetheless, it grossed $267M worldwide in its opening weekend.
5 UFC 202 N/A 759,740
The latest Ultimate Fighting Championship was held on August 20 at the T-Mobile Arena in Las Vegas. The headlining bout was a rematch between UFC Featherweight Champion Conor McGregor (pictured) and Nate Diaz, who had defeated McGregor at UFC 196. McGregor won this bout by majority decision.
6 Killing of Harambe Start-class 735,203
What began as a heartfelt reaction to what some felt was the unnecessary killing of a silverback western lowland gorilla (pictured, though not him specifically) has morphed over the last three months into online trolling and racist abuse, along with the standard targeted misogyny. What the troll army hopes to accomplish is never clear, but whatever it is it doesn't involve helping gorillas.
7 Blonde (Frank Ocean album) Start-class 722,611
The long-delayed album from rapper and R&B artist Frank Ocean was released exclusively on Apple Music on August 20 to near-universal acclaim.
8 Tic Tac Start-class 711,441
As learned on a Reddit thread this week, Tic Tacs are almost pure sugar, but small enough to be considered sugar-free per serving. Interestingly, the two other Reddit threads linked to this article also noticed the same thing.
9 Frank Ocean B-Class 697,461
See #7
10 Deaths in 2016 List 617,084
The views for the annual list of deaths are remarkably consistent on a day to day basis. It was consistently higher in the first half of 2016 owing to a string of highly notable deaths, but things seem to be calming down a bit.

S

See also

Also in this Signpost edition, milowent delves into the traffic generated by the Summer Olympics.



Reader comments

2016-09-06

Wikimedia mobile sites now don't load images if the user doesn't see them



Reader comments

2016-09-06

AI-generated articles and research ethics; anonymous edits and vandalism fighting ethics

A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.

[[File:|center|200px]]

While I was enthusiastic about the results, I was surprised by the suboptimal quality of the articles I reviewed – three that were mentioned in the paper. After a brief discussion with the authors, a wider discussion was initiated on the Wiki-research mailing list. This was followed by an entry on the English Wikipedia administrators' noticeboard (which includes a list of all accounts used for this particular research paper). The discussion led to the removal of most of the remaining articles.

The discussion concerned the ethical implications of the research, and using Wikipedia for such an experiment without the consent of Wikipedia contributors or readers. The first author of the paper was an active member of the discussion; he showed a lack of awareness of these issues, and appeared to learn a lot from the discussion. He promised to take these lessons to the relevant research community – a positive outcome.

In general, this sets an example for engineers and computer-science engineers, who often show a lack of awareness of certain ethical issues in their research. Computer scientists are typically trained to think about bits and complexities, and rarely discuss in depth how their work impacts human lives. Whether it's social networks experimenting with the mood of their users, current discussions of biases in machine-learned models, or the experimental upload of automatically created content in Wikipedia without community approval, computer science has generally not reached the level of awareness of some other sciences for the possible effects of their research on human subjects, at least as far as this reviewer can tell.

Even in Wikipedia, there's no clear-cut, succinct Wikipedia policy I could have pointed the researchers to. The use of sockpuppets was a clear violation of policy, but an incidental component of the research. WP:POINT was a stretch to cover the situation at hand. In the end, what we can suggest to researchers is to check back with the Wikimedia Research list. A lot of people there have experience with designing research plans with the community in mind, and it can help to avoid uncomfortable situations.

See also our 2015 review of a related paper coauthored by the same authors: "Bot detects theatre play scripts on the web and writes Wikipedia articles about them" and other similarly themed papers they have published since then: "WikiKreator: Automatic Authoring of Wikipedia Content"[2], "WikiKreator: Improving Wikipedia Stubs Automatically"[3], "Filling the Gaps: Improving Wikipedia Stubs"[4]. DV

Ethics researcher: Vandal fighters should not be allowed to see whether an edit was made anonymously

A paper[5] in the journal Ethics and Information Technology examines the "system of surveillance" that the English Wikipedia has built up over the years to deal with vandalism edits. The author, Paul B. de Laat from the University of Groningen, presents an interesting application of a theoretical framework by US law scholar Frederick Schauer that focuses on the concepts of rule enforcement and profiling. While providing justification for the system's efficacy and largely absolving it of some of the objections that are commonly associated with the use of profiling in, for example, law enforcement, de Laat ultimately argues that in its current form, it violates an alleged "social contract" on Wikipedia by not treating anonymous and logged-in edits equally. Although generally well-informed about both the practice and the academic research of vandalism fighting, the paper unfortunately fails to connect to an existing debate about very much the same topic – potential biases of artificial intelligence-based anti-vandalism tools against anonymous edits – that was begun last year[6] by the researchers developing ORES (an edit review tool that was just made available to all English Wikipedia users, see this week's Technology report) and most recently discussed in the August 2016 WMF research showcase.

The paper first gives an overview of the various anti-vandalism tools and bots in use, recapping an earlier paper[7] where de Laat had already asked whether these are "eroding Wikipedia’s moral order" (following an even earlier 2014 paper in which he had argued that new-edit patrolling "raises a number of moral questions that need to be answered urgently"). There, de Laat's concerns included the fact that some stronger tools (rollback, Huggle, and STiki) are available only to trusted users and "cause a loss of the required moral skills in relation to newcomers", and that they a lack of transparency about how the tools operate (in particular when more sophisticated artificial intelligence/machine learning algorithms such as neural networks are used). The present paper expands on a separate but related concern, about the use of "profiling" to pre-select which recent edits will be subject to closer human review. The author emphasizes that on Wikipedia this usually does not mean person-based offender profiling (building profiles of individuals committing vandalism), citing only one exception in form of a 2015 academic paper – cf. our review: "Early warning system identifies likely vandals based on their editing behavior". Rather, "the anti-vandalism tools exemplify the broader type of profiling" that focuses on actions. Based on Schauer's work, the author asks the following questions:
  1. "Is this profiling profitable, does it bring the rewards that are usually associated with it?"
  2. "is this profiling approach towards edit selection justified? In particular, do any of the dimensions in use raise moral objections? If so, can these objections be met in a satisfactory fashion, or do such controversial dimensions have to be adapted or eliminated?"
But snakes are much more dangerous! According to Schauer, while general rules are always less fair than case-by-case decisions, their existence can be justified by other arguments.

To answer the first question, the author turns to Schauer's work on rules, in a brief summary that is worth reading for anyone interested in Wikipedia policies and guidelines – although de Laat instead applies the concept to the "procedural rules" implicit in vandalism profiling (such as that anonymous edits are more likely to be worth scrutinizing). First, Schauer "resolutely pushes aside the argument from fairness: decision-making based on rules can only be less just than deciding each case on a particularistic basis ". (For example, a restaurant's "No Dogs Allowed" rule will unfairly exclude some well-behaved dogs, while not prohibiting much more dangerous animals such as snakes.) Instead, the existence of rules have to be justified by other arguments, of which Schauer presents four:

  • Rules "create reliability/predictability for those affected by the rule: rule-followers as well as rule-enforcers".
  • Rules "promote more efficient use of resources by rule-enforcers" (as one example, in case of a speeding car driver, traffic police and judges can apply a simple speed limit instead having to prove in detail that an instance of driving was dangerous).
  • Rules, if simple enough, reduce the problem of "risk-aversion" by enforcers, who are much more likely to make mistakes and face repercussions if they have to make case by case decisions.
  • Rules create stability, which however also presents "an impediment to change; it entrenches the status-quo. If change is on a society’s agenda, the stability argument turns into an argument against having (simple) rules."

The author cautions that these four arguments have to be reinterpreted when applying them to vandalism profiling, because it consists of "procedural rules" (which edits should be selected for inspection) rather than "substantive rules" (which edits should be reverted as vandalism, which animals should be disallowed from the restaurant). While in the case of substantive rules, their absence would mean having to judge everything on a case-by-case basis, the author asserts that procedural rules arise in a situation where the alternative would be to to not judge at all in many cases: Because "we have no means at our disposal to check and pass judgment on all of them; a selection of a kind has to be made. So it is here that profiling comes in". With that qualification, Schauer's second argument provides justification for "Wikipedian profiling [because it] turns out to be amazingly effective", starting with the autonomous bots that auto-revert with an (aspired) 1:1000 false-positive rate.

De Laat also interprets "the Schauerian argument of reliability/predictability for those affected by the rule" in favor of vandalism profiling. Here, though, he fails to explain the benefits of vandals being able to predict which kind of edits will be subject to scrutiny. This also calls into question his subsequent remark that "it is unfortunate that the anti-vandalism system in use remains opaque to ordinary users". The remaining two of Schauer's four arguments are judged as less pertinent. But overall the paper concludes that it is possibile to justify the existence of vandalism profiling rules as beneficial via Schauer's theoretical framework.

Police traffic stops: A good analogy for anti-vandalism patrol on Wikipedia?

Next, de Laat turns to question 2, on whether vandalism profiling is also morally justified. Here he relies on later work by Schauer, from a 2003 book, "Profiles, Probabilities, and Stereotypes", that studies such matters as profiling by tax officials (selecting which taxpayers have to undergo an audit), airport security (selecting passengers for screening) and by police officers (for example, selecting cars for traffic stops). While profiling of some kind is a necessity for all these officials, the particular characteristics (dimensions) used for profiling can be highly problematic (see Driving While Black). For de Laat's study of Wikipedia profiling, "two types of complications are important: (1) possible ‘overuse’ of dimension(s) (an issue of profile effectiveness) and (2) social sensibilities associated with specific dimension(s) (a social and moral issue)." Overuse can mean relying on stereotypes that have no basis in reality, or over-reliance on some dimensions that, while having a non-spurious correlation with the deviant behavior, are over-emphasized at the expense of other relevant characteristics because they are more visible or salient to the profile. While Schauer considers that it may be justified for "airport officials looking for explosives [to] single out for inspection the luggage of younger Muslim men of Middle Eastern appearance", it would be an over-use if "officials ask all Muslim men and all men of Middle Eastern origin to step out of line to be searched", thus reducing their effectiveness by neglecting other passenger characteristics. This is also an example for the second type of complication profiling, where the selected dimensions are socially sensitive – indeed, for the specific case of luggage screening in the US, "the factors of race, religion, ethnicity, nationality, and gender have expressly been excluded from profiling" since 1997.

Applying this to the case of Wikipedia's anti-vandalism efforts, de Laat first observes that complication (1) (overuse) is not a concern for fully automated tools like ClueBotNG – obviously their algorithm applies the existing profile directly without a human intervention that could introduce this kind of bias. For Huggle and STiki, however, "I see several possibilities for features to be overused by patrollers, thereby spoiling the optimum efficacy achievable by the profile embedded in those tools." This is because both tools do not just use these features in their automatic pre-selection of edits to be reviewed, but expose at least the fact whether an edit was anonymous to the human patroller in the edit review interface. (The paper examines this in detail for both tools, also observing that Huggle presents more opportunities for this kind of overuse, while STiki is more restricted. However, there seems to have been no attempt to study empirically whether this overuse actually occurs.)

Regarding complication (2), whether some of the features used for vandalism profiling are socially sensitive, de Laat highlights that they include some amount of discrimination by nationality: IP edits geolocated to the US, Canada, and Australia have been found to contain vandalism more frequently and are thus more likely to be singled out for inspection. However, he does not consider this concern "strong enough to warrant banning the country-dimension and correspondingly sacrifice some profiling efficacy", chiefly because there do not appear to be a lot of nationalistic tensions within the English Wikipedia community that could be stirred up by this.

In contrast, de Laat argues that "the targeting of contributors who choose to remain anonymous ... is fraught with danger since anons already constitute a controversial group within the Wikipedian community." Still, he acknowledges the "undisputed fact" that the ratio of vandalism is much higher among anonymous edits. Also, he rejects the concern that they might be more likely to be the victim of false positives:

With this said, de Laat still makes the controversial call "that the anonymous-dimension should be banned from all profiling efforts" – including removing it from the scoring algorithms of Huggle, STiki and ClueBotNG. Instead of concerns about individual harm,

Sadly, while the paper is otherwise rich in citations and details, it completely fails to provide evidence for the existence of this alleged social contract. While it is true that "the ability of almost anyone to edit (most) articles without registration" forms part of Wikipedia's founding principles (a principle that this reviewer strongly agrees with), the "equal stature" part seems to be de Laat's own invention – there is a long list of things that, by longstanding community consensus, require the use of an account (which after all is freely available to everyone, without even requiring an email address). Most of these restrictions – say, the inability to create new articles or being prevented from participating in project governance during admin or arbcom votes – seem much more serious than the vandalism profiling that is the topic of de Laat's paper. TB

Briefly

Conferences and events

Other recent publications

A list of other recent publications that could not be covered in time for this issue—contributions are always welcome for reviewing or summarizing newly published research. This month, the list mainly gathers research about the extraction of specific content from Wikipedia.

  • "Large SMT Data-sets Extracted from Wikipedia"[8] From the abstract: "The article presents experiments on mining Wikipedia for extracting SMT [ statistical machine translation ] useful sentence pairs in three language pairs. ... The optimized SMT systems were evaluated on unseen test-sets also extracted from Wikipedia. As one of the main goals of our work was to help Wikipedia contributors to translate (with as little post editing as possible) new articles from major languages into less resourced languages and vice-versa, we call this type of translation experiments 'in-genre' translation. As in the case of 'in-domain' translation, our evaluations showed that using only 'in-genre' training data for translating same genre new texts is better than mixing the training data with 'out-of-genre' (even) parallel texts."
  • "Recognizing Biographical Sections in Wikipedia"[9] From the abstract: "Thanks to its coverage and its availability in machine-readable format, [Wikipedia] has become a primary resource for large scale research in historical and cultural studies. In this work, we focus on the subset of pages describing persons, and we investigate the task of recognizing biographical sections from them: given a person’s page, we identify the list of sections where information about her/his life is present [as opposed to nonbiographical sections, e.g. 'Early Life' but not 'Legacy' or 'Selected writings']."
  • "Extraction of lethal events from Wikipedia and a semantic repository"[10] From the abstract and conclusion: "This paper describes the extraction of information on lethal events from the Swedish version of Wikipedia. The information searched includes the persons’ cause of death, origin, and profession. [...] We also extracted structured semantic data from the Wikidata store that we combined with the information retrieved from Wikipedia ... [The resulting] data could not support the existence of the Club 27".
  • "Learning Topic Hierarchies for Wikipedia Categories"[11] (from frequently used section headings in a category, e.g. "eligibility", "endorsements" or "results" for Category:Presidential elections)
  • "'A Spousal Relation Begins with a Deletion of engage and Ends with an Addition of divorce': Learning State Changing Verbs from Wikipedia Revision History."[12] From the abstract: "We propose to learn state changing verbs [such as 'born', 'died', 'elected', 'married'] from Wikipedia edit history. When a state-changing event, such as a marriage or death, happens to an entity, the infobox on the entity's Wikipedia page usually gets updated. At the same time, the article text may be updated with verbs either being added or deleted to reflect the changes made to the infobox. ... We observe in our experiments that when state-changing verbs are added or deleted from an entity's Wikipedia page text, we can predict the entity's infobox updates with 88% precision and 76% recall."
  • "Extracting Representative Phrases from Wikipedia Article Sections"[13] From the abstract: "Since [Wikipedia's] long articles are taking time to read, as well as section titles are sometimes too short to capture comprehensive summarization, we aim at extracting informative phrases that readers can refer to."
  • "Accurate Fact Harvesting from Natural Language Text in Wikipedia with Lector"[14] From the abstract: "Many approaches have been introduced recently to automatically create or augment Knowledge Graphs (KGs) with facts extracted from Wikipedia, particularly its structured components like the infoboxes. Although these structures are valuable, they represent only a fraction of the actual information expressed in the articles. In this work, we quantify the number of highly accurate facts that can be harvested with high precision from the text of Wikipedia articles [...]. Our experimental evaluation, which uses Freebase as reference KG, reveals we can augment several relations in the domain of people by more than 10%, with facts whose accuracy are over 95%. Moreover, the vast majority of these facts are missing from the infoboxes, YAGO and DBpedia."
  • "Extracting Scientists from Wikipedia"[15] From the abstract: "[We] describe a system that gathers information from Wikipedia articles and existing data from Wikidata, which is then combined and put in a searchable database. This system is dedicated to making the process of finding scientists both quicker and easier."
  • "LeadMine: Disease identification and concept mapping using Wikipedia"[16] From the abstract: "LeadMine, a dictionary/grammar-based entity recognizer, was used to recognize and normalize both chemicals and diseases to MeSH [ Medical Subject Headings ] IDs. The lexicon was obtained from 3 sources: MeSH, the Disease Ontology and Wikipedia. The Wikipedia dictionary was derived from pages with a disease/symptom box, or those where the page title appeared in the lexicon."
  • "Finding Member Articles for Wikipedia Lists"[17] From the abstract: "... for a given Wikipedia article and list, we determine whether the article can be added to the list. Its solution can be utilized on automatic generation of lists, as well as generation of categories based on lists, to help self-organization of knowledge structure. In this paper, we discuss building classifiers for judging on whether an article belongs to a list or not, where features are extracted from various components including list titles, leading sections, as well as texts of member articles. ... We report our initial evaluation results based on Bayesian and other classifiers, and also discuss feature selection."
  • "Study of the content about documentation sciences in the Spanish-language Wikipedia"[18] (in Spanish). From the English abstract: "This study explore how [Wikipedia] addresses the documentation sciences, focusing especially on pages that discuss the discipline, not only the page contents, but the relationships between them, their edit history, Wikipedians who participated and all aspects that can influence on how the image of this discipline is projected" [sic]. TB


References

  1. ^ Siddhartha Banerjee, Prasenjit Mitra, "WikiWrite: Generating Wikipedia Articles Automatically".
  2. ^ Banerjee, Siddhartha; Mitra, Prasenjit (October 2015). "WikiKreator: Automatic Authoring of Wikipedia Content". AI Matters. 2 (1): 4–6. doi:10.1145/2813536.2813538. ISSN 2372-3483. Closed access icon
  3. ^ Banerjee, Siddhartha and Mitra, Prasenjit: "WikiKreator: Improving Wikipedia Stubs Automatically, Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing" (Volume 1: Long Papers), July 2015, Beijing, China, Association for Computational Linguistics, pages 867–877,
  4. ^ Banerjee, Siddhartha; Mitra, Prasenjit (2015). "Filling the Gaps: Improving Wikipedia Stubs". Proceedings of the 2015 ACM Symposium on Document Engineering. DocEng '15. New York, NY, USA: ACM. pp. 117–120. doi:10.1145/2682571.2797073. ISBN 9781450333078. Closed access icon
  5. ^ Laat, Paul B. (30 April 2016). "Profiling vandalism in Wikipedia: A Schauerian approach to justification". Ethics and Information Technology: 1–18. doi:10.1007/s10676-016-9399-8. ISSN 1388-1957.
  6. ^ See, as an example, Halfaker, Aaron (December 6, 2015). "Disparate impact of damage-detection on anonymous Wikipedia editors". Socio-technologist.
  7. ^ Laat, Paul B. de (2 September 2015). "The use of software tools and autonomous bots against vandalism: eroding Wikipedia's moral order?". Ethics and Information Technology. 17 (3): 175–188. doi:10.1007/s10676-015-9366-9. ISSN 1388-1957.
  8. ^ Tufiş, Dan; Ion, Radu; Dumitrescu, Ştefan; Ştefănescu2, Dan (26 May 2014). "Large SMT Data-sets Extracted from Wikipedia" (PDF). Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14). TUFI 14.103. ISBN 978-2-9517408-8-4.{{cite conference}}: CS1 maint: numeric names: authors list (link)
  9. ^ Aprosio, Alessio Palmero; Tonelli, Sara (17 September 2015). "Recognizing Biographical Sections in Wikipedia". Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Lisbon, Portugal. pp. 811–816.
  10. ^ Norrby, Magnus; Nugues, Pierre (2015). Extraction of lethal events from Wikipedia and a semantic repository (PDF). workshop on Semantic resources and semantic annotation for Natural Language Processing and the Digital Humanities at NODALIDA 2015. Vilnius, Lithuania.
  11. ^ Hu, Linmei; Wang, Xuzhong; Zhang, Mengdi; Li, Juanzi; Li, Xiaoli; Shao, Chao; Tang, Jie; Liu, Yongbin (2015-07-26). "Learning Topic Hierarchies for Wikipedia Categories" (PDF). Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Short Papers). Beijing, China. pp. 346–351.
  12. ^ Nakashole, Ndapa; Mitchell, Tom; Wijaya, Derry (2015). "A Spousal Relation Begins with a Deletion of engage and Ends with an Addition of divorce": Learning State Changing Verbs from Wikipedia Revision History (PDF). Proceedings of EMNLP 2015. Lisbon, Portugal. pp. 518–523.
  13. ^ Shan Liu, Mizuho Iwaihara: Extracting Representative Phrases from Wikipedia Article Sections, DEIM Forum 2016 C3-6. http://db-event.jpn.org/deim2016/papers/314.pdf
  14. ^ Cannaviccio, Matteo; Barbosa, Denilson; Merialdo, Paolo (2016). "Accurate Fact Harvesting from Natural Language Text in Wikipedia with Lector". Proceedings of the 19th International Workshop on Web and Databases. WebDB '16. New York, NY, USA: ACM. doi:10.1145/2932194.2932203. ISBN 9781450343107. Closed access icon
  15. ^ Ekenstierna, Gustaf Harari; Lam, Victor Shu-Ming. Extracting Scientists from Wikipedia. Digital Humanities 2016. From Digitization to Knowledge 2016: Resources and Methods for Semantic Processing of Digital Works/Texts, Proceedings of the Workshop, July 11, 2016, Krakow, Poland.
  16. ^ Lowe, Daniel M.; O'Boyle, Noel M.; Sayle, Roger A. "LeadMine: Disease identification and concept mapping using Wikipedia" (PDF). Proceeding of the fifth BioCreative challenge evaluation workshop. BCV 2015. pp. 240–246.
  17. ^ Shuang Sun, Mizuho Iwaihara: Finding Member Articles for Wikipedia Lists. DEIM Forum 2016 C3-3. http://db-event.jpn.org/deim2016/papers/184.pdf
  18. ^ Martín Curto, María del Rosario (2016-04-15). "Estudio sobre el contenido de las Ciencias de la Documentación en la Wikipedia en español" (info:eu-repo/semantics/bachelorThesis). thesis, University of Salamanca, 2014



Reader comments

2016-09-06

Switzerland’s ETH-Bibliothek is uploading 134,000 images to Wikimedia Commons

The following content has been republished from the Wikimedia Blog. Any views expressed in this piece are not necessarily shared by the Signpost; responses and critical commentary are invited in the comments. For more information on this partnership, see our content guidelines.


Refueling in Tunisia.

134,000 images are being uploaded to Wikimedia Commons, a central repository for free media, from ETH-Bibliothek, Switzerland’s largest public scientific and technical library.

Most of the photographs are being drawn from their aerial photograph holdings (70,000 in all) and 40,000 from the archives of Swissair, the national airline of the country until its bankruptcy in 2002.

The first 18,000 uploads come from Walter Mittelholzer, a Swiss aviation pioneer and entrepreneur. In his travels, which included the first north–south flight across the African continent, he took thousands of aerial photographs from places as varied as Spitsbergen (1923), a Norwegian island in the Arctic Ocean; Persia (1924–25); Kilimanjaro, the dormant volcano in modern-day Tanzania (1929–30); and Ethiopia (1934). You can see examples of his work sprinkled throughout this post.

“Mittelholzer captured sensational aerial images of landscapes, many of which had never been photographed from a bird’s-eye view before,” ETH-Bibliothek project coordinator Michael Gasser said. Mittelholzer utilized these images in a series of popular books that chronicled his trips into the-then great unknown; today, his work is used in post-colonial research.

"Policeman of the Emir of Kano"

Other images being uploaded are historical photographs of ETH-Bibliothek’s campus in Zurich, along with portraits of professors, students, and scientists at the same location.

Gasser says that while all of these images are already available on the internet, ETH-Bibliothek is “facilitating access to these valuable image sources ... we are trying to bring the material to where the users are.” All are licensed under CC BY-SA or are in the public domain.

The project to upload them to Wikimedia Commons stems from a collaboration between ETH-Bibliothek and Wikimedia CH, an independent organization that works to advance the Wikimedia movement in Switzerland, which was initiated through mutual contacts at Open Data.ch, the Swiss chapter of the Open Knowledge Foundation.

You can see the images for yourself as they are being uploaded on Commons. EE



Fishing boat on a beach, West Africa.




Reader comments

If articles have been updated, you may need to refresh the single-page edition.



       

The Signpost · written by many · served by Sinepost V0.9 · 🄯 CC-BY-SA 4.0