Clocking External Links on Wikipedia
Wednesday, April 30, 2008
I've found a fascinating, if sometimes very slow, tool on Wikipedia: the External links tool. It's probably what Certain People used when they attempted to de-link All Wikipedia from nielsenhayden.com. It's most common use seems to be for rooting out spam.
I've discovered that if you really try and are patient, it can be used to calculate the total number of links from Wikipedia to a given domain for domains with five- and six-figure numbers of outgoing links. I've determined (though patience and effort) that there are about 61,000 links from Wikipedia to the domain nytimes.com, but close to 147,000 to wolfram.com (mostly because of the heavy use of MathWorld and ScienceWorld and what looks to be a very elaborate and perhaps difficult attempt to match up resources across websites with the purpose of making use of material on the target site on a large scale).
I am curious if there are other tools that work better for the purpose of evaluating Wikipedia's out-going links. (It seems to me that this is sort of like the reverse of the Wikipedia scanner.) Because of the Wikipedia no-follow policy, I presume that most external search engines are prohibited from collecting this kind of info and from giving this bit of transparency.
FURTHER COUNTS: a ranked list can be found HERE
- 20,345 for washingtonpost.com;
- 30,557 for cnn.com;
- 3,944 for nature.com;
- 1,060 for sciam.com (Scientific American);
- 5,051 for foxnews.com;
- 1,980 for nationalreview.com;
- 677 for motherjones.com;
- 4,732 for wired.com;
- 616 for boingboing.net;
- 2,653 for newscientist.com;
- 705 for newsweek.com;
- 133,150 for the BBC (bbc.co.uk);
- 15,020 for time.com;
- 2,652 for economist.com;
- 2,561 for wsj.com (The Wall Street Journal);
- 1,957 for ft.com (The Financial Times);
- 6,175 for latimes.com;
- 1,523 for isfdb.org;
- 312 for barackobama.com;
- 236 for hillaryclinton.com;
- 100 for johnmccain.com;
- 68,051 for iucredlist.org (The International Union for Conservation of Nature and Natural Resources);
- 7,771 for planetmath.org;
- 4,837 for arxiv.org;
- 3,621 for ams.org (American Mathematical Society)
- 2,794 for slashdot.org
- 100 for drudgereport.com;
- 274 for metafilter.com;
- 1,285 for www.research.att.com/~njas/sequences/ (The On-Line Encyclopedia of Integer Sequences);
- 576 for springer.de (which includes the Encyclopaedia of Mathematics as well as other resorces);
- 324 for cia.gov (mostly to the CIA World Factbook);
- 48,920 for usgs.gov;
- 3,348 for whitehouse.gov;
- 213 for physicstoday.org;
- 1,409 for aps.org (The American Physical Society which hosts Physical Review and other resources);
- 5,594 for forbes.com;
- 3,267 for businessweek.com;
- 1,127 for playboy.com (purely for the articles, I'm sure);
- 25 for penthouse.com;
- 1,832 for newyorker.com;
- 183 for esquire.com;
- 365 for vanityfair.com;
- 1,592 for thenation.com;
- 97 for rd.com (Reader's Digest);
- 2,686 for timesonline.co.uk (The London Times);
- 4,502 for rollingstone.com;
- 25 for nationalenquirer.com;
- 1,809 for spiegel.de;
- 2,231 for cbs.com;
- 152,847 for imdb.com (The Internet Movie Database);
- 49,568 for youtube.com, a site to which linking is officially discouraged according to Wikipedia policy; "semi-automatic" removal of YouTube links has been proposed, but the proposal failed to attact a consensus;
- 161,730 for msn.com which hosts Encarta and many other resources; the link structures from Wikipedia to Encarta are similar to those found with wolfram.com and suggest datamining efforts;
- 21,556 for noaa.gov;
- 704 for lemonde.fr;
- 331 for corriere.it;
- 191,195 for britannica.com, most of it for datamining Encyclopædia Britannica;
- 6,265 for iht.com (International Herald Tribune);
- 1,726 for nypost.com;
- 210 from amconmag.com (American Conservative Magazine);
- 2,724 to nationalgeographic.com;
- 5,591 for npr.org;
- 1,201 for huffingtonpost.com;
- 1,957 for ebay.com;
- 35,796 for amazon.com, many to product pages despite official prohibitions on making commerical links on Wikipedia;
- 240 for buy.com;
- 4,626 for amazon.co.uk, again many to product pages, not to mention the . . .
- 631 for amazon.ca (dude, you've really got to buy my CD!);
- 529 for amazon.de;
- 291 for amazon.fr;
- 5,266 for video.google.com, including 8 links to something called "Weekly Freekly Weekly Jan 2008" which is a promotional video for Psychopathic Records, plus 7 links to a single interview with Stephen J. Cannell and 35 links to a single interview with James Garner;
- 136 for wonkette.com
- 642 dailykos.com;
- 238 for talkingpointsmemo.com;
- 477 for techcrunch.com;
- 777 for engadget.com;
- 298 for gizmodo.com;
- 124 for lifehacker.com;
- 65 for perezhilton.com;
- 234 for gawker.com;
- 105 for valleywag.com;
- 249 for thinkprogress.org;
- 1,020 for mediamatters.org;
- 2,331 for snopes.com;
- 1,241 for people.com;
- 1,051 for gpoaccess.gov; the US Government Printing Office, which hosts the Congressional Record and other resources.
(These numbers are all for en.wikipedia.org.)