Sam's news

Here are some of the news sources I follow.

My main website is at

British values: the autumn 2017 New Humanist

Published 17 Aug 2017 by in New Humanist Articles and Posts.

Out now - reflections on a divided society

Program for W3C Publishing Summit Announced

Published 17 Aug 2017 by Bill McCoy in W3C Blog.

openweb quote illustration The program for the inaugural W3C Publishing Summit (taking place November 9-10, 2017 in the San Francisco Bay Area) has just been announced. The program will feature keynotes from Internet pioneer and futurist Tim O’Reilly and Adobe CTO Abhay Parasnis. along with dozens of other speakers and panelists who will showcase and discuss how web technologies are shaping publishing today, tomorrow, and beyond.

Publishing and the web interact in innumerable ways. From schools to libraries, from design to production to archiving, from metadata to analytics, from New York to Paris to Buenos Aires to Tokyo, the Summit will show how web technologies are making publishing more accessible, more global, and more efficient and effective. Mozilla user experience lead and author Jen Simmons will showcase the ongoing revolution in CSS. Design experts Laura Brady, Iris Febre and Nellie McKesson will cover putting the reader first when producing ebooks and automating publishing workflows. We’ll also hear from reading system creator Micah Bowers (Bluefire) and EPUB pioneers George Kerscher (DAISY) and Garth Conboy (Google).

The newly-unveiled program will also showcase insights from senior leaders from across the spectrum of publishing and digital content stakeholders including Jeff Jaffe (CEO, W3C), Yasushi Fujita (CEO, Media DO), Rick Johnson (SVP Product and Strategy, Ingram/VitalSource), Ken Brooks (COO, Macmillan Learning), Liisa McCloy-Kelley (VP, Penguin Random House), and representatives from Rakuten Kobo, NYPL, University of Michigan Library/Publishing, Wiley, Hachette Book Group, Editis, EDRLab, and more.

I’m very excited about this new event which represents an important next milestone in the expanded Publishing@W3C initiative and I hope you will join us. Register now. For more information on the event, see the W3C Publishing Summit 2017 homepage and Media Advisory.

Sponsors of the W3C Publishing Summit include Ingram/VitalSource, SPi Global, and Apex. Additional sponsorship opportunities are available, email me at for more information. The Publishing Summit is one of several co-located events taking place during W3C’s major annual gathering, TPAC, for which registration is open for W3C members.

Seeking a cure for HIV

Published 16 Aug 2017 by in New Humanist Articles and Posts.

A child in South Africa appears to have been "cured" of HIV - and has lived treatment free for nearly nine years. What can this tell us?

x-post: Community Conduct Project Kick-off Meeting

Published 15 Aug 2017 by Ipstenu (Mika Epstein) in Make WordPress Plugins.

Community Conduct Project – Kick off meeting scheduled for 17:00 UTC on the 5th September 2017


How to know if a Wiki page is for a person

Published 15 Aug 2017 by Mohamed Seif in Newest questions tagged mediawiki - Stack Overflow.

I search a word on Wiki pages using Wiki API. I need to know if that word is a name for a person.

For example searching "Leonardo Dicaprio"

I need to know from the query result if this is a name for a person

The Great Firewall

Published 15 Aug 2017 by in New Humanist Articles and Posts.

Apple is just the latest tech giant that has failed to stand up to Chinese censorship.

All the stations have an adventure

Published 12 Aug 2017 by Sam Wilson in Sam's notebook.

Today is All The Stations‘ “have an adventure” day, in which they’re asking people to visit a railway station that they’ve never been to before. When I first heard about it I figured I have to end up at somewhere boring like Aubin Grove but as it turns out I’m actually at Wikimania in Montreal! So it’s rather easy to find a station to which I’ve never been; in fact, with the assistance of a friend, I have today been to seven new stations.


All the stations have an adventure


Square-Victoria-OACI station

Place d’Armes (no photo).


Champ-de-Mars station


Berri-UQAM station


Jean-Drapeau station


Longueuil station

And also Windsor, which isn’t actually a station any more:

Windsor station

How to configure DiffUtils 3 in MediaWiki?

Published 11 Aug 2017 by Wendel Rodrigues in Newest questions tagged mediawiki - Stack Overflow.

I started installing MediaWiki and it needs DiffUtils 3, but I can not find it to install. Does anyone know where I can find it?

Gitchangelog plugin error "bjurr.gitchangelog.api.exceptions.GitChangelogIntegrationException"

Published 11 Aug 2017 by Ram in Newest questions tagged mediawiki - Stack Overflow.

Error while posting content from jenkin Git changelog plugin to mediawiki


Full error:

atse.bjurr.gitchangelog.internal.integrations.mediawiki.MediaWikiClient.createMediaWikiPage( at se.bjurr.gitchangelog.api.GitChangelogApi.toMediaWiki( at at at hudson.FilePath.act( at org.jenkinsci.plugins.gitchangelog.perform.GitChangelogPerformer.performerPerform( at org.jenkinsci.plugins.gitchangelog.GitChangelogRecorder.perform( at hudson.tasks.BuildStepCompatibilityLayer.perform( at hudson.tasks.BuildStepMonitor$1.perform( at hudson.model.AbstractBuild$AbstractBuildExecution.perform( at hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps( at hudson.model.Build$BuildExecution.post2( at hudson.model.AbstractBuild$ at hudson.model.Run.execute( at at hudson.model.ResourceController.execute( at Caused by: com.jayway.jsonpath.PathNotFoundException: Missing property in path $['login'] at com.jayway.jsonpath.internal.path.PathToken.handleObjectProperty( at com.jayway.jsonpath.internal.path.PropertyPathToken.evaluate( at com.jayway.jsonpath.internal.path.RootPathToken.evaluate( at com.jayway.jsonpath.internal.path.CompiledPath.evaluate( at com.jayway.jsonpath.internal.path.CompiledPath.evaluate( at at at at at se.bjurr.gitchangelog.internal.integrations.mediawiki.MediaWikiClient.getWikiToken( at se.bjurr.gitchangelog.internal.integrations.mediawiki.MediaWikiClient.doAuthenticate( at se.bjurr.gitchangelog.internal.integrations.mediawiki.MediaWikiClient.createMediaWikiPage(

Mediawiki extension used in a model with parameters

Published 11 Aug 2017 by Moissinac in Newest questions tagged mediawiki - Stack Overflow.

I'm using Mediawiki 1.18.1 and the anyweb extension. All is working well. I'm trying to replace a chunk of several pages by a model (protected against edition) The chunk I'm replacing use an extension (anyweb) and looks like:

<anyweb  mywidth="100%" myheight="170">,</anyweb>

I'm trying to replace it by a model like this: {{Meteo|lat=50.028055555555554|lon=1.3005555555555557|good=SSO,}} where the page Model:Meteo contains, for example:

<anyweb  mywidth="100%" myheight="170">{{{lat|0.0}}}&lon={{{lon|0.0}}}&good=NO,</anyweb>

but the lat and lon variables are not evaluated by the model before passing to the anyweb extension; so the extension get as lat value {{{lat|0.0}}} in place of 50.028055555555554

Is it possible to use parameters of a model as part of an extension? How?

MediaWiki VisualEditor Setup

Published 10 Aug 2017 by UCanCallMeBob89 in Newest questions tagged mediawiki - Stack Overflow.

I have a MediaWiki hosted in a subdirectory of my company's website. I'm trying to set up VisualEditor, and have installed both VisualEditor and Parsoid in the wiki/extensions/ directory, and have followed the setup instructions.

I can curl [myaddress].com/wiki/api.php and get the MediaWiki API help response as described here:

The weird thing is, when I navigate back to my wiki in the browser and edit a page using VisualEditor, the page content disappears and the MediaWiki API help content (click for picture) appears in its place. If I click "Edit Source" instead of "Edit" I can still edit the content with the WikiEditor.

Any idea why this is happening and how to fix it?

MediaWiki v1.29.0

VisualEditor v0.1.0

Mediawiki Forms: how to set Category from an input field

Published 10 Aug 2017 by user1084363 in Newest questions tagged mediawiki - Stack Overflow.

In my MediaWiki site with Page Forms extension, I'm trying to set the Category of a form-generated page with the value that the user introduces in a Combobox in the Form.

The part of the code of the Form where I get the Category name is in the SupraEspecie field of the DatosEspecie Template:

{{{for template|DatosEspecie|label=Data}}}
{| class="formtable"
! Nombre Especie: 
| {{{field|NEspecie|mandatory|restricted|default={{PAGENAME}}}}}
! Nombre Cientifico: 
| {{{field|NCientifico}}}
! SupraEspecie: 
| {{{field|SupraEspecie|mandatory|input type=combobox|namespace=Category|values from namespace=Category|existing values only}}} 

Then I try to call another template that defines the category with the parameter received. Basically a template containing: [[Category:{{{1}}}]]

The call I'm using is:

{{{for template|CategoryDefinition|SupraEspecie}}}    
{{{end template}}}

But it is wrong as the template CategoryDefinition is called but without any parameter.

What is the correct way of passing a a field of a MediaWiki Page Form (in this case SupraEspecie) to a template in the form ( in this case CategoryDefinition)??

Mediawiki login cancelled to prevent session hijacking

Published 8 Aug 2017 by LordFarquaad in Newest questions tagged mediawiki - Stack Overflow.

I have just set up a MediaWiki 1.29.0 page on an AS400 IBM i machine. I am using MariaDB as a database. I am using PHP 5.5.37

Every time I try to log into an account, I get the error:

There seems to be a problem with your login session; this action has been canceled as a precaution against session hijacking. Go back to the previous page, reload that page and then try again.

Obviously, the behavior I'm looking for is to log in.

I've tried:

I've checked and I know my cookies are enabled (I'm able to call document.cookie; and get data back).

From what I've read, this error is happening because my CSRF token is not being properly cached. I don't know enough to confirm this, but it seems to be the consensus.

This question has been asked before here, and the linked questions within, but no solutions fixed my problem. They also deal with an older version of WikiMedia, though I don't know if that makes a difference in this instance.

EDIT: I am also getting the same behavior when I try to create a new account. However, I am able to navigate the wiki, create pages, and edit pages without any sort of error.

Here is my current debug.log file:

IP: <My-IP>
Start request POST /<my-wiki>/index.php?title=Special:UserLogin&returnto=Main+Page
COOKIE: ZDEDebuggerPresent=php,phtml,php3
ACCEPT-ENCODING: gzip, deflate
REFERER: http://<my-wiki>>/index.php?title=Special:UserLogin&returnto=Main+Page
ACCEPT: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8
CONTENT-TYPE: application/x-www-form-urlencoded
USER-AGENT: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.90 Safari/537.36
ORIGIN: http://<server>
CACHE-CONTROL: max-age=0
CONNECTION: keep-alive
HOST: <server>
[CryptRand] mcrypt_create_iv generated 20 bytes of randomness.
[CryptRand] 0 bytes of randomness leftover in the buffer.
[DBReplication] Wikimedia\Rdbms\LBFactory::getChronologyProtector: using request info {
    "IPAddress": "<My-IP>",
    "UserAgent": "Mozilla\/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/60.0.3112.90 Safari\/537.36",
    "ChronologyProtection": false
[DBConnection] Wikimedia\Rdbms\LoadBalancer::openConnection: calling initLB() before first connection.
[DBConnection] Connected to database 0 at '<My-IP>:3307'.
[SQLBagOStuff] Connection 360 will be used for SqlBagOStuff
[cookie] already deleted setcookie: "<my-wiki>_session", "", "1470838517", "/", "", "1", "1"
[cookie] already deleted setcookie: ""<my-wiki>UserID", "", "1470838517", "/", "", "1", "1"
[cookie] already deleted setcookie: ""<my-wiki>Token", "", "1470838517", "/", "", "1", "1"

[cookie] already deleted setcookie: "forceHTTPS", "", "1470838517", "/", "", "", "1"
[DBConnection] Wikimedia\Rdbms\LoadBalancer::openConnection: calling initLB() before first connection.
[DBConnection] Connected to database 0 at '<My-IP>:3307'.
[cookie] setcookie: ""<my-wiki>_session", "6ov9336kuss9v7vfm6nc1aogaouhm9rp", "0", "/", "", "1", "1"
[cookie] already deleted setcookie: ""<my-wiki>UserID", "", "1470838518", "/", "", "1", "1"
[cookie] already deleted setcookie: ""<my-wiki>Token", "", "1470838518", "/", "", "1", "1"
[cookie] already deleted setcookie: "forceHTTPS", "", "1470838518", "/", "", "", "1"
[cookie] already set setcookie: ""<my-wiki>_session", "6ov9336kuss9v7vfm6nc1aogaouhm9rp", "0", "/", "", "1", "1"
[cookie] already deleted setcookie: ""<my-wiki>UserID", "", "1470838518", "/", "", "1", "1"
[cookie] already deleted setcookie: ""<my-wiki>Token", "", "1470838518", "/", "", "1", "1"
[cookie] already deleted setcookie: "forceHTTPS", "", "1470838518", "/", "", "", "1"
[MessageCache] MessageCache::load: Loading en... local cache is empty, got from global cache
Unstubbing $wgParser on call of $wgParser::firstCallInit from MessageCache->getParser
Parser: using preprocessor: Preprocessor_DOM
Unstubbing $wgLang on call of $wgLang::_unstub from ParserOptions->__construct
QuickTemplate::__construct was called with no Config instance passed to it
[CryptRand] mcrypt_create_iv generated 7 bytes of randomness.
[CryptRand] 0 bytes of randomness leftover in the buffer.
[CryptRand] mcrypt_create_iv generated 16 bytes of randomness.
[CryptRand] 0 bytes of randomness leftover in the buffer.
MediaWiki::preOutputCommit: primary transaction round committed
MediaWiki::preOutputCommit: pre-send deferred updates completed
MediaWiki::preOutputCommit: LBFactory shutdown completed
OutputPage::sendCacheControl: private caching;  **
[DBReplication] Wikimedia\Rdbms\LBFactory::getChronologyProtector: using request info {
    "ChronologyProtection": "false",
    "IPAddress": "<My-IP>",
    "UserAgent": "Mozilla\/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/60.0.3112.90 Safari\/537.36"
[DBConnection] Wikimedia\Rdbms\LoadBalancer::openConnection: calling initLB() before first connection.
[DBConnection] Connected to database 0 at '<My-IP>:3307'.
Request ended normally
[DBReplication] Wikimedia\Rdbms\LBFactory::getChronologyProtector: using request info {
    "ChronologyProtection": "false",
    "IPAddress": "<My-IP>",
    "UserAgent": "Mozilla\/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/60.0.3112.90 Safari\/537.36"
[DBConnection] Wikimedia\Rdbms\LoadBalancer::openConnection: calling initLB() before first connection.
[DBConnection] Connected to database 0 at '<My-IP>:3307'.
[MessageCache] MessageCache::load: Loading en... local cache is empty, got from global cache
[DBReplication] Wikimedia\Rdbms\LBFactory::getChronologyProtector: using request info {
    "ChronologyProtection": "false",
    "IPAddress": "<My-IP>",
    "UserAgent": "Mozilla\/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/60.0.3112.90 Safari\/537.36"
[DBConnection] Wikimedia\Rdbms\LoadBalancer::openConnection: calling initLB() before first connection.
[DBConnection] Connected to database 0 at '<My-IP>:3307'.

Here are my request and response headers:

request headerresponse header

I'm also not sure if it's helpful, but I have a "form data" area from logging in that looks like this (with username and password removed). :

enter image description here

Pywikibot MediaWiki Query API

Published 7 Aug 2017 by Brubsby in Newest questions tagged mediawiki - Stack Overflow.

I have a data dump of Wikipedia articles listed only by their pageid, and I am hoping to filter them by namespace. It would be relatively easy to write some python (probably using the requests module) to call the MediaWiki Query API, to query for namespaces 50 at a time using the pageids param.

But, I was going to try to use Pywikibot instead, as the best practices and error handling for querying the API are likely baked into Pywikibot, and less likely to get my IP banned if I make a logical error and over-query the API. (In addition, I am hoping to gain experience with this module for my future bot writing endeavors)

However, I can't really find very good documentation for Pywikibot, and am having trouble finding language in the existing doc about this API. I have also tried various other python packages in hopes to find bindings (is that the correct usage of bindings?) with this API, to no avail.

The power of small acts of resistance

Published 7 Aug 2017 by in New Humanist Articles and Posts.

Q&A with Steve Crawshaw, author of Street Spirit, a book on the power of mischief in protest.

Staying in is the new going out

Published 7 Aug 2017 by in New Humanist Articles and Posts.

Laurie Taylor embraces the joys of domestic quiescence.

How to get high-res thumbnails out of low-res SVGs in Mediawiki?

Published 7 Aug 2017 by Stephanus Tavilrond in Newest questions tagged mediawiki - Stack Overflow.

I don't exactly know how to word this question properly, so I'll make an attempt:

See the rasterized SVG on this page? Looks pretty distorted, and - excuse me language - rather s***. Now let's compare it with the one here. Both of them have the exact same SVG file and the source code is identical in wikitext. The difference is in how the rasterized "thumbnail" is generated, it seems.

The result that MediaWiki gives me

The result I get from MediaWiki.

Intended result

The intended result.

From what I have noticed - correct me if I'm wrong - both Wikipedia and Wikia either create several rasterized thumbnails for the SVG, or just simply generate them on demand, depending on the size the pages want. MediaWiki by default however, only generates one thumbnail, which is implied to have the same resolution as the original SVG - which gives us blurry and **** raster-images when rasterizing a small SVG to a large image.

Either that, or the SVGs don't get scaled/resized prior to thumbnailing/rasterization, when they should be.

Just for heads up, here's some come from my LocalSettings.php:

$wgFileExtensions = array( 'png', 'gif', 'jpg', 'jpeg',
    'xls', 'mpp', 'pdf', 'ppt', 'tiff', 'ogg', 'svg',
    'woff', 'eot', 'woff2'
// $wgSVGConverters['ImageMagick'] = '"' . $wgImageMagickConvertCommand . '" -background white -thumbnail $widthx$height^! $input PNG:$output';
// $wgSVGConverters['ImageMagick'] = '$path/convert -density $width -geometry $width $input PNG:$output';
$wgSVGConverters['ImageMagick'] = '$path/convert -density 1200 -background none -geometry $width $input PNG:$output';
$wgSVGConverters['rsvg'] = '/usr/bin/rsvg-convert -w $width -h $height $input -o $output';
$wgSVGConverter = 'ImageMagick';
// $wgSVGConverter = 'rsvg';
$wgSVGMaxSize = 2048;
$wgMaxImageArea = 1.25e7;
$wgMaxAnimatedGifArea = 1.0e6; 
$wgUseImageResize = true;
$wgGenerateThumbnailOnParse = true;

So... how do I enable having multiple thumbnails, if the lack of them is the cause of the problem? Is this even the cause of the problem to begin with? If not, what is the real reason why I'm not getting my intended result? What can I do?

EDIT: Already solved by switching over to ImageMagick to RSVG.

GeSHI new language

Published 7 Aug 2017 by Martin in Newest questions tagged mediawiki - Stack Overflow.

I running a mediawiki and want to use syntax highlighting. I am using the extension for that.

That works pretty fine. Alas I want to highlight a language that is not part of this extension. So I wrote an python egg to extend pygments so it can parse my language (using entry points). This works fine.

Now I am struggling to get both to work together. How do I tell the GeShi Extension of MediaWiki to use my pygments extension? What do I have to do, so that using <syntaxhighlight lang="myLanguage"> will result in using my language lexer and style?

How to Import the MediaWiki files?

Published 6 Aug 2017 by Fenici in Newest questions tagged mediawiki - Stack Overflow.

The instruction in the MediaWiki doc was not clear about how to restore a mediawikiImages.tar.gz file , for example, into MediaWiki system.

The step I did:

  1. download MediaWiki latest version

  2. install and configure it

  3. import the database via phpMyAdmin

  4. restore the mediawikiImages.tar.gz file into mediaWiki (this is the step I having trouble to follow) in Manual:Restoring_a_wiki_from_backup I am not sure how to backed up the image file

let me know if you wanna more information. any comments are welcome... just need some ideas for this.

For those who gave me downvotes. this is not a pure coding question for PHP but I believe there are people using media wiki and facing same issues. I don't understand why you voting it down, I would like your comments and suggestion.

How do I pass MediaWiki API cookies to login using Google Apps Script?

Published 5 Aug 2017 by Hausa Dictionary in Newest questions tagged mediawiki - Stack Overflow.

I've been trying to find good guides and examples for using Google Apps Script with the MediaWiki API but to no avail. So far been following MediaWiki's own guides to the best of my beginner abilities but I am stuck here:

Any help on how I can successfully login would be greatly appreciated.

Here's the code:

function myFunction() {
  //Attempt to Login to MediaWiki API...
  //Gets tokens required by data-modifying actions. If you request one of these actions without providing a token, the API returns an error code such as notoken.
  // > (For MediaWiki 1.27+)

  //See also and

  var loginURL1 = "";
  var login = UrlFetchApp.fetch(loginURL1);  
  //Logger.log(login); // {"batchcomplete":"","query":{"tokens":{"logintoken":"9c1a9eba6399bc9b096c162527f4244e59842ae3+\\"}}}
  var getLgToken = JSON.parse(login.getContentText()); 
  //Logger.log( getLgToken ); // {batchcomplete=, query={tokens={logintoken=062633fe9928f3413b12302ef3d2836e598545c4+\}}}
  //Logger.log( getLgToken.query.tokens.logintoken );
  var logintoken = getLgToken.query.tokens.logintoken;
  Logger.log( JSON.stringify( login.getAllHeaders() ) ); 
  var getCookie = JSON.stringify( login.getAllHeaders() );

  getCookie = JSON.parse(getCookie);
  Logger.log(getCookie.Vary); //returns Accept-Encoding but how can I get the results for "Set-Cookie" and which item do I pass to the next login? 

  ","Vary":"Accept-Encoding","Set-Cookie":["enwikiSession=1n6m0rst6j6a2du8455ctjf8c6b5h6bd; path=/; secure; httponly","forceHTTPS=true; path=/; httponly","WMF-Last-Access=05-Aug-2017;Path=/;HttpOnly;secure;Expires=Wed, 06 Sep 2017 00:00:00 GMT",
  "WMF-Last-Access-Global=05-Aug-2017;Path=/;;HttpOnly;secure;Expires=Wed, 06 Sep 2017 00:00:00 GMT","GeoIP=US:CA:Mountain_View:37.42:-122.06:v4; Path=/; secure;"],"

  var loginURL2 = "";
  var options = {
   'method' : 'post',
   'contentType': 'application/json',
    'headers': { 'Api-User-Agent': 'Example/1.0' },
    'cookies': '',
  //Logger.log( UrlFetchApp.fetch(loginURL2, options) ); //Doesn't work yet.

Editing MediaWiki pages with my own text editor

Published 4 Aug 2017 by Cedar in Newest questions tagged mediawiki - Stack Overflow.

I'm trying to make pretty pages hosted on a MediaWiki server, and I tend to spend a lot of time dealing with javascript and HTML and CSS while editing these wiki pages.

Now, the wiki editor on MediaWiki is really really basic, just a textbox. I would like to have things like syntax highlighting, tab-completion, and indentation help when I'm editing my pages.

Is there something that could give me those tools?

Or maybe help me download the Wiki, edit, and then sync the changes back up?

Generate A Chart Using Javascript in Mediawiki Extension

Published 4 Aug 2017 by Benji Button in Newest questions tagged mediawiki - Stack Overflow.

I'm currently trying to generate a chart using javascript in a MediaWiki extension after loading the data with the same extension from a MySQL database. The chart uses the Chart.js library for displaying the data.

The chart is not showing up and I believe it is because MediaWiki does not support running javascript after a page has been loaded in an attempt to prevent wiki users from writing javascript functions into the page.

Does anyone know of a way around this safeguard? The extension uses a form to get info about what data will be displayed in this chart so it has to be dynamically created (which I am doing through generating the HTML and javascript via PHP functions).

MediaWiki - "You have cookies disabled"

Published 4 Aug 2017 by Rein Shope in Newest questions tagged mediawiki - Stack Overflow.

I'm currently running an instance of MediaWiki on Windows Server 2008 R2. At some point yesterday, it started giving us this error when trying to log in:

"Wiki uses cookies to log in users. You have cookies disabled. Please enable them and try again"

We, of course, have cookies enabled. I've tried every single thing I could from the various posts I've found about this, including these:

MediaWiki sessions and cookies not working on multi-server behind CloudFlare

problem with mediawiki cookies

MediaWiki uses cookies to log in users. You have cookies disabled. Please enable them and try again

How can I fix the MediaWiki error "Wiki uses cookies to log in users. You have cookies disabled. Please enable them and try again."?

Nothing, so far, has worked. I'm running a simple private wiki setup, and I have a feeling this all started perhaps after the server rebooted or the IIS service got restarted.

Given that I've tried everything else that's mentioned in every thread I can find about this issue, what could be the root of the problem?

MediaWiki Edit Rest API - invalid CSRF token error

Published 3 Aug 2017 by SrishAkaTux in Newest questions tagged mediawiki - Stack Overflow.

I am using MediaWiki Rest API to make an edit on a wiki with javascript code. I am able to extract login token, successfully login and retrieve an edit token as well. But, in the final step, while calling edit API, I get 'invalid CSRF token' error. Although I am handling cookies, it seems like, between login and extracting edit token, I am missing a step. Any helpful pointers would be much appreciated! (link to my javascript code snippet)

Kubuntu 16.04.3 LTS Update Available

Published 3 Aug 2017 by clivejo in Kubuntu.

The third point release update to Kubuntu 16.04 LTS (Xenial Xerus) is out now. This contains all the bug-fixes added to 16.04 since its first release in April 2016. Users of 16.04 can run the normal update procedure to get these bug-fixes. In addition, we suggest adding the Backports PPA to update to Plasma 5.8.7. Read more about it:

Warning: 14.04 LTS to 16.04 LTS upgrades are problematic, and should not be attempted by the average user. Please install a fresh copy of 16.04.3 instead. To prevent messages about upgrading, change Prompt=lts with Prompt=normal or Prompt=never in the /etc/update-manager/release-upgrades file. As always, make a thorough backup of your data before upgrading.

See the Ubuntu 16.04.3 release announcement and Kubuntu Release Notes.

Download 16.04.3 images.

Launching the WebAssembly Working Group

Published 3 Aug 2017 by Bradley Nelson in W3C Blog.

We’d like to announce the formation of a WebAssembly Working Group.

For over two years the WebAssembly W3C Community Group has served as a forum for browser vendors and others to come together to develop an elegant and efficient compilation target for the Web. A first version is available in 4 browser engines and is on track to become a standard part of the Web. We’ve had several successful in-person CG meetings, while continuing our robust online collaboration on github. We also look forward to engaging the wider W3C community at the WebAssembly meeting at this year’s TPAC.

With the formation of this Working Group, we will soon be able to recommend an official version of the WebAssembly specification.

For those of you unfamiliar with WebAssembly, its initial goal is to provide a good way for C/C++ programs to compile to run on the Web, safely and at near-native speeds.

WebAssembly improves or enables a ranges of use cases, including:

WebAssembly is also about bringing more programming languages to the Web.

By offering a compact and well specified compilation target, WebAssembly enables not only compiled languages like C/C++ and Rust, but also interpreted languages like Lua, Python, and Ruby. As we enhance WebAssembly to support managed objects and better DOM+JS bindings, the list of supported languages will continue to grow.

Even if you develop primarily in JavaScript, you’ll benefit as a wealth of libraries from other languages are exposed to JavaScript. Imagine using JavaScript to access powerful libraries from outside the Web for things like physical simulation, fast number crunching, and machine learning.

There is still a lot of work to do with WebAssembly, which we will continue to incubate in our Community Group. We plan to make Wasm an even better compilation target and are already exploring adding features like: threads, managed object support, direct DOM/JS bindings, SIMD, and memory mapping.

A warm thanks to everyone involved with the WebAssembly effort.

Keep expecting the Web to do more!

Run SQL inline queries on Mediawiki

Published 3 Aug 2017 by LSG in Newest questions tagged mediawiki - Stack Overflow.

I would like to be able to run SQL queries in pages of a Mediawiki (inline queries). I am not even sure if this is possible or if we can do it only through SSH. I find information provided by MediaWiki pretty confusing for a new user.

As far as i am concerned, the SQL queries are wrapped for security reasons, so syntax will not be SQL exactly.

The questions would be: Can we make inline SQL (or wrapped SQL) queries in Mediawiki pages? If yes, how? If not, is there a 'similar' alternative for it? (For example, creating a function with the query in it an accessing it). Please, provide examples if possible and take in account i am not familiar with Mediawiki data structure. Let´s assume, for example, that i want to know all pages created by a users named 'user1' and 'user2'.

Also, if there is an extension which helps with this it would be worth mentioning.

I am using wampserver3.0.6_x64, apache2.4.23, mysql5.7.14 and php5.6.25.

Mediawiki - Create a link to a template's edit page

Published 3 Aug 2017 by Miguel Bartelsman in Newest questions tagged mediawiki - Stack Overflow.

I'm making a template, in this template I want to include a link that will send me to its own edit page so that it can be easily accessible from the pages it is included in (very similar to Wikipedia's "view - edit - discuss" links), but I want to do it without extensions. Is it possible? how?

I tried using[[{{PAGENAME}}?action=edit]] but it automatically converts that link to {{PAGENAME}}%26action%3Dedit&action=edit&redlink=1. Using html anchors doesn't seem to work either, they are not parsed.

MediaWiki sortable HTML table, not wikicode table

Published 2 Aug 2017 by S.Mason in Newest questions tagged mediawiki - Stack Overflow.

Does the wikimedia sortable table class work with HTML tables with class="wikitable sortable"? I can't find in the documentation. Using version 1.28.1

I added the sortable class and it doesn't do anything.

<table class="wikitable sortable">
 <tr class="column-header"><td>Name</td><td>Desc</td></tr>
 <tr><td>item 1</td><td>item 1 descr</td></tr>
 <tr><td>item 2</td><td>item 2 descr</td></tr>

Could I be missing CSS or javascript somewhere? It creates the table correctly with the styling but it's not sortable.

FastMail apps and signup available in Germany again

Published 2 Aug 2017 by Bron Gondwana in FastMail Blog.

For the past two months, FastMail has been rejecting new signups from Germany, and our app has been unavailable in the German app stores. We took these actions out of caution after receiving letters from the German Bundesnetzagentur requiring us to register with them as an email provider.

Until we obtained legal advice, we did not know whether we could comply with German requirements while fulfilling our obligations under Australian law.

Having conferred with lawyers in Germany, we have confidence that we can safely register, which allows us to provide apps in German stores again, and to accept new signups from German users.

From our lawyers:

you are required to notify your company and your business as a “commercial, publicly available telecommunications service. Section 6 of the German Telecommunications Act (TKG) provides the legal basis for this notification requirement: "Any person operating a public telecommunications network on a commercial basis or providing a publicly available telecommunications service on a profit-oriented basis shall notify the Bundesnetzagentur without undue delay of beginning to provide, of providing with differences or of ceasing to provide his activity and of any changes in his undertaking. Such notification requires written form."

you are not required to offer interception facilities. Most of the obligations following from part 7 (section 110 and 113) of the TKG are only relevant if the provider has more than 100,000 German customers. So it’s misleading if the Bundesnetzagentur talks about “100,000 customers” without mentioning that this means German customers only.

We currently have significantly fewer than 100,000 customers in Germany. We will re-assess our legal situation when we get closer to 100,000 German customers as international law in this area is changing quickly, and the situation may have changed again by the time those clauses become material.

FastMail continues to be an Australian company subject to Australian law as described in our privacy policy. Our understanding of Australian law is that it is illegal for us to directly provide any customer data or metadata to law enforcement authorities from outside Australia. If we receive requests from the Bundesnetzagentur we will follow our existing process for all foreign requests and refer them to their country's mutual assistance treaty with Australia.

How do I log failed MediaWiki login attempts?

Published 1 Aug 2017 by Roger Creasy in Newest questions tagged mediawiki - Stack Overflow.

I cannot find a MediaWiki hook event for a failed login attempt. Does one exist? If not, does anyone know of a strategy for determining failed attempts?

In case there is another way - I am trying to log failed logins.

Here is the pertinent bit of my code, the globals are set to the name of the wiki (I also tried the code offered in the comments):

$wgHooks['AuthManagerLoginAuthenticateAudit'][] = 'logAuth';
function logAuth($response, $user, $username)                                                                                                                                        
    // grab the MediaWiki global vars                                                                                                                                                
    global $fail2banfile;                                                                                                                                                            
    global $fail2banid;                                                                                                                                                              

    //set vars to log                                                                                                                                                                
    $time = date("Y-m-d H:i:s T");                                                                                                                                                   
    $ip = $_SERVER['REMOTE_ADDR'];                                                                                                                                                   

    //successful login                                                                                                                                                               
    if ($response->status == "PASS") {                                                                                                                                               
        error_log("$time Successful login by $username from $ip on   $fail2banid\n", 3, $fail2banfile);                                                                                
        return true; //continue to next hook                                                                                                                                         
    } else {                                                                                                                                                                         
        error_log("$time Authentication error by $username from $ip on $fail2banid\n", 3, $fail2banfile);                                                                            
        return true; //continue to next hook                                                                                                                                         

The above logs successful logins, and failed logins by registered users. Login attempts by unregistered usernames are not logged. I am using the logs with Fail2Ban.

Marley Spoon: A Look into Their Stack and Team Structure

Published 31 Jul 2017 by Hollie Haggans in DigitalOcean: Cloud computing designed for developers.

Marley Spoon: A Look into Their Stack and Team Structure

Over the past eleven months, more than 1,600 startups from around the world have built their infrastructure on DigitalOcean through Hatch, our global incubator program designed to help startups as they scale. Launched in 2016, the goal of the program is to help support the next generation of startups get their products off the ground.

Marley Spoon, a subscription meal kit company based in Berlin and Hatch startup, sees infrastructure as an integral part of every engineer’s workflow. “We are trying to build a team where people don’t feel responsible just for a small bit, but we want to build a team where people feel responsible for the whole architecture,” says Stefano Zanella, Head of Software Engineering at Marley Spoon. “In order to do this, we believe that people need to know how the system works.”

In this interview, Zanella gives us a glimpse into Marley Spoon’s unique engineering team structure, and the technologies they use to power both their customer-facing platform and the internal-facing production and distribution platform. The following has been transcribed from our Deep End podcast, and adapted for this blog post.

DigitalOcean: How do you model your engineering teams?

Stefano Zanella: Our teams are shaped around user flows to some extent. We have currently four teams: three teams are product teams—they are related to the product itself—and one team actually takes care of the platform for the infrastructure.

The [first] three teams, we shape them around the user flow. So, we have a team that takes care of the new customers. We call it the acquisition team because they focus mostly on marketing, but they also provide data insights, manage the customer experience for new customers, shorten the subscription flow, and so on.

Then we have a team that focuses on the recurring customers. It’s the team that takes care of the functionality like adding to an order, posing new subscriptions, keeping a delivery, changing your address, changing the time that you want your box at home, etc.

And then the third team actually takes care of what we call the “back office” in the sense that we do it in our own production centers; we have warehouses all across the world. We have a tool that tracks how many orders need to be done, when, where, and [by] which warehouse. We have them organize the batches because we work a lot with shippers and we try to be just in time, because of course the food is fresh and we want to keep it just like that. So this team takes care of all the production-related issues.

And how do you organize these teams? Do you have teams with maybe product managers, designers, engineers in the same group? Or [do] you isolate teams depending on their skill set or area of expertise?

The interesting thing about Marley Spoon is that the situation is always changing. We are very proud of the fact that we believe in owning the process and changing the process and structure as we see fit.

When we started we had an engineering team and a product team. Then, at some point, we realized that the communication structure wasn’t working well enough for us to be productive and effective enough. So we actually put the product managers inside the [engineering] teams. Then, we [also] figured out that the relationship with the designers wasn’t good enough, so we put the designers inside the team as well.

For a certain period of time, we had teams [that] were functional from my point of view, and now since we are growing a lot, the team is growing, and we have different needs. We are [now] focusing on product managers aligning with the rest of the business, rather than with engineers because the relationship with engineers is really good right now. We moved the product team outside of the teams again, so they are their own team because we want them to also work as a team, not just be disconnected. We assign specific product managers to specific departments and then internally, the team shuffles the work to the engineering team. But it’s a situation that can change every time, because it really depends on where we see the problems.

Going down the technology side of things, what’s your stack and architecture right now? Or maybe you want to talk about how Marley Spoon evolved?

Well, actually let me answer the last part of your question, because I think it’s really interesting speaking about the engineers. So, we do believe that the main role of an engineer is not writing code, but it's actually running the system.

And in order to do this, we believe that people need to know how the system works. They need to have a feeling of how the whole system is working. From that point of view, we don’t see all of the teams related to technology, for example. We use a workflow based on the Kanban workflow. Since it’s based on Kanban, every time somebody runs out of work, they are free to pick new work from the backlog. And the product managers manage the backlog, which means that whoever is free should pick stuff from the top because that’s the most important thing to do.

We don’t have this clear distinction between backend and frontend developers. We do have people that are more skilled at frontend or backend, but we try to broaden their scope of action all the time. So, from that point of view, we try to help each other a lot because we believe that’s the best way to grow.

Getting back to the stack question, what are the technologies you have in your architecture?

So, mainly we are a Ruby-based company. We use Rails mainly for our web apps. We have a couple of projects that are pure Ruby because they are projects for background processing. We started them in Ruby, but we are considering switching to a different technology.

We are currently in the process of upgrading the stack because we were using Backbone as a library and Coffeescript as a language because that was what was coming out with default Rails 4. Now we are slowly moving toward React because we see a lot of traction outside and inside the team as well. So, we would like to give it a try.

We hope that will also help us shape and improve our relationship with the designers, for example. Then we have a small a command line tool, for our Kanban board because we wrote it ourselves. We wrote our own Kanban board because we like to have a tool that can evolve in the process. And we wrote a very little command line tool so that you can create tickets and move tickets around from the command line.

Tune into the full interview on our podcast, or learn more about our Hatch program today.

Hollie Haggans heads up Global Partnerships for DigitalOcean’s Hatch program. She is passionate about startups and cold brew coffee. Get in touch with questions at

Mediawiki - Display languages chooser on the sidebar (left) [duplicate]

Published 31 Jul 2017 by Manu in Newest questions tagged mediawiki - Stack Overflow.

This question already has an answer here:

I have install MediaWiki with Xamp. But my wiki content must be in English and in French. I want a menu on the left (like Wikipedia) for choose the language.

I have found this :

but it's not clear.

The mysterious case of the changed last modified dates

Published 31 Jul 2017 by Jenny Mitcham in Digital Archiving at the University of York.

Today's blog post is effectively a mystery story.

Like any good story it has a beginning (the problem is discovered, the digital archive is temporarily thrown into chaos), a middle (attempts are made to solve the mystery and make things better, several different avenues are explored) and an end (the digital preservation community come to my aid).

This story has a happy ending (hooray) but also includes some food for thought (all the best stories do) and as always I'd be very pleased to hear what you think.

The beginning

I have probably mentioned before that I don't have a full digital archive in place just yet. While I work towards a bigger and better solution, I have a set of temporary procedures in place to ingest digital archives on to what is effectively a piece of locked down university filestore. The procedures and workflows are both 'better than nothing' and 'good enough' as a temporary measure and actually appear to take us pretty much up to Level 2 of the NDSA Levels of Preservation (and beyond in some places).

One of the ways I ensure that all is well in the little bit of filestore that I call 'The Digital Archive' is to run frequent integrity checks over the data, using a free checksum utility. Checksums (effectively unique digital fingerprints) for each file in the digital archive are created when content is ingested and these are checked periodically to ensure that nothing has changed. IT keep back-ups of the filestore for a period of three months, so as long as this integrity checking happens within this three month period (in reality I actually do this 3 or 4 times a month) then problems can be rectified and digital preservation nirvana can be seamlessly restored.

Checksum checking is normally quite dull. Thankfully it is an automated process that runs in the background and I can just get on with my work and cheer when I get a notification that tells me all is well. Generally all is well, it is very rare that any errors are highlighted - when that happens I blog about it!

I have perhaps naively believed for some time that I'm doing everything I need to do to keep those files safe and unchanged because if the checksum is the same then all is well, however this month I encountered a problem...

I've been doing some tidying of the digital archive structure and alongside this have been gathering a bit of data about the archives, specifically looking at things like file formats, number of unidentified files and last modified dates.

Whilst doing this I noticed that one of the archives that I had received in 2013 contained 26 files with a last modified date of 18th January 2017 at 09:53. How could this be so if I have been looking after these files carefully and the checksums are the same as they were when the files were deposited?

The 26 files were all EML files - email messages exported from Microsoft Outlook. These were the only EML files within the whole digital archive. The files weren't all in the same directory and other files sitting in those directories retained their original last modified dates.

The middle

So this was all a bit strange...and worrying too. Am I doing my job properly? Is this something I should be bringing to the supportive environment of the DPC's Fail Club?

The last modified dates of files are important to us as digital archivists. This is part of the metadata that comes with a file. It tells us something about the file. If we lose this date are we losing a little piece of the authentic digital object that we are trying to preserve?

Instead of beating myself up about it I wanted to do three things:

  1. Solve the mystery (find out what happened and why)
  2. See if I could fix it
  3. Stop it happening again
So how could it have happened? Has someone tampered with these 26 files? Perhaps unlikely considering they all have the exact same date/time stamp which to me suggests a more automated process. Also, the digital archive isn't widely accessible. Quite deliberately it is only really me (and the filestore administrators) who have access.

I asked IT whether they could explain it. Had some process been carried out across all filestores that involved EML files specifically? They couldn't think of a reason why this may have occurred. They also confirmed my suspicions that we have no backups of the files with the original last modified dates.

I spoke to a digital forensics expert from the Computer Science department and he said he could analyse the files for me and see if he could work out what had acted on them and also suggest a methodology of restoring the dates.

I have a record of the last modified dates of these 26 files when they arrived - the checksum tool that I use writes the last modified date to the hash file it creates. I wondered whether manually changing the last modified dates back to what they were originally was the right thing to do or whether I should just accept and record the change.

...but I decided to sit on it until I understood the problem better.

The end

I threw the question out to the digital preservation community on Twitter and as usual I was not disappointed!

In fact, along with a whole load of discussion and debate, Andy Jackson was able to track down what appears to be the cause of the problem.

He very helpfully pointed me to a thread on StackExchange which described the issue I was seeing.

It was a great comfort to discover that the cause of this problem was apparently a bug and not something more sinister. It appears I am not alone!

...but what now?

So I now I think I know what caused the problem but questions remain around how to catch issues like this more quickly (not six months after it has happened) and what to do with the files themselves.

IT have mentioned to me that an OS upgrade may provide us with better auditing support on the filestore. Being able to view reports on changes made to digital objects within the digital archive would be potentially very useful (though perhaps even that wouldn't have picked up this Windows bug?). I'm also exploring whether I can make particular directories read only and whether that would stop issues such as this occurring in the future.

If anyone knows of any other tools that can help, please let me know.

The other decision to make is what to do with the files themselves. Should I try and fix them? More interesting debate on Twitter on this topic and even on the value of these dates in the first place. If we can fudge them then so can others - they may have already been fudged before they got to the digital archive - in which case, how much value do they really have?

So should we try and fix last modified dates or should we focus our attention on capturing and storing them within the metadata. The later may be a more sustainable solution in the longer term, given their slightly slippery nature!

I know there are lots of people interested in this topic - just see this recent blog post by Sarah Mason and in particular the comments - When was that?: Maintaining or changing ‘created’ and ‘last modified’ dates. It is great that we are talking about real nuts and bolts of digital preservation and that there are so many people willing to share their thoughts with the community.

...and perhaps if you have EML files in your digital archive you should check them too!

Don't lose your mail: a tale of horror

Published 31 Jul 2017 by Nicola Nye in FastMail Blog.

Caution: this blog post is not for the faint-hearted. It will take you on a journey through the darkest depths of despair and anxiety. But fear not, there is a happy ending. So come with us, gentle reader, on a tale of thrilling adventure sure to set your hair on end.

There you are, sitting at your desk (or on a train, or a bus) catching up on your email, making plans, following up with friends and colleagues. This is good. This is going great. Except...

You've lost your email.

Your heart sinks, your palms sweat.

You double-check. Definitely gone. Not all of it: some folders are still there. But mail is definitely... gone.

You check that your internet service provider isn't offline. You check that your internet is working. You check your mail on your phone and tablet.

Everything else is fine, but mail is missing. MISSING!

Do you use FastMail on the web?

Don't lose your mail: not with FastMail

Published 31 Jul 2017 by Nicola Nye in FastMail Blog.

Looking for the start of this adventure?

We know how vital email is to the every day running of our lives. We know the sick feeling you get when you lose even one important mail.

We want to make it hard for you to accidentally lose mail, and easy for you to recover in the unfortunate event a disaster occurs.

FastMail. We've got your back.

How to make a table of contents in Media-Wiki that has a direct link to other pages and not to sections in the same page

Published 31 Jul 2017 by 21kc in Newest questions tagged mediawiki - Stack Overflow.

I have certain headers in my wiki-page that serve as links to other pages.

Wiki creates automatically the table of contents for all the headers on my page, but when you press on a header (which is a link) in the table of contents, it directs you to the section on the same page and not to the linked page.

For example:

If my header is :

=[[OtherWikiPage| link to another page]]=

The line that will appear in the Table of contents will be:

link to another page

But when you will press on it, it will direct you to the section and not to the actual linked page.

Thanks for your help...

Period pains: how the menstrual taboo is being challenged

Published 31 Jul 2017 by in New Humanist Articles and Posts.

Social taboos over menstruation cause undue shame to millions of women – but it is finally being understood as a human rights issue.

Find Leaflet map object after initialisation

Published 31 Jul 2017 by Joeytje50 in Newest questions tagged mediawiki - Stack Overflow.

I'm trying to change some things on a map that is already initialised by another script, using the Leaflet library. This other script did not store the map-object in a global variable, or in any other location I can access with my script. So currently there is a map on my page, but I don't have the map object.

What I'd like to do is to retrieve the object of an already-initialised map, to make changes to it. For example, if there'd exist a function L.getMap('myID') I'd like to use such a method to retrieve the map object linked to the container myID.

TL;DR: Is there a way to get a map object of an already-initialised leaflet map, using the id of the container?

Changing the position and styling of particular label or button for a particular template in mediawiki?

Published 30 Jul 2017 by Jatin Sabherwal in Newest questions tagged mediawiki - Stack Overflow.

I have an ApprovedRevs extension on my wiki and i want to change its position and color for a particular template. Currently, it is shown as a subtitle alongside the Title. I also want to convert it in a blue tick sign rather than an approved label.

Can anyone tell me which files should i affect and how should i do it ? Thanks in advance.

Fast way to check against link rot in a MW wiki?

Published 29 Jul 2017 by Rob Kam in Newest questions tagged mediawiki - Stack Overflow.

I was using extension ExternalLinks to check external links per page. However it's now marked unsafe and is no longer maintained. What quick and easy ways are there to validate all external URLs on a MW wiki?

How to get all url's ( not just titles ) in a wikipedia article using mediawiki api?

Published 29 Jul 2017 by saurabh vyas in Newest questions tagged mediawiki - Stack Overflow.

I am using the wikimedia api to retrieve all possible URL's from a wikipedia article ,'' , but it is only giving a list of link titles , for example , Artificial Intelligence , wikipedia page has a link titled " delivery networks," , but the actual URL is "" , which is what I want

Jazz and the MediaWiki package

Published 28 Jul 2017 by Sam Wilson in Sam's notebook.

And rain, I mustn’t forget the rain. I’m worrying about the roof, although far less than I used to (it’s a different roof). The jazz is the radio; it’s on.

But the main point this morning is exploring the mediawiki-lts package maintained by Legoktm. I’ve been meaning to look at it for a while, and switch my (non-playground) wikis over to it, but there’s never enough time. Not that there’s enough time now, but I’m just trying to get it running locally for two wikis (yes, the smallest possible farm).

So, in simple steps, I first added the PPA:

sudo add-apt-repository ppa:legoktm/mediawiki-lts

This created /etc/apt/sources.list.d/legoktm-ubuntu-mediawiki-lts-xenial.list. Then I updated the package info:

sudo apt-get update

And installed the package:

sudo apt install mediawiki

At this point, the installation prompt for MediaWiki 1.27.3 was available at http://localhost/mediawiki/ (which luckily doesn’t conflict with anything I already had locally) and I stepped through the installer, creating a new database and DB user via phpMyAdmin as I went, and answering all the questions appropriately. (It’s actually been a while since I last saw the installer properly.) The only tricky thing I found was that it asks for the “Directory for deleted files” but not for the actual directory for all files — because I want the files to be stored in a particular place and not in /usr/share/mediawiki/images/, especially as I want there to be two different wikis that don’t share files.

I made a typo in my database username in the installation form, and got a “Access denied for user x to database y” error. I hit the browser’s back button, and then the installer’s back buttons, to go back to the relevant page in the installer, fixed the typo and proceeded. It remembered everything correctly, and this time installed the database tables, with only one error. This was “Notice: JobQueueGroup::__destruct: 1 buffered job(s) of type(s) RecentChangesUpdateJob never inserted. in /usr/share/mediawiki/includes/jobqueue/JobQueueGroup.php on line 447”. Didn’t seem to matter.

At the end of the installer, it prompted me to download LocalSettings.php and put it at /etc/mediawiki/LocalSettings.php which I did:

 sudo mv ~/LocalSettings.php /etc/mediawiki/.
 sudo chown root:root /etc/mediawiki/LocalSettings.php
 sudo chmod 644 /etc/mediawiki/LocalSettings.php

And then I had a working wiki at http://localhost/mediawiki/index.php!


I wanted a different URL, so edited /etc/apache2/sites-available/000-default.conf (in order to not modify the package-provided /etc/mediawiki/mediawiki.conf) to add:

Alias /mywiki /var/lib/mediawiki

And changed the following in LocalSettings.php:

$wgScriptPath = "/mywiki";

The multiple wikis will have to wait until later, as will the backup regime.

Roundup: Welcome, on news, bad tools and great tools

Published 28 Jul 2017 by Carlos Fenollosa in Carlos Fenollosa — Blog.

I'm starting a series of posts with a summary of the most interesting links I found. The concept of "social bookmarks" has always been interesting, but no implementation is perfect. was probably the closest to a good enough service, but in the end, we all just post them to Twitter and Facebook for shares and likes.

Unfortunately, Twitter search sucks, and browser bookmarks rot quickly. That's why I'm trying this new model of social + local, not only for my readers but also for myself. Furthermore, writing a tapas-sized post is much faster than a well-thought one.

Hopefully, forcing myself to post periodically —no promises, though— will encourage me to write regular articles sometimes.

Anyway, these posts will try to organize links I post on my Twitter account and provide a bit more of context.

While other friends publish newsletters, I still believe RSS can work well, so subscribe to the RSS if you want to get these updates. Another option is to use some of the services which deliver feeds by email, like Feenbox which, by the way may never leave alpha, so drop me an email if you want an invitation.


RTVE, the Spanish public TV, has uploaded a few Bit a bit episodes. It was a rad early-90s show that presented video games and the early Internet.

On news

I quit reading news 3 years ago. A recent article from Tobias Rose-Stockwell digs deep into how your fear and outrage are being sold for profit by the Media.

@xurxof recommended a 2012 article from Rolf Dobelli, Avoid News. Towards a Healthy News Diet

LTE > Fiber

I was having router issues and realized how my cellphone internet is sometimes more reliable than my home fiber.

It seems to be more common than you'd think, read the Twitter replies! XKCD also recently posted a comic on this


There was a discussion on on tools to journal your workday, which was one of the reasons that led me to try out these roundup posts.

New keyboard

I bought a Matias Clicky mechanical keyboard which sounds like a minigun. For all those interested in mechanical keyboards, you must watch Thomas's Youtube channel

The new board doesn't have a nav cluster, so I configured Ctrl-HJKL to be the arrow keys. It gets a few days to get used to, but since then, I've been using that combination even when I'm using a keyboard with arrow keys.

Slack eats CPU cycles

Slack was eating a fair amount of my CPU while my laptop was trying to build a Docker image and sync 3000 files on Dropbox. Matthew O'Riordan also wrote Where’s all my CPU and memory gone? The answer: Slack

Focus, focus, focus!

I'm a subscriber and use it regularly, especially when I'm working on the train or in a busy cafe.

musicForProgramming() is a free resource with a variety of music and also provides a podcast feed for updates.

Tags: roundup

Comments? Tweet  

Kubuntu Artful Aardvark (17.10) Alpha 2

Published 28 Jul 2017 by clivejo in Kubuntu.

Artful Aardvark (17.10) Alpha 2 images are now available for testing.

The Kubuntu team will be releasing 17.10 in October.

This is the first spin in preparation for the Alpha 2 pre-release. Kubuntu Alpha pre-releases are NOT recommended for:

Kubuntu Alpha pre-releases are recommended for:

Getting Kubuntu 17.10 Alpha 2

To upgrade to Kubuntu 17.10 pre-releases from 17.04, run sudo do-release-upgrade -d from a command line.

Download a Bootable image and put it onto a DVD or USB Drive

See our release notes:

Please report your results on the Release tracker:

Media Wiki Embed File History

Published 27 Jul 2017 by S.Mason in Newest questions tagged mediawiki - Stack Overflow.

Is it possible to embed a wiki file's history on a page?

I want to embed a file's history on a page related to the file so that someone can download older versions without having to click into the file's page. It would be nice if I can show just the history section of that file. enter image description here

How can that be done?

MediaWiki Giving warnings

Published 27 Jul 2017 by Anti21 in Newest questions tagged mediawiki - Stack Overflow.

I just installed MediaWiki with MySQL on Windows 7. It is running using IIS on localhost:94. Whenever I open the website I get the following two errors :

Warning: OutputPage::transformFilePath: Failed to hash 
.29.0/../resources/assets/wiki.png [Called from 
OutputPage::transformFilePath in 
1.29.0\includes\OutputPage.php at line 3804] in 
1.29.0\includes\debug\MWDebug.php on line 309

Warning: md5_file(C:\Users\smehta30\Documents\Website\MediaWiki\mediawiki-
1.29.0/../resources/assets/wiki.png): failed to open stream: No such file or 
directory in C:\Users\smehta30\Documents\Website\MediaWiki\mediawiki-
1.29.0\includes\OutputPage.php on line 3802

Since I am new to this and this was installed using the default MediaWiki installation can you please point to what I have done wrong?

The second error points to file not found. How do I correct it? Specifically which file in dir.

In a word: global

Published 27 Jul 2017 by in New Humanist Articles and Posts.

Michael Rosen's column on language and its uses.

Is there a way to insert the name of the user who used a template?

Published 26 Jul 2017 by user241205 in Newest questions tagged mediawiki - Stack Overflow.

I want to make a template which automatically adds the name of person who used it in another page.

For example if I had the following template named "addedby":

Added by '~~~~'.

I want it so when I use it in the page like this:

This page was {{addedby}}

The ~~~~ is automatically replaced by the name of the user who used the template.
For example if user 'john' used the template the final page above would look like this:

This page was Added by 'john'.

Is this possible? I tried using ~~~~ but when I save the template ~~~~ is replaced directly in the template when I save it, not when I use the template in any of the pages.

Mediawiki upgrade on Strato webspace to allow use of PHP version newer than 5.3

Published 26 Jul 2017 by tfv in Newest questions tagged mediawiki - Stack Overflow.

I am currently trying to upgrade different older mediawiki versions (1.19, 1.21) to a more recent versions since Strato does no longer support PHP version 5.3.

Those mediawiki installations have originally been installed using the Strato App Wizard, which currently would install mediawiki 1.23.14.

I am aware of the following information:

a.) Strato description on updates of apps (only in German, but there is no description on updates of mediawiki)

b.) Mediawiki Update guide

c.) Compatibility table between different mediawiki and PHP versions (since my most current need is just to migrate to a newer PHP version which is still supported by Starto)

Does anyboy have experience with mediawiki upgrades at Strato? Is there any easier way to do an upgrade, e.g. using the app wizard?

My letter to the Boy Scouts of America

Published 25 Jul 2017 by legoktm in The Lego Mirror.

The following is a letter I just mailed to the Boy Scouts of America, following President Donald Trump's speech at the National Jamboree. I implore my fellow scouts to also contact the BSA to express their feelings.

25 July 2017

Boy Scouts of America
PO Box 152079
Irving, TX

Dear Boy Scouts of America,

Like many others I was extremely disappointed and disgusted to hear about the contents of President Donald Trump’s speech to the National Jamboree. Politics aside, I have no qualms with inviting the president, or having him speak to scouts. I was glad that some of the Eagle Scouts currently serving at high levels of our government were recognized for their accomplishments.

However above all, the Boy Scouts of America must adhere to the values of the Scout Law, and it was plainly obvious that the president’s speech did not. Insulting opponents is not “kindness”. Threatening to fire a colleague is not “loyal”. Encouraging boos of a former President is not “courteous”. Talking about fake news and media is not “trustworthy”. At the end of the day, the values of the Scout Law are the most important lesson we must instill in our youth – and President Trump showed the opposite.

The Boy Scouts of America must send a strong message to the public, and most importantly the young scouts that were present, that the president’s speech was not acceptable and does not embody the principles of the Boy Scouts of America.

I will continue to speak well of scouting and the program to all, but incidents like this will only harm future boys who will be dissuaded from joining the organization in the first place.

Kunal Mehta
Eagle Scout, 2012
Troop 294
San Jose, CA

Through the hard times and the good

Published 24 Jul 2017 by in New Humanist Articles and Posts.

We might not realise it, but our image of modern Britain owes a debt to the propaganda arm of empire.

How do I get my MediaWiki site to use templates? [closed]

Published 21 Jul 2017 by Cyberherbalist in Newest questions tagged mediawiki - Webmasters Stack Exchange.

My MediaWiki site is currently using v1.24.4.

I don't seem to have many templates installed, and some very important ones seem to be missing. For example, I can't use the Reference List template. If I do put references in an article, with {{reflist}} at the bottom, the template comes across as a redlink:


Are templates something that have to be installed separately? And if so, how do I go about it.

My site is hosted by DreamHost.

“Fixing the Web” with Jeff Jaffe, Brewster Kahle and Steven Gordon

Published 20 Jul 2017 by Amy van der Hiel in W3C Blog.

On 14 July 2017, W3C CEO Jeff Jaffe (MIT ’76) was featured as part of an MIT Alumni Association Panel “Fixing the Web” with Brewster Kahle, (’82) Founder and Digital Librarian, Internet Archive and Steven Gordon (’75), Professor of IT Management, Babson College.

When talking about the history of the Web and Tim Berners-Lee, Jeff noted that after its invention:

“He created a consortium called the W3C so that everyone who was interested in enhancing the web technology base can work together collaboratively.”

Jeff added about W3C:

“Most of our work recently has been transforming the web from being a large database of static information to dynamic information; a web of application where people build web applications which work essentially as distributed applications across multiple systems, making sure that we address societal problems such as web accessibility for people that have challenges or security privacy issues.”

The panel was moderated by science Journalist Barbara Moran, and the topics were wide ranging and interesting – from the Internet Archive, to government control of the Web, advertising, social media, innovation and more.

In the discussion, a question was raised from Twitter about the EME standard:

Jeff noted:

We’ve developed a new proposed standard called EME, Encrypted Media Extensions, that instead of displaying these movies to hundreds of millions of people in an insecure and privacy violating fashion, we’ve built it in a way that makes it secure for people to watch movies.


Please watch the video if you’d like to see more.


Building the Lego Saturn V rocket 48 years after the moon landing

Published 20 Jul 2017 by legoktm in The Lego Mirror.

Full quality video available on Wikimedia Commons.

On this day 48 years ago, three astronauts landed on the moon after flying there in a Saturn V rocket.

Today I spent four hours building the Lego Saturn V rocket - the largest Lego model I've ever built. Throughout the process I was constantly impressed with the design of the rocket, and how it all came together. The attention paid to the little details is outstanding, and made it such a rewarding experience. If you can find a place that has them in stock, get one. It's entirely worth it.

The rocket is designed to be separated into the individual stages, and the lander actually fits inside the rocket. Vertically, it's 3ft, and comes with three stands so you can show it off horizontally.

As a side project, I also created a timelapse of the entire build, using some pretty cool tools. After searching online how to have my DSLR take photos on a set interval and being frustrated with all the examples that used a TI-84 calculator, I stumbled upon gphoto2, which lets you control digital cameras. I ended up using a command as simple as gphoto2 --capture-image-and-download -I 30 to have it take and save photos every 30 seconds. The only negative part is that it absolutely killed the camera's battery, and within an hour I needed to switch the battery.

To stitch the photos together (after renaming them a bit), ffmpeg came to the rescue: ffmpeg -r 20 -i "%04d.jpg" -s hd1080 -vcodec libx264 time-lapse.mp4. Pretty simple in the end!

Riding the Jet Stream to 1 Million Users

Published 18 Jul 2017 by Ben Uretsky in DigitalOcean: Cloud computing designed for developers.

Riding the Jet Stream to 1 Million Users

Today, we’re excited to share a recent milestone with you: DO now supports 1 million users around the world. We’ve grown with our users, and have worked hard to give them the products they need to run their services without compromising the user experience they’ve come to love. We’re grateful to our users and community, and to the people that have helped us grow and learn along the way.

In 2012, DigitalOcean had a modest start. Our staging environment was around 4 or 5 servers, and we had a handful of engineers running the platform. We had two datacenter regions, 200 Droplets deployed, and a vision for what cloud computing could become. But most importantly, we had the support of a community of developers that helped us realize that vision.

A Maiden Voyage

Holding user groups in our early stages really helped us answer key questions about what aspects of the user experience could be improved. We launched our first datacenter, NYC1, and opened up our first international datacenter, AMS1, in January 2012.

Our users have played a huge part in helping us determine where to launch new datacenters to serve them better; in addition to NYC and Amsterdam, we now have them in San Francisco, Frankfurt, London, Singapore, Toronto, and Bangalore. Our dedicated team of network engineers, software engineers, datacenter technicians, and platform support specialists have worked tirelessly to give all of our users a great experience and access to simple cloud computing at any scale.

Making Waves

Among our early adopters were projects and companies like, AudioBox, and GitLab, who have scaled along with us as we’ve grown. Projects like Laravel Forge also chose to host their applications on DO. We’ve also partnered with companies like GitHub (Student Developer Pack and Hacktoberfest), Docker (Docker Student Developer Kit and our Docker one-click application), CoreOS, and Mesosphere on major initiatives.

Developers that helped spread the word when we first started include John Resig (jQuery), Jeff Atwood (Stack Overflow), Ryan Bates (Railscast), Xavier Noria (core Rails contributor), and Salvatore Sanfilippo (Redis).

Pere Hospital, co-founder of Cloudways, found DigitalOcean in 2014 while looking for an IaaS partner that could add value to his clients’ business processes. When Cloudways hit 5,000 DO compute instances they had their own internal celebration—and they’ve added thousands more since.

John O’Nolan, founder of Ghost, shared this anecdote: “On one of DigitalOcean's birthdays, the team sent us a couple of vinyl shark toys as a surprise and a thank you for being a customer. These sharks quickly became a mainstay of our weekly team meetings, along with the most horrific slew of puns: “Are you being ‘shark-astic’?” “That sounds a bit fishy.” etc. The jokes went so far that six months later we somehow found ourselves on a retreat in Thailand with our CTO, Hannah, coding at a table in a full-body shark costume.”

Additionally, several community members embraced DO and created tools that extended our API early on. Jack Pearkes created the command line tool, Tugboat, in 2013. Ørjan Blom created Barge, a Ruby library that pre-dated our official Ruby library, droplet_kit. Lorenzo Setale created python-digitalocean, which remains the most widely used Python library on DO. And Antoine Corcy created DigitalOceanV2, a library that helps PHP applications interact with v2 of the DO API. There have also been many others that have shared feedback with us and created tools of their own. We thank all of you for being a part of this.

All Hands on Deck

Members of the DO community have become a part of the DO family. We’ve reached over 1,600 tutorials on our Community site, in large part due to technologists that have contributed articles through participation in our Get Paid to Write program. Marko Mudrinić, for example, has written a number of articles for the Community site, frequently engages with other users in our Q&A section, and contributes to the official DO command line tool, doctl.

We’ve been lucky to have community members go on to join the DO team, like Community Manager Kamal Nasser and Platform Support Specialist Jonathan Tittle. Jonathan was an early adopter, having migrated his company’s customers to DO back in 2012. He then became one of our most engaged Community members. Jonathan told me, “When I look over questions posted to the DigitalOcean Community, I can honestly look back and say 'I’ve been there' and recall the countless times that I ran into an issue and couldn’t find the answer on my own, much less get the help I needed from someone who knew. When the questions were stacking up one day, I dove in and did my best to help. I quickly found myself spending countless hours troubleshooting alongside a user until an issue was resolved. I was simply trying to offer a helping hand when and where I could.”

Over the Horizon

The journey to 1 million is full of stories, people, moments, events, and companies that have crossed paths with us and have inspired us. Our users have been with us every step of the way, and we’ve tasked ourselves with meeting their growing infrastructure needs, and their goals for engaging and collaborating with us. There is so much more to come, and we’re excited to share it all with you. Thank you!

Book review: To Be A Machine

Published 18 Jul 2017 by in New Humanist Articles and Posts.

Mark O'Connell's study of transhumanism is a portrait of a movement that believes death is our disgrace and technology our redeemer.

Does literature help or hinder the fight for equality?

Published 18 Jul 2017 by in New Humanist Articles and Posts.

Today, the concept of human rights is being dangerously undermined. Does literature offer us a way back from the brink?

Song Club Showcase

Published 14 Jul 2017 by Dave Robertson in Dave Robertson.

While the finishing touches are being put on the album, I’m going solo with other Freo songwriter’s at the Fib.


Net Neutrality: Why the Internet Must Remain Open and Accessible

Published 11 Jul 2017 by Ben Uretsky in DigitalOcean: Cloud computing designed for developers.

Net Neutrality: Why the Internet Must Remain Open and Accessible

DigitalOcean is proud to be taking part in today’s Day of Action to Save Net Neutrality. Access to an open internet is crucial to allowing companies like DigitalOcean and the thousands of businesses we power to exist. This is not something we can take for granted. Efforts to roll back the protections provided by net neutrality rules will stifle innovation and create an uneven playing field for smaller companies competing with entrenched players.

I want to share the letter that I sent to our representatives in the Senate and encourage you to join us in speaking up while there's still time.

DigitalOcean Inc. supports the Federal Communication Commission’s Open Internet Order and the principles of network neutrality that it upholds. As an infrastructure provider serving over one million registered users, we support our customers’ rights to fair, equal, and open networks access as outlined in the Order. We have not experienced, nor do we anticipate experiencing, any negative impact on broadband investment or service as a result of the Order.

We strongly oppose the Commission’s recent proposal to dismantle the 2015 Open Internet Order. As evidenced by the federal judiciary over the past two decades in Comcast Corp. v. FCC and other cases, the Commission cannot enforce unbiased and neutral Internet access without the Title II classification of broadband providers as telecommunications providers. Therefore, we ask you to codify Title II reclassification into law. It is the only way to uphold network neutrality principles.

As a direct competitor to the largest technology infrastructure providers in the nation, we are concerned that the Commission’s recent Notice for Proposed Rulemaking (WC Docket No. 17-108) will create an anti-competitive market environment because the costs of unfair networking practices will be forced onto infrastructure providers such as ourselves. Furthermore, many of our customers are individuals or small edge providers for whom changes to current network neutrality policies would significantly raise barriers to entry in various markets. Without legal protections against network blocking, throttling, unreasonable interference, and paid prioritization, it will be more difficult for us and for our customers to innovate, compete, and support the free flow of information.

By protecting network neutrality, we hope that the 115th Congress can promote investment in New York, eliminate business uncertainty with regards to FCC rulemaking, support competition in the broadband market, and encourage small businesses to innovate. We look forward to working with you on passing legislation related to this issue.


Ben Uretsky, CEO, DigitalOcean Inc.

If you’re in the US, join us and stand up for your right to an open and accessible internet by submitting your own letter to the FCC today.

Wikidata Map July 2017

Published 11 Jul 2017 by addshore in Addshore.

It’s been 9 months since my last Wikidata map update and once again we have many new noticable areas appearing, including Norway, South Africa, Peru and New Zealand to name but a few.  As with the last map generation post I once again created a diff image so that the areas of change are easily identifiable comparing the data from July 2017 with that from my last post on October 2016.

The various sizes of the generated maps can be found on Wikimedia Commons:

Reasons for increases

If you want to have a shot at figuring out the cause of the increases in specific areas then take a look at my method described in the last post using the Wikidata Query Service.

Peoples discoveries so far:

I haven’t included the names of those that discovered reasons for areas of increase above, but if you find your discovery here and want credit just ask!

Introducing High CPU Droplets

Published 10 Jul 2017 by Ben Schaechter in DigitalOcean: Cloud computing designed for developers.

Introducing High CPU Droplets

Today, we’re excited to announce new High CPU Droplet plans for CPU-intensive workloads. As applications grow and evolve we have found that there are certain workloads that need more powerful underlying computing power.

Use Cases

Here are some use cases that can benefit from CPU-optimized compute servers:


We are offering five new Droplet plans. They start from $40/mo for two dedicated vCPUs, up to $640/mo for 32 dedicated vCPUs.

Introducing High CPU Droplets

We've partnered with Intel to back these Droplets with Intel's most powerful processors, delivering a maximum, reliable level of performance. Going forward, we’ll regularly evaluate and use the best CPUs available to ensure they always deliver the best performance for your applications.

The current CPUs powering High CPU Droplets are the Intel Broadwell 2697Av4 with a clock speed of 2.6Ghz, and the Intel Skylake 8168 with a clock speed of 2.7Ghz. Customers in our early access period have seen up to four times the performance of Standard Droplet CPUs, and on average see about 2.5 times the performance.

These Droplets are available through the Control Panel and the API starting today as capacity allows in SFO2, NYC1, NYC3, TOR1, BLR1, AMS3, FRA1, and LON1.

Ben Schaechter
Product Manager, Droplet

Preserving Google docs - decisions and a way forward

Published 7 Jul 2017 by Jenny Mitcham in Digital Archiving at the University of York.

Back in April I blogged about some work I had been doing around finding a suitable export (and ultimately preservation) format for Google documents.

This post has generated a lot of interest and I've had some great comments both on the post itself and via Twitter.

I was also able to take advantage of a slot I had been given at last week's Jisc Research Data Network event to introduce the issue to the audience (who had really come to hear me talk about something else but I don't think they minded).

There were lots of questions and discussion at the end of this session, mostly focused on the Google Drive issue rather than the rest of the talk. I was really pleased to see that the topic had made people think. In a lightening talk later that day, William Kilbride, Executive Director of The Digital Preservation Coalition mused on the subject of "What is data?". Google Drive was one of the examples he used, asking where does the data end and the software application start?

I just wanted to write a quick update on a couple of things - decisions that have been made as a result of this work and attempts to move the issue forward.

Decisions decisions

I took a summary of the Google docs data export work to my colleagues in a Research Data Management meeting last month in order to discuss a practical way forward for the institutional research data we are planning on capturing and preserving.

One element of the Proof of Concept that we had established at the end of phase 3 of Filling the Digital Preservation Gap was a deposit form to allow researchers to deposit data to the Research Data York service.

As well as the ability to enable researchers to browse and select a file or a folder on their computer or network, this deposit form also included a button to allow deposit to be carried out via Google Drive.

As I mentioned in a previous post, Google Drive is widely used at our institution. It is clear that many researchers are using Google Drive to collect, create and analyse their research data so it made sense to provide an easy way for them to deposit direct from Google Drive. I just needed to check out the export options and decide which one we should support as part of this automated export.

However, given the inconclusive findings of my research into export options it didn't seem that there was one clear option that adequately preserved the data.

As a group we decided the best way out of this imperfect situation was to ask researchers to export their own data from Google Drive in whatever format they consider best captures the significant properties of the item. By exporting themselves in a manual fashion prior to upload, this does give them the opportunity to review and check their files and make their own decision on issues such as whether comments are included in the version of their data that they upload to Research Data York.

So for the time being we are disabling the Google Drive upload button from our data deposit interface....which is a shame because a certain amount of effort went into getting that working in the first place.

This is the right decision for the time being though. Two things need to happen before we can make this available again:

  1. Understanding the use case - We need to gain a greater understanding of how researchers use Google Drive and what they consider to be 'significant' about their native Google Drive files.
  2. Improving the technology - We need to make some requests to Google to make the export options better.

Understanding the use case

We've known for a while that some researchers use Google Drive to store their research data. The graphic below was taken from a survey we carried out with researchers in 2013 to find out about current practice across the institution. 

Of the 188 researchers who answered the question "Where is your digital research data stored (excluding back up copies)?" 22 mentioned Google Drive. This is only around 12% of respondents but I would speculate that over the last four years, use of Google Drive will have increased considerably as Google applications have become more embedded within the working practices of staff and students at the University.

Where is your digital research data stored (excluding back up copies)?

To understand the Google Drive use case today I really need to talk to researchers.

We've run a couple of Research Data Management teaching sessions over the last term. These sessions are typically attended by PhD students but occasionally a member of research staff also comes along. When we talk about data storage I've been asking the researchers to give a show of hands as to who is using Google Drive to store at least some of their research data.

About half of the researchers in the room raise their hand.

So this is a real issue. 

Of course what I'd like to do is find out exactly how they are using it. Whether they are creating native Google Drive files or just using Google Drive as a storage location or filing system for data that they create in another application.

I did manage to get a bit more detail from one researcher who said that they used Google Drive as a way of collaborating on their research with colleagues working at another institution but that once a document has been completed they will export the data out of Google Drive for storage elsewhere. 

This fits well with the solution described above.

I also arranged a meeting with a Researcher in our BioArCh department. Professor Matthew Collins is known to be an enthusiastic user of Google Drive.

Talking to Matthew gave me a really interesting perspective on Google Drive. For him it has become an essential research tool. He and his colleagues use many of the features of the Google Suite of tools for their day to day work and as a means to collaborate and share ideas and resources, both internally and with researchers in other institutions. He showed me PaperPile, an extension to Google Drive that I had not been aware of. He uses this to manage his references and share them with colleagues. This clearly adds huge value to the Google Drive suite for researchers.

He talked me through a few scenarios of how they use Google - some, (such as the comments facility) I was very much aware of. Others, I've not used myself such as the use of the Google APIs to visualise for example activity on preparing a report in Google Drive - showing a time line and when different individuals edited the document. Now that looks like fun!

He also talked about the importance of the 'previous versions' information that is stored within a native Google Drive file. When working collaboratively it can be useful to be able to track back and see who edited what and when. 

He described a real scenario in which he had had to go back to a previous version of a Google Sheet to show exactly when a particular piece of data had been entered. I hadn't considered that the previous versions feature could be used to demonstrate that you made a particular discovery first. Potentially quite important in the competitive world of academic research.

For this reason Matthew considered the native Google Drive file itself to be "the ultimate archive" and "a virtual collaborative lab notebook". A flat, static export of the data would not be an adequate replacement.

He did however acknowledge that the data can only exist for as long as Google provides us with the facility and that there are situations where it is a good idea to take a static back up copy.

He mentioned that the precursor to Google Docs was a product called Writely (which he was also an early adopter of). Google bought Writely in 2006 after seeing the huge potential in this online word processing tool. Matthew commented that backwards compatibility became a problem when Google started making some fundamental changes to the way the application worked. This is perhaps the issue that is being described in this blog post: Google Docs and Backwards Compatibility.

So, I'm still convinced that even if we can't preserve a native Google Drive file perfectly in a static form, this shouldn't stop us having a go!

Improving the technology

Along side trying to understand how researchers use Google Drive and what they consider to be significant and worthy of preservation, I have also been making some requests and suggestions to Google around their export options. There are a few ideas I've noted that would make it easier for us to archive the data.

I contacted the Google Drive forum and was told that as a Google customer I was able to log in and add my suggestions to Google Cloud Connect so this I did...and what I asked for was as follows:

  • Please can we have a PDF/A export option?
  • Please could we choose whether or not to export comments or not ...and if we are exporting comments can we choose whether historic/resolved comments are also exported
  • Please can metadata be retained - specifically the created and last modified dates. (Author is a bit trickier - in Google Drive a document has an owner rather than an author. The owner probably is the author (or one of them) but not necessarily if ownership has been transferred).
  • I also mentioned a little bug relating to comment dates that I found when exporting a Google document containing comments out into docx format and then importing it back again.
Since I submitted these feature requests and comments in early May it has all gone very very quiet...

I have a feeling that ideas only get anywhere if they are popular ...and none of my ideas are popular ...because they do not lead to new and shiny functionality.

Only one of my suggestions (re comments) has received a vote by another member of the community.

So, what to do?

Luckily, since having spoken about my problem at the Jisc Research Data Network, two people have mentioned they have Google contacts who might be interested in hearing my ideas.

I'd like to follow up on this, but in the meantime it would be great if people could feedback to me. 

  • Are my suggestions sensible? 
  • Are there are any other features that would help the digital preservation community preserve Google Drive? I can't imagine I've captured everything...

The UK Archivematica group goes to Scotland

Published 6 Jul 2017 by Jenny Mitcham in Digital Archiving at the University of York.

Yesterday the UK Archivematica group met in Scotland for the first time. The meeting was hosted by the University of Edinburgh and as always it was great to be able to chat informally to other Archivematica users in the UK and find out what everyone is up to.

The first thing to note was that since this group of Archivematica ‘explorers’ first met in 2015 real and tangible progress seems to have been made. This was encouraging to see. This is particularly the case at the University of Edinburgh. Kirsty Lee talked us through their Archivematica implementation (now in production) and the steps they are taking to ingest digital content.

One of the most interesting bits of her presentation was a discussion about appraisal of digital material and how to manage this at scale using the available tools. When using Archivematica (or other digital preservation systems) it is necessary to carry out appraisal at an early stage before an Archival Information Package (AIP) is created and stored. It is very difficult (perhaps impossible) to unpick specific files from an AIP at a later date.

Kirsty described how one of her test collections has been reduced from 5.9GB to 753MB using a combination of traditional and technical appraisal techniques. 

Appraisal is something that is mentioned frequently in digital preservation discussions. There was a group talking about just this a couple of weeks ago at the recent DPC unconference ‘Connecting the Bits’. 

As ever it was really valuable to hear how someone is moving forward with this in a practical way. 

It will be interesting to find out how these techniques can be applied at scale of some of the larger collections Kirsty intends to work with.

Kirsty recommended an article by Victoria Sloyan, Born-digital archives at the Wellcome Library: appraisal and sensitivity review of two hard drives which was helpful to her and her colleagues when formulating their approach to this thorny problem.

She also referenced the work that the Bentley Historical Library at University of Michigan have carried out with Archivematica and we watched a video showing how they have integrated Archivematica with DSpace. This approach has influenced Edinburgh’s internal discussions about workflow.

Kirsty concluded with something that rings very true for me (in fact I think I said it myself the two presentations I gave last week!). Striving for perfection isn’t helpful, the main thing is just to get started and learn as you go along.

Rachel McGregor from the University of Lancaster gave an entertaining presentation about the UK Archivematica Camp that was held in York in April, covering topics as wide ranging as the weather, the food and finally feeling the love for PREMIS!

I gave a talk on work at York to move Archivematica and our Research Data York application towards production. I had given similar talks last week at the Jisc Research Data Network event and a DPC briefing day but I took a slightly different focus this time. I wanted to drill in a bit more detail into our workflow, the processing configuration within Archivematica and some problems I was grappling with. 

It was really helpful to get some feedback and solutions from the group on an error message I’d encountered whilst preparing my slides the previous day and to have a broader discussion on the limitations of web forms for data upload. This is what is so good about presenting within a small group setting like this as it allows for informality and genuinely productive discussion. As a result of this I over ran and made people wait for their lunch (very bad form I know!)

After lunch John Kaye updated the group on the Jisc Research Data Shared Service. This is becoming a regular feature of our meetings! There are many members of the UK Archivematica group who are not involved in the Jisc Shared Service so it is really useful to be able to keep them in the loop. 

It is clear that there will be a substantial amount of development work within Archivematica as a result of its inclusion in the Shared Service and features will be made available to all users (not just those who engage directly with Jisc). One example of this is containerisation which will allow Archivematica to be more quickly and easily installed. This is going to make life easier for everyone!

Sean Rippington from the University of St Andrews gave an interesting perspective on some of the comparison work he has been doing of Preservica and Archivematica. 

Both of these digital preservation systems are on offer through the Jisc Shared Service and as a pilot institution St Andrews has decided to test them side by side. Although he hasn’t yet got his hands on both, he was still able to offer some really useful insights on the solutions based on observations he has made so far. 

First he listed a number of similarities - for example alignment with the OAIS Reference Model, the migration-based approach, the use of microservices and many of the tools and standards that they are built on.

He also listed a lot of differences - some are obvious, for example one system is commercial and the other open source. This leads to slightly different models for support and development. He mentioned some of the additional functionality that Preservica has, for example the ability to handle emails and web archives and the inclusion of an access front end. 

He also touched on reporting. Preservica does this out of the box whereas with Archivematica you will need to use a third party reporting system. He talked a bit about the communities that have adopted each solution and concluded that Preservica seems to have a broader user base (in terms of the types of institution that use it). The engaged, active and honest user community for Archivematica was highlighted as a specific selling point and the work of the Filling the Digital Preservation Gap project (thanks!).

Sean intends to do some more detailed comparison work once he has access to both systems and we hope he will report back to a future meeting.

Next up we had a collaborative session called ‘Room 101’ (even though our meeting had been moved to room 109). Considering we were encouraged to grumble about our pet hates this session came out with some useful nuggets:

After coffee break we were joined (remotely) by several representatives from the OSSArcFlow project from Educopia and the University of North Carolina. This project is very new but it was great that they were able to share with us some information about what they intend to achieve over the course of the two year project. 

They are looking specifically at preservation workflows using open source tools (specifically Archivematica, BitCurator and ArchivesSpace) and they are working with 12 partner institutions who will all be using at least two of these tools. The project will not only provide training and technical support, but will fully document the workflows put in place at each institution. This information will be shared with the wider community. 

This is going to be really helpful for those of us who are adopting open source preservation tools, helping to answer some of those niggling questions such as how to fill the gaps and what happens when there are overlaps in the functionality of two tools.

We registered our interest in continuing to be kept in the loop about this project and we hope to hear more at a future meeting.

The day finished with a brief update from Sara Allain from Artifactual Systems. She talked about some of the new things that are coming in version 1.6.1 and 1.7 of Archivematica.

Before leaving Edinburgh it was a pleasure to be able to join the University at an event celebrating their progress in digital preservation. Celebrations such as this are pretty few and far between - perhaps because digital preservation is a task that doesn’t have an obvious end point. It was really refreshing to see an institution publicly celebrating the considerable achievements made so far. Congratulations to the University of Edinburgh!

Hot off the press…

Published 4 Jul 2017 by Tom Wilson in tom m wilson.


Can't connect to MediaWiki on Nginx server [duplicate]

Published 4 Jul 2017 by Marshall S. Lee in Newest questions tagged mediawiki - Server Fault.

This question is an exact duplicate of:

I downloaded and configured MediaWiki on the Ubuntu server. I'm running it on Nginx, so I opened the nginx.conf file and modified the server part as follows.

 38     server {
 39         listen 80;
 40         server_name;
 42         access_log /var/log/nginx/access-wiki.log;
 43         error_log /var/log/nginx/error-wiki.log;
 45         charset utf-8;
 46         passenger_enabled on;
 47         client_max_body_size 50m;
 49         location / {
 50             root /var/www/html/mediawiki;
 51             index index.php;
 52         }
 54         # pass the PHP scripts to FastCGI server
 55         location ~ \.php$ {
 56             root           html;
 57             fastcgi_pass   unix:/var/run/php/php7.0-fpm.sock;
 58             fastcgi_index  index.php;
 59             fastcgi_param  SCRIPT_FILENAME  /scripts$fastcgi_script_name;
 60             include        fastcgi_params;
 61         }
 63         # deny access to .htaccess files, if Apache's document root
 64         # concurs with nginx's one
 66         location ~ /\.ht {
 67             deny  all;
 68         }
 69     }

After editing, I restarted the Nginx and now I started facing another problem. Every time I try to access the webpage by with the domain above, I keep failing to face the main page of MediaWiki but I receive a file instead, which says the following.

 * This is the main web entry point for MediaWiki.
 * If you are reading this in your web browser, your server is probably
 * not configured correctly to run PHP applications!
 * See the README, INSTALL, and UPGRADE files for basic setup instructions
 * and pointers to the online documentation.
 * ----------
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation; either version 2 of the License, or
 * (at your option) any later version.
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * GNU General Public License for more details.
 * You should have received a copy of the GNU General Public License along
 * with this program; if not, write to the Free Software Foundation, Inc.,
 * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
 * @file

// Bail on old versions of PHP, or if composer has not been run yet to install
// dependencies. Using dirname( __FILE__ ) here because __DIR__ is PHP5.3+.
// @codingStandardsIgnoreStart MediaWiki.Usage.DirUsage.FunctionFound
require_once dirname( __FILE__ ) . '/includes/PHPVersionCheck.php';
// @codingStandardsIgnoreEnd
wfEntryPointCheck( 'index.php' );

require __DIR__ . '/includes/WebStart.php';

$mediaWiki = new MediaWiki();

Now, in the middle of the setup, I'm almost lost and have no idea how to work it out. I created a file hello.html in the root directory and accessed the page via This is working. I do believe that the PHP configuration part is causing the errors, but I don't know how to fix it.


Published 4 Jul 2017 by fabpot in Tags from Twig.


Published 2 Jul 2017 by Sam Wilson in Sam's notebook.

As of today, my shed can finally be locked up.  Time now to decide where to put things.


Published 1 Jul 2017 by Sam Wilson in Sam's notebook.

Is it a coincidence that Jeremy Corbyn on Dead Ringers is rather similar-sounding to their Brian Cox?

Publishing on the indiweb

Published 30 Jun 2017 by Sam Wilson in Sam's notebook.

I’ve been reading about POSSE and PESOS, and getting re-inspired about the value in a plurality of web tools. I sometimes try to focus just on one Software package (MediaWiki, at the moment, because it’s what I code fornear at work. But I used to love working on WordPress, and I’ve got a couple of stalled projects for Piwigo lying around. Basically, all these things will be of higher quality if they have to work with each other and with all the data silos (Facebook, Twitter, etc.).

The foundational principles of the IndiWeb are:

  1. Own your data.
  2. Use visible data for humans first, machines second. See also DRY.
  3. Build tools for yourself, not for all of your friends. It’s extremely hard to fight Metcalfe’s law: you won’t be able to convince all your friends to join the independent web. But if you build something that satisfies your own needs, but is backwards compatible for people who haven’t joined in (say, by practicing POSSE), the time and effort you’ve spent building your own tools isn’t wasted just because others haven’t joined in yet.
  4. Eat your own dogfood. Whatever you build should be for yourself. If you aren’t depending on it, why should anybody else? We call that selfdogfooding. More importantly, build the indieweb around your needs. If you design tools for some hypothetical user, they may not actually exist; if you build tools for yourself, you actually do exist. selfdogfooding is also a form of “proof of work” to help focus on productive interactions.
  5. Document your stuff. You’ve built a place to speak your mind, use it to document your processes, ideas, designs and code. At least document it for your future self.
  6. Open source your stuff! You don’t have to, of course, but if you like the existence of the indie web, making your code open source means other people can get on the indie web quicker and easier.
  7. UX and design is more important than protocols, formats, data models, schema etc. We focus on UX first, and then as we figure that out we build/develop/subset the absolutely simplest, easiest, and most minimal protocols & formats sufficient to support that UX, and nothing more. AKA UX before plumbing.
  8. Build platform agnostic platforms. The more your code is modular and composed of pieces you can swap out, the less dependent you are on a particular device, UI, templating language, API, backend language, storage model, database, platform. The more your code is modular, the greater the chance that at least some of it can and will be re-used, improved, which you can then reincorporate.
  9. Longevity. Build for the long web. If human society is able to preserve ancient papyrus, Victorian photographs and dinosaur bones, we should be able to build web technology that doesn’t require us to destroy everything we’ve done every few years in the name of progress.
  10. Plurality. With IndieWebCamp we’ve specifically chosen to encourage and embrace a diversity of approaches & implementations. This background makes the IndieWeb stronger and more resilient than any one (often monoculture) approach.
  11. Have fun. Remember that GeoCities page you built back in the mid-90s? The one with the Java applets, garish green background and seventeen animated GIFs? It may have been ugly, badly coded and sucky, but it was fun, damnit. Keep the web weird and interesting.

Bringing Publications to the Web: First Steps

Published 29 Jun 2017 by Tzviya Siegman in W3C Blog.

The Publishing Working Group, Business Group, and Interest Group met for a two-day face to face at the Adobe office in New York. People came from far and wide to glimpse the Naked Cowboy for the first (and hopefully last) time from this office overlooking Times Square. We had an excellent view of a giant screen advertising Despicable Me 3. We might hire some minions for scribing. In some ways, this was a farewell to the IG and a welcome to the WG and new members who come from the IDPF. We formally welcomed Bill McCoy as Publishing Champion. We took a few moments to say goodbye to our friend and colleague, Pierre Danet, who passed away a few weeks ago. We began the process of creating new specifications for digital books and other publications on the Web. EPUB has been very successful, but our goal is to make publications first-class citizens of the Web.

Some of you may not be that familiar with EPUB. What is EPUB? It’s the standard ebook delivery and distribution format created by the International Digital Publishing Forum, which recently combined with W3C. EPUB began in 1999 as the Open eBook Publication Structure and has gone through several iterations over the years. Notably, in 2011 The DAISY Consortium recommended EPUB 3 over their proprietary DTBook as the best choice for providing accessible books. The current version of the spec is available at

Administrativia, Charter Review, Horizontal Reviews, and Testing

We started the meeting with some information about how a W3C WG operates. Ivan Herman rolled out the PWG’s new website, and we introduced some newbies to IRC. The PWG has a three-year charter to allow for writing four recs, Web Publications, Portable Web Publications, EPUB 4, and DPUB-ARIA 2.0. We plan to publish FPWDs early, followed by frequent updates. We will incorporate testing into the spec development to ensure that the specs are implementable and can meet exit criteria. Likewise, we agreed to appoint ambassadors for each area of horizontal review. Avneesh Singh of DAISY will serve as Accessibility Ambassador. Leonard Rosenthol of Adobe will serve as Security Ambassador. Ivan Herman of W3C will be the Internationalization ambassador. Mateus Teixeira of Norton (the publisher not the anti-virus) is mulling over privacy, but he could use a co-ambassador. (Contact Tzviya if you want to know about the perks of being an ambassador.) As a WG within the W3C, we need to remember that we are not rebuilding the Web. We might be adding a few missing pieces.

Photo of Mateus, Karen, Tzviya, and Rachel talking with computers open on table in front of them.

Mateus Teixeira, Tzviya Siegman, Karen Myers, and Rachel Comerford at the Publishing face to face. Photo by Cristina Mussinelli.

Leonard got us in a secure mood by offering an overview of security. His presentations are available at and He stressed that security is all about trust. This can mean knowing the origin of a publication, not surprising the user, or preventing attacks.

Ric Wright of the Readium Foundation kicked off a great discussion about testing and the need to test ourselves as we go. We want to avoid the chicken and egg problems that existed with EPUB 3 of creating specs that no one implemented because no UAs (reading systems) supported them because no users implemented them. We are testing the specs for consistency and implementability before we get to the point of meeting exit criteria. Ric highlighted the need to perform not only granular tests but see what happens when features are combined and we face real-world scenarios.

Input Technologies

There are documents (or, shall I say, publications?) that were produced by the Digital Publishing Interest Group that are considered input documents as we craft our magnificent specs. We offered an overview of these documents for the uninitiated. Web Publications and Portable Web Publications are documented at The initial goal was similar to work done in the IDPF’s Browser Friendly Format, create a publication format that “just works” when you point a browser at it. A publication is not just a collection of documents, but there is a boundedness that must be represented, via metadata, linking, identification, addressing mechanisms, styling, and the ability to go offline. Garth Conboy of Google talked about the Packaging aspect of this. He provided some of the history of EPUB, its origins in multi-part MIME and its current state as a specialized zip. It is possible that the PWG packaging spec will be a very simple variation on work produced by Web Apps. It is possible that it will be far more complicated than that.

Matt Garrish of DAISY provided a summary of the work that the DPUB IG did with the ARIA WG on DPUB-ARIA 1.0 and where we can expect to go with DPUB-ARIA 2.0. It’s important to realize that ARIA is language agnostic, and this vocabulary can be used anywhere. The work on DPUB-ARIA 2.0 will be done by the PWG.

Dave Cramer of Hachette Book Group provided some background about his BFF experiments. The goal was to decrease the distance between EPUB and the Web. Dave built a JSON serialization of the EPUB manifest that also acts as navigation. Hadrien Gardeur of EDRLab took this idea to new extents, and it is now in use in the forthcoming Readium-2 (

photo of rachel, dave, ric, and garth seated and watching a presentation

Rachel Comerford, Dave Cramer, Ric Wright, and Garth Conboy absorbed in the details of an exciting presentation. Photo by Cristina Mussinelli.

Laurent LeMeur of EDRLab further explained the work at Readium, outlining the tools called Streamer and Navigator that stream EPUB 2, 3, cbz, and (in the future) PWP, and EPUB 4.

Romain Deltour of DAISY offered an overview of Web App Manifest. From my perspective, the most important points are that this spec reproduces the functionality of native apps, provides a URL mechanism, and is extensible.

Web Packaging, Service Workers

Leonard Rosenthol presented a detailed overview of the ongoing work on Packaging on the Web by the Google Chrome team. Since this work is ongoing at W3C, our group will have to consider whether this packaging method meets our needs for packaging, distributions, and archiving. Bill McCoy informed the group that the document will be split into to two parts, with the parts about format and signing being offered to IETF and the part about browser behavior staying with W3C.

Google’s Brady Duga gave us a two-sentence definition of Service Workers, which would allow web publications to work offline as well as online. Service Workers are scripts that mediate between users and the server. For example, if you click a link for chapter two of a book, the service worker would recognize that you aren’t online and give you chapter two from a local cache. We don’t yet know if service workers will be formal part of our specs or an implementation mechanism.


Avneesh Singh of DAISY, newly appointed Accessibility Ambassador, spoke about plans to work with the APA and WCAG WGs to add success criteria to WCAG and work on Media Overlays in the mainstream. EPUB relies on the largely ignored SMIL specification to sync audio and text files, which is helpful to all readers, but especially those who require accessible publications. We need to find a solution that works in browsers.

Document Planning

We have an aggressive timeline to publish several documents. We began to draft a rough outline and appoint editors for different sections. Matt Garrish, famous for editing all-things-EPUB, will be the editor of the WP document. We created a rough outline, and Matt has already begun a draft. We plan to work in task forces and have already begun conversations on GitHub about manifests and navigation. The PWG will not take a summer hiatus, because there is a lot of work to do.

Who owns those docs?

Rick Johnson of VitalSource, and co-chair of the Publishing Business Group, asked the group to review the documents listed at The Publishing Activity has inherited responsibility for these documents, and now we must assess which organization takes ownership.

It was a packed schedule, and it’s clear that we have a lot of work to do. Thanks everyone for traveling and participating. We look forward to your contributions!

For more details on these discussions, see the meeting minutes: Day 1 and Day 2.

Wikipedia workshop this weekend

Published 28 Jun 2017 by Sam Wilson in Sam's notebook.

There will be a workshop at the State Library of Western Australia this Saturday from 1 p.m., for anyone to come along and learn how to add just one citation to just one Wikipedia article (or more of either, of course). For more details, see

2017 Community Summit Notes

Published 28 Jun 2017 by Ipstenu (Mika Epstein) in Make WordPress Plugins.

The Plugin team is small but mighty. We had a very productive summit and contributor day this year, pushing forward some of the changes we’ve been working on for a while. The following notes are the product of the sessions as well as some hallway chats over red wine, gin, and cheese.


To Do:

Most of that to-do is on me to at least get the tickets started, but if these are things you’re interested in, then I encourage you to come to the open office hours! I’m hoping to have the first in August, as I have July Vacations 🙂 Sorry, family first!

I’ll post more about what I plan to do with the open office hours soon, including topics and schedules.

#community-summit, #contributor-day

Kubuntu Artful Aardvark (17.10) Alpha 1

Published 28 Jun 2017 by valorie-zimmerman in Kubuntu.

Artful Aardvark (17.10) Alpha 1 images are now available for testing so we can release the alpha on Thursday.

The Kubuntu team will be releasing 17.10 in October.

This is the first spin in preparation for the Alpha 1 pre-release. Kubuntu Alpha pre-releases are NOT recommended for:

Kubuntu Alpha pre-releases are recommended for:

Getting Kubuntu 17.10 Alpha 1

To upgrade to Kubuntu 17.10 pre-releases from 17.04, run sudo do-release-upgrade -d from a command line.

Download a Bootable image and put it onto a DVD or USB Drive

See our release notes:

Please report your results on the Release tracker:

Test With Gutenberg Please!

Published 27 Jun 2017 by Ipstenu (Mika Epstein) in Make WordPress Plugins.

Call for testing: Gutenberg

This is especially important if your plugin adds meta boxes or otherwise makes changes to the editor. PLEASE test early and often.

Possible future directions for data on the Web

Published 27 Jun 2017 by Phil Archer in W3C Blog.

As I enter my final days as a member of the W3C Team*, I’d like to record some brief notes for what I see as possible future directions in the areas in which I’ve been most closely involved, particularly since taking on the ‘data brief’ 4 years ago.


The Data on the Web Best Practices, which became a Recommendation in January this year, forms the foundation. As I highlighted at the time, it sets out the steps anyone should take when sharing data on the Web, whether openly or not, encouraging the sharing of actual information, not just information about where a dataset can be downloaded. A domain-specific extension, the Spatial Data on the Web Best Practices, is now all-but complete. There again, the emphasis is on making data available directly on the Web so that, for example, search engines can make use of it directly and not just point to a landing page from where a dataset can be downloaded – what I call using the Web as a glorified USB stick.

Spatial Data

That specialized best practice document is just one output from the Spatial Data on the Web WG in which we have collaborated with our sister standards body, the Open Geospatial Consortium, to create joint standards. Plans are being laid for a long term continuation of that relationship which has exciting possibilities in VR/AR, Web of Things, Building Information Models, Earth Observations, and a best practices document looking at statistical data.

Research Data

Another area in which I very much hope W3C will work closely with others is in research data: life sciences, astronomy, oceanography, geology, crystallography and many more ‘ologies.’ Supported by the VRE4EIC project, the Dataset Exchange WG was born largely from this area and is leading to exciting conversations with organizations including the Research Data Alliance, CODATA, and even the UN. This is in addition to, not a replacement for, the interests of governments in the sharing of data. Both communities are strongly represented in the DXWG that will, if it fulfills its charter, make big improvements in interoperability across different domains and communities.

Linked Data

A line graph showing an initial peak of inflated peak of expectations, followed by the trough of disillusionment, the slope of enlightenment and the plateau of productivity
The Gartner Hype Cycle. CC: BY-SA Jeremykemp at English Wikipedia

The use of Linked Data continues to grow; if we accept the Gartner Hype Cycle as a model then I believe that, following the Trough of Disillusionment, we are well onto the Slope of Enlightenment. I see it used particularly in environmental and life sciences, government master data and cultural heritage. That is, it’s used extensively as a means of sharing and consuming data across departments and disciplines. However, it would be silly to suggest that the majority of Web Developers are building their applications on SPARQL endpoints. Furthermore, it is true that if you make a full SPARQL endpoint available openly, then it’s relatively easy to write a query that will be so computationally expensive as to bring the system down. That’s why the BBC, OpenPHACTS and others don’t make their SPARQL endpoints publicly available. Would you make your SQL interface openly available? Instead, they provide a simple API that runs straightforward queries in the background that a developer never sees. In the case of the BBC, even their API is not public, but it powers a lot of the content on their Web site.

The upside of this approach is that through those APIs it’s easy to access high value, integrated data as developer-friendly JSON objects that are readily dealt with. From a publisher’s point of view, the API is more stable and reliable. The irritating downside is that people don’t see and therefore don’t recognize the Linked Data infrastructure behind the API allowing the continued questioning of the value of the technology.

Semantic Web, AI and Machine Learning

The main Semantic Web specs were updated at the beginning of 2014 and there are no plans to review the core RDF and OWL specs any time soon. However, that doesn’t mean that there aren’t still things to do.

One spec that might get an update soon is JSON-LD. The relevant Community Group has continued to develop the spec since it was formally published as a Rec and would now like to put those new specs through Rec Track. Meanwhile, the Shapes Constraint Language. SHACL, has been through something of a difficult journey but is now at Proposed Rec, attracting significant interest and implementation.

But, what I hear from the community is that the most pressing ‘next thing’ for the Semantic Web should be what I call ‘annotated triples.’ RDF is pretty bad at describing and reflecting change: someone changes job, a concert ticket is no longer valid, the global average temperature is now y not x and so on. Furthermore, not all ‘facts’ are asserted with equal confidence. Natural Language Processing, for example, might recognize a ‘fact’ within a text with only 75% certainty.

It’s perfectly possible to express these now using Named Graphs, however, in talks I’ve done recently where I’ve mentioned this, including to the team behind Amazon’s Alexa, there has been strong support for the idea of a syntax that would allow each tuple to be extended with ‘validFrom’, validTo and ‘probability’. Other possible annotations might relate to privacy, provenance and more. Such annotations may be semantically equivalent to creating and annotating a named graph, and RDF 1.1 goes a long way in this direction, but I’ve received a good deal of anecdotal evidence that a simple syntax might be a lot easier to process. This is very relevant to areas like AI, deep learning and statistical analysis.

These sorts of topics were discussed at ESWC recently and I very much hope that there will be a W3C workshop on it next year, perhaps leading to a new WG. A project proposal was submitted to the European Commission recently that would support this, and others interested in the topic should get in touch.

Other possible future work in the Semantic Web includes a common vocabulary for sharing the results of data analysis, natural language processing etc. The Natural Language Interchange Format, for example, could readily be put through Rec Track.

Vocabularies and

Common vocabularies, maintained by the communities they serve, are an essential part of interoperability. Whether it’s researchers, governments or businesses, better and easier maintenance of vocabularies and a more uniform approach to sharing mappings, crosswalks and linksets, must be a priority. Internally at least, we have recognized for years that W3C needs to be better at this. What’s not so widely known is that we can do a lot now. Community Groups are a great way to get a bunch of people together and work on your new schema and, if you want it, you can even have a namespace (either directly or via a redirect). Again, subject to an EU project proposal being funded, there should be money available to improve our tooling in this regard.

W3C will continue to support the development of which is transforming the amount of structured data embedded within Web pages. If you want to develop an extension for, a Community Group and a discussion on is the place to start.


To summarize, my personal priorities for W3C in relation to data are:

  1. Continue and deepen the relationship with OGC for better interoperability between the Web and geospatial information systems.
  2. Develop a similarly deep relationship with the research data community.
  3. Explore the notion of annotating RDF triples for context, such as temporal and probabilistic factors.
  4. Be better at supporting vocabulary development and their agile maintenance.
  5. Continue to promote the Linked Data/Semantic Web approach to data integration that can sit behind high value and robust JSON-returning APIs.

I’ll be watching …

A Dark Theme for FastMail

Published 26 Jun 2017 by Neil Jenkins in FastMail Blog.

As we pass the winter solstice here in Australia, the darkest night has come and the days are finally getting longer. But for those that miss the murky blackness (and for all our northern hemisphere customers journeying towards winter), you can now choose a Dark theme for your FastMail account and relive the inky twilight.

Want to try it out? Change your theme on the Settings → General & Preferences screen.

Here's what mail in the Dark theme looks like…

Mailbox in dark theme

(Please note that rich text (HTML) messages get a white background, as unfortunately too many messages set font colours presuming a light background).

And our Dark calendar…

Calendar in dark theme

Wikimedia Hackathon at home project

Published 24 Jun 2017 by legoktm in The Lego Mirror.

This is the second year I haven't been able to attend the Wikimedia Hackathon due to conflicts with my school schedule (I finish at the end of June). So instead I decided I would try and accomplish a large-ish project that same weekend, but at home. I'm probably more likely to get stuff done while at home because I'm not chatting up everyone in person!

Last year I converted OOjs-UI to use PHP 5.5's traits instead of a custom mixin system. That was a fun project for me since I got to learn about traits and do some non-MediaWiki coding, while still reducing our technical debt.

This year we had some momentum on MediaWiki-Codesniffer changes, so I picked up one of our largest tasks which had been waiting - to upgrade to the 3.0 upstream PHP_CodeSniffer release. Being a new major release there were breaking changes, including a huge change to the naming and namespacing of classes. My current diffstat on the open patch is +301, -229, so it is roughly the same size as last year. The conversion of our custom sniffs wasn't too hard, the biggest issue was actually updating our test suite.

We run PHPCS against test PHP files and verify the output matches the sniffs that we expect. Then we run PHPCBF, the auto-fixer, and check that the resulting "fixed" file is what we expect. The first wasn't too bad, it just calls the relevant internal functions to run PHPCS, but the latter would have PHPBCF output in a virtual filesystem, shells out to create a diff, and then tries to put it back together. Now, we just get the output from the relevant PHPCS class, and compare it to the expected test output.

This change was included in the 0.9.0 release of MediaWiki-Codesniffer and is in use by many MediaWiki extensions.

Emulation for preservation - is it for me?

Published 23 Jun 2017 by Jenny Mitcham in Digital Archiving at the University of York.

I’ve previously been of the opinion that emulation isn’t really for me.

I’ve seen presentations about emulation at conferences such as iPRES and it is fair to say that much of it normally goes over my head.

This hasn’t been helped by the fact that I’ve not really had a concrete use case for it in my own work - I find it so much easier to relate and engage to a topic or technology if I can see how it might be directly useful to me.

However, for a while now I’ve been aware that emulation is what all the ‘cool kids’ in the digital preservation world seem to be talking about. From the very migration heavy thinking of the 2000’s it appears that things are now moving in a different direction.

This fact first hit my radar at the 2014 Digital Preservation Awards where the University of Freiburg won the The OPF Award for Research and Innovation award for their work on Emulation as a Service with bwFLA Functional Long Term Archiving and Access.

So I was keen to attend the DPC event Halcyon, On and On: Emulating to Preserve to keep up to speed... not only because it was hosted on the doorstep in the centre of my home town of York!

It was an interesting and enlightening day. As usual the Digital Preservation Coalition did a great job of getting all the right experts in the room (sometimes virtually) at the same time, and a range of topics and perspectives were covered.

After an introduction from Paul Wheatley we heard from the British Library about their experiences of doing emulation as part of their Flashback project. No day on emulation would be complete without a contribution from the University of Freiburg. We had a thought provoking talk via WebEx from Euan Cochrane of Yale University Library and an excellent short film created by Jason Scott from the Internet Archive. One of the highlights for me was Jim Boulton talking about Digital Archaeology - and that wasn’t just because it had ‘Archaeology’ in the title (honest!). His talk didn’t really cover emulation, it related more to that other preservation strategy that we don’t talk about much anymore - hardware preservation. However, many of the points he raised were entirely relevant to emulation - for example, how to maintain an authentic experience, how you define what the significant properties of an item actually are and what decisions you have to make as a curator of the digital past. It was great to see how engaged the public were with his exhibitions and how people interacted with it.

Some of the themes of the day and take away thoughts for me:

Thinking about how this all relates to me and my work, I am immediately struck by two use cases.

Firstly research data - we are taking great steps forward in enabling this data to be preserved and maintained for the long term but will it be re-usable? For many types of research data there is no clear migration strategy. Emulation as a strategy for accessing this data ten or twenty years from now needs to be seriously considered. In the meantime we need to ensure we can identify the files themselves and collect adequate documentation - it is these things that will help us to enable reuse through emulators in the future.

Secondly, there are some digital archives that we hold at the Borthwick Institute from the 1980's. For example I have been working on a batch of WordStar files in my spare moments over the last few years. I'd love to get a contemporary emulator fired up and see if I could install WordStar and work with these files in their native setting. I've already gone a little way down the technology preservation route, getting WordStar installed on an old Windows 98 PC and viewing the files, but this isn't exactly contemporary. These approaches will help to establish the significant properties of the files and assess how successful subsequent migration strategies are....but this is a future blog post.

It was a fun event and it was clear that everybody loves a bit of nostalgia. Jim Boulton ended his presentation saying "There is something quite romantic about letting people play with old hardware".

We have come a long way and this is most apparent when seeing artefacts (hardware, software, operating systems, data) from early computing. Only this week whilst taking the kids to school we got into a conversation about floppy disks (yes, I know...). I asked the kids if they knew what they looked like and they answered "Yes, it is the save icon on the computer"(see Why is the save icon still a floppy disk?)...but of course they've never seen a real one. Clearly some obsolete elements of our computer history will remain in our collective consciousness for many years and perhaps it is our job to continue to keep them alive in some form.

Quick Method to wget my local wiki... need advice (without dumping mysql)

Published 23 Jun 2017 by WubiUbuntu1980 in Newest questions tagged mediawiki - Ask Ubuntu.

I need advice.

I have a webserver vm (LAN, not on the internet), it has 2 wikis:



I want to wget only the homework wiki pages, without crawling into the GameWiki?

My goal is to just get the .htmls (ignore all other files images etc), with wget. (I dont want to do a mysqldump or mediawiki export, but rather wget for my (non-IT) boss who just wants to double click the html).

How can I run wget to only crawl the HomeWorkWiki, and not the GameWiki on this VM.


Tim Berners-Lee awarded 2016 ACM A.M. Turing Prize

Published 22 Jun 2017 by Coralie Mercier in W3C Blog.

On Saturday 24 June 2017, Sir Tim Berners-Lee, inventor of the World Wide Web and Director of the W3C will be awarded the ACM A.M. Turing Prize.

This video from the ACM outlines Sir Tim’s work on the Web and his thoughts on the award.

A few quotes from this short documentary particularly resonated with me.

How the Web allows many brains to be better than one brain

How does humanity come up with interesting ideas when the problem is in lots of people’s heads and different parts of the solution are in different people’s head?” Tim Berners-Lee

The Open Web virtuous circle

If we just keep the Web open, neutral and Royalty-free, innovation will bloom and people will build valuable social systems for both science and democracy.“Tim Berners-Lee

It has taken all of us to build the web we have, and now it’s up to all of us to build the web we want for everyone.” Tim Berners-Lee

Please join us in congratulating Sir Tim on this award. We at W3C deeply thank him for his work in creating the Web; for releasing it for free to all and for heading our mission to lead the Web to its full potential.

Video credit: ACM and Jody Small Productions.

Using MediaWiki and external data, how can I show an image in a page, returned as a blob from a database?

Published 20 Jun 2017 by Masutatsu in Newest questions tagged mediawiki - Webmasters Stack Exchange.

I'm creating a wiki (using MediaWiki) which pulls data from a mySQL instance, and uses this alongside a template to generate the page dynamically.

My mySQL instance contains images, stored in a field of type BLOB.

Is it possible for MediaWiki to interpret this BLOB data into the actual image desired to be shown on the page?

The Value of Having A Bug Bounty Program

Published 19 Jun 2017 by Rusty Bower in DigitalOcean: Cloud computing designed for developers.

The Value of Having A Bug Bounty Program

In Spring of 2017, DigitalOcean transitioned from a private bug bounty program to a public bounty program on Bugcrowd. There were many drivers behind this decision, including getting more researcher engagement with our products, leveraging the pre-existing researchers that exist in the Bugcrowd ecosystem, and creating a scalable solution for the DO security team to manage. Although researchers were actively engaged in our original private bug bounty program, we immediately began to see quality vulnerabilities reported once we made the switch. Our old bug bounty program consisted of manual verification and a reward of Droplet credit and/or DO swag. While this worked when we were a much smaller company, the need to level up our bug bounty program has grown as we’ve scaled.

While we already conform to secure coding practices and undergo regular code audits and reviews, bug bounty researchers are able to find valuable bugs for our engineers to fix. Currently, any Bugcrowd researcher is able to test against the platform (although we’ve limited the scope to our API and Cloud endpoints for the time being).

Once we launched, we immediately saw results as hundreds of new researchers began testing the platform in the first few days alone:

The Value of Having A Bug Bounty Program In the rest of this post, we’ll explore some examples of bugs we received within 24 hours of launching our new bug bounty program.

One of the first submissions we received was a stored XSS in the notifications field via team name. This bug coincided with the launch of our new Teams product, and although our engineers had built client-side sanitization, server-side sanitization was not properly implemented. For this reason, an attacker could modify the team name, and any users invited to that team would have a stored XSS in their notifications area. After triaging and verifying this vulnerability, we worked closely with our engineers to get proper sanitization of all inputs, both on the client side and server side.

A second—and slightly more severe—reported vulnerability was a misconfiguration of Google OAuth in one of our applications. Even though all access to this application was supposed to be restricted to only valid DigitalOcean email addresses, the misconfiguration resulted in any valid Google Apps domain being able to authenticate successfully. Once we received this vulnerability, we worked quickly to check our logs for any potential unauthorized access. We didn’t find any, and as we reviewed the logs, we updated the OAuth provider to restrict appropriately.

The most exciting and severe vulnerability we received was a blind SQL injection into a specific search field in our API. We alerted the engineering team as soon as we received the report, and then audited the logs in search of any malicious activity exploiting this vulnerability. Thankfully, none were found, and our engineers were able to implement a fix within 24 hours of being notified of the issue.

As a company that takes security very seriously, we’ve gotten tremendous value from our new bug bounty program by finding issues and vulnerabilities that have passed code review and third-party penetration tests. It has afforded us the opportunity to work with and yield amazing results from skilled researchers we didn’t have access to before.

If you are interested in learning more—or participating in—our bug bounty program, visit our Bugcrowd program page.

Rusty Bower is an Information Security Engineer who manages the DigitalOcean Bug Bounty Program. When he is not triaging vulnerabilities, Rusty enjoys speaking about security topics and tinkering with random InfoSec projects in his basement.

Major Milestones for Publishing@W3C

Published 19 Jun 2017 by Bill McCoy in W3C Blog.

publishing groups workflow diagramThe new Publishing@W3C activity was formed in February 2017 when W3C finalized our combination with IDPF (the International Digital Publishing Forum). Over the last four months there’s been a ton of progress. The new Publishing Business Group, the focal point for discussing overall requirements and issues, is up and running, with a kick-off meeting in March in London and bi-weekly calls. The new EPUB 3 Community Group is also up and running, with a full plate of work items to extend the success of the EPUB standard under the auspices of W3C. Over 50 former IDPF member organizations are now participating in Publishing@W3C activities which makes this the fastest and largest expansion of W3C ever in an industry area.

Today, W3C announced two more very significant milestones: the formation of a new Publishing Working Group and the first-ever W3C Publishing Summit, set for November 9-10 in San Francisco.

The mission of the new Publishing Working Group is to “enable all publications — with all their specificities and traditions — to become first-class entities on the Web. The WG will provide the necessary technologies on the Open Web Platform to make the combination of traditional publishing and the Web complete in terms of accessibility, usability, portability, distribution, archiving, offline access, and reliable cross referencing”. That’s an exciting and ambitious goal, and overwhelming support across the W3C membership for the creation of this Working Group is a key proof point for the convergence vision that was the key strategic motivator for the combination of IDPF with W3C.

And in some ways the Publishing Summit is an even more welcome development. As a standards development organization (SDO), W3C’s work product is Web Standards, which are a means to an end: interoperability. These days, the ecosystem around any enabling technology, including especially the Open Web Platform, isn’t just specifications. It’s also open source, testbeds, education and training, and much more; i.e., the holistic ecosystem and the community around it. IDPF was a trade organization for the digital publishing community as well as an SDO, and IDPF’s events including its annual conference were a key part of building the community around the EPUB standard and broader issues in the digital transformation of publishing. I joined W3C as the Publishing Champion not only to help develop standards but also to foster a broader community that will successfully lead the Web to its full potential for the particular needs of publishing and documents. The Publishing Summit will help us build that community, so I hope you’ll join the conversation on November 9-10 in San Francisco.

Overall I’m excited by the progress to date on Publishing@W3C. The opportunities to help enable the future of publishing and the Web are tremendous. And there’s just one thing that we need to make it happen: participation. Thanks are due the many folks who have already pitched in to get the ball rolling, and I hope you’ll join in supporting the Publishing@W3C initiative.

Latest round of backports PPA updates include Plasma 5.10.2 for Zesty 17.04

Published 17 Jun 2017 by rikmills in Kubuntu.

Kubuntu 17.04 – Zesty Zapus

The latest 5.10.2 bugfix update for the Plasma 5.10 desktop is now available in our backports PPA for Zesty Zapus 17.04.

Included with the update is KDE Frameworks 5.35

Kdevelop has also been updated to the latest version 5.1.1

Our backports for Xenial Xerus 16.04 also receive updated Plasma and Frameworks, plus some requested KDE applications.

Kubuntu 16.04 – Xenial Xerus

To update, use the Software Repository Guide to add the following repository to your software sources list:


or if it is already added, the updates should become available via your preferred update method.

The PPA can be added manually in the Konsole terminal with the command:

sudo add-apt-repository ppa:kubuntu-ppa/backports

and packages then updated with

sudo apt update
sudo apt full-upgrade


Upgrade notes:

~ The Kubuntu backports PPA already contains significant version upgrades of Plasma, applications, Frameworks (and Qt for 16.04), so please be aware that enabling the backports PPA for the 1st time and doing a full upgrade will result in a substantial amount of upgraded packages in addition to the versions in this announcement.  The PPA will also continue to receive bugfix and other stable updates when they become available.

~ While we believe that these packages represent a beneficial and stable update, please bear in mind that they have not been tested as comprehensively as those in the main ubuntu archive, and are supported only on a limited and informal basis. Should any issues occur, please provide feedback on our mailing list [1], IRC [2], file a bug against our PPA packages [3], or optionally contact us via social media.

1. Kubuntu-devel mailing list:
2. Kubuntu IRC channels: #kubuntu & #kubuntu-devel on
3. Kubuntu ppa bugs:

A typical week as a digital archivist?

Published 16 Jun 2017 by Jenny Mitcham in Digital Archiving at the University of York.

Sometimes (admittedly not very often) I'm asked what I actually do all day. So at the end of a busy week being a digital archivist I've decided to blog about what I've been up to.


Today I had a couple of meetings. One specifically to talk about digital preservation of electronic theses submissions. I've also had a work experience placement in this week so have set up a metadata creation task which he has been busy working on.

When I had a spare moment I did a little more testing work on the EAD harvesting feature the University of York is jointly sponsoring Artefactual Systems to develop in AtoM. Testing this feature from my perspective involves logging into the test site that Artefactual has created for us and tweaking some of the archival descriptions. Once those descriptions are saved, I can take a peek at the job scheduler and make sure that new EAD files are being created behind the scenes for the Archives Hub to attempt to harvest at a later date.

This piece of development work has been going on for a few months now and communications have been technically quite complex so I'm also trying to ensure all the organisations involved are happy with what has been achieved and will be arranging a virtual meeting so we can all get together and talk through any remaining issues.

I was slightly surprised today to have a couple of requests to talk to the media. This has sprung from the news that the Queen's Speech will be delayed. One of the reasons for the delay relates to the fact that the speech has to be written on goat's skin parchment, which takes a few days to dry. I had previously been interviewed for a article entitled Why is the UK still printing its laws on vellum? and am now mistaken for someone who knows about vellum. I explained to potential interviewers that this is not my specialist subject!


In the morning I went to visit a researcher at the University of York. I wanted to talk to him about how he uses Google Drive in relation to his research. This is a really interesting topic to me right now as I consider how best we might be able to preserve current research datasets. Seeing how exactly Google Drive is used and what features the researcher considers to be significant (and necessary for reuse) is really helpful when thinking about a suitable approach to this problem. I sometimes think I work a little bit too much in my own echo chamber, so getting out and hearing different perspectives is incredibly valuable.

Later that afternoon I had an unexpected meeting with one of our depositors (well, there were two of them actually). I've not met them before but have been working with their data for a little while. In our brief meeting it was really interesting to chat and see the data from a fresh perspective. I was able to reunite them with some digital files that they had created in the mid 1980's, had saved on to floppy disk and had not been able to access for a long time.

Digital preservation can be quite a behind the scenes sort of job - we always give a nod to the reason why we do what we do (ie: we preserve for future reuse), but actually seeing the results of that work unfold in front of your eyes is genuinely rewarding. I had rescued something from the jaws of digital obsolescence so it could now be reused and revitalised!

At the end of the day I presented a joint webinar for the Open Preservation Foundation called 'PRONOM in practice'. Alongside David Clipsham (The National Archives) and Justin Simpson (Artefactual Systems), I talked about my own experiences with PRONOM, particularly relating to file signature creation, and ending with a call to arms "Do try this at home!". It would be great if more of the community could get involved!

I was really pleased that the webinar platform worked OK for me this time round (always a bit stressful when it doesn't) and that I got to use the yellow highlighter pen on my slides.

In my spare moments (which were few and far between), I put together a powerpoint presentation for the following day...


I spent the day at the British Library in Boston Spa. I'd been invited to speak at a training event they regularly hold for members of staff who want to find out a bit more about digital preservation and the work of the team.

I was asked specifically to talk through some of the challenges and issues that I face in my work. I found this pretty easy - there are lots of challenges - and I eventually realised I had too many slides so had to cut it short! I suppose that is better than not having enough to say!

Visiting Boston Spa meant that I could also chat to the team over lunch and visit their lab. They had a very impressive range of old computers and were able to give me a demonstration of Kryoflux (which I've never seen in action before) and talk a little about emulation. This was a good warm up for the DPC event about emulation I'm attending next week: Halcyon On and On: Emulating to Preserve.

Still left on my to do list from my trip is to download Teracopy. I currently use Foldermatch for checking that files I have copied have remained unchanged. From the quick demo I saw at the British Library I think that Teracopy would be a more simple one step solution. I need to have a play with this and then think about incorporating it into the digital ingest workflow.

Sharing information and collaborating with others working in the digital preservation field really is directly beneficial to the day to day work that we do!


Back in the office today and a much quieter day.

I extracted some reports from our AtoM catalogue for a colleague and did a bit of work with our test version of Research Data York. I also met with another colleague to talk about storing and providing access to digitised images.

In the afternoon I wrote another powerpoint presentation, this time for a forthcoming DPC event: From Planning to Deployment: Digital Preservation and Organizational Change.

I'm going to be talking about our experiences of moving our Research Data York application from proof of concept to production. We are not yet in production and some of the reasons why will be explored in the presentation! Again I was asked to talk about barriers and challenges and again, this brief is fairly easy to fit! The event itself is over a week away so this is unprecedentedly well organised. Long may it continue!


On Fridays I try to catch up on the week just gone and plan for the week ahead as well as reading the relevant blogs that have appeared over the week. It is also a good chance to catch up with some admin tasks and emails.

Lunch time reading today was provided by William Kilbride's latest blog post. Some of it went over my head but the final messages around value and reuse and the need to "do more with less" rang very true.

Sometimes I even blog myself - as I am today!

Was this a typical week - perhaps not, but in this job there is probably no such thing! Every week brings new ideas, challenges and surprises!

I would say the only real constant is that I've always got lots of things to keep me busy.

Yes RDF is All Well and Good But Does It Scale?

Published 15 Jun 2017 by Phil Archer in W3C Blog.

A criticism of Linked Data, RDF and the Semantic Web in general is that is doesn’t scale. In the past this has been a justified complaint – but no longer.

The EU-funded Big Data Europe project, in which W3C is pleased to be a partner, is running a series of pilots in 7 different societal areas (the European Commission’s Societal Challenges). Not all of these use Linked Data but those that do are using a lot of it. OpenPHACTS, for example, provides an API to a service that reconciles the many different identifiers used in biomedical and pharmacological data and easily handles billions of triples. In the food and agriculture domain, we’re using NLP to extract information from millions of scholarly articles about viticulture and linking that information using a SKOS thesaurus. In the social sciences, Linked Data is again being used to compare spending data from neighboring authorities. Finally, we’re using large scale geospatial Linked Data to process Earth Observation data to detect change on the ground and link it to events as reported in social media.

All of these pilots depend on processing large amounts of data in a variety of formats. As I noted recently, the Big Data Europe platform creates a virtual RDF graph of all the data available at query time, overcoming the most difficult of problems with big data: variety. Allied to this use of Semantic Technologies, the team behind the Semantic Analysis Stack, SANSA, has just released version 0.2 of its distributed computing software that supports machine learning, inference and querying at scale.

ISWC Vienna, 21-15 October 2017

This kind of advanced use of Semantic Technologies will be in focus later this year at the International Semantic Web Conference (ISWC) in Vienna, 21 – 25 October. The Big Data Europe project is proud to be a Gold sponsor of ISWC 2017, which is the premier annual forum for the discussion of Semantic Technologies and expects to attract 600-700 international delegates. As local chair Axel Polleres notes, it presents a variety of opportunities for local and international businesses. Best of all from my point of view the 16th year of this event and others like it refute the nay-sayers’ claims that the Semantic Web is an academic solution looking for a real world problem. Big Data Europe provides evidence for it being used to solve real world data processing problems at scale.

Looking Back at DigitalOcean’s First Year in India

Published 14 Jun 2017 by Prabhakar (PJ) Jayakumar in DigitalOcean: Cloud computing designed for developers.

Looking Back at DigitalOcean’s First Year in India

It’s been a year since we established DigitalOcean’s presence in India, starting with our Bangalore office and BLR1 datacenter, and it’s been nothing short of an exhilarating ride! We are excited about being able to cater to the needs of the developer community in India and its neighboring regions, and we’re humbled by the love our customers have shown.

In this blog post, we’ll share some memorable highlights from engaging with India’s growing developer community over the past 12 months.

Face-to-Face With India’s Developers: Conferences, Meetups, and Hackathons

This past fall, we organized a “Product-a-thon” contest named Campus Shark targeted at university students to identify and recognize the best student engineers across colleges in India. The contest saw participation from student teams across colleges in more than 30 Indian cities, spanning from Silchar (Assam) in the East, Ahmedabad (Gujarat) in the West, Thiruvananthapuram (Kerala) in the South, to Jalandhar (Punjab) in the North. Student teams worked on a diverse set of projects relevant for the local community, including:

This past fall, we ran our Hacktoberfest initiative and partnered with IndiaStack to organize their first-ever hackathon around the Aadhaar Auth API. IndiaStack is a set of public, open APIs, and systems that allow government entities, businesses, startups, and developers to utilize a unique digital infrastructure to solve India’s pressing problems.

We’ve hosted two editions of our signature Tide Conference, where we’ve seen enthusiastic participation from hundreds of developers and startups, and Tide has become a platform for them to connect, network, and engage with influencers, mentors, and VCs in the tech ecosystem. Additionally, DO Meetups have expanded to chapters across six cities (Bangalore, Hyderabad, Mumbai, Pune, Delhi, and Chennai).

We’ve also added monthly webinars from industry experts such as MSV Janakiram to our ongoing programming, and continue to facilitate workshops, hackathons, and coding contests across India.

Looking Back at DigitalOcean’s First Year in India

Becoming a Part of India’s Startup Ecosystem

As part of our global incubator program Hatch, we are collaborating with more than 60 partners from the ecosystem (including top-tier accelerators like NUMA, incubators such as Nasscom 10K startups, VCs such as Accel and SAIF Partners, and government-run initiatives like the Startup India program), with hundreds of startups getting year-long free access to our infrastructure, technical training, mentorship, and priority support.

We are proud to have amazing Indian companies such as, Betaout, KartRocket, and HackerRank among our customers today.

Akhil Gupta, co-founder and CTO of, says, “NoBroker was one of the first customers of DigitalOcean in India and since Day 1 we have been amazed with the simplicity of the solution. DO has grown in last 3 years and launched some amazing products like Floating IP and Block Storage which covers what is required for a production cluster. Many times we have been stuck for implementation and DO technical blogs have come to our rescue with [their] step-by-step installation guide.”

What’s Next

As we embark on yet another year, we will endeavor to continue empowering developers and software companies to build amazing things while our robust, affordable, and simple infrastructure does the heavy-lifting for them. To date, over 492,000 Droplets have been deployed in BLR1, and nearly one third of DO’s global Meetup presence is in India. With Block Storage to be launched in BLR1 by early Q3, and a host of new products in the 2017 roadmap, we’re focused on making it easier than ever for startups and teams of software developers from India to deploy and scale their applications.

If you have participated in any of our activities or if you have suggestions on how you would like to engage with DigitalOcean, let us know in the comments below. We’re looking forward to partnering with you and supporting your needs in the year ahead!

Prabhakar (PJ) Jayakumar is DigitalOcean's India Country Manager. He is responsible for running the firm’s operations in India, and his team is focused on both building out the DO community and supporting the localized needs of India’s developer and startup ecosystem.

Search Issues

Published 14 Jun 2017 by Ipstenu (Mika Epstein) in Make WordPress Plugins.

UPDATE (@dd32): All issues should be resolved as of 2:15AM UTC. The root cause was a change in the behaviour of Jetpack Search which we rely upon causing queries to fail. A network outage had caused issues for some queries earlier in the day, but was completely unrelated.

You may have noticed that search is acting up. Per @dd32: is experiencing a few network issues at present in the datacenter, it’s likely that connectivity between the API and’s elastic search is up-and-down, and when it’s down, search will be offline.

Yes, that means search for plugins too.

There’s nothing to do but wait at this point. It may be up and down while the connectivity is being sorted.

Block Storage Comes to Singapore & Toronto; Four More Datacenters on the Way!

Published 12 Jun 2017 by Ben Schaechter in DigitalOcean: Cloud computing designed for developers.

Block Storage Comes to Singapore & Toronto; Four More Datacenters on the Way!

Today, we're excited to share that Block Storage is available to all Droplets in our Toronto region. With Block Storage, you can scale your storage independently of your compute and have more control over how you grow your infrastructure, enabling you to build and scale larger applications more easily. Block Storage has been a key part of our overall focus on strengthening the foundation of our platform to increase performance and enable our customers to scale.

We've seen incredible engagement since our launch last July. Together, you have created more than 95,000 Block Storage volumes in SFO2, NYC1, and FRA1 to scale databases, take backups, store media, and much more; SGP1 and TOR1 are our fourth and fifth datacenters with Block Storage.

As we continue to upgrade and augment our other datacenters, we'll be ensuring that Block Storage is added too. In order to help you plan your deployments, we've finalized the timelines for the next four regions. Here is the schedule we're targeting for Block Storage rollout in 2017:

We'll have more specific updates to share on these datacenters as well as NYC2 and AMS2 in a future update.

Block Storage Comes to Singapore & Toronto; Four More Datacenters on the Way!

Inside SGP1, our Singapore datacenter region.

Thanks to everyone who has given us feedback and used Block Storage so far. Please keep it coming. You can create your first Block Storage volume in Singapore or in Toronto today!

Ben Schaechter

Product Manager, Droplet & Block Storage

In Memoriam – Pierre Danet

Published 12 Jun 2017 by Bill McCoy in W3C Blog.

A visionary leader in the advancement of publishing technology and the Web has passed away. Pierre Danet, Chief Digital Officer of Hachette-Livre Group, was a long-time Board member of the International Digital Publishing Forum (IDPF). In that capacity he played an instrumental leadership role in the recently completed combination of IDPF with the W3C.

Pierre also led Hachette-Livre to join W3C and was active in other W3C work prior to the combination. He was a founding Board member of Readium Foundation which develops open source software for EPUB and Web publishing. Pierre also drove the creation, and served as founding President, of the European Digital Reading Lab (EDRLab).

Pierre was a tireless leader, and a warm and generous soul whose wise counsel was always well-seasoned with humor.  W3C and the entire publishing industry owe a great debt to Pierre’s vision, energy and accomplishments, as do I. His passing is a great loss but I know he was happy to see his vision for digital convergence and open standards for publishing technology and the Web moving forward, and I’m certain that his memory will inspire all of us to continue to work to make it happen.

On behalf of the W3C and its Publishing Business Group,

Bill McCoy

W3C Publishing Champion /  President, Readium Foundation / Executive Director, IDPF (emeritus)

Introducing the W3C Strategy Funnel

Published 8 Jun 2017 by Wendy Seltzer in W3C Blog.

W3C has a variety of mechanisms for listening to what the community thinks could make for good future Web standards. These include discussions with the Membership, discussions with other standards bodies, and the activities of thousands of engineers in nearly 300 community groups. There are lots of good ideas, and lately the strategy team in W3C has experimented with ways to identify promising topics. Today we’d like to share our exploratory work and invite public participation.

We’d like to take the “beta” label off the Strategy Funnel and explicitly invite your participation.

The Funnel documents our exploration of potential new work areas, from the germ of an idea — Exploration and Investigation — through its development as a potential work item — Incubation and Evaluation — to the potential Chartering of a new Working Group or Interest Group or a re-charter to expand the scope of an existing group. The Funnel is a GitHub Project view in which each new area is an issue represented by a “card” in the stack. Cards move through the columns, usually from left to right. Most issues (cards) start in Exploration and move forward or move out of the funnel.

The meaning of each stage is further documented at these links: 0.Exploration 1.Investigation 2.Incubation 3.Evaluation 4.Chartering.

Public input is welcome at any stage, and particularly invited at the Incubation phase and beyond: To help us identify work that is sufficiently incubated to warrant standardization (Incubation); to review the ecosystem around the work and indicate your interest in participating in its standardization (Evaluation); and to draft a charter that reflects an appropriate scope of work (Chartering). We aim to incorporate comments and respond to concerns well before a charter is presented to the W3C Advisory Committee for formal review, and hope thereby to make the review process more effective and responsive.

Funnel graphic illustrating Exploration, Investigation, Incubation, Evaluation, Chartering

Don’t see what you need there yet? Add it as new issue.

Five minutes with Kylie Howarth

Published 7 Jun 2017 by carinamm in State Library of Western Australia Blog.

Kylie Howarth is an award winning Western Australian author, illustrator and graphic designer. Original illustrations and draft materials from her most recent picture book 1, 2, Pirate Stew (Five Mile Press) are currently showing in The Story Place Gallery.

We spent some time hearing from Kylie Howarth about the ideas and inspiration behind her work. Here’s what she had to say…


1, 2, Pirate Stew is all about the power of imagination and the joys of playing in a cardboard box. How do your real life experiences influence your picture book ideas? What role does imagination play?

The kids and I turned the box from our new BBQ into a pirate ship. We painted it together and made anchors, pirate hats and oars. They loved it so much they played in it every day for months… and so the idea for 1, 2, Pirate Stew was born. It eventually fell apart and so did our hot water system, so we used that box to build a rocket. Boxes live long lives around our place. I also cut them up and take them to school visits to do texture rubbings with the students.

Your illustrations for 1, 2, Pirate Stew are unique in that they incorporate painted textures created during backyard art sessions with your children. What encouraged you to do this? How do your children’s artworks inspire you?

I just love children’s paintings. They have an energy I find impossible to replicate. Including them in my book illustrations encourages kids to feel their art is important and that they can make books too. Kids sometimes find highly realistic illustrations intimidating and feel they could never do it themselves. During school and library visits, they love seeing the original finger paintings and potato stamp prints that were used in my books.

Through digital illustration you have blended hand drawings with painted textures. How has your background and training as a graphic designer influenced your illustrative style?

Being a graphic designer has certainly influenced the colour and composition of my illustrations. In 1, 2, Pirate Stew particularly the use of white space. Many illustrators and designers are afraid of white space but it can be such an effective tool, it allows the book to breathe. The main advantage though is that I have been able to design all my own book covers, select fonts and arrange the text layout.

Sometimes ideas for picture books evolve and change a lot when working with the publisher. Sometimes the ideas don’t change much at all. What was your experience when creating 1, 2, Pirate Stew? Was it similar or different to your previous books Fish Jam and Chip?

I worked with a fabulous editor, Karen Tayleur on all three books. We tweaked the text for Fish Jam and Chip a little to make them sing as best we could. With 1, 2, Pirate Stew however, the text was based on the old nursery rhyme 1, 2, Buckle My Shoe. So there was little room to move as I was constrained to a limited number of syllables and each line had to rhyme. I think we only added one word. I did however further develop the illustrations from my original submission. Initially the character’s faces were a little more stylised so I refined them to be more universal. Creating the mini 3D character model helped me get them looking consistent from different angles throughout the book. I also took many photographs of my boys to sketch from.

1, 2, Pirate Stew – an exhibition is on display at the State Library of Western Australia until 22 June 2017. The exhibition is part of a series showcasing the diverse range of illustrative styles in picture books published by Western Australian authors and illustrators. For more information go to

Filed under: Children's Literature, Exhibitions, Illustration, SLWA displays, SLWA Exhibitions, SLWA news Tagged: 1 2 Pirate Stew, Five Mile Press, Kylie Howarth, State Library of Western Australia, State Library WA, Story Place Gallery, WA authors, WA illustrators


Published 7 Jun 2017 by fabpot in Tags from Twig.


Published 7 Jun 2017 by fabpot in Tags from Twig.

Plasma 5.10.1 now in Zesty backports

Published 7 Jun 2017 by valorie-zimmerman in Kubuntu.

The first bugfix update of the Plasma 5.10 series is now available for users of Kubuntu Zesty Zapus 17.04 to install via our backports PPA.

See the Plasma 5.10.1 and 5.10.0 announcements and the release video below for more about the new features available, and bugfixes applied.

To update, use the Software Repository Guide to add the following repository to your software sources list:


or if it is already added, the updates should become available via your preferred update method.

The PPA can be added manually in the Konsole terminal with the command:

sudo add-apt-repository ppa:kubuntu-ppa/backports

and packages then updated with

sudo apt update
sudo apt full-upgrade


Upgrade notes:

~ The Kubuntu backports PPA already contains from previous updates backported KDE PIM 16.12.3 packages (Kmail, Kontact, Korganiser Akregator etc), plus various other backported applications, and KDE Frameworks 5.34, so please be aware that enabling the backports PPA for the 1st time and doing a full upgrade will result in a substantial amount of upgraded packages in addition to Plasma 5.10.

~ The PPA will also continue to receive bugfix updates to Plasma 5.10 when they become available, and further updated KDE applications.

~ While we believe that these packages represent a beneficial and stable update, please bear in mind that they have not been tested as comprehensively as those in the main ubuntu archive, and are supported only on a limited and informal basis. Should any issues occur,  please provide feedback on our mailing list [1], IRC [2], file a bug against our PPA packages [3], or optionally contact us via social media.

1. Kubuntu-devel mailing list:
2. Kubuntu IRC channels: #kubuntu & #kubuntu-devel on
3. Kubuntu ppa bugs:


MediaWiki fails to show Ambox

Published 7 Jun 2017 by lucamauri in Newest questions tagged mediawiki - Webmasters Stack Exchange.

I am writing you about the use of Template:Ambox in MediaWiki.

I have a version 1.28 hosted MediWiki installation that works well apparently at everything, but I can't get the boxes explain here to work properly.

As a test I implemented in this page the following code:

| type       = notice
| text       = Text for a big box, for the top of articles.
| smalltext  = Text for the top of article sections.

and I expected a nice box to show up. Instead I simply see the text Template:Ambox shown at the top of the page.
It seems like this template is not defined in MediaWiki, but, as far as I understood, this is built-in and in all examples I saw it seems it should work out-of-the-box.

I guess I miss something basic here, but it really escapes me: any help you might provide will be appreciated.



New Customer Support System!

Published 6 Jun 2017 by Nicola Nye in FastMail Blog.

You are our customer, not our product. This is our number one value. Your subscription fee doesn't just go towards keeping your email safe, secure and reliable; it means if you ever have a problem that our detailed help pages don't resolve, one of our friendly support team will help you out.

And now, as part of our commitment to providing first rate customer service, we have made it even easier to access FastMail Support.

We have made changes to our contact form and ticketing system to make them clearer and more user-friendly. We've also been busy behind the scenes improving our back-end systems, which means we can address issues faster and more efficiently than before.

The biggest change is that you can now send support requests via email, to!

Not all users will have access to the new system yet. We are gradually rolling it out over the next few weeks. We hope you like what you see.

With FastMail, you're not left alone. We know you would rather spend your time getting things done, not trying to fix a glitch by experimenting with something a random stranger on the internet tried once.

Ticket creation screenshot

We pride ourselves on the quality of our personal support service. As a result we don't use social media for individual support: instead please create a ticket. Most requests need us to send or receive the kind of information we (and you!) don't want to be exposed on social media. For detailed and rapid assistance, our ticketing system has the tools to help us help you.

We also don’t provide telephone support. Telephone support is expensive, and time consuming. In order to provide high quality, round-the-clock assistance while keeping your costs reasonable, we provide support by email only.

We do still love hearing from you! Drop us a line any time on Twitter, Facebook, or Google+, where we post system-wide announcements.

While our system is changing, our policies remain the same. Here's how to recognise true requests from the FastMail support team. Our team will never ask you for a password or credit card number via email or phone. We will never offer to fix your problems by remotely accessing your computer. Messages from us will be identified with a green tick in the web interface. We encourage you to check out our full customer support policies.

Cloud Firewalls: Secure Droplets by Default

Published 6 Jun 2017 by Rafael Rosa in DigitalOcean: Cloud computing designed for developers.

Cloud Firewalls: Secure Droplets by Default

When deploying a new application or service, security is always a top concern. The internet is full of malicious actors probing applications for vulnerabilities and sniffing for open ports. Tools like iptables are essential to any developer’s toolkit, but they can be complicated to use, especially when building distributed services. Adding a new Droplet can require updating your configuration across all of your infrastructure.

At DigitalOcean, we are working to make it easier for developers to build applications and deploy them to the cloud by simplifying the infrastructure experience. Today, we’re excited to bring that approach to security with Cloud Firewalls, an easily configurable service for securing your Droplets. It is free to use and designed to scale with you as you grow.

By using Cloud Firewalls, you will have a central location to define access rules and apply them to all of your Droplets. We enforce these rules on our network layer. Unauthorized traffic will not reach your Droplets, and this protection doesn't consume any resources from your Droplet.

Secure by Default

When using Firewalls, we start from the principle of least privilege—only the ports and IPs explicitly defined by you will be accessible. Any packet that doesn't fit the rules will be dropped before it reaches your Droplet. A simple Firewall that would only allow HTTP, SSH, and ICMP connections from any source would need three rules:

Cloud Firewalls: Secure Droplets by Default

If someone tried to access this Droplet on any other port—say FTP using port 21—they would receive a timeout because Firewalls filtered out the traffic.

Easy to Configure

We’ve designed Firewalls to be easy to configure. Your source and destination rules can specify individual Droplets by name, Load Balancers, IP ranges, and even sets of Droplets by using Tags.

Cloud Firewalls: Secure Droplets by Default

For finer-grained control, you can also apply multiple Firewalls to a Droplet. This allows you to keep rules for different concerns in different Firewalls. For example, you could create one Firewall called webapp-firewall, that allows only HTTP on port 80, and another called admin-firewall, that allows SSH and ICMP from only a specific IP. Our service will combine their rules and enforce them together.

Beyond the Control Panel, you can manage your Firewalls on the command line with doctl or automate using our RESTful API or our Go and Ruby API client libraries. Expect more integrations to come along soon, thanks to our amazing community.

Works at Scale

Even without automation, Firewalls makes it much easier to secure distributed applications with large numbers of resources. You can leverage tagging to group and organize any number of Droplets, and use them to define how each group of Droplets is secured by Firewalls.

Cloud Firewalls: Secure Droplets by Default

For example, you could create a Firewall called db-firewall and only allow inbound connections from all Droplets tagged frontend, securing your database from unauthorized access. If you add this tag to more Droplets, they will automatically be recognized by our system and be whitelisted by this rule.

Getting Started

Whether you’re ready to dive in and create your first Firewall or you just want to learn more, check out these tutorials on our Community site for all the details and some best practices:

We can’t wait to hear your feedback. It helps guide us as we continue to work on making your infrastructure more secure and easier to manage at scale. Let us know what you think in the comments below, and stay tuned for major network security improvements later this year.


Published 5 Jun 2017 by fabpot in Tags from Twig.


Published 5 Jun 2017 by fabpot in Tags from Twig.


Published 5 Jun 2017 by fabpot in Tags from Twig.


Published 5 Jun 2017 by fabpot in Tags from Twig.


Published 5 Jun 2017 by fabpot in Tags from Twig.


Published 5 Jun 2017 by fabpot in Tags from Twig.

Ted Nelson’s Junk Mail (and the Archive Corps Pilot)

Published 31 May 2017 by Jason Scott in ASCII by Jason Scott.

I’ve been very lucky over the past few months to dedicate a few days here and there to helping legend Ted Nelson sort through his archives. We’ve known each other for a bunch of years now, but it’s always a privilege to get a chance to hang with Ted and especially to help him with auditing and maintaining his collection of papers, notes, binders, and items. It also helps that it’s in pretty fantastic shape to begin with.

Along with sorting comes some discarding – mostly old magazines and books; they’re being donated wherever it makes sense to. Along with these items were junk mail that Ted got over the decades.

About that junk mail….

After glancing through it, I requested to keep it and take it home. There was a lot of it, and even going through it with a cursory view showed me it was priceless.

There’s two kinds of people in the world – those who look at ephemera and consider it trash, and those who consider it gold.

I’m in the gold camp.

I’d already been doing something like this for years, myself – when I was a teenager, I circled so many reader service cards and pulled in piles and piles of flyers and mailings from companies so fleeting or so weird, and I kept them. These became and later the reader service collection, which encapsulates completely. There’s well over a thousand pages in that collection, which I’ve scanned myself.

Ted, basically, did what I was doing, but with more breadth, more variety, and with a few decades more time.

And because he was always keeping an eye out on many possibilities for future fields of study, he kept his mind (and mailbox) open to a lot of industries. Manufacturing, engineering, film-making, printing, and of course “computers” as expressed in a thousand different ways. The mail dates from the 1960s through to the mid 2000s, and it’s friggin’ beautiful.

Here’s where it gets interesting, and where you come in.

There’s now a collection of scanned mail from this collection up at the Internet Archive. It’s called Ted Nelson’s Junk Mail and you can see the hundreds of scanned pages that will soon become thousands and maybe tens of thousands of scanned pages.

They’re separated by mailing, and over time the metadata and the contents will get better, increase in size, and hopefully provide decades of enjoyment for people.

The project is being coordinated by Kevin Savetz, who has hired a temp worker to scan in the pages across each weekday, going through the boxes and doing the “easy” stuff (8.5×11 sheets) which, trust me, is definitely worth going through first. As they’re scanned, they’re uploaded, and (for now) I am running scripts to add them as items to the Junk Mail collection.

The cost of doing this is roughly $80 a day, during which hundreds of pages can be scanned. We’re refining the process as we go, and expect it to get even more productive over time.

So, here’s where Archive Corps comes in; this is a pilot program for the idea behind the new idea of Archive Corps, which is providing a funnel for all the amazing stuff out there to get scanned. If you want to see more stuff come from the operation that Kevin is running, he has a paypal address up at – the more you donate the more days we are able to have the temp come in to scan.

I’m very excited to watch this collection grow, and see the massive variety of history that it will reveal. A huge thank-you to Ted Nelson for letting me take these items, and a thank-you to Kevin Savetz for coordinating.

Let’s enjoy some history!

The State of AI

Published 31 May 2017 by Alejandro (Alex) Jaimes in DigitalOcean: Cloud computing designed for developers.

The State of AI

This post is the first in a three-part series we're publishing this summer on artificial intelligence, written by DigitalOcean’s Head of R&D, Alejandro (Alex) Jaimes.

In recent months, the amount of media coverage on AI has increased so significantly that a day doesn’t go by without news about it. Whether it’s an acquisition, a funding round, a new application, a technical innovation, or an opinion piece on ethical and philosophical issues (“AI will replace humans, take over the world, eat software, eat the world”), the content just keeps coming.

The field is progressing at amazing speeds and there’s a lot of experimentation. But with so much noise, it’s hard to distinguish hype from reality, and while everyone seems to be rushing into AI in one way or another, it’s fair to say there is a good amount of confusion on what AI really is, what sort of value it can bring and where things will go next.

While the reality is that AI has the potential to impact just about everything and be embedded in just about anything—just like software already is—getting started can be daunting, depending on who you ask.

In this post, I will first explain why computing is now AI. Then, in future posts, I’ll describe the most significant trends, outline steps to be taken in actually implementing AI in practice, and say a few words about the future.

Computing Is Now AI

AI is already embedded, in some form, in most of the computing services we use on a daily basis: when we search the web, visit a webpage, read our email, use social media, use our phone, etc. Most of those applications use some form of machine learning to perform “basic” tasks, like spam detection, personalization, and advertising. But like computing itself, penetration of AI doesn’t stop there. Our transportation systems, security, cargo shipping, banking, dating, and just about everything else is likely “touched” by algorithms that use machine learning.

The State of AI

AI is really an umbrella term that encompasses many subfields. For the sake of simplicity, most of what people think of as AI currently has machine learning, and/or deep learning. The ideas behind the three concepts are rather straightforward: AI aims to "emulate or supercede human intelligence," machine learning is concerned with algorithms that learn models from data, and deep learning is "simply" a subset of machine learning algorithms that learn from data with less human intervention. In building "traditional" machine learning algorithms an engineer has to design features, but in a deep learning framework the features themselves are learned by the algorithm—those algorithms, however, need significantly greater amounts of data.

Some industries use computing technology in more advanced ways than others. Tech companies, in particular, have taken the lead in developing products and services around data and AI (in various forms), and scaling to millions and billions of users. This has led to significant advances in some areas where having large, diverse datasets can improve performance to the point where problems that seemed out of reach now seem solvable. Other industries, such as healthcare and education, have been slower to adapt, but we're beginning to see significant progress with very promising prospects.

If we look closely at trends, and technical requirements (for AI to deliver in products and services), it's easy to see that AI can already be applied everywhere. More specifically, where repetitive patterns occur, and those patterns can be recorded, whether the data is individual or aggregated. One could easily argue that everything in life—and business—consists of cycles, and what's changed significantly in recent years is our ability to record, store, and process behavioral patterns at every level. AI adds prediction, which is extremely valuable.

The power of AI comes at multiple granularities. There are a plethora of decisions made every day based on simple, repetitive patterns—and those apply to businesses as much as they do to individuals. It's no surprise then, that most companies are using AI today to cut costs and improve efficiency. As more processes become digital, AI, then, becomes not just a critical part of the ecosystem, but the driving force, in large part because its main benefit is efficiency. And if we look at things from this perspective, it's easy to see why computing and AI are already converging to the point where there's no distinction. It the very near future, it will be assumed that AI is part of computing, just as networking and other technical components are.

This is not a minor shift, however. It is massive because it emphasizes processes that leverage data, and evolving models (vs. "fixed" algorithms), impacting how software is developed. This has several ripple effects that I'll describe in future posts, including pushing the hardware boundaries. I would argue that the companies and teams that understand this and think and operate with this mindset will now have a significant advantage over others that try to "add" AI at a later stage.

On one hand, this means that individuals and teams must constantly learn and grow, remain up to date, and rely on the larger community for the exchange of models, ideas, code, and knowledge. It also means that applications will be increasingly built by layering components and data—nothing will be built from scratch. For hobbyists, "professional" developers, engineering teams, the open source community, and companies, this translates into significant synergies—an ecosystem that relies on the cloud, which is the perfect platform to combine multiple resources and scale with a single click. Ultimately, this implies that AI skills will be as critical to individuals as they are to companies, and they will form the basis of economic progress for decades to come.

We'd love to get your thoughts on AI. How it has impacted the way you build software? What do you think you need to make AI part of your workflow? What opportunities and barriers do you see? What are the topics you'd like to learn more about or the tools you'd like to use? Let us know in the comments below!

Alejandro (Alex) Jaimes is Head of R&D at DigitalOcean. Alex enjoys scuba diving and started coding in Assembly when he was 12. In spite of his fear of heights, he's climbed a peak or two, gone paragliding, and ridden a bull in a rodeo. He's been a startup CTO and advisor, and has held leadership positions at Yahoo, Telefonica, IDIAP, FujiXerox, and IBM TJ Watson, among others. He holds a Ph.D. from Columbia University.
Learn more by visiting his personal website or LinkedIn profile. Find him on Twitter: @tinybigdata.

17.10 Wallpaper Contest deadline for submissions soon

Published 30 May 2017 by valorie-zimmerman in Kubuntu.

The 17.10 Wallpaper contest will close on June 8, 2017. Submit your work soon! to see the entries so far.

X-posting Proposal: WordPress Community Conduct Project

Published 30 May 2017 by Ipstenu (Mika Epstein) in Make WordPress Plugins.

Please read + comment on the original post.

Proposal: WordPress Community Conduct Project

Local illustration showcase

Published 30 May 2017 by carinamm in State Library of Western Australia Blog.

From digital illustration to watercolor painting and screen-printing, three very different styles of illustration highlight the diversity and originality of picture books published this year. 

In a series of exhibitions, The Story Place Gallery will showcase original artwork by Western Australian illustrators from the picture books 1,2 , Pirate Stew, (Five Mile Press 2017), One Thousand Trees and Colour Me (Fremantle Press 2017).


7, 8, he took the bait © Kylie Howarth 2017

In 1,2 , Pirate Stew,  Kylie Howarth has used a digital Illustration process to merge her drawings created using water soluble pencils, with background textures painted by her two adventurous children Beau and Jack. Kylie Howarth’s playful illustrations of gentle colours, together with her entertaining rhyming verse, take readers on an imaginative adventure all about the joys of playing in a cardboard box. Illustrations from 1,2, Pirate Stew are on display from 26 May – 22 June.


Among © Kyle Hughes-Odgers 2017

Kyle Hughes-Odgers’ distinctive illustrations blend geometric shapes, patterns and forms. In his watercolour illustrations for One Thousand Trees, he uses translucent colours and a restricted colour palette to explore the relationship between humankind and the environment. Shades of green browns and grey blues emphasise contrasts between urban and natural scenes. Kyle Hughes-Odgers places the words of the story within his illustrations to accentuate meaning. One Thousand Trees is on display from 24 June to 23 July.


If I was red © Moira Court

Moira Court’s bold illustration for the book Colour Me (written by Ezekiel Kwaymullina) were created using a woodcut and screen printing technique. Each final illustration is made from layers of silk screen prints created using hand cut paper stencils and transparent ink. Each screen print was then layered with a patchy, textural woodcut or linoleum print. Colours were  printed one at a time to achieve a transparent effect. The story celebrates the power of each individual colour, as well as the power of their combination. Colour Me is on display from 26 July – 16 August.

Each exhibition in this series is curated especially for children and is accompanied by a story sharing area, self-directed activity, and discussion prompters for families

Filed under: Children's Literature, community events, Exhibitions, Illustration, SLWA displays, SLWA Exhibitions, SLWA news

A Lot of Doing

Published 28 May 2017 by Jason Scott in ASCII by Jason Scott.

If you follow this weblog, you saw there was a pause of a couple months. I’ve been busy! Better to do than to talk about doing.

A flood of posts are coming – they reflect accomplishments and thoughts of the last period of time, so don’t be freaked out as they pop up in your life very quickly.


TV Interview on Stepping Off

Published 26 May 2017 by Tom Wilson in tom m wilson.

New Directory Status

Published 23 May 2017 by Ipstenu (Mika Epstein) in Make WordPress Plugins.

As everyone knows, the first phase of opening up the directory to more reviewers was getting on the new system. We’re not quite there yet, however a great deal of progress has been made!

So far, we’ve run into a few weird flow issues that are blocking us from being able to invite new people. The biggest issue is that if you know the old system, it’s easy to move tickets through the new one. But it’s set up in a way that is very very easy to make mistakes and put tickets in unrecoverable states. So we need to mitigate that as much as possible before we let new people in. Basically we don’t want to break things for users because we didn’t think about use-cases.

Okay, fine, you say. What can you do to help?

I’m glad you asked!

We have 100 tickets open in Meta Trac. You can install the meta-environment in VVV and help us out with patches. Sadly, the meta env isn’t complete. It’s missing data, so you’ll end up having to add in plugins in order to mess with the state flow.

But if you can’t patch, and I do understand that, remember to come to the Plugin Directory revamp meetings on Wednesday at 2200 UTC in #meta on Slack. And please, test test test everything! The more we break the directory, the better it is 🙂

WikiCite 2017

Published 23 May 2017 by Sam Wilson in Sam's notebook.

(Firefox asked me to rate it this morning, with a little picture of a broken heart and five stars to select from. I gave it five (’cause it’s brilliant) and then it sent me to a survey on titled “Heavy User V2”, which sounds like the name of an confused interplanetary supply ship.)

Today WikiCite17 begins. Three days of talking and hacking about the galaxy that comprises Wikipedia, Wikidata, Wikisource, citations, and all bibliographic data. There are lots of different ways into this topic, and I’m focusing not on Wikipedia citations (which is the main drive of the conference, I think), but on getting (English) Wikisource metadata a tiny bit further along (e.g. figure out how to display work details on a Wikisource edition page); and on a little side project of adding a Wikidata-backed citation system to WordPress.

The former is currently stalled on me not understanding the details of P629 ‘edition or translation of’ — specifically whether it should be allowed to have multiple values.

The latter is rolling on quite well, and I’ve got it searching and displaying and the beginnings of updating ‘book’ records on Wikidata. Soon it shall be able to make lists of items, and insert the lists (or individual citations of items on them) into blog posts and pages. I’m not sure what the state of the art is in PHP of packages for formatting citations, but I’m hoping there’s something good out there.

And here is a scary chicken I saw yesterday at the Naturhistorisches Museum:

Scary chicken (Deinonychus antirrhopus)

All accounts updated to version 2.9

Published 20 May 2017 by Pierrick Le Gall in The Blog.

17 days after Piwigo 2.9.0 was released and 4 days after we started to update, all accounts are now up-to-date.

Piwigo 2.9 and new design on administration pages

Piwigo 2.9 and new design on administration pages

As you will learn from the release notes, your history will now be automatically purged to keep “only” the last 1 million lines. Yes, some of you, 176 to be exact, have more than 1 million lines, with a record set to 27 millions lines!

Wikimedia Commons Android App Pre-Hackathon

Published 19 May 2017 by addshore in Addshore.

Wikimedia Commons Logo

The Wikimedia Commons Android App allows users to upload photos to Commons directly from their phone.

The website for the app details some of the features and the code can be found on GitHub.

A hackathon was organized in Prague to work on the app in the run up to the yearly Wikimedia Hackathon which is in Vienna this year.

A group of 7 developers worked on the app over a few days and as well as meeting each other and learning from each other they also managed to work on various improvements which I have summarised below.

2 factor authentication (nearly)

Work has been done towards allowing 2fa logins to the app.

Lots of the login & authentication code has been refactored and the app now uses the clientlogin API module provided by Mediawiki instead of the older login module.

When building to debug the 2fa input box will appear if you have 2fa login enabled, however the current production build will not show this box and simply display a message saying that 2fa is not currently supported. This is due to a small amount of session handling work that the app still needs.

Better menu & Logout

As development on the app was fairly non existent between mid 2013 and 2016 the UI generally fell behind. This is visible in forms, buttons as well as app layout.

One significant push was made to drop the old style ‘burger’ menu from the top right of the app and replace it with a new slide out menu draw including a feature image and icons for menu items.

Uploaded images display limit

Some users have run into issues with the number of upload contributions that the app loads by default in the contributions activity. The default has always been 500 and this can cause memory exhaustion / OOM and a crash on some memory limited phones.

In an attempt to fix and generally speed up the app a recent upload limit has been added to the settings which will limit the number images and image details that are displayed, however the app will still fetch and store more than this on the device.

Nearby places enhancements

The nearby places enhancements probably account for the largest portion of development time at the pre hackathon. The app has always had a list of nearby places that don’t have images on commons but now the app also has a map!

The map is powered by the mapbox SDK and the current beta uses the mapbox tiles however part of the plan for the Vienna hackathon is to switch this to using the wikimedia hosted map tiles at

The map also contains clickable pins that provide a small pop up pulling information from Wikidata including the label and description of the item as well as providing two buttons to get directions to the place or read the Wikipedia article.

Image info coordinates & image date

Extra information has also been added to the image details view and the image date and coordinates of the image can now be seen in the app.

Summary of hackathon activity

The contributions and authors that worked on the app during the pre hackathon can be found on Github at the following link.

Roughly 66 commits were made between the 11th and 19th of May 2017 by 9 contributors.

Screenshot Gallery

A New Maintainer Appears!

Published 18 May 2017 by Beau Simensen in Sculpin's Blog.

Effective immediately, I have handed over full ownership of the Sculpin project to Chris Tankersley. Until otherwise specified, the rest of the Sculpin Organization will remain intact.

This was not an easy decision for me to make. I've been thinking about it for a few years now. It isn't fair to the community for my continued lack of time and energy to hold Sculpin back from moving forward.

The hardest thing for me is that Sculpin, as it stands right now, works great for me. I maintain dozens of Sculpin sites and they've all worked great for the last two to three years.

There are things I'd love to change but it has become clear to me that I neither have the time nor energy to make it happen.

Thanks for your support over the years. I'm sure Chris and the team will treat you better than I have over the last few.



I am honored that Beau is allowing me to take over ownership of the Sculpin project. I have been the FIG representative for Sculpin for a few years now, making sure that Sculpin's interests are heard for new PSRs that take shape in the PHP community. Since the beginning of the year I've also represented the FIG as part of the Core Committee.

While I may not exactly be a very heavy committer, if at all to the base code so far, I have been serving on the Sculpin organization committee since 2015. I've spent much of that time extolling the virtues of Sculpin, and have helped guide what features and roadmaps we have worked on in that time.

Sculpin has been a big part of my workflow since I started working with it, and it is one of the projects near and dear to my heart. When Beau decided to step down, it was not a hard decision to step up and help keep this project going. Sculpin is a stable, dependable static site builder, and I would hate to see it go away.

I plan on coming up with some exciting new features for Sculpin in addition to updating the codebase. I hope you all come along for the ride.

Thank you Beau. For Sculpin, and letting me help keep it alive.


SVN Status: Seems to be Okay

Published 16 May 2017 by Ipstenu (Mika Epstein) in Make WordPress Plugins.

I know Dion mentioned it in a comment, but here’s the official… We think it’s okay now post (I delayed to be more sure).

The SVN sync stuff SEEMS to be okay. The main issues appear to be sorted out, so 🤞🏾

We’re keeping a close eye on it, but please do remember to be nice to our poor system 🙂


Privacy Awareness Week 2017

Published 16 May 2017 by David Gurvich in FastMail Blog.

We are excited to announce that FastMail is a partner in this year’s Privacy Awareness Week (PAW) — the largest campaign in the Asia Pacific that raises awareness on privacy issues and how personal information can be better protected.

The campaign runs from 15 May to 19 May and this year’s theme is ‘trust and transparency’, to highlight how clear privacy practices build trust between individuals and organisations.

At FastMail, we have a great responsibility to keep your email secure. We continually review our code and processes for potential vulnerabilities and we take new measures wherever possible to further secure your data.

In our recent blog post on FastMail’s Values we made it clear that not only does your data belong to you, but that we also strive to be good stewards of your data.

By being privacy-aware, you can make more informed decisions about managing your data. Here are a few quick best practice tips you can use with your FastMail account:

1. Protect your email, protect your identity

Passwords are like locks, and some doors are more important than others. Your email is the front door and master key to most of your online identities. If a malicious user controls your email, they can reset your passwords everywhere else (like your bank account).

The best protection? Just like in your home, it’s two sets of locks — two-step verification (also known as two-factor authentication or 2FA). It combines something you know (your password) and something you have (your phone or a security key). We make it easy to set up and use two-step verification on your FastMail account.

Not all your online accounts require two-step verification, but we recommend it for identity services (like Facebook or Twitter), financial services (your bank, your credit card company), and other services with critical data (Dropbox, your DNS provider).

2. Protect your keys

The two most common ways for an attacker to get your password are either knowing enough about a user’s personal information to guess it, or reuse of a password compromised from another site. You can protect against both of these attacks with one simple tool: a password manager. A password manager makes it easy to use a distinct password for every service. Good password managers will even generate random passwords for you, making it impossible for someone to guess.

Many browsers have a basic password manager built in. We prefer stand-alone tools like 1Password or LastPass — their syncing tools let you access your passwords on both your computer and your phone.

3. Trust, but verify

Less common than password reuse or guessable passwords, but a growing problem, is phishing. Phishing is a targeted attack, where a malicious user claims to be a trusted contact (FastMail, your bank, a loved one) to get you to provide your password or other personal information.

When you receive an email from the FastMail team, it will always have our security check mark. (Want to know what the security check mark looks like?) If you want to be sure you're on the proper FastMail website, look for the green padlock badge in the URL bar.

For any service, when in doubt, do not click on the links in a message – go directly to their website instead. If it's urgent enough for a company to email you, you should expect to see an alert on your account, too.

Follow these simple tips, and better protect your privacy online.

Visit the PAW 2017 website to find out more and be sure to join the conversation on social media with #2017PAW, and help raise privacy awareness.

SVN Syncing Issues Continued

Published 14 May 2017 by Ipstenu (Mika Epstein) in Make WordPress Plugins.

tl;dr – Yes we know, yes we’re working on it, no you don’t need to email.

I’m really sorry about this issue, but right now literally all I know is that the tool we use to automagically schedule everything that happens after you use SVN to bump your plugins is acting like a truculent child. It’s slow, it’s dragging it’s feet, and it’s taking WAY more than six hours (which is usually the outside norm for this stuff) to finish, if it does at all. It took 36 hours for one plugin, and even then some people got weird results.

And no, we don’t really know why yet.

It’s possible this is related to the new directory. It’s possible it’s from the entire .org slowdown last week or maybe it’s because we released the Beta and everything is slow from that. We literally don’t know.

I apologize for a series of very curt emails, but with the volume of people complaining, we had to resort to an auto-reply of, basically, we know, please be patient. If I have anything else to tell you, I will post, but right now we don’t know why and we can’t magically tell you what we don’t know, so please be patient with us.

Also no, there’s not a ticket because this is actually outside the meta repository. That means it’s not open source, the part that’s busted. The people with the access are aware and yes, I’ve pushed to escalate this. But it’s the weekend and it’s Mothers’ Day in the US, so you’re just going to have to be a little extra patient.

Plasma bugfix releases, Frameworks, & selected app updates now available in backports PPA for Zesty and Xenial

Published 13 May 2017 by rikmills in Kubuntu.

Plasma Desktop 5.9.5 for Zesty 17.04, 5.8.6 for Xenial 16.04,  KDE Frameworks 5.33 and some selected application updates are now available via the Kubuntu backports PPA.

The updates include:

Kubuntu 17.04 – Zesty Zapus.

Kubuntu 16.04 – Xenial Xerus

We hope that the additional application updates for 17.04 above will soon be available for 16.04.

To update, use the Software Repository Guide to add the following repository to your software sources list:


or if already added, the updates should become available via your preferred updating method.

Instructions on how to manage PPA and more info about the Kubuntu PPAs can be found in the Repositories Documentation

The PPA can be added manually in the Konsole terminal with the command:

sudo add-apt-repository ppa:kubuntu-ppa/backports

and packages then updated with

sudo apt-get update
sudo apt-get dist-upgrade

NYI Datacentre Move

Published 13 May 2017 by Bron Gondwana in FastMail Blog.

You're reading this version of this blog post, so we succeeded. Our primary datacentre moved and you didn't even notice!

Over the past week, we have moved all of the FastMail, Pobox and Listbox hardware from New York Internet's Manhattan datacentre to their Bridgewater, New Jersey location. The new location gives us more space to expand while keeping the same great service we have always received from NYI's network and remote hands teams.

The wide open plains of New Jersey

Pre-move view

To prepare for this move, we performed numerous "fire drills" over the last 3 weeks. We shut down half the FastMail infrastructure at a time during the Australian day, to make sure nobody noticed and that all the hardware would come back up cleanly.

Our design goal is to have sufficient redundancy that we can run on half capacity - comfortably during non-peak times - with a little slowdown during peak times. This is due to our commitment to high availability. The fire drills gave us a high level of confidence that our systems were still meeting this goal in practice.

In 2014 I spent a week in New York and moved all the servers to a new set of racks. In the process we reconfigured that redundancy such that we could take down entire racks at a time if we ever had to - either for this type of move within NYI or even to move to a new provider! It's part of our regular contingency planning, and has been very valuable for this week's work.

We had migrated the Pobox/Listbox hardware from Philadelphia up to New York over a few batches in the last 12 months. While not the same 50/50 plan we used for FastMail, we felt confident we could repeat the batched moves over a much shorter timeline for this move.

Those plans in hand, we are pleased to say that we moved every service, and virtually every server, in only two days!

Getting it done

We have to start by thanking NYI for their assistance and diligent preparation leading up to this move. They set up racks in the new datacentre and bridged all our networks through dark fibre between their two locations.

This week, two of our operations team flew to New York to put our plan into effect. The moves were scheduled for Monday and Tuesday nights (8th and 9th of May), starting at 6pm New York time (8am Melbourne). Rob and Jon are the operations leads for the two sets of infrastructure. They led the move on the ground, working with NYI staff and a team of movers. Back in Australia, I was one of the two operations staff monitoring the move and keeping services running smoothly.

On Tuesday during the day, we were running with half the hardware in each datacentre across the bridged network. We're now entirely in New Jersey with nothing left in Manhattan.

The moves took longer than planned, as moves always do! Missing rack rails, slightly smaller racks than expected, networks not quite coming together on the first go, etc meant Rob and Jon were up until 5am their time getting the last bits up and working. The datacentre crew at NYI-NJ were amazing as well. We were very fortunate 15 years ago when we found NYI, they really are a gem amongst datacentres! As I've said before, a lot of our reliability can be attributed to having a really good datacentre partner. With their help, we were back up and running for the US day.

But enough talking about the move, let's see some more photos!

Christmas morning, all the presents are unwrapped

What Now?

Packed for transit

Packed and ready to go

Colourful Pobox

Pobox is colourful

Hang on, we've seen this before

Hang on, we've seen this before

Even though I knew the plan and had confidence that we had tested each of the individual tasks required, you never know what's going to happen on the ground. (Yes, we even had a plan for what would happen if the truck crashed on either day.) So I speak for everyone when I give Rob and Jon huge high fives for pulling this off so smoothly!

Things customers may have noticed

There will always be a few hitches with a massively coordinated operation. Issues we dealt with in the process:

  1. A handful of FastMail App users were using the QA server, which was offline for about 8 hours on Monday night. Likewise the beta server was offline for about 8 hours on Tuesday night.

  2. Pobox and Listbox new logins broke on Monday night because of an undeclared dependency on the billing service, which was offline during the Monday part of the move. Once that was identified as the cause, the quickest fix was to push forwards and bring up the billing service again in New Jersey.

  3. A bug in Pobox service provisioning cropped up, unrelated to the move. But, because other services were intentionally offline, the bad behavior persisted long enough to cause Pobox DNS to break for 30-40 minutes. During that time, some Pobox users could not send mail, and others reported bouncing messages from their correspondents. As continuous delivery of mail is always our highest goal, we deeply apologize to anyone affected by this issue.

Thank you to everyone who did notice issues for your support and patience. We know how important your email is to you, so your kind comments make us very happy.

AtoM Camp take aways

Published 12 May 2017 by Jenny Mitcham in Digital Archiving at the University of York.

The view from the window at AtoM Camp ...not that there was
any time to gaze out of the window of course...
I’ve spent the last three days in Cambridge at AtoM Camp. This was the second ever AtoM Camp, and the first in Europe. A big thanks to St John’s College for hosting it and to Artefactual Systems for putting it on.

It really has been an interesting few days, with a packed programme and an engaged group of attendees from across Europe and beyond bringing different levels of experience with AtoM.

As a ‘camp counsellor’ I was able to take to the floor at regular intervals to share some of our experiences of implementing AtoM at the Borthwick, covering topics such as system selection, querying the MySQL database, building the community and overcoming implementation challenges.

However, I was also there to learn!

Here are some bits and pieces that I’ve taken away.

My first real take away is that I now have a working copy of the soon to be released AtoM 2.4 on my Macbook - this is really quite cool. I'll never again be bored on a train - I can just fire up Ubuntu and have a play!

Walk to Camp takes you over Cambridge's Bridge of Sighs
During the camp it was great to be able to hear about some of the new features that will be available in this latest release.

At the Borthwick Institute our catalogue is still running on AtoM 2.2 so we are pretty excited about moving to 2.4 and being able to take advantage of all of this new functionality.

Just some of the new features I learnt about that I can see an immediate use case are:

On day two of camp I enjoyed the implementation tours, seeing how other institutions have implemented AtoM and the tweaks and modifications they have made. For example it was interesting to see the shopping cart feature developed for the Mennonite Archival Image Database and most popular image carousel feature on front page of the Chinese Canadian Artifacts Project. I was also interested in some of the modifications the National Library of Wales have made to meet their own needs.

It was also nice to hear the Borthwick Catalogue described  by Dan as “elegant”!

There was a great session on community and governance at the end of day two which was one of the highlights of the camp for me. It gave attendees the chance to really understand the business model of Artefactual (as well as alternatives to the bounty model in use by other open source projects). We also got a full history of the evolution of AtoM and saw the very first project logo and vision.

The AtoM vision hasn't changed too much but the name and logo have!

Dan Gillean from Artefactual articulated the problem of trying to get funding for essential and ongoing tasks, such as code modernisation. Two examples he used were updating AtoM to work with the latest version of Symfony and Elasticsearch - both of these tasks need to happen in order to keep AtoM moving in the right direction but both require a substantial amount of work and are not likely to be picked up and funded by the community.

I was interested to see Artefactual’s vision for a new AtoM 3.0 which would see some fundamental changes to the way AtoM works and a more up-to-date, modular and scalable architecture designed to meet the future use cases of the growing AtoM community.

Artefactual's proposed modular architecture for AtoM 3.0

There is no time line for AtoM 3.0, and whether it goes ahead or not is entirely dependent on a substantial source of funding being available. It was great to see Artefactual sharing their vision and encouraging feedback from the community at this early stage though.

Another highlight of Camp:
a tour of the archives of St John's College from Tracy Deakin
A session on data migrations on day three included a demo of OpenRefine from Sara Allain from Artefactual. I’d heard of this tool before but wasn’t entirely sure what it did and whether it would be of use to me. Sara demonstrated how it could be used to bash data into shape before import into AtoM. It seemed to be capable of doing all the things that I’ve previously done in Excel (and more) ...but without so much pain. I’ll definitely be looking to try this out when I next have some data to clean up.

Dan Gillean and Pete Vox from IMAGIZ talked through the process of importing data into AtoM. Pete focused on an example from Croydon Museum Service who's data needed to be migrated from CALM. He talked through some of the challenges of the task and how he would approach this differently in future. It is clear that the complexities of data migration may be one of the biggest barriers to institutions moving to AtoM from an alternative system, but it was encouraging to hear that none of these challenges are insurmountable.

My final take away from AtoM Camp is a long list of actions - new things I have learnt that I want to read up on or try out for myself ...I best crack on!

MediaWiki Documentation Day 2017

Published 12 May 2017 by Sam Wilson in Sam's notebook.

It’s MediaWiki Documentation Day 2017!

So I’ve been documenting a couple of things, and I’ve added a bit to the Xtools manual.

The latter is actually really useful, not so much from the end-user’s point of view because I dare say they’ll never read it, but I always like writing documentation before coding. It makes the goal so much more clear in my mind, and then the coding is much easier. With agreed-upon documentation, writing tests is easier; with tests written, writing the code is easier.

Time for a beer — and I’ll drink to DFD (document first development)! Oh, and semantic linebreaks are great.

Repository Syncing Issues (Updated)

Published 11 May 2017 by Ipstenu (Mika Epstein) in Make WordPress Plugins.

Update from Otto: “The brunt of the plugin-delay problem has been solved now, so plugins should be nice and speedy again. There may be a few stragglers that got rescheduled for later due to the overload, they will clear themselves up in the next couple hours.”

Due to a network issue in the wee hours of May 11, updates to SVN are taking longer than normal to show up on the directory. The issue created a 2 hour backlog, which normally would work itself out pretty quickly. It’s currently being exacerbated by everyone who sees their code NOT showing up right away and pushing it again.

As Otto says:

In short, when it’s being slow, then it’s being slow and nothing you can do will speed it up, so just relax, go outside, walk around, and wait for it.

Do we want to make the process faster? Of course. And if you’re interested in helping that, please check out Meta ticket #1578 to understand how it was all written for the new directory.

To remind you:


Permission denied for files in www-data

Published 11 May 2017 by petergus in Newest questions tagged mediawiki - Ask Ubuntu.

I have image files being uploaded with mediawiki, and they are setting the owner as www-data. Viewing the files results in 403 forbidden. (all other site files owned by SITE_USER).

The SITE_USER and www-data are both in each others (secondary) groups.

What am I missing here?

EDIT: My Apache directives

DocumentRoot "/home/SITE_USER/public_html/"
# Alias for Wiki so images work
Alias /images "/home/SITE_USER/public_html/mediawiki/sites/images"    
<Directory "/home/SITE_USER/public_html/">
RewriteRule ^(.*)$ %{DOCUMENT_ROOT}//index.php [L]
# Enable the rewrite engine
RewriteEngine On
# Short url for wiki pages
RewriteRule ^/?wiki(/.*)?$ %{DOCUMENT_ROOT}/index.php [L]
# Redirect / to Main Page
RewriteRule ^/*$ %{DOCUMENT_ROOT}/index.php [L]
Options -Indexes +SymLinksIfOwnerMatch
allow from all
AllowOverride All Options=ExecCGI,Includes,IncludesNOEXEC,Indexes,MultiViews,SymLinksIfOwnerMatch
Require all granted

Maintenance report of April 28th 2017

Published 11 May 2017 by Pierrick Le Gall in The Blog. clients have already received this message. Many users told us they were happy to receive such details about our technical operations so but let’s make it more “public” with a blog post!

A. The short version

On April 27th 2017, we replaced one of main servers. The replacement itself was successful. No downtime. The read-only mode has lasted only 7 minutes, from 6:00 to 6:07 UTC.

While sending the notification email to our clients, we encountered difficulties with Gmail users. Solving this Gmail issue made the website unavailable for a few users and maybe an hour. Everything was back to normal in a few hours. Of course, no data has been lost during this operation.

The new server and Piwigo are now good friends. They both look forward to receive version 2.9 in the next days 😉

B. Additional technical details

The notification message had already been sent to the first 390 users when we realized emails sent to Gmail addresses were returned in error. Indeed Gmail now asks for a “reverse DNS IPv6”. Sorry for this very technical detail. We already had it on the old server so we added it on the new server. And then start the problems… Unfortunately the new server does not manage IPv6 the same way. A few users, on IPv6, told us they only see “Apache2 Debian Default Page” instead of their Piwigo. Here is the timeline:

Unfortunately adding or removing an IPv6 is not an immediate action. It relies on the “DNS propagation” which may take a few hours, depending on each user.

We took the rest of the day to figure out how to make Gmail accept our emails and web visitors see your Piwigo. Instead of “”, we now use a sub-domain of “” (Pigolabs is the company running service) with an IPv6 : no impact on web traffic.

We also have a technical solution to handle IPv6 for web traffic. We have decided not to use it because IPv6 lacks an important feature, the FailOver. This feature, only available on IPv4, let us redirect web traffic from one server to another in a few seconds without worrying about DNS propagation. We use it when a server fails and web traffic goes to a spare server.

In the end, the move did not go so well and we sweat quite a this friday, but everything came back to normal and the “Apache2 Debian Default Page” issue eventually affected only a few people!

DigitalOcean Monitoring: Insight Into Key Design Decisions

Published 10 May 2017 by DigitalOcean in DigitalOcean: Cloud computing designed for developers.

DigitalOcean Monitoring: Insight Into Key Design Decisions

We designed DigitalOcean Monitoring and its service alerts to provide insight into overall Droplet performance. In this post, we’ll cover key design decisions on Droplet-level Monitoring so you can better understand the choices we have made and how you can best use this service.

CPU Measurement: Use a consistent scale regardless of the number of CPUs

When a server has more than one CPU, there are two main ways to display CPU utilization in a single metric. One option is to have each CPU counted as 100% value, so that a two CPU server has a maximum capacity of 200%, while an eight CPU server has a maximum capacity of 800%. The other option is to display the total capacity as 100%, which is what you'll find with DigitalOcean Monitoring.

We used the 0-100 scale on DigitalOcean because it provides a consistent way to think about capacity. For example, when setting up Monitoring, you'll choose 70% when you want to know that 70% of the server's total CPU capacity is being used. This is regardless of the number of processors, so you'll see usage displayed on the same scale.

Notification Thresholds: Prefer certainty over early detection

Just as there are multiple ways to display CPU utilization there are multiple ways to display notification thresholds. At one end of the spectrum, administrators may wish to be notified at the very first sign of an issue. This allows for intervention at the earliest possible moment and can therefore reduce the impact of a problem. Erring on this side, however, can lead to a "server that cried wolf" situation where most notifications may not actually indicate an issue. When false alarms regularly get mixed into reports of real problems, time and attention is spent on non-issues. If this happens often enough, notifications of real emergencies may be ignored.

At the other end of the spectrum, administrators may wish to receive notifications only when there is solid indication of a real issue. Sometimes, a temporary situation may resolve itself prior to the administrator even receiving a notification. This can increase trust that the notification requires action, but it also means that users may experience disruption before the situation is brought to an administrator's attention.

We decided to address this question by creating alerts when a server is experiencing a sustained problem. To accomplish this, data is measured each minute and an average of the data points is used. For example, if a service alert is set to send email when the CPU usage is above 90% during a 5-minute interval, the average of those data points must exceed 90% before the notification is triggered. As each minute passes, the oldest datapoint of the interval is dropped, the newest data point is added, and the average is recalculated.

Notification Frequency: Prefer signal over noise

With service alerts, it is important to balance information sharing with meaningful notifications. For notifications to be useful and actionable, it is important that they do not become too prolific.

We set up DigitalOcean service alerts to send a single notification when a threshold is reached. No additional notifications are sent until the situation is resolved. For example, when the average over 5 minutes drops below 90%, a notification will be sent that the situation is resolved. This, too, is intended to avoid notification fatigue and ensure notifications are more significant.

To learn more about DigitalOcean alerts and notifications, you can get a detailed overview in An Introduction to Monitoring. To create your first alerts, see How to Set Up Service Alerts with DigitalOcean Monitoring. You might also like to explore one of our many tutorials for installing and configuring your own monitoring services.

We always welcome feedback! If there are other design decisions we have made that you would like to hear more about, let us know in the comments or open a request on our UserVoice.

Ficra and Gibson Park Freospaces gone

Published 8 May 2017 by Sam Wilson in Sam's notebook.

The Freospace blogs for Ficra and Gibson Park have gone offline. The former I guess because they’ve merged with the Fremantle Society, and maybe the latter is just not active at all? Would have been nice to at least put a notice up on their sites explaining what’s going on.

Anyway, I’ve removed their feeds from Planet Freo.

Server Load Woes

Published 8 May 2017 by Ipstenu (Mika Epstein) in Make WordPress Plugins.

If you got an error like this when using SVN today, it’s not just you:

Committing transaction...:   Error: Commit failed (details follow):  Error: Error running context: The server unexpectedly closed the connection.  Completed!:


Committing transaction...:   Error: Commit failed (details follow):  Error: Failed to start '/home/svn/repos/wp-plugins/hooks/pre-commit' hook  

There was a high server load on on Monday May 8th, which caused the system to have those errors. When it happens to you, just go make a coffee or bake a pie and come back in a bit.

#downtime, #server-load

At the J Shed

Published 7 May 2017 by Dave Robertson in Dave Robertson.

We can’t wait to play here again soon… in June… stay tuned! Photo by Alex Chapman


composer - Semantic MediaWiki require onoi/callback-container, but it can't be installed

Published 5 May 2017 by Сергей Румянцев in Newest questions tagged mediawiki - Server Fault.

I try to install the latest release of SemanticMediaWiki. When I run composer update, it returns the following:

> ComposerHookHandler::onPreUpdate
Loading composer repositories with package information
Updating dependencies (including require-dev)
Your requirements could not be resolved to an installable set of packages.

  Problem 1
    - mediawiki/semantic-media-wiki 2.4.x-dev requires onoi/callback-container ~1.0 -> satisfiable by onoi/callback-container[1.0.0, 1.1.0] but these conflict with your requirements or minimum-stability.
    - mediawiki/semantic-media-wiki 2.4.6 requires onoi/callback-container ~1.0 -> satisfiable by onoi/callback-container[1.0.0, 1.1.0] but these conflict with your requirements or minimum-stability.
    - mediawiki/semantic-media-wiki 2.4.5 requires onoi/callback-container ~1.0 -> satisfiable by onoi/callback-container[1.0.0, 1.1.0] but these conflict with your requirements or minimum-stability.
    - mediawiki/semantic-media-wiki 2.4.4 requires onoi/callback-container ~1.0 -> satisfiable by onoi/callback-container[1.0.0, 1.1.0] but these conflict with your requirements or minimum-stability.
    - mediawiki/semantic-media-wiki 2.4.3 requires onoi/callback-container ~1.0 -> satisfiable by onoi/callback-container[1.0.0, 1.1.0] but these conflict with your requirements or minimum-stability.
    - mediawiki/semantic-media-wiki 2.4.2 requires onoi/callback-container ~1.0 -> satisfiable by onoi/callback-container[1.0.0, 1.1.0] but these conflict with your requirements or minimum-stability.
    - mediawiki/semantic-media-wiki 2.4.1 requires onoi/callback-container ~1.0 -> satisfiable by onoi/callback-container[1.0.0, 1.1.0] but these conflict with your requirements or minimum-stability.
    - Installation request for mediawiki/semantic-media-wiki ~2.4.1 -> satisfiable by mediawiki/semantic-media-wiki[2.4.1, 2.4.2, 2.4.3, 2.4.4, 2.4.5, 2.4.6, 2.4.x-dev].

I have even set minimum-stability to dev and even prefer-stable to false. Nothing resolves.

It is not the first problem with Composer. It returned an error due to no set version in package mediawiki/core, which was required still by this SMW. But not at this time, surprise.

And Composer don't see the package in composer show onoi/callback-container. There is stable version 2.0 at all.

17.10 Wallpaper Contest! Call for artists

Published 3 May 2017 by Aaron Honeycutt in Kubuntu.

For the Artful cycle, we’re trying something new for Kubuntu: a wallpaper contest!

Any user can enter their own piece of artwork; you do not have to be a K/ubuntu member. Kubuntu members will be voting on the wallpaper entries. The top ten wallpapers will be on the Artful ISO. The Kubuntu Council will deal with any ties.

Upload your original work:

Follow the Ubuntu Free Culture Showcase examples:

Submissions should have a human-language title and the description should give the author’s name if not entered as a display name on Flickr.

License your entry using the Creative Commons Attribution-ShareAlike or Creative Commons Attribution license. As an exception, we will consider images licensed as “Public Domain” that are submitted to this contest as being under the Creative Commons Zero waiver.

Only submit your own work; no more than two entries per person.

All entries must follow the Ubuntu Code of Conduct.

Submission Deadline: June 8 2017 | 
Winners announced June 22 2017

After upgrade to 14.04 I get "You don't have permission to access /wiki/ on this server."

Published 3 May 2017 by Finn Årup Nielsen in Newest questions tagged mediawiki - Ask Ubuntu.

After dist-upgrade to 14.04 I get "You don't have permission to access /wiki/ on this server." for a MediaWiki installation with alias. /w/index.php is also failing.

So far I have seen a difference in configuration between 12.04 and 14.04 and I did

cd /etc/apache2/sites-available
sudo ln -s ../sites-available/000-default.conf .

This fixed other problems, but not the MediaWiki problem.

KDE PIM update now available for Zesty Zapus 17.04

Published 30 Apr 2017 by rikmills in Kubuntu.

As explained in our call for testing post,  we missed by a whisker getting updated PIM 16.12.3 (kontact, kmail, akregator, kgpg etc..) into Zesty for release day, and we believe it is important that our users have access to this significant update.

Therefore packages for PIM 16.12.3 release are now available in the Kubuntu backports PPAs.

While we believe these packages should be relatively issue-free, please bear in mind that they have not been tested as comprehensively as those in the main ubuntu archive.

Should any issues occur,  please provide feedback on our mailing list [1], IRC [2], file a bug against our PPA packages [3], or optionally via social media.

Reading about how to use PPA purge is also advisable.

How to install KDE PIM 16.12.3 packages for Zesty:

To better serve the varied needs of our users, these updates have been provided in 2 separate PPAs.

Which of the two you add to your system depends on your preference for receiving further updates of other backported releases of KDE software.

User case 1 –  A user who wants to update to PIM 16.12.3, who is happy to add the main backports ppa, and is happy to receive further updates/backports of other applications (plasma, KDE applications, frameworks, digikam, krita etc..) through this backports PPA.

In a console run:

sudo add-apt-repository ppa:kubuntu-ppa/backports

User case 2 – A user who generally wants to use the default Zesty archive packages, but is missing the update to PIM 16.12.3 and would like to add just this.

In a console run:

sudo add-apt-repository ppa:kubuntu-ppa/backports-pim

In both cases, users should run after adding the PPA:

sudo apt-get update
sudo apt-get dist-upgrade

to complete the upgrade.

Notes: It is expected that on most systems the upgrade will ask to remove a few old libraries and obsolete packages. This is normal and required to make way for new versions, and cases where old packages have been split into several new ones in the new KDE PIM release.

Other upgrade methods (Discover/Muon etc) could be used after adding the PPA, but to ensure packages are upgraded and replaced as intended, upgrading via the command line with the above command is the preferred option.

We hope you enjoy the update.

Kubuntu Team.

1. Kubuntu-devel mailing list:
2. Kubuntu IRC channels: #kubuntu & #kubuntu-devel on
3. Kubuntu ppa bugs:

How can we preserve Google Documents?

Published 28 Apr 2017 by Jenny Mitcham in Digital Archiving at the University of York.

Last month I asked (and tried to answer) the question How can we preserve our wiki pages?

This month I am investigating the slightly more challenging issue of how to preserve native Google Drive files, specifically documents*.


At the University of York we work a lot with Google Drive. We have the G Suite for Education (formally known as Google Apps for Education) and as part of this we have embraced Google Drive and it is now widely used across the University. For many (me included) it has become the tool of choice for creating documents, spreadsheets and presentations. The ability to share documents and directly collaborate are key.

So of course it is inevitable that at some point we will need to think about how to preserve them.

How hard can it be?

Quite hard actually.

The basic problem is that documents created in Google Drive are not really "files" at all.

The majority of the techniques and models that we use in digital preservation are based around the fact that you have a digital object that you can see in your file system, copy from place to place and package up into an Archival Information Package (AIP).

In the digital preservation community we're all pretty comfortable with that way of working.

The key challenge with stuff created in Google Drive is that it doesn't really exist as a file.

Always living in hope that someone has already solved the problem, I asked the question on Twitter and that really helped with my research.

Isn't the digital preservation community great?

Exporting Documents from Google Drive

I started off testing the different download options available within Google docs. For my tests I used 2 native Google documents. One was the working version of our Phase 1 Filling the Digital Preservation Gap report. This report was originally authored as a Google doc, was 56 pages long and consisted of text, tables, images, footnotes, links, formatted text, page numbers, colours etc (ie: lots of significant properties I could assess). I also used another more simple document for testing - this one was just basic text and tables but also included comments by several contributors.

I exported both of these documents into all of the different export formats that Google supports and assessed the results, looking at each characteristic of the document in turn and establishing whether or not I felt it was adequately retained.

Here is a summary of my findings, looking specifically at the Filling the Digital Preservation Gap phase 1 report document:

...but what about the comments?

My second test document was chosen so I could look specifically at the comments feature and how these were retained (or not) in the exported version.

  • docx - Comments are exported. On first inspection they appear to be anonymised, however this seems to be just how they are rendered in Microsoft Word. Having unzipped and dug into the actual docx file and looked at the XML file that holds the comments, it is clear that a more detailed level of information is retained - see images below. The placement of the comments is not always accurate. In one instance the reply to a comment is assigned to text within a subsequent row of the table rather than to the same row as the original comment.
  • odt -  Comments are included, are attributed to individuals and have a date and time. Again, matching up of comments with right section of text is not always accurate - in one instance a comment and it's reply are linked to the table cell underneath the one that they referenced in the original document.
  • rtf - Comments are included but appear to be anonymised when displayed in MS Word...I haven't dug around enough to establish whether or not this is just a rendering issue.
  • txt - Comments are retained but appear at the end of the document with a [a], [b] etc prefix - these letters appear in the main body text to show where the comments appeared. No information about who made the comment is preserved.
  • pdf - Comments not exported
  • epub - Comments not exported
  • html - Comments are present but appear at the end of the document with a code which also acts as a placeholder in the text where the comment appeared. References to the comments in the text are hyperlinks which take you to the right comment at the bottom of the document. There is no indication of who made the comment (not even hidden within the html tags).

A comment in original Google doc

The same comment in docx as rendered by MS Word

...but in the XML buried deep within the docx file structure - we do have attribution and date/time
(though clearly in a different time zone)

What about bulk export options?

Ed Pinsent pointed me to the Google Takeout Service which allows you to:
"Create an archive with your data from Google products"
[Google's words not mine - and perhaps this is a good time to point you to Ed's blog post on the meaning of the term 'Archive']

This is really useful. It allows you to download Google Drive files in bulk and to select which formats you want to export them as.

I tested this a couple of times and was surprised to discover that if you select pdf or docx (and perhaps other formats that I didn't test) as your export format of choice, the takeout service creates the file in the format requested and an html file which includes all comments within the document (even those that have been resolved). The content of the comments/responses including dates and times is all included within the html file, as are names of individuals.

The downside of the Google Takeout Service is that it only allows you to select folders and not individual files. There is another incentive for us to organise our files better! The other issue is that it will only export documents that you are the owner of - and you may not own everything that you want to archive!

What's missing?

Quite a lot actually.

The owner, creation and last modified dates of a document in Google Drive are visible when you click on Document details... within the File menu. Obviously this is really useful information for the archive but is lost as soon as you download it into one of the available export formats.

Creation and last modified dates as visible in Document details

Update: I was pleased to see that if using the Google Takeout Service to bulk export files from Drive, the last modified dates are retained, however on single file export/download these dates are lost and the last modified date of the file becomes the date that you carried out the export. 

Part of the revision history of my Google doc
But of course in a Google document there is more metadata. Similar to the 'Page History' that I mentioned when talking about preserving wiki pages, a Google document has a 'Revision history'

Again, this *could* be useful to the archive. Perhaps not so much so for my document which I worked on by myself in March, but I could see more of a use case for mapping and recording the creative process of writing a novel for example. 

Having this revision history would also allow you to do some pretty cool stuff such as that described in this blog post: How I reverse engineered Google Docs to play back any documents Keystrokes (thanks to Nick Krabbenhoft for the link).

It would seem that the only obvious way to retain this information would be to keep the documents in their original native Google format within Google Drive but how much confidence do we have that it will be safe there for the long term?


If you want to preserve a Google Drive document there are several options but no one-size-fits-all solution.

As always it boils down to what the significant properties of the document are. What is it we are actually trying to preserve?

  • If we want a fairly accurate but non interactive digital 'print' of the document, pdf might be the most accurate representation though even the pdf export can't be relied on to retain the exact pagination. Note that I didn't try and validate the pdf files that I exported and sadly there is no pdf/a export option.
  • If comments are seen to be a key feature of the document then docx or odt will be a good option but again this is not perfect. With the test document I used, comments were not always linked to the correct point within the document.
  • If it is possible to get the owner of the files to export them, the Google Takeout Service could be used. Perhaps creating a pdf version of the static document along with a separate html file to capture the comments.

A key point to note is that all export options are imperfect so it would be important to check the exported document against the original to ensure it accurately retains the important features.

Another option would be simply keeping them in their native format but trying to get some level of control over them - taking ownership and managing sharing and edit permissions so that they can't be changed. I've been speaking to one of our Google Drive experts in IT about the logistics of this. A Google Team Drive belonging to the Archives could be used to temporarily store and lock down Google documents of archival value whilst we wait and see what happens next. 

...I live in hope that export options will improve in the future.

This is a work in progress and I'd love to find out what others think.

* note, I've also been looking at Google Sheets and that may be the subject of another blog post

Legal considerations regarding hosting a MediaWiki site

Published 27 Apr 2017 by Oliver K in Newest questions tagged mediawiki - Webmasters Stack Exchange.

What legal considerations are there when creating a wiki using MediaWiki for people to use worldwide?

For example, I noticed there are privacy policies & terms and conditions; are these required to safeguard me from any legal battles?

Lenovo ThinkPad Carbon X1 (gen. 5)

Published 22 Apr 2017 by Sam Wilson in Sam's notebook.

Five years, two months, and 22 days after the last time, I’m retiring my laptop and moving to a new one. This time it’s a Lenovo ThinkPad Carbon X1, fifth generation (manufactured in March this year, if the packaging is to be believed). This time, I’m not switching operating systems (although I am switching desktop’s, to KDE, because I hear Ubuntu is going all-out normal Gnome sometime soon).

So I kicked off the download of kubuntu-16.04.2-desktop-amd64.iso and while it was going started up the new machine. I jumped straight into bios to set the boot order (putting ‘Windows boot manager’ right at the bottom because it sounds like something predictably annoying), and hit ‘save’. Then I forgot what I was doing and wondered back to my other machine, leaving the new laptop to reboot and send itself into the Windows installation process. Oops.

There’s no way out! You select the language you want to use, and then are presented with the EULA—with a but ‘accept’ button, but no way to decline the bloody thing, and no way to restart the computer! Even worse, a long-press on the power button just suspended the machine, rather than force-booting it. In the end some combination of pressing on the power button while waking from suspend tricked it into dying. Then it was a simple matter of booting from a thumb drive and getting Kubuntu installed.

I got slightly confused at two points: at having to turn off UEFI (which I think is the ‘Windows boot manager’ from above?) in order to install 3rd party proprietary drivers (usually Lenovo are good at providing Linux drivers, but more on that later); and having to use LVM in order to have full-disk encryption (because I had thought that it was usually possible to encrypt without LVM, but really I don’t mind either way; there doesn’t seem to be any disadvantage to using LVM; I then of course elected to not encrypt my home directory).

So now I’m slowly getting KDE set up how I like it, and am running into various problems with the trackpoint, touchpad, and Kmail crashing. I’ll try to document the more interesting bits here, or add to the KDE UserBase wiki.

mosh, the disconnection-resistant ssh

Published 22 Apr 2017 by Carlos Fenollosa in Carlos Fenollosa — Blog.

The second post on this blog was devoted to screen and how to use it to make persistent SSH sessions.

Recently I've started using mosh, the mobile shell. It's targeted to mobile users, for example laptop users who might get short disconnections while working on a train, and it also provides a small keystroke buffer to get rid of network lag.

It really has little drawbacks and if you ever ssh to remote hosts and get annoyed because your vim sessions or tail -F windows get disconnected, give mosh a try. I strongly recommend it.

Tags: software, unix

Comments? Tweet  

KDE PIM update for Zesty available for testers

Published 20 Apr 2017 by rikmills in Kubuntu.

Since we missed by a whisker getting updated PIM (kontact, kmail, akregator, kgpg etc..) into Zesty for release day, and we believe it is important that our users have access to this significant update, packages are now available for testers in the Kubuntu backports landing ppa.

While we believe these packages should be relatively issue-free, please bear in mind that they have not been tested as comprehensively as those in the main ubuntu archive.

Testers should be prepared to troubleshoot and hopefully report issues that may occur. Please provide feedback on our mailing list [1], IRC [2], or optionally via social media.

After a period of testing and verification, we hope to move this update to the main backports ppa.

You should have some command line knowledge before testing.
Reading about how to use ppa purge is also advisable.

How to test KDE PIM 16.12.3 for Zesty:

Testing packages are currently in the Kubuntu Backports Landing PPA.

sudo add-apt-repository ppa:kubuntu-ppa/backports-landing
sudo apt-get update
sudo apt-get dist-upgrade

1. Kubuntu-devel mailing list:
2. Kubuntu IRC channels: #kubuntu & #kubuntu-devel on


Published 20 Apr 2017 by fabpot in Tags from Twig.

In conversation with the J.S. Battye Creative Fellows

Published 19 Apr 2017 by carinamm in State Library of Western Australia Blog.

How can contemporary art lead to new discoveries about collections and ways of engaging with history?  Nicola Kaye and Stephen Terry will discuss this idea drawing from the experience of creating Tableau Vivant and the Unobserved.

In conversation with the J.S. Battye Creative Fellows
Thursday 27 April, 6pm
State Library Theatre.

April 4 Tableau Vivant Image_darkened_2

Tableau Vivant and the Unobserved is the culmination of the State Library’s inaugural J.S. Battye Creative Fellowship.  The Creative Fellowship aims to enhance engagement with the Library’s heritage collections and provide new experiences for the public.

Tableau Vivant and the Unobserved
visually questions how history is made, commemorated and forgotten. Through digital art installation, Nicola Kaye and Stephen Terry expose the unobserved and manipulate our perception of the past.  Their work juxtaposes archival and contemporary imagery to create an interactive experience for the visitor where unobserved lives from the archive collide with the contemporary world. The installation is showing at the State Library until 12 May 2017.

For more information visit:

Filed under: community events, Exhibitions, Pictorial, SLWA collections, SLWA displays, SLWA Exhibitions, SLWA news, State Library of Western Australia, talks, Western Australia Tagged: contemporary art, discussion, installation, J.S. Battye Creative Fellowship, Nicola Kaye, Stephen Terry, talk


Published 17 Apr 2017 by mblaney in Tags from simplepie.

Merge pull request #510 from mblaney/master

Version bump to 1.5 due to changes to Category class.

Interview on Stepping Off: Rewilding and Belonging in the South West

Published 14 Apr 2017 by Tom Wilson in tom m wilson.

You can listen to a recent radio interview I did about my new book with Adrian Glamorgan here.

Update on the April 11th SFO2 Power Outage

Published 12 Apr 2017 by DigitalOcean in DigitalOcean: Cloud computing designed for developers.

On April 11th at 06:43 UTC, DigitalOcean's SFO2 region experienced an outage of compute and networking services. The catalyst of this incident was the failure of multiple redundant power distribution units (PDU) within the datacenter. Complications during the recovery effort prolonged the incident and caused intermittent failures of our control panel and API. We'd like to apologize, share more details about exactly what happened, and talk about how we are working to make sure it doesn't happen again.

The Incident

The initial power loss affected SFO2 including the core networking infrastructure for the region. As power and connectivity were restored, our event processing system was placed under heavy load from the backlog of in-progress events. The database backing this system was unable to support the load of the SFO2 datacenter recovery in addition to our normal operational load from other datacenters. This temporarily disabled our control panel and API. We then proceeded with recovery on multiple fronts.

Timeline of Events

06:15 UTC - A datacenter-level PDU in the building housing our SFO2 region suffered a critical failure. Hardware automatically began drawing power from a secondary PDU.

06:40 UTC - The secondary PDU also suffered a failure.

06:43 UTC - Multiple alerts indicated that SFO2 was unreachable and initial investigations were undertaken by our operations and network engineering teams.

07:00 UTC - After finding that all circuits in the region were down, we opened a ticket with the facility operator.

07:49 UTC - A DigitalOcean datacenter engineer arrived and confirmed the power outage.

08:27 UTC - The facility operations staff arrived and began restoring power to the affected racks.

09:04 UTC - Recovery commenced and both management servers and hypervisors containing customer Droplets began to come back online.

09:49 UTC - After an initial "inception problem" where portions of our compute infrastructure which were self-hosted couldn't bootstrap themselves, services began to recover.

09:53 UTC - Customer reports and alerts indicated that our control panel and API had become inaccessible. Our event processing system became overloaded attempting to process the backlog of pending events while also supporting the normal operational load of our other regions. Work commenced to slow-roll activation of services.

16:32 UTC - All services activated in SFO2 and event processing re-enabled; customers able to start deploying new Droplets. Existing Droplets not yet restarted. Work began to re-start Droplets in controlled way.

19:43 UTC - 50% of all Droplets restored.

20:15 UTC - All Droplets and services fully restored.

Future Measures

There were a number of major issues that contributed to the cause and duration of this outage and we are committed to providing you with the stable and reliable platform you require to launch, scale, and manage your applications.

During this incident, we were faced with conditions from our provider that were outside of our control. We're working to implement stronger safeguards and validation of our power management system to ensure this power failure does not reoccur.

In addition, we're conducting a review of our datacenter recovery procedures to ensure that we can move more quickly in the event that we do lose power to an entire facility.

Finally, we will be adding additional capacity to our event processing system to ensure it is able to sustain significant peaks in load, such as the one that occurred here.

In Conclusion

We wanted to share the specific details around this incident as quickly and accurately as possible to give you insight into what happened and how we handled it. We recognize this may have had a direct impact on your business and for that we are deeply sorry. We will be issuing SLA credits to affected users, which will be reflected on their May 1st invoice, and we will continue to explore better ways of mitigating future customer impacting events. The entire team at DigitalOcean thanks you for your understanding and patience.

Wikimania submisison: apt install mediawiki

Published 9 Apr 2017 by legoktm in The Lego Mirror.

I've submitted a talk to Wikimania titled apt install mediawiki. It's about getting the MediaWiki package back into Debian, and efforts to improve the overall process. If you're interested, sign up on the submissions page :)

Archivematica Camp York: Some thoughts from the lake

Published 7 Apr 2017 by Jenny Mitcham in Digital Archiving at the University of York.

Well, that was a busy week!

Yesterday was the last day of Archivematica Camp York - an event organised by Artefactual Systems and hosted here at the University of York. The camp's intention was to provide a space for anyone interested in or currently using Archivematica to come together, learn about the platform from other users, and share their experiences. I think it succeeded in this, bringing together 30+ 'campers' from across the UK, Europe and as far afield as Brazil for three days of sessions covering different aspects of Archivematica.

Our pod on the lake (definitely a lake - not a pond!)
My main goal at camp was to ensure everyone found their way to the rooms (including the lakeside pod) and that we were suitably fuelled with coffee, popcorn and cake. Alongside these vital tasks I also managed to partake in the sessions, have a play with the new version of Archivematica (1.6) and learn a lot in the process.

I can't possibly capture everything in this brief blog post so if you want to know more, have a look back at all the #AMCampYork tweets.

What I've focused on below are some of the recurring themes that came up over the three days.


Archivematica is just one part of a bigger picture for institutions that are carrying out digital preservation, so it is always very helpful to see how others are implementing it and what systems they will be integrating with. A session on workflows in which participants were invited to talk about their own implementations was really interesting. 

Other sessions  also helped highlight the variety of different configurations and workflows that are possible using Archivematica. I hadn't quite realised there were so many different ways you could carry out a transfer! 

In a session on specialised workflows, Sara Allain talked us through the different options. One workflow I hadn't been aware of before was the ability to include checksums as part of your transfer. This sounds like something I need to take advantage of when I get Archivematica into production for the Borthwick. 

Justin talking about Automation Tools
A session on Automation Tools with Justin Simpson highlighted other possibilities - using Archivematica in a more automated fashion. 

We already have some experience of using Automation Tools at York as part of the work we carried out during phase 3 of Filling the Digital Preservation Gap, however I was struck by how many different ways these can be applied. Hearing examples from other institutions and for a variety of different use cases was really helpful.


The camp included a chance to play with Archivematica version 1.6 (which was only released a couple of weeks ago) as well as an introduction to the new Appraisal and Arrangement tab.

A session in progress at Archivematica Camp York
I'd been following this project with interest so it was great to be able to finally test out the new features (including the rather pleasing pie charts showing what file formats you have in your transfer). It was clear that there were a few improvements that could be made to the tab to make it more intuitive to use and to deal with things such as the ability to edit or delete tags, but it is certainly an interesting feature and one that I would like to explore more using some real data from our digital archive.

Throughout camp there was a fair bit of discussion around digital appraisal and at what point in your workflow this would be carried out. This was of particular interest to me being a topic I had recently raised with colleagues back at base.

The Bentley Historical Library who funded the work to create the new tab within Archivematica are clearly keen to get their digital archives into Archivematica as soon as possible and then carry out the work there after transfer. The addition of this new tab now makes this workflow possible.

Kirsty Lee from the University of Edinburgh described her own pre-ingest methodology and the tools she uses to help her appraise material before transfer to Archivematica. She talked about some tools (such as TreeSize Pro) that I'm really keen to follow up on.

At the moment I'm undecided about exactly where and how this appraisal work will be carried out at York, and in particular how this will work for hybrid collections so as always it is interesting to hear from others about what works for them.

Metadata and reporting

Evelyn admitting she loves PREMIS and METS
Evelyn McLellan from Artefactual led a 'Metadata Deep Dive' on day 2 and despite the title, this was actually a pretty interesting session!

We got into the details of METS and PREMIS and how they are implemented within Archivematica. Although I generally try not to look too closely at METS and PREMIS it was good to have them demystified. On the first day through a series of exercises we had been encouraged to look at a METS file created by Archivematica ourselves and try and pick out some information from it so these sessions in combination were really useful.

Across various sessions of the camp there was also a running discussion around reporting. Given that Archivematica stores such a detailed range of metadata in the METS file, how do we actually make use of this? Being able to report on how many AIPs have been created, how many files and what size is useful. These are statistics that I currently collect (manually) on a quarterly basis and share with colleagues. Once Archivematica is in place at York, digging further into those rich METS files to find out which file formats are in the digital archive would be really helpful for preservation planning (among other things). There was discussion about whether reporting should be a feature of Archivematica or a job that should be done outside Archivematica.

In relation to the later option - I described in one session how some of our phase 2 work of Filling the Digital Preservation Gap was designed to help expose metadata from Archivematica to a third party reporting system. The Jisc Research Data Shared Service was also mentioned in this context as reporting outside of Archivematica will need to be addressed as part of this project.


As with most open source software, community is important. This was touched on throughout the camp and was the focus of the last session on the last day.

There was a discussion about the role of Artefactual Systems and the role of Archivematica users. Obviously we are all encouraged to engage and help sustain the project in whatever way we are able. This could be by sharing successes and failures (I was pleased that my blog got a mention here!), submitting code and bug reports, sponsoring new features (perhaps something listed on the development roadmap) or helping others by responding to queries on the mailing list. It doesn't matter - just get involved!

I was also able to highlight the UK Archivematica group and talk about what we do and what we get out of it. As well as encouraging new members to the group, there was also discussion about the potential for forming other regional groups like this in other countries.

Some of the Archivematica community - class of Archivematica Camp York 2017

...and finally

Another real success for us at York was having the opportunity to get technical staff at York working with Artefactual to resolve some problems we had with getting our first Archivematica implementation into production. Real progress was made and I'm hoping we can finally start using Archivematica for real at the end of next month.

So, that was Archivematica Camp!

A big thanks to all who came to York and to Artefactual for organising the programme. As promised, the sun shined and there were ducks on the lake - what more could you ask for?

Thanks to Paul Shields for the photos

Failover in local accounts

Published 7 Apr 2017 by MUY Belgium in Newest questions tagged mediawiki - Server Fault.

I would like to use mediawiki as documentation with access privileges. I use the LdapAuthentication extension (here : ) in order to get user authenticated against a LDAP.

For various reason, the authentication should continue working even if the LDAP fails.

How can I get a fail-over (for example using the passwords in the local SQL database?) which should enable the wiki to remains accessible even if infrastructure fails?

Shiny New History in China: Jianshui and Tuanshan

Published 6 Apr 2017 by Tom Wilson in tom m wilson.

  The stones in this bridge are not all in a perfect state of repair.  That’s part of its charm.  I’m just back from a couple of days down at Jianshui, a historic town a few hours south of Kunming with a large city wall and a towering city gate.  The trip has made me reflect on […]

Update on the April 5th, 2017 Outage

Published 4 Apr 2017 by DigitalOcean in DigitalOcean: Cloud computing designed for developers.

Today, DigitalOcean's control panel and API were unavailable for a period of four hours and fifty-six minutes. During this time, all running Droplets continued to function, but no additional Droplets or other resources could be created or managed. We know that you depend on our services, and an outage like this is unacceptable. We would like to apologize and take full responsibility for the situation. The trust you've placed in us is our most important asset, so we'd like to share all of the details about this event.

At 10:24 AM EDT on April 5th, 2017, we began to receive alerts that our public services were not functioning. Within three minutes of the initial alerts, we discovered that our primary database had been deleted. Four minutes later we commenced the recovery process, using one of our time-delayed database replicas. Over the next four hours, we copied and restored the data to our primary and secondary replicas. The duration of the outage was due to the time it took to copy the data between the replicas and restore it into an active server.

At 3:20 PM EDT the primary database was completely restored, and no data was lost.

Timeline of Events

Future Measures

The root cause of this incident was a engineer-driven configuration error. A process performing automated testing was misconfigured using production credentials. As such, we will be drastically reducing access to the primary system for certain actions to ensure this does not happen again.

As noted above, duration of the incident was primarily influenced by the speed of our network while reloading the data into our database. While it should be a rare occurrence that this type of action would happen again, we are in the process of upgrading our network connectivity between database servers and also updating our hardware to improve the speed of recovery. We expect these improvements to be completed over the next few months.

In Conclusion

We wanted to share this information with you as soon as possible so that you can understand the nature of the outage and its impact. In the coming days, we will continue to assess further safeguards against developer error, work to improve our processes around data recovery, and explore ways to provide better real time information during future customer impacting events. We take the reliability of our service seriously and are committed to delivering a platform that you can depend on to run your mission-critical applications. The entire team at DigitalOcean thanks you for your understanding and, again, we apologize for the impact of this incident.

Introducing Monitoring: Insight into Your Infrastructure

Published 3 Apr 2017 by Ankur Jain in DigitalOcean: Cloud computing designed for developers.

Introducing Monitoring: Insight into Your Infrastructure

Over the lifecycle of your application, knowing when and why an issue in production occurs is critical. At DigitalOcean, we understand this and want to enable developers to make informed decisions about scaling their infrastructure. That's why we are excited to announce our new Monitoring service, available today for free with all Droplets. It gives you the tools to resolve issues quickly by alerting you when one occurs and giving you the information you need to understand it.

Monitoring the applications you've deployed should be as simple and intuitive as the rest of the DigitalOcean experience. Earlier this year, we released an open source agent and improved graphs that give you a better picture of the health of your Droplets. That was just the first piece of the puzzle. The agent offers greater visibility into your infrastructure, and now Monitoring will let you know when to act on that information.

Monitoring is natively integrated with the DigitalOcean platform and can be enabled at no extra cost by simply checking a box when creating your Droplets. It introduces new alerting capabilities using the metrics collected by the agent, allowing your team to receive email or Slack notifications based on the resource utilization and operational health of your Droplets.

View Graphs & Statistics

The Monitoring service exposes system metrics and provides an overview of your Droplets' health. The metrics are collected at one-minute intervals and the data is retained for a month, enabling you to view both up-to-the-minute and historical data. The improved Droplet graphs allow you to visualize how your instances are performing over time.

The following metrics are currently available:

Create Alert Policies

You can create alert policies on any of your metrics to receive notifications when the metric crosses your specified threshold. An alert policy monitors a single metric over a time period you specify. Alerts are triggered when the state is above or below your threshold for the specified time period. You can leverage DigitalOcean tags to group your Droplets based on your project or environment. Then you can apply the alert policy to specific Droplets or groups of tagged Droplets.

Alert policies can be created from the Monitoring tab in the DigitalOcean control panel:

Introducing Monitoring: Insight into Your Infrastructure

You can find more information about creating alert policies in this tutorial on the DigitalOcean Community site.

Configure Notifications

When you set up an alert policy, you will be able to choose between two notification methods:

Introducing Monitoring: Insight into Your Infrastructure

You'll receive notifications both when an alert threshold has been exceeded and when the issue has been resolved.

Getting Started

To enable Monitoring on your Droplets, you'll need to have the agent installed. On new Droplets, it's as simple as clicking the Monitoring checkbox during Droplet creation.

Introducing Monitoring: Insight into Your Infrastructure

On existing Droplets, you can install the agent by running:

curl -sSL | sh

Find more information on the agent itself in this tutorial on the DigitalOcean Community site.

Coming Soon

With the first iteration of our Monitoring service out the door, we're already working on what's next. Some features you will see soon include:

From alerting on issues to visualizing metrics, we want to provide you with the tools you need to monitor the health and performance of your applications in production. We'd love to hear your feedback. What metrics are important for your team? How can we help integrate Monitoring into your workflow? Let us know in the comments or submit a suggestion on our UserVoice page.

Tableau Vivant and the Unobserved

Published 30 Mar 2017 by carinamm in State Library of Western Australia Blog.

April 4 Tableau Vivant Image_darkened_2.jpg

Still scene: Tableau Vivant and the Unobserved, 2016, Nicola Kaye, Stephen Terry.

Tableau Vivant and the Unobserved visually questions how history is made, commemorated and forgotten. Through digital art installation, Nicola Kaye and Stephen Terry expose the unobserved and manipulate our perception of the past.  Their work juxtaposes archival and contemporary imagery to create an experience for the visitor where unobserved lives from the archive collide with the contemporary world.

Tableau Vivant and the Unobserved is the culmination of the State Library’s inaugural J.S. Battye Creative Fellowship.  The Creative Fellowship aims to enhance engagement with the Library’s heritage collections and provide new experiences for the public.

Artists floor talk
Thursday 6 April, 6pm
Ground Floor Gallery, State Library of Western Australia.

Nicola Kaye and Stephen Terry walk you through Tableau Vivant and the Unobserved

In conversation with the J.S. Battye Creative Fellows
Thursday 27 April, 6pm
State Library Theatre.

How can contemporary art lead to new discoveries about collections and ways of engaging with history?  Nicola Kaye and Stephen Terry will discuss this idea drawing from the experience of creating Tableau Vivant and the Unobserved.

Tableau Vivant and the Unobserved is showing at the State Library from 4 April – 12 May 2017.
For more information visit:

Filed under: community events, Exhibitions, SLWA collections, SLWA displays, SLWA events, SLWA Exhibitions, SLWA news, State Library of Western Australia, talks, WA history, Western Australia Tagged: exhibitions, installation art, J.S. Battye Creative Fellowship, Nicola Kaye, Perth, Perth Cultural Centre, State Library of Western Australai, Stephen Terry, Tableau Vivant and the Unobserved

Faster Files Forever

Published 29 Mar 2017 by Nicola Nye in FastMail Blog.

You know FastMail provides the best in email but did you also know as part of your account you have file storage? All of our plans includes bonus storage for files in addition to your email account.

Today we are proud to reveal our improved file storage screens.

The cluttered, slow static screens have been replaced with a shiny, fast, responsive interface. The new three panel view lets you view and edit file and folder details dynamically, including image previews.

Screenshot of new three panel interface

Easily upload your files and folders with drag and drop on the web interface. (Note: Safari does not support folder drag and drop) Our upload manager allows you to track your upload progress and cancel individual uploads, or even the whole batch at once.

Upload manager shows progress of file uploads

Files support works just as well on our mobile apps as it does on the web interface (whether you're on mobile or desktop). You can even manage your files via FTP or WebDAV.

No screen refreshes required when files are uploaded from mobile or shared in a multi-user account: see them instantly on all FastMail clients.

Easily locate files and folders with our powerful search tool.

Once your files are uploaded, you can quickly attach them to emails.

Select file from storage to attach to email

You can even host simple websites and photo galleries from your file storage, and share files with other users in your account.

File quota limits apply, but they are independent of your mail quota: filling up your file storage won't affect delivery of your email!

Read the online help for full details of the improved Files feature.

Remembering Another China in Kunming

Published 29 Mar 2017 by Tom Wilson in tom m wilson.

Last weekend I headed out for a rock climbing session with some locals and expats.  First I had to cross town, and while doing so I came across an old man doing water calligraphy by Green Lake.  I love the transience of this art: the beginning of the poem is starting to fade by the time he reaches […]

Week #11: Raided yet again

Published 27 Mar 2017 by legoktm in The Lego Mirror.

If you missed the news, the Raiders are moving to Las Vegas. The Black Hole is leaving Oakland (again) for a newer, nicer, stadium in the desert. But let's talk about how we got here, and how different this is from the moving of the San Diego Chargers to Los Angeles.

The current Raiders stadium is outdated and old. It needs renovating to keep up with other modern stadiums in the NFL. Owner Mark Davis isn't a multi-billionaire that could finance such a stadium. And the City of Oakland is definitely not paying for it. So the options left are find outside financing for Oakland, for find said financing somewhere else. And unfortunately it was the latter option that won out in the end.

I think it's unsurprising that more and more cities are refusing to put public money into stadiums that they will see no profit from - it makes no sense whatsoever.

Overall I think the Raider Nation will adapt and survive just as it did when they moved to Los Angeles. The Raiders still have an awkward two-to-three years left in Oakland, and with Derek Carr at the helm, it looks like they will be good ones.

Week #10: March Sadness

Published 23 Mar 2017 by legoktm in The Lego Mirror.

In California March Madness is really...March Sadness. The only Californian team that is still in is UCLA. UC Davis made it in but was quickly eliminated. USC and Saint Mary's both fell in the second round. Cal and Stanford didn't even make it in. At best we can root for Gonzaga, but that's barely it.

Some of us root for school's we went to, but for those of us who grew up here and support local teams, we're left hanging. And it's not bias in the selection commitee, those schools just aren't good enough.

On top of that we have a top notch professional team through the Warriors, but our amateur players just aren't up to muster.

So good luck to UCLA, represent California hella well. We somewhat believe in you.

Week #9: The jersey returns

Published 23 Mar 2017 by legoktm in The Lego Mirror.

And so it has been found. Tom Brady's jersey was in Mexico the whole time, stolen by a member of the press. And while it's great news for Brady, sports memorabilia fans, and the FBI, it doesn't look good for journalists. Journalists are given a lot of access to players, allowing them to obtain better content and get better interviews. It would not be surprising if the NFL responds to this incident by locking down the access that journalists are given. And that would be real bummer.

I'm hoping this is seen as an isolated incident and all journalists are not punished for the offenses by one. Enterprise plans, now official!

Published 23 Mar 2017 by Pierrick Le Gall in The Blog.

In the shadow of the standard plan for several years and yet already adopted by more than 50 organizations, it is time to officially introduce the Enterprise plans. They were designed for organizations, private or public, looking for a simple, affordable and yet complete tool to manage their collection of photos.

The main idea behind Enterprise is to democratize photo library management for organizations of all kind and size. We are not targeting fortune 500, although some of them are already clients, but fortune 5,000,000 companies! Enterprise plans can replace, at a reasonable cost, inadequate solutions relying on intranet shared folders, where photos are sometimes duplicated, deleted by mistake, without the appropriate permission system.

Introduction to Enterprise plans

Introduction to Enterprise plans

Why announcing officially these plans today? Because the current trend obviously shows us that our Enterprise plans find its market. Although semi-official, Enterprise plans represented nearly 40% of our revenue in February 2017! It is time to put these plans under the spotlights.

In practice, here is what changes with the Enterprise plans:

  1. they can be used by organizations, as opposed to the standard plan
  2. additional features, such as support for non-photo files (PDF, videos …)
  3. higher level of service (priority support, customization, presentation session)

Discover Entreprise

Please Help Us Track Down Apple II Collections

Published 20 Mar 2017 by Jason Scott in ASCII by Jason Scott.

Please spread this as far as possible – I want to reach folks who are far outside the usual channels.

The Summary: Conditions are very, very good right now for easy, top-quality, final ingestion of original commercial Apple II Software and if you know people sitting on a pile of it or even if you have a small handful of boxes, please get in touch with me to arrange the disks to be imaged. 

The rest of this entry says this in much longer, hopefully compelling fashion.

We are in a golden age for Apple II history capture.

For now, and it won’t last (because nothing lasts), an incredible amount of interest and effort and tools are all focused on acquiring Apple II software, especially educational and engineering software, and ensuring it lasts another generation and beyond.

I’d like to take advantage of that, and I’d like your help.

Here’s the secret about Apple II software: Copy Protection Works.

Copy protection, that method of messing up easy copying from floppy disks, turns out to have been very effective at doing what it is meant to do – slow down the duplication of materials so a few sales can eke by. For anything but the most compelling, most universally interesting software, copy protection did a very good job of ensuring that only the approved disks that went out the door are the remaining extant copies for a vast majority of titles.

As programmers and publishers laid logic bombs and coding traps and took the brilliance of watchmakers and used it to design alternative operating systems, they did so to ensure people wouldn’t take the time to actually make the effort to capture every single bit off the drive and do the intense and exacting work to make it easy to spread in a reproducible fashion.

They were right.

So, obviously it wasn’t 100% effective at stopping people from making copies of programs, or so many people who used the Apple II wouldn’t remember the games they played at school or at user-groups or downloaded from AE Lines and BBSes, with pirate group greetings and modified graphics.

What happened is that pirates and crackers did what was needed to break enough of the protection on high-demand programs (games, productivity) to make them work. They used special hardware modifications to “snapshot” memory and pull out a program. They traced the booting of the program by stepping through its code and then snipped out the clever tripwires that freaked out if something wasn’t right. They tied it up into a bow so that instead of a horrendous 140 kilobyte floppy, you could have a small 15 or 20 kilobyte program instead. They even put multiple cracked programs together on one disk so you could get a bunch of cool programs at once.

I have an entire section of TEXTFILES.COM dedicated to this art and craft.

And one could definitely argue that the programs (at least the popular ones) were “saved”. They persisted, they spread, they still exist in various forms.

And oh, the crack screens!

I love the crack screens, and put up a massive pile of them here. Let’s be clear about that – they’re a wonderful, special thing and the amount of love and effort that went into them (especially on the Commodore 64 platform) drove an art form (demoscene) that I really love and which still thrives to this day.

But these aren’t the original programs and disks, and in some cases, not the originals by a long shot. What people remember booting in the 1980s were often distant cousins to the floppies that were distributed inside the boxes, with the custom labels and the nice manuals.


On the left is the title screen for Sabotage. It’s a little clunky and weird, but it’s also something almost nobody who played Sabotage back in the day ever saw; they only saw the instructions screen on the right. The reason for this is that there were two files on the disk, one for starting the title screen and then the game, and the other was the game. Whoever cracked it long ago only did the game file, leaving the rest as one might leave the shell of a nut.

I don’t think it’s terrible these exist! They’re art and history in their own right.

However… the mistake, which I completely understand making, is to see programs and versions of old Apple II software up on the Archive and say “It’s handled, we’re done here.” You might be someone with a small stack of Apple II software, newly acquired or decades old, and think you don’t have anything to contribute.

That’d be a huge error.

It’s a bad assumption because there’s a chance the original versions of these programs, unseen since they were sold, is sitting in your hands. It’s a version different than the one everyone thinks is “the” version. It’s precious, it’s rare, and it’s facing the darkness.

There is incredibly good news, however.

I’ve mentioned some of these folks before, but there is now a powerful allegiance of very talented developers and enthusiasts who have been pouring an enormous amount of skills into the preservation of Apple II software. You can debate if this is the best use of their (considerable) skills, but here we are.

They have been acquiring original commercial Apple II software from a variety of sources, including auctions, private collectors, and luck. They’ve been duplicating the originals on a bits level, then going in and “silent cracking” the software so that it can be played on an emulator or via the web emulation system I’ve been so hot on, and not have any change in operation, except for not failing due to copy protection.

With a “silent crack”, you don’t take the credit, you don’t make it about yourself – you just make it work, and work entirely like it did, without yanking out pieces of the code and program to make it smaller for transfer or to get rid of a section you don’t understand.

Most prominent of these is 4AM, who I have written about before. But there are others, and they’re all working together at the moment.

These folks, these modern engineering-minded crackers, are really good. Really, really good.

They’ve been developing tools from the ground up that are focused on silent cracks, of optimizing the process, of allowing dozens, sometimes hundreds of floppies to be evaluated automatically and reducing the workload. And they’re fast about it, especially when dealing with a particularly tough problem.

Take, for example, the efforts required to crack Pinball Construction Set, and marvel not just that it was done, but that a generous and open-minded article was written explaining exactly what was being done to achieve this.

This group can be handed a stack of floppies, image them, evaluate them, and find which have not yet been preserved in this fashion.

But there’s only one problem: They are starting to run out of floppies.

I should be clear that there’s plenty left in the current stack – hundreds of floppies are being processed. But I also have seen the effort chug along and we’ve been going through direct piles, then piles of friends, and then piles of friends of friends. We’ve had a few folks from outside the community bring stuff in, but those are way more scarce than they should be.

I’m working with a theory, you see.

My theory is that there are large collections of Apple II software out there. Maybe someone’s dad had a store long ago. Maybe someone took in boxes of programs over the years and they’re in the basement or attic. I think these folks are living outside the realm of the “Apple II Community” that currently exists (and which is a wonderful set of people, be clear). I’m talking about the difference between a fan club for surfboards and someone who has a massive set of surfboards because his dad used to run a shop and they’re all out in the barn.

A lot of what I do is put groups of people together and then step back to let the magic happen. This is a case where this amazingly talented group of people are currently a well-oiled machine – they help each other out, they are innovating along this line, and Apple II software is being captured in a world-class fashion, with no filtering being done because it’s some hot ware that everyone wants to play.

For example, piles and piles of educational software has returned from potential oblivion, because it’s about the preservation, not the title. Wonderfully done works are being brought back to life and are playable on the Internet Archive.

So like I said above, the message is this:

Conditions are very, very good right now for easy, top-quality, final ingestion of original commercial Apple II Software and if you know people sitting on a pile of it or even if you have a small handful of boxes, please get in touch with me to arrange the disks to be imaged.

I’ll go on podcasts or do interviews, or chat with folks on the phone, or trade lots of e-mails discussing details. This is a very special time, and I feel the moment to act is now. Alliances and communities like these do not last forever, and we’re in a peak moment of talent and technical landscape to really make a dent in what are likely acres of unpreserved titles.

It’s 4am and nearly morning for Apple II software.

It’d be nice to get it all before we wake up.


Nature in China

Published 20 Mar 2017 by Tom Wilson in tom m wilson.

The sun sets in south-east Yunnan province, over karst mountains and lakes, not far from the border with Vietnam. Last weekend I went to Puzheihei, an area of karst mountains surrounded by water lilly-filled lakes 270kms south-east of Kunming. What used to be a five hour bus journey now just takes 1.5 hours on the […]

Managing images on an open wiki platform

Published 19 Mar 2017 by Oliver K in Newest questions tagged mediawiki - Webmasters Stack Exchange.

I'm developing a wiki page using MediaWiki and there are a few ways of inplementing images into wiki pages such as uploading them on the website and uploading them on external websites it potentially banning and requesting others to place an image.

Surely images may be difficult to manage as one day someone may upload a vulgar image and many people will then see it. How can I ensure vulgar images do not get through and that administrators aren't scarred for life after monitoring them?

Does the composer software have a command like python -m compileall ./

Published 18 Mar 2017 by jehovahsays in Newest questions tagged mediawiki - Server Fault.

I want to use composer for a mediawiki root folder with multiple directories
that need composer to install their dependencies
with a command like composer -m installall ./
For example , if the root folder was all written in python
i could use the command python -m compileall ./

Hilton Harvest Earth Hour Picnic and Concert

Published 18 Mar 2017 by Dave Robertson in Dave Robertson.


Sandpapering Screenshots

Published 15 Mar 2017 by Jason Scott in ASCII by Jason Scott.

The collection I talked about yesterday was subjected to the Screen Shotgun, which does a really good job of playing the items, capturing screenshots, and uploading them into the item to allow people to easily see, visually, what they’re in for if they boot them up.

In general, the screen shotgun does the job well, but not perfectly. It doesn’t understand what it’s looking at, at all, and the method I use to decide the “canonical” screenshot is inherently shallow – I choose the largest filesize, because that tends to be the most “interesting”.

The bug in this is that if you have, say, these three screenshots:

…it’s going to choose the first one, because those middle-of-loading graphics for an animated title screen have tons of little artifacts, and the filesize is bigger. Additionally, the second is fine, but it’s not the “title”, the recognized “welcome to this program” image. So the best choice turns out to be the third.

I don’t know why I’d not done this sooner, but while waiting for 500 disks to screenshot, I finally wrote a program to show me all the screenshots taken for an item, and declare a replacement canonical title screenshot. The results have been way too much fun.

It turns out, doing this for Apple II programs in particular, where it’s removed the duplicates and is just showing you a gallery, is beautiful:

Again, the all-text “loading screen” in the middle, which is caused by blowing program data into screen memory, wins the “largest file” contest, but literally any other of the screens would be more appropriate.

This is happening all over the place: crack screens win over the actual main screen, the mid-loading noise of Apple II programs win over the final clean image, and so on.

Working with tens of thousands of software programs, primarily alone, means that I’m trying to find automation wherever I can. I can’t personally boot up each program and do the work needed to screenshot/describe it – if a machine can do anything, I’ll make the machine do it. People will come to me with fixes or changes if the results are particularly ugly, but it does leave a small amount that no amount of automation is likely to catch.

If you watch a show or documentary on factory setups and assembly lines, you’ll notice they can’t quite get rid of people along the entire line, especially the sign-off. Someone has to keep an eye to make sure it’s not going all wrong, or, even more interestingly, a table will come off the line and you see one person giving it a quick run-over with sandpaper, just to pare down the imperfections or missed spots of the machine. You still did an enormous amount of work with no human effort, but if you think that’s ready for the world with no final sign-off, you’re kidding yourself.

So while it does mean another hour or two looking at a few hundred screenshots, it’s nice to know I haven’t completely automated away the pleasure of seeing some vintage computer art, for my work, and for the joy of it.

More Ways to Work with Load Balancers

Published 15 Mar 2017 by Rafael Rosa in DigitalOcean: Cloud computing designed for developers.

More Ways to Work with Load Balancers

When building new products at DigitalOcean, one of our goals is to ensure that they're simple to use and developer friendly. And that goes beyond the control panel; we aim to provide intuitive APIs and tools for each of our products. Since the release of Load Balancers last month, we've worked to incorporate them into our API client libraries and command line client. We've also seen community-supported open source projects extended to support Load Balancers.

Today, we want to share several new ways you can interact with Load Balancers.

Command Line: doctl

doctl is our easy-to-use, official command line client. Load Balancer support landed in version v1.6.0. You can download the release from GitHub or install it using Homebrew on Mac:

brew install doctl

You can use doctl for anything you can do in our control panel. For example, here's how you would create a Load Balancer:

doctl compute load-balancer create --name "example-01" \
    --region "nyc3" --tag-name "web:prod" \
    --algorithm "round_robin" \
    --forwarding-rules \

Find doctl's full documentation in this DigitalOcean tutorial.

Go: godo

We're big fans of Go, and godo is the way to interact with DigitalOcean using Go. Load Balancer support is included in the recently tagged v1.0.0 release. Here's an example:

createRequest := &godo.LoadBalancerRequest{
    Name:      "example-01",
    Algorithm: "round_robin",
    Region:    "nyc3",
    ForwardingRules: []godo.ForwardingRule{
            EntryProtocol:  "http",
            EntryPort:      80,
            TargetProtocol: "http",
            TargetPort:     80,
    HealthCheck: &godo.HealthCheck{
        Protocol:               "http",
        Port:                   80,
        Path:                   "/",
        CheckIntervalSeconds:   10,
        ResponseTimeoutSeconds: 5,
        HealthyThreshold:       5,
        UnhealthyThreshold:     3,
    StickySessions: &godo.StickySessions{
        Type: "none",
    Tag:                 "web:prod",
    RedirectHttpToHttps: false,

lb, _, err := client.LoadBalancers.Create(ctx, createRequest)

The library's full documentation is available on GoDoc.

Ruby: droplet_kit

droplet_kit is our Ruby API client library. Version 2.1.0 has Load Balancer support and is now available on Rubygems. You can install it with this command:

gem install droplet_kit

And you can create a new Load Balancer like so:

load_balancer =
  name: 'example-lb-001',
  algorithm: 'round_robin',
  tag: 'web:prod',
  redirect_http_to_https: true,
  region: 'nyc3',
  forwarding_rules: [
      entry_protocol: 'http',
      entry_port: 80,
      target_protocol: 'http',
      target_port: 80,
      certificate_id: '',
      tls_passthrough: false
    type: 'none',
    cookie_name: '',
    cookie_ttl_seconds: nil
    protocol: 'http',
    port: 80,
    path: '/',
    check_interval_seconds: 10,
    response_timeout_seconds: 5,
    healthy_threshold: 5,
    unhealthy_threshold: 3


Community Supported

Besides our official open source projects, there are two community contributions we'd like to highlight:

Thanks to our colleagues Viola and Andrew for working on these features, and the open source community for including Load Balancer support in their projects. In particular, we want to give a special shout out to Paul Stack and the rest of our friends at HashiCorp who added support to Terraform so quickly. You rock!

We're excited to see more tools add Load Balancer support. If you're the maintainer of a project that has added support, Tweet us @digitalocean. We can help spread the word!

Rafael Rosa

Product Manager, High Availability

Thoughts on a Collection: Apple II Floppies in the Realm of the Now

Published 15 Mar 2017 by Jason Scott in ASCII by Jason Scott.

I was connected with The 3D0G Knight, a long-retired Apple II pirate/collector who had built up a set of hundreds of floppy disks acquired from many different locations and friends decades ago. He generously sent me his entire collection to ingest into a more modern digital format, as well as the Internet Archive’s software archive.

The floppies came in a box without any sort of sleeves for them, with what turned out to be roughly 350 of them removed from “ammo boxes” by 3D0G from his parents’ house. The disks all had labels of some sort, and a printed index came along with it all, mapped to the unique disk ID/Numbers that had been carefully put on all of them years ago. I expect this was months of work at the time.

Each floppy is 140k of data on each side, and in this case, all the floppies had been single-sided and clipped with an additional notch with a hole punch to allow the second side to be used as well.

Even though they’re packed a little strangely, there was no damage anywhere, nothing bent or broken or ripped, and all the items were intact. It looked to be quite the bonanza of potentially new vintage software.

So, this activity at the crux of the work going on with both the older software on the Internet Archive, as well as what I’m doing with web browser emulation and increasing easy access to the works of old. The most important thing, over everything else, is to close the air gap – get the data off these disappearing floppy disks and into something online where people or scripts can benefit from them and research them. Almost everything else – scanning of cover art, ingestion of metadata, pulling together the history of a company or cross-checking what titles had which collaborators… that has nowhere near the expiration date of the magnetized coated plastic disks going under. This needs us and it needs us now.

The way that things currently work with Apple II floppies is to separate them into two classes: Disks that Just Copy, and Disks That Need A Little Love. The Little Love disks, when found, are packed up and sent off to one of my collaborators, 4AM, who has the tools and the skills to get data of particularly tenacious floppies, as well as doing “silent cracks” of commercial floppies to preserve what’s on them as best as possible.

Doing the “Disks that Just Copy” is a mite easier. I currently have an Apple II system on my desk that connects via USB-to-serial connection to my PC. There, I run a program called Apple Disk Transfer that basically turns the Apple into a Floppy Reading Machine, with pretty interface and everything.

Apple Disk Transfer (ADT) has been around a very long time and knows what it’s doing – a floppy disk with no trickery on the encoding side can be ripped out and transferred to a “.DSK” file on the PC in about 20 seconds. If there’s something wrong with the disk in terms of being an easy read, ADT is very loud about it. I can do other things while reading floppies, and I end up with a whole pile of filenames when it’s done. The workflow, in other words, isn’t so bad as long as the floppies aren’t in really bad shape. In this particular set, the floppies were in excellent shape, except when they weren’t, and the vast majority fell into the “excellent” camp.

The floppy drive that sits at the middle of this looks like some sort of nightmare, but it helps to understand that with Apple II floppy drives, you really have to have the cover removed at all time, because you will be constantly checking the read head for dust, smudges, and so on. Unscrewing the whole mess and putting it back together for looks just doesn’t scale. It’s ugly, but it works.

It took me about three days (while doing lots of other stuff) but in the end I had 714 .dsk images pulled from both sides of the floppies, which works out to 357 floppy disks successfully imaged. Another 20 or so are going to get a once over but probably are going to go into 4am’s hands to get final evaluation. (Some of them may in fact be blank, but were labelled in preparation, and so on.) 714 is a lot to get from one person!

As mentioned, an Apple II 5.25″ floppy disk image is pretty much always 140k. The names of the floppy are mine, taken off the label, or added based on glancing inside the disk image after it’s done. For a quick glance, I use either an Apple II emulator called Applewin, or the fantastically useful Apple II disk image investigator Ciderpress, which is a frankly the gold standard for what should be out there for every vintage disk/cartridge/cassette image. As might be expected, labels don’t always match contents. C’est la vie.

As for the contents of the disks themselves; this comes down to what the “standard collection” was for an Apple II user in the 1980s who wasn’t afraid to let their software library grow utilizing less than legitimate circumstances. Instead of an elegant case of shiny, professionally labelled floppy diskettes, we get a scribbled, messy, organic collection of all range of “warez” with no real theme. There’s games, of course, but there’s also productivity, utilities, artwork, and one-off collections of textfiles and documentation. Games that were “cracked” down into single-file payloads find themselves with 4-5 other unexpected housemates and sitting behind a menu. A person spending the equivalent of $50-$70 per title might be expected to have a relatively small and distinct library, but someone who is meeting up with friends or associates and duplicating floppies over a few hours will just grab bushels of strange.

The result of the first run is already up on the Archive: A 37 Megabyte .ZIP file containing all the images I pulled off the floppies. 

In terms of what will be of relevance to later historians, researchers, or collectors, that zip file is probably the best way to go – it’s not munged up with the needs of the Archive’s structure, and is just the disk images and nothing else.

This single .zip archive might be sufficient for a lot of sites (go git ‘er!) but as mentioned infinite times before, there is a very strong ethic across the Internet Archive’s software collection to make things as accessible as possible, and hence there are over nearly 500 items in the “3D0G Knight Collection” besides the “download it all” item.

The rest of this entry talks about why it’s 500 and not 714, and how it is put together, and the rest of my thoughts on this whole endeavor. If you just want to play some games online or pull a 37mb file and run, cackling happily, into the night, so be it.

The relatively small number of people who have exceedingly hard opinions on how things “should be done” in the vintage computing space will also want to join the folks who are pulling the 37mb file. Everything else done by me after the generation of the .zip file is in service of the present and near future. The items that number in the hundreds on the Archive that contain one floppy disk image and interaction with it are meant for people to find now. I want someone to have a vague memory of a game or program once interacted with, and if possible, to find it on the Archive. I also like people browsing around randomly until something catches their eye and to be able to leap into the program immediately.

To those ends, and as an exercise, I’ve acquired or collaborated on scripts to do the lion’s share of analysis on software images to prep them for this living museum. These scripts get it “mostly” right, and the rough edges they bring in from running are easily smoothed over by a microscopic amount of post-processing manual attention, like running a piece of sandpaper over a machine-made joint.

Again, we started out 714 disk images. The first thing done was to run them against a script that has hash checksums for every exposed Apple II disk image on the Archive, which now number over 10,000. Doing this dropped the “uniquely new” disk images from 714 to 667.

Next, I concatenated disk images that are part of the same product into one item: if a paint program has two floppy disk images for each of the sides of its disk, those become a single item. In one or two cases, the program spans multiple floppies, so 4-8 (and in one case, 14!) floppy images become a single item. Doing this dropped the total from 667 to 495 unique items. That’s why the number is significantly smaller than the original total.

Let’s talk for a moment about this.

Using hashes and comparing them is the roughest of rough approaches to de-duplicating software items. I do it with Apple II images because they tend to be self contained (a single .dsk file) and because Apple II software has a lot of people involved in it. I’m not alone by any means in acquiring these materials and I’m certainly not alone in terms of work being done to track down all the unique variations and most obscure and nearly lost packages written for this platform. If I was the only person in the world (or one of a tiny sliver) working on this I might be super careful with each and every item to catalog it – but I’m absolutely not; I count at least a half-dozen operations involving in Apple II floppy image ingestion.

And as a bonus, it’s a really nice platform. When someone puts their heart into an Apple II program, it rewards them and the end user as well – the graphics can be charming, the program flow intuitive, and the whole package just gleams on the screen. It’s rewarding to work with this corpus, so I’m using it as a test bed for all these methods, including using hashes.

But hash checksums are seriously not the be-all for this work. Anything can make a hash different – an added file, a modified bit, or a compilation of already-on-the-archive-in-a-hundred-places files that just happen to be grouped up slightly different than others. That said, it’s not overwhelming – you can read about what’s on a floppy and decide what you want pretty quickly; gigabytes will not be lost and the work to track down every single unique file has potential but isn’t necessary yet.

(For the people who care, the Internet Archive generates three different hashes (md5, crc32, sha1) and lists the size of the file – looking across all of those for comparison is pretty good for ensuring you probably have something new and unique.)

Once the items are up there, the Screen Shotgun whips into action. It plays the programs in the emulator, takes screenshots, leafs off the unique ones, and then assembles it all into a nice package. Again, not perfect but left alone, it does the work with no human intervention and gets things generally right. If you see a screenshot in this collection, a robot did it and I had nothing to do with it.

This leads, of course, to scaring out which programs are a tad not-bootable, and by that I mean that they boot up in the emulator and the emulator sees them and all, but the result is not that satisfying:

On a pure accuracy level, this is doing exactly what it’s supposed to – the disk wasn’t ever a properly packaged, self-contained item, and it needs a boot disk to go in the machine first before you swap the floppy. I intend to work with volunteers to help with this problem, but here is where it stands.

The solution in the meantime is a java program modified by Kevin Savetz, which analyzes the floppy disk image and prints all the disk information it can find, including the contents of BASIC programs and textfiles. Here’s a non-booting disk where this worked out. The result is that this all gets ingested into the search engine of the Archive, and so if you’re looking for a file within the disk images, there’s a chance you’ll be able to find it.

Once the robots have their way with all the items, I can go in and fix a few things, like screenshots that went south, or descriptions and titles that don’t reflect what actually boots up. The amount of work I, a single person, have to do is therefore reduced to something manageable.

I think this all works well enough for the contemporary vintage software researcher and end user. Perhaps that opinion is not universal.

What I can say, however, is that the core action here – of taking data away from a transient and at-risk storage medium and putting it into a slightly less transient, less at-risk storage medium – is 99% of the battle. To have the will to do it, to connect with the people who have these items around and to show them it’ll be painless for them, and to just take the time to shove floppies into a drive and read them, hundreds of times… that’s the huge mountain to climb right now. I no longer have particularly deep concerns about technology failing to work with these digital images, once they’re absorbed into the Internet. It’s this current time, out in the cold, unknown and unloved, that they’re the most at risk.

The rest, I’m going to say, is gravy.

I’ll talk more about exactly how tasty and real that gravy is in the future, but for now, please take a pleasant walk in the 3D0G Knight’s Domain.

The Followup

Published 14 Mar 2017 by Jason Scott in ASCII by Jason Scott.

Writing about my heart attack garnered some attention. I figured it was only right to fill in later details and describe what my current future plans are.

After the previous entry, I went back into the emergency room of the hospital I was treated at, twice.

The first time was because I “felt funny”; I just had no grip on “is this the new normal” and so just to understand that, I went back in and got some tests. They did an EKG, a blood test, and let me know all my stats were fine and I was healing according to schedule. That took a lot of stress away.

Two days later, I went in because I was having a marked shortness of breath, where I could not get enough oxygen in and it felt a little like I was drowning. Another round of tests, and one of the cardiologists mentioned a side effect of one of the drugs I was taking was this sort of shortness/drowning. He said it usually went away and the company claimed 5-7% of people got this side effect, but that they observed more like 10-15%. They said I could wait it out or swap drugs. I chose swap. After that, I’ve had no other episodes.

The hospital thought I should stay in Australia for 2 weeks before flying. Thanks to generosity from both MuseumNext and the ACMI, my hosts, that extra AirBnB time was basically paid for. MuseumNext also worked to help move my international flight ahead the weeks needed; a very kind gesture.

Kind gestures abounded, to be clear. My friend Rochelle extended her stay from New Zealand to stay an extra week; Rachel extended hers to match my new departure date. Folks rounded up funds and sent them along, which helped cover some additional costs. Visitors stopped by the AirBnB when I wasn’t really taking any walks outside, to provide additional social contact.

Here is what the blockage looked like, before and after. As I said, roughly a quarter of my heart wasn’t getting any significant blood and somehow I pushed through it for nearly a week. The insertion of a balloon and then a metal stent opened the artery enough for the blood flow to return. Multiple times, people made it very clear that this could have finished me off handily, and mostly luck involving how my body reacted was what kept me going and got me in under the wire.

From the responses to the first entry, it appears that a lot of people didn’t know heart attacks could be a lingering, growing issue and not just a bolt of lightning that strikes in the middle of a show or while walking down the street. If nothing else, I’m glad that it’s caused a number of people to be aware of how symptoms portray each other, as well as getting people to check up cholesterol, which I didn’t see as a huge danger compared to other factors, and which turned out to be significant indeed.

As for drugs, I’ve got a once a day waterfall of pills for blood pressure, cholesterol, heart healing, anti-clotting, and my long-handled annoyances of gout (which I’ve not had for years thanks to the pills). I’m on some of them for the next few months, some for a year, and some forever. I’ve also been informed I’m officially at risk for another heart attack, but the first heart attack was my hint in that regard.

As I healed, and understood better what was happening to me, I got better remarkably quick. There is a single tiny dot on my wrist from the operation, another tiny dot where the IV was in my arm at other times. Rachel gifted a more complicated Fitbit to replace the one I had, with the new one tracking sleep schedule and heart rate, just to keep an eye on it.

A day after landing back in the US, I saw a cardiologist at Mt. Sinai, one of the top doctors, who gave me some initial reactions to my charts and information: I’m very likely going to be fine, maybe even better than before. I need to take care of myself, and I was. If I was smoking or drinking, I’d have to stop, but since I’ve never had alcohol and I’ve never smoked, I’m already ahead of that game. I enjoy walking, a lot. I stay active. And as of getting out of the hospital, I am vegan for at least a year. Caffeine’s gone. Raw vegetables are in.

One might hesitate putting this all online, because the Internet is spectacularly talented at generating hatred and health advice. People want to help – it comes from a good place. But I’ve got a handle on it and I’m progressing well; someone hitting me up with a nanny-finger-wagging paragraph and 45 links to isn’t going to help much. But go ahead if you must.

I failed to mention it before, but when this was all going down, my crazy family of the Internet Archive jumped in, everyone from Dad Brewster through to all my brothers and sisters scrambling to find me my insurance info and what they had on their cards, as I couldn’t find mine. It was something really late when I first pinged everyone with “something is not good” and everyone has been rather spectacular over there. Then again, they tend to be spectacular, so I sort of let that slip by. Let me rectify that here.

And now, a little bit on health insurance.

I had travel insurance as part of my health insurance with the Archive. That is still being sorted out, but a large deposit had to be put on the Archive’s corporate card as a down-payment during the sorting out, another fantastic generosity, even if it’s technically a loan. I welcome the coming paperwork and nailing down of financial brass tacks for a specific reason:

I am someone who once walked into an emergency room with no insurance (back in 2010), got a blood medication IV, stayed around a few hours, and went home, generating a $20,000 medical bill in the process. It got knocked down to $9k over time, and I ended up being thrown into a low-income program they had that allowed them to write it off (I think). That bill could have destroyed me, financially. Therefore, I’m super sensitive to the costs of medical care.

In Australia, it is looking like the heart operation and the 3 day hospital stay, along with all the tests and staff and medications, are going to round out around $10,000 before the insurance comes in and knocks that down further (I hope). In the US, I can’t imagine that whole thing being less than $100,000.

The biggest culture shock for me was how little any of the medical staff, be they doctors or nurses or administrators, cared about the money. They didn’t have any real info on what things cost, because pretty much everything is free there. I’ve equating it to asking a restaurant where the best toilets to use a few hours after your meal – they might have some random ideas, but nobody’s really thinking that way. It was a huge factor in my returning to the emergency room so willingly; each visit, all-inclusive, was $250 AUD, which is even less in US dollars. $250 is something I’ll gladly pay for peace of mind, and I did, twice. The difference in the experince is remarkable. I realize this is a hot button issue now, but chalk me up as another person for whom a life-changing experience could come within a remarkably close distance of being an influence on where I might live in the future.

Dr. Sonny Palmer, who did insertion of my stent in the operating room.

I had a pile of plans and things to get done (documentaries, software, cutting down on my possessions, and so on), and I’ll be getting back to them. I don’t really have an urge to maintain some sort of health narrative on here, and I certainly am not in the mood to urge any lifestyle changes or preach a way of life to folks. I’ll answer questions if people have them from here on out, but I’d rather be known for something other than powering through a heart attack, and maybe, with some effort, I can do that.

Thanks again to everyone who has been there for me, online and off, in person and far away, over the past few weeks. I’ll try my best to live up to your hopes about what opportunities my second chance at life will give me.


On the Red Mud Trail in Yunnan

Published 13 Mar 2017 by Tom Wilson in tom m wilson.

I finally made it to downtown Kunming last weekend.  Amazingly there were still a few of the old buildings standing in the centre (although they were a tiny minority). Walking across Green Lake, a lake in downtown Kunming with various interconnected islands in its centre, I passed through a grove of bamboo trees. Old women […]

Want to learn about Archivematica whilst watching the ducks?

Published 13 Mar 2017 by Jenny Mitcham in Digital Archiving at the University of York.

We are really excited to be hosting the first European Archivematica Camp here at the University of York next month - on the 4-6th April.

Don't worry - there will be no tents or campfires...but there may be some wildlife on the lake.

The Ron Cooke Hub on a frosty morning - hoping for some warmer weather for Camp!

The event is taking place at the Ron Cooke Hub over on our Heslington East campus. If you want to visit the beautiful City of York (OK, I'm biased!) and meet other European Archivematica users (or Archivematica explorers) this event is for you. Artefactual Systems will be leading the event and the agenda is looking very full and interesting.

I'm most looking forward to learning more about the workflows that other Archivematica users have in place or are planning to implement.

One of these lakeside 'pods' will be our breakout room

There are still places left and you can register for Camp here or contact the organisers at

...and if you are not able to attend in person, do watch this blog in early April as you can guarantee I'll be blogging after the event!

Through the mirror-glass: Capture of artwork framed in glass.

Published 13 Mar 2017 by slwacns in State Library of Western Australia Blog.


State Library’s collection material that is selected for digitisation comes to the Digitisation team in a variety of forms. This blog describes capture of artwork that is framed and encased within glass.

So let’s see how the item is digitized.


Two large framed original artworks from the picture book Teacup written by Rebecca Young and illustrated by Matt Ottley posed some significant digitisation challenges.

When artwork from the Heritage collection is framed in glass, the glass acts like a mirror and without great care during the capture process, the glass can reflect whatever is in front of it, meaning that the photographer’s reflection (and the reflection of capture equipment) can obscure the artwork.

This post shows how we avoided this issue during the digitisation of two large framed paintings, Cover illustration for Teacup and also page 4-5 [PWC/255/01 ] and The way the whales called out to each other [PWC/255/09].

Though it is sometimes possible to remove the artwork from its housing, there are occasions when this is not suitable. In this example, the decision was made to not remove the artworks from behind glass as the Conservation staff assessed that it would be best if the works were not disturbed from their original housing.

PWC/255/01                                                         PWC/255/09

The most critical issue was to be in control of the light. Rearranging equipment in the workroom allowed for the artwork to face a black wall, a method used by photographers to eliminate reflections.


We used black plastic across the entrance of the workroom to eliminate all unwanted light.


The next challenge was to set up the camera. For this shoot we used our Hasselblad H3D11 (a 39 mega pixel with excellent colour fidelity).


Prior to capture, we gave the glass a good clean with an anti-static cloth. In the images below, you can clearly see the reflection caused by the mirror effect of the glass.


Since we don’t have a dedicated photographic studio we needed to be creative when introducing extra light to allow for the capture. Bouncing the light off a large white card prevented direct light from falling on the artwork and reduced a significant number of reflections. We also used a polarizing filter on the camera lens to reduce reflections even further.


Once every reflection was eliminated and the camera set square to the artwork, we could test colour balance and exposure.

In the image below, you can see that we made the camera look like ‘Ned Kelly’ to ensure any shiny metal from the camera body didn’t reflect in the glass. We used the camera’s computer controlled remote shutter function to further minimise any reflections in front of the glass.



The preservation file includes technically accurate colour and greyscale patches to allow for colour fidelity and a ruler for accurate scaling in future reproductions.


The preservation file and a cropped version for access were then ingested into the State Library’s digital repository. The repository allows for current access and future reproductions to be made.

From this post you can see the care and attention that goes into preservation digitisation, ‘Do it right, do it once’ is our motto.

Filed under: Children's Literature, Exhibitions, Illustration, Picture Books, SLWA collections, SLWA Exhibitions, State Library of Western Australia, Uncategorized, WA, Western Australia Tagged: digitisation, illustration, slwa, SLWA collections, WA, WA Author

Week #8: Warriors are on the right path

Published 12 Mar 2017 by legoktm in The Lego Mirror.

As you might have guessed due to the lack of previous coverage of the Warriors, I'm not really a basketball fan. But the Warriors are in an interesting place right now. After setting an NBA record for being the fastest team to clinch a playoff spot, Coach Kerr has started resting his starters and the Warriors have a three game losing streak. This puts the Warriors in danger of losing their first seed spot with the San Antonio Spurs only half a game behind them.

But I think the Warriors are doing the right thing. Last year the Warriors set the record for having the best regular season record in NBA history, but also became the first team in NBA history to have a 3-1 advantage in the finals and then lose.

No doubt there was immense pressure on the Warriors last year. It was just expected of them to win the championship, there really wasn't anything else.

So this year they can easily avoid a lot of that pressure by not being the best team in the NBA on paper. They shouldn't worry about being the top seed, just seed in the top four, and play your best in the playoffs. Get some rest, they have a huge advantage over every other team simply by already being in the playoffs with so many games left to play.

How can we preserve our wiki pages

Published 10 Mar 2017 by Jenny Mitcham in Digital Archiving at the University of York.

I was recently prompted by a colleague to investigate options for preserving institutional wiki pages. At the University of York we use the Confluence wiki and this is available for all staff to use for a variety of purposes. In the Archives we have our own wiki space on Confluence which we use primarily for our meeting agendas and minutes. The question asked of me was how can we best capture content on the wiki that needs to be preserved for the long term? 

Good question and just the sort of thing I like to investigate. Here are my findings...

Space export

The most sensible way to approach the transfer of a set of wiki pages to the digital archive would be to export them using the export options available within the Space Tools.

The main problem with this approach is that a user will need to have the necessary permissions on the wiki space in order to be able to use these tools ...I found that I only had the necessary permissions on those wiki spaces that I administer myself.

There are three export options as illustrated below:

Space export options - available if you have the right permissions!


Once you select HTML, there are two options - a standard export (which exports the whole space) or a custom export (which allows you to select the pages you would like included within the export).

I went for a custom export and selected just one section of meeting papers. Each wiki page is saved as an HTML file. DROID identifies these as HTML version 5. All relevant attachments are included in the download in their original format.

There are some really good things about this export option:
  • The inclusion of attachments in the export - these are often going to be as valuable to us as the wiki page content itself. Note that they were all renamed with a number that tied them to the page that they were associated with. It seemed that the original file name was however preserved in the linking wiki page text 
  • The metadata at the top of a wiki page is present in the HTML pages: ie Created by Jenny Mitcham, last modified by Jenny Mitcham on 31, Oct, 2016 - this is really important to us from an archival point of view
  • The links work - including links to the downloaded attachments, other wiki pages and external websites or Google Docs
  • The export includes an index page which can act as a table of contents for the exported files - this also includes some basic metadata about the wiki space


Again, there are two options here - either a standard export (of the whole space) or a custom export, which allows you to select whether or not you want comments to be exported and choose exactly which pages you want to export.

I tried the custom export. It seemed to work and also did export all the relevant attachments. The attachments were all renamed as '1' (with no file extension), and the wiki page content is all bundled up into one huge XML file.

On the plus side, this export option may contain more metadata than the other options (for example the page history) but it is difficult to tell as the XML file is so big and unwieldy and hard to interpret. Really it isn't designed to be usable. The main function of this export option is to move wiki pages into another instance of Confluence.


Again you have the option to export whole space or choose your pages. There are also other configurations you can make to the output but these are mostly cosmetic.

I chose the same batch of meeting papers to export as PDF and this produces a 111 page PDF document. The first page is a contents page which lists all the other pages alphabetically with hyperlinks to the right section of the document. It is hard to use the document as the wiki pages seem to run into each other without adequate spacing and because of the linear nature of a pdf document you feel drawn to read it in the order it is presented (which in this case is not a logical order for the content). Attachments are not included in the download though links to the attachments are maintained in the PDF file and they do continue to resolve to the right place on the wiki. Creation and last modified metadata is also not included in the export.

Single page export

As well as the Space Export options in Confluence there are also single page export options. These are available to anyone who can access the wiki page so may be useful if people do not have necessary permissions for a space export.

I exported a range of test pages using the 'Export to PDF' and 'Export to Word' options.

Export to PDF

The PDF files created in this manner are version 1.4. Sadly no option to export as PDF/A, but at least version 1.4 is closer to the PDF/A standard than some, so perhaps a subsequent migration to PDF/A would be successful.

Export to Word

Surprisingly the 'Word' files produced by Confluence appear not to be Word files at all!

Double click on the files in Windows Explorer and they open in Microsoft Word no problem, but DROID identifies the files as HTML (with no version number) and reports a file extension mismatch (because the files have a .doc extension).

If you view the files in a text application you can clearly see the Content-Type marked as text/html and <html> tags within the document. Quick View Plus, however views them as an Internet Mail Message with the following text displayed at the top of each page:

Subject: Exported From Confluence
1024x640 72 Print 90

All very confusing and certainly not giving me a lot of faith in this particular export format!


Both of these single page export formats do a reasonable job of retaining the basic content of the wiki pages - both versions include many of the key features I was looking for - text, images, tables, bullet points, colours. 

Where advanced formatting has been used to lay out a page using coloured boxes, the PDF version does a better job at replicating this than the 'Word' version. Whilst the PDF attempts to retain the original formatting, the 'Word' version displays the information in a much more linear fashion.

Links were also more usefully replicated in the PDF version. The absolute URL of all links, whether internal, external or to attachments was included within the PDF file so that it is possible to follow them to their original location (if you have the necessary permissions to view the pages). On the 'Word' versions, only external links worked in this way. Internal wiki links and links to attachments were exported as a relative link which become 'broken' once that page is taken out of its original context. 

The naming of the files that were produced is also worthy of comment. The 'Word' versions are given a name which mirrors the name of the page within the wiki space, but the naming of the PDF versions are much more useful, including the name of the wiki space itself, the page name and a date and timestamp showing when the page was exported.

Neither of these single page export formats retained the creation and last modified metadata for each page and this is something that it would be very helpful to retain.


So, if we want to preserve pages from our institutional wiki, what is the best approach?

The Space Export in HTML format is a clear winner. It reproduces the wiki pages in a reusable form that replicates the page content well. As HTML is essentially just ASCII text it is also a good format for long term preservation.

What impressed me about the HTML export was the fact that it retained the content, included basic creation and last modified metadata for each page and downloaded all relevant attachments, updating the links to point to these local copies.

What if someone does not have the necessary permissions to do a space export? My first suggestion would be that they ask for their permissions to be upgraded. If not, perhaps someone who does have necessary permissions could carry out the export?

If all else fails, the export of a single page using the 'Export as PDF' option could be used to provide ad hoc content for the digital archive. PDF is not the best preservation format but may be able to be converted to PDF/A. Note that any attachments would have to be exported separately and manually is this option was selected.

Final thoughts

A wiki space is a dynamic thing which can involve several different types of content - blog posts, labels/tags and comments can all be added to wiki spaces and pages. If these elements are thought to be significant then more work is required to see how they can be captured. It was apparent that comments could be captured using the HTML and XML exports and I believe blog posts can be captured individually as PDF files.

What is also available within the wiki platform itself is a very detailed Page History. Within each wiki page it is possible to view the Page History and see how a page has evolved over time - who has edited it and when those edits occurred. As far as I could see, none of the export formats included this level of information. The only exception may be the XML export but this was so difficult to view that I could not be sure either way.

So, there are limitations to all these approaches and as ever this goes back to the age old discussion about Significant Properties. What is significant about the wiki pages? What is it that we are trying to preserve? None of the export options preserve everything. All are compromises, but perhaps some are compromises we could live with.

China – Arrival in the Middle Kingdom

Published 9 Mar 2017 by Tom Wilson in tom m wilson.

I’ve arrived in Kunming, the little red dot you can see on the map above.  I’m here to teach research skills to undergraduate students at Yunnan Normal University.  As you can see, I’ve come to a point where the foothills of the Himalayas fold up into a bunch of deep creases.  Yunnan province is the area of […]

Introducing Similarity Search at Flickr

Published 7 Mar 2017 by Clayton Mellina in

At Flickr, we understand that the value in our image corpus is only unlocked when our members can find photos and photographers that inspire them, so we strive to enable the discovery and appreciation of new photos.

To further that effort, today we are introducing similarity search on Flickr. If you hover over a photo on a search result page, you will reveal a “…” button that exposes a menu that gives you the option to search for photos similar to the photo you are currently viewing.

In many ways, photo search is very different from traditional web or text search. First, the goal of web search is usually to satisfy a particular information need, while with photo search the goal is often one of discovery; as such, it should be delightful as well as functional. We have taken this to heart throughout Flickr. For instance, our color search feature, which allows filtering by color scheme, and our style filters, which allow filtering by styles such as “minimalist” or “patterns,” encourage exploration. Second, in traditional web search, the goal is usually to match documents to a set of keywords in the query. That is, the query is in the same modality—text—as the documents being searched. Photo search usually matches across modalities: text to image. Text querying is a necessary feature of a photo search engine, but, as the saying goes, a picture is worth a thousand words. And beyond saving people the effort of so much typing, many visual concepts genuinely defy accurate description. Now, we’re giving our community a way to easily explore those visual concepts with the “…” button, a feature we call the similarity pivot.

The similarity pivot is a significant addition to the Flickr experience because it offers our community an entirely new way to explore and discover the billions of incredible photos and millions of incredible photographers on Flickr. It allows people to look for images of a particular style, it gives people a view into universal behaviors, and even when it “messes up,” it can force people to look at the unexpected commonalities and oddities of our visual world with a fresh perspective.

What is “similarity”?

To understand how an experience like this is powered, we first need to understand what we mean by “similarity.” There are many ways photos can be similar to one another. Consider some examples.

It is apparent that all of these groups of photos illustrate some notion of “similarity,” but each is different. Roughly, they are: similarity of color, similarity of texture, and similarity of semantic category. And there are many others that you might imagine as well.

What notion of similarity is best suited for a site like Flickr? Ideally, we’d like to be able to capture multiple types of similarity, but we decided early on that semantic similarity—similarity based on the semantic content of the photos—was vital to facilitate discovery on Flickr. This requires a deep understanding of image content for which we employ deep neural networks.

We have been using deep neural networks at Flickr for a while for various tasks such as object recognition, NSFW prediction, and even prediction of aesthetic quality. For these tasks, we train a neural network to map the raw pixels of a photo into a set of relevant tags, as illustrated below.

Internally, the neural network accomplishes this mapping incrementally by applying a series of transformations to the image, which can be thought of as a vector of numbers corresponding to the pixel intensities. Each transformation in the series produces another vector, which is in turn the input to the next transformation, until finally we have a vector that we specifically constrain to be a list of probabilities for each class we are trying to recognize in the image. To be able to go from raw pixels to a semantic label like “hot air balloon,” the network discards lots of information about the image, including information about  appearance, such as the color of the balloon, its relative position in the sky, etc. Instead, we can extract an internal vector in the network before the final output.

For common neural network architectures, this vector—which we call a “feature vector”—has many hundreds or thousands of dimensions. We can’t necessarily say with certainty that any one of these dimensions means something in particular as we could at the final network output, whose dimensions correspond to tag probabilities. But these vectors have an important property: when you compute the Euclidean distance between these vectors, images containing similar content will tend to have feature vectors closer together than images containing dissimilar content. You can think of this as a way that the network has learned to organize information present in the image so that it can output the required class prediction. This is exactly what we are looking for: Euclidian distance in this high-dimensional feature space is a measure of semantic similarity. The graphic below illustrates this idea: points in the neighborhood around the query image are semantically similar to the query image, whereas points in neighborhoods further away are not.

This measure of similarity is not perfect and cannot capture all possible notions of similarity—it will be constrained by the particular task the network was trained to perform, i.e., scene recognition. However, it is effective for our purposes, and, importantly, it contains information beyond merely the semantic content of the image, such as appearance, composition, and texture. Most importantly, it gives us a simple algorithm for finding visually similar photos: compute the distance in the feature space of a query image to each index image and return the images with lowest distance. Of course, there is much more work to do to make this idea work for billions of images.

Large-scale approximate nearest neighbor search

With an index as large as Flickr’s, computing distances exhaustively for each query is intractable. Additionally, storing a high-dimensional floating point feature vector for each of billions of images takes a large amount of disk space and poses even more difficulty if these features need to be in memory for fast ranking. To solve these two issues, we adopt a state-of-the-art approximate nearest neighbor algorithm called Locally Optimized Product Quantization (LOPQ).

To understand LOPQ, it is useful to first look at a simple strategy. Rather than ranking all vectors in the index, we can first filter a set of good candidates and only do expensive distance computations on them. For example, we can use an algorithm like k-means to cluster our index vectors, find the cluster to which each vector is assigned, and index the corresponding cluster id for each vector. At query time, we find the cluster that the query vector is assigned to and fetch the items that belong to the same cluster from the index. We can even expand this set if we like by fetching items from the next nearest cluster.

This idea will take us far, but not far enough for a billions-scale index. For example, with 1 billion photos, we need 1 million clusters so that each cluster contains an average of 1000 photos. At query time, we will have to compute the distance from the query to each of these 1 million cluster centroids in order to find the nearest clusters. This is quite a lot. We can do better, however, if we instead split our vectors in half by dimension and cluster each half separately. In this scheme, each vector will be assigned to a pair of cluster ids, one for each half of the vector. If we choose k = 1000 to cluster both halves, we have k2= 1000 * 1000 = 1e6 possible pairs. In other words, by clustering each half separately and assigning each item a pair of cluster ids, we can get the same granularity of partitioning (1 million clusters total) with only 2 * 1000 distance computations with half the number of dimensions for a total computational savings of 1000x. Conversely, for the same computational cost, we gain a factor of k more partitions of the data space, providing a much finer-grained index.

This idea of splitting vectors into subvectors and clustering each split separately is called product quantization. When we use this idea to index a dataset it is called the inverted multi-index, and it forms the basis for fast candidate retrieval in our similarity index. Typically the distribution of points over the clusters in a multi-index will be unbalanced as compared to a standard k-means index, but this unbalance is a fair trade for the much higher resolution partitioning that it buys us. In fact, a multi-index will only be balanced across clusters if the two halves of the vectors are perfectly statistically independent. This is not the case in most real world data, but some heuristic preprocessing—like PCA-ing and permuting the dimensions so that the cumulative per-dimension variance is approximately balanced between the halves—helps in many cases. And just like the simple k-means index, there is a fast algorithm for finding a ranked list of clusters to a query if we need to expand the candidate set.

After we have a set of candidates, we must rank them. We could store the full vector in the index and use it to compute the distance for each candidate item, but this would incur a large memory overhead (for example, 256 dimensional vectors of 4 byte floats would require 1Tb for 1 billion photos) as well as a computational overhead. LOPQ solves these issues by performing another product quantization, this time on the residuals of the data. The residual of a point is the difference vector between the point and its closest cluster centroid. Given a residual vector and the cluster indexes along with the corresponding centroids, we have enough information to reproduce the original vector exactly. Instead of storing the residuals, LOPQ product quantizes the residuals, usually with a higher number of splits, and stores only the cluster indexes in the index. For example, if we split the vector into 8 splits and each split is clustered with 256 centroids, we can store the compressed vector with only 8 bytes regardless of the number of dimensions to start (though certainly a higher number of dimensions will result in higher approximation error). With this lossy representation we can produce a reconstruction of a vector from the 8 byte codes: we simply take each quantization code, look up the corresponding centroid, and concatenate these 8 centroids together to produce a reconstruction. Likewise, we can approximate the distance from the query to an index vector by computing the distance between the query and the reconstruction. We can do this computation quickly for many candidate points by computing the squared difference of each split of the query to all of the centroids for that split. After computing this table, we can compute the squared difference for an index point by looking up the precomputed squared difference for each of the 8 indexes and summing them together to get the total squared difference. This caching trick allows us to quickly rank many candidates without resorting to distance computations in the original vector space.

LOPQ adds one final detail: for each cluster in the multi-index, LOPQ fits a local rotation to the residuals of the points that fall in that cluster. This rotation is simply a PCA that aligns the major directions of variation in the data to the axes followed by a permutation to heuristically balance the variance across the splits of the product quantization. Note that this is the exact preprocessing step that is usually performed at the top-level multi-index. It tends to make the approximate distance computations more accurate by mitigating errors introduced by assuming that each split of the vector in the production quantization is statistically independent from other splits. Additionally, since a rotation is fit for each cluster, they serve to fit the local data distribution better.

Below is a diagram from the LOPQ paper that illustrates the core ideas of LOPQ. K-means (a) is very effective at allocating cluster centroids, illustrated as red points, that target the distribution of the data, but it has other drawbacks at scale as discussed earlier. In the 2d example shown, we can imagine product quantizing the space with 2 splits, each with 1 dimension. Product Quantization (b) clusters each dimension independently and cluster centroids are specified by pairs of cluster indexes, one for each split. This is effectively a grid over the space. Since the splits are treated as if they were statistically independent, we will, unfortunately, get many clusters that are “wasted” by not targeting the data distribution. We can improve on this situation by rotating the data such that the main dimensions of variation are axis-aligned. This version, called Optimized Product Quantization (c), does a better job of making sure each centroid is useful. LOPQ (d) extends this idea by first coarsely clustering the data and then doing a separate instance of OPQ for each cluster, allowing highly targeted centroids while still reaping the benefits of product quantization in terms of scalability.

LOPQ is state-of-the-art for quantization methods, and you can find more information about the algorithm, as well as benchmarks, here. Additionally, we provide an open-source implementation in Python and Spark which you can apply to your own datasets. The algorithm produces a set of cluster indexes that can be queried efficiently in an inverted index, as described. We have also explored use cases that use these indexes as a hash for fast deduplication of images and large-scale clustering. These extended use cases are studied here.


We have described our system for large-scale visual similarity search at Flickr. Techniques for producing high-quality vector representations for images with deep learning are constantly improving, enabling new ways to search and explore large multimedia collections. These techniques are being applied in other domains as well to, for example, produce vector representations for text, video, and even molecules. Large-scale approximate nearest neighbor search has importance and potential application in these domains as well as many others. Though these techniques are in their infancy, we hope similarity search provides a useful new way to appreciate the amazing collection of images at Flickr and surface photos of interest that may have previously gone undiscovered. We are excited about the future of this technology at Flickr and beyond.


Yannis Kalantidis, Huy Nguyen, Stacey Svetlichnaya, Arel Cordero. Special thanks to the rest of the Computer Vision and Machine Learning team and the Vespa search team who manages Yahoo’s internal search engine.

Thumbs.db – what are they for and why should I care?

Published 7 Mar 2017 by Jenny Mitcham in Digital Archiving at the University of York.

Recent work I’ve been doing on the digital archive has made me think a bit more about those seemingly innocuous files that Windows (XP, Vista, 7 and 8) puts into any directory that has images in – Thumbs.db.

Getting your folder options right helps!
Windows uses a file called Thumbs.db to create little thumbnail images of any images within a directory. It stores one of these files in each directory that contains images and it is amazing how quickly they proliferate. Until recently I wasn’t aware I had any in my digital archive at all. This is because although my preferences in Windows Explorer were set to display hidden files, the "Hide protected operating system files" option also needs to be disabled in order to see files such as these.

The reason I knew I had all these Thumbs.db files was through a piece of DROID analysis work published last month. Thumbs.db ranked at number 12 in my list of the most frequently occurring file formats in the digital archive. I had 210 of these files in total. I mentioned at the time that I could write a whole blog post about this, so here it is!

Do I really want these in the digital archive? In my mind, what is in the ‘original’ folders within the digital archive should be what OAIS would call the Submission Information Package (SIP). Just those files that were given to us by a donor or depositor. Not files that were created subsequently by my own operating system.

Though they are harmless enough they can be a bit irritating. Firstly, when I’m trying to run reports on the contents of the archive, the number of files for each archive is skewed by the Thumb.db files that are not really a part of the archive. Secondly, and perhaps more importantly, I was trying to create a profile of the dates of files within the digital archive (admittedly not an exact science when using last modified dates) and the span of dates for each individual archive that we hold. The presence of Thumbs.db files in each archive that contained images gave the false impression that all of the archives had had content added relatively recently, when in fact all that had happened was that a Thumbs.db file had automatically been added when I had transferred the data to the digital archive filestore. It took me a while to realise this - gah!

So, what to do? First I needed to work out how to stop them being created.

After a bit of googling I quickly established the fact that I didn’t have the necessary permissions to be able to disable this default behaviour within Windows so I called in the help of IT Services.

IT clearly thought this was a slightly unusual request, but made a change to my account which now stops these thumbnail images being created by me. Being that I am the only person who has direct access to the born digital material within the archive this should solve that problem.

Now I can systematically remove the files. This means that they won’t skew any future reports I run on numbers of files and last modified dates.

Perhaps once we get a proper digital archiving system in place here at the Borthwick we won’t need to worry about these issues as we won’t directly interact with the archive filestore? Archivematica will package up the data into an AIP and put it on the filestore for me.

However, I will say that now IT have stopped the use of Thumbs.db from my account I am starting to miss them. This setting applies to my own working filestore as well as the digital archive. It turns out that it is actually incredibly useful to be able to see thumbnails of your image files before double clicking on them! Perhaps I need to get better at practicing what I preach and make some improvements to how I name my own image files – without a preview thumbnail, an image file *really* does benefit from a descriptive filename!

As always, I'm interested to hear how other people tackle Thumbs.db and any other system files within their digital archives.

This Month’s Writer’s Block

Published 7 Mar 2017 by Dave Robertson in Dave Robertson.



Published 6 Mar 2017 by timbaker in Tim Baker.

The image on the left was taken a year ago when I had to renew my driver’s license, so I am stuck with it for the next 10 years. I don’t mind so much as it reminds me how far I’ve come. The photo on...

Week #7: 999 assists and no more kneeling

Published 4 Mar 2017 by legoktm in The Lego Mirror.

Joe Thornton is one assist away from reaching 1,000 in his career. He's a team player - the recognition of scoring a goal doesn't matter to him, he just wants his teammates to score. And his teammates want him to achieve this milestone too, as shown by Sharks passing to Thornton and him passing back instead of them going directly for the easy empty netter.

Oh, and now that the trade deadline has passed with no movement on the goalie front, it's time for In Jones We Trust:

via /u/MisterrAlex on reddit

In other news, Colin Kaepernick announced that he's going to be a free agent and opted out of the final year of his contract. But in even bigger news, he said he will stop kneeling for the national anthem. I don't know if he is doing that to make himself more marketable, but I wish he would have stood (pun intended) with his beliefs.

FastMail Customer Stories – CoinJar

Published 2 Mar 2017 by David Gurvich in FastMail Blog.

Welcome to our first Customer Story video for 2017 featuring CoinJar Co-Founder and CEO Asher Tan.

CoinJar is Australia’s largest Bitcoin exchange and wallet, and it was while participating in a startup accelerator program that Asher had the idea for creating an easier way to buy, sell and spend the digital currency Bitcoin.

“We had decided to work on some Bitcoin ideas in the consumer space, which were quite lacking at the time,” Asher says.

Participating in the startup process was instrumental in helping Asher and his Co-Founder Ryan Zhou to really hone in on what type of business they needed to build.

CoinJar launched in Melbourne in 2013 and despite experiencing rapid success, Asher is quick to point out that his is a tech business that’s still working within a very new industry.

“It’s a very new niche industry and finding what works as a business, what people want, I think is an ongoing process. You’re continually exploring, but I think that’s what makes it exciting,” Asher says.

Asher says that one of the great things about launching a startup is you can choose the tools you want. Initially starting out with another email provider, Asher and Ryan were soon underwhelmed by both the performance and cost.

“The UI was pretty slow, the package was pretty expensive as well. There was also a lack of flexibility of some of the tools we wanted to use … so we were looking for other options and FastMail came up,” Asher says.

And while most of CoinJar’s business tools are self-hosted, they decided that FastMail was going to be the best choice to meet their requirements for secure, reliable and private email hosting.

Today CoinJar has team members all around the world and uses FastMail’s calendar and timezone feature to keep everyone working together.

CoinJar continues to innovate, recently launching a debit card that allows their customers to buy groceries using Bitcoin.

We’d like to thank Asher for his time and also Ben from Benzen Video Productions for helping us to put this story together.

You can learn more about CoinJar at

Songs for the Beeliar Wetlands

Published 2 Mar 2017 by Dave Robertson in Dave Robertson.

The title track of the forthcoming Kiss List album has just been included on an awesome fundraising compilation of 17 songs by local songwriters for the Beeliar wetlands. All proceeds go to #rethinkthelink. Get it while its hot! You can purchase the whole album or just the songs you like.

Songs for the Beeliar Wetlands: Original Songs by Local Musicians (Volume 1) by Dave Robertson and The Kiss List


Stepping Off Meets the Public

Published 1 Mar 2017 by Tom Wilson in tom m wilson.

At the start of February I launched my new book, Stepping Off: Rewilding and Belonging in the South-West, at an event at Clancy’s in Fremantle.  On Tuesday evening this week I was talking about the book down at Albany Library.     As I was in the area I decided to camp for a couple of […]

Digital Deli, reading history in the present tense

Published 1 Mar 2017 by Carlos Fenollosa in Carlos Fenollosa — Blog.

Digital Deli: The Comprehensive, User Lovable Menu Of Computer Lore, Culture, Lifestyles, And Fancy is an obscure book published in 1984. I found about it after learning that the popular Steve Wozniak article titled "Homebrew and How the Apple Came to Be" belonged to a compilation of short articles.

The book

I'm amazed that this book isn't more cherished by the retrocomputing community, as it provides an incredible insight into the state of computers in 1984. We've all read books about their history, but Digital Deli provides a unique approach: it's written in present tense.

Articles are written with a candid and inspiring narrative. Micro computers were new back then, and the authors could only speculate about how they might change the world in the future.

The book is adequately structured in sections which cover topics from the origins of computing, Silicon Valley startups, and reviews of specific systems. But the most interesting part for me are not the tech articles, but rather the sociological essays.

There are texts on how families welcome computers to the home, the applications of artificial intelligence, micros on Wall Street and computers on the classroom.

How the Source works

Fortunately, a copy of the book has been preserved online, and I highly encourage you to check it out and find some copies online

Besides Woz explaining how Apple was founded, don't miss out on Paul Lutus describing how he programmed AppleWriter in a cabin in the woods, Les Solomon envisioning the "magic box" of computing, Ted Nelson on information exchange and his Project Xanadu, Nolan Bushnell on video games, Bill Gates on software usability, the origins of the Internet... the list goes on and on.

Les Solomon

If you love vintage computing you will find a fresh perspective, and if you were alive during the late 70s and early 80s you will feel a big nostalgia hit. In any case, do yourself a favor, grab a copy of this book, and keep it as a manifesto of the greatest revolution in computer history.

Tags: retro, books

Comments? Tweet  

Week #6: Barracuda win streak is great news for the Sharks

Published 24 Feb 2017 by legoktm in The Lego Mirror.

The San Jose Barracuda, the Sharks AHL affiliate team, is currently riding a 13 game winning streak, and is on top of the AHL — and that's great news for the Sharks.

Ever since the Barracuda moved here from Worcester, Mass., it's only been great news for the Sharks. Because they play in the same stadium, sending players up or down becomes as simple as a little paperwork and asking them to switch locker rooms, not cross-country flights.

This allows the Sharks to have a significantly deeper roster, since they can call up new players at a moment's notice. So the Barracuda's win streak is great news for Sharks fans, since it demonstrates how even the minor league players are ready to play in the pros.

And if you're watching hockey, be on the watch for Joe Thornton to score his 1,000 assist! (More on that next week).

How can I keep mediawiki not-yet-created pages from cluttering my google webmaster console with 404s?

Published 24 Feb 2017 by Sean in Newest questions tagged mediawiki - Webmasters Stack Exchange.

we have a mediawiki install as part of our site. As on all wikis people will add links for not yet created pages (red links). When followed these links return a 404 status (as there is no content) along with an invite to add content.

I'm not getting buried in 404 notices in google webmaster console for this site. Is there a best way to handle this?

Thanks for any help.

The Other Half

Published 24 Feb 2017 by Jason Scott in ASCII by Jason Scott.

On January 19th of this year, I set off to California to participate in a hastily-arranged appearance in a UCLA building to talk about saving climate data in the face of possible administrative switchover. I wore a fun hat, stayed in a nice hotel, and saw an old friend from my MUD days for dinner. The appearance was a lot of smart people doing good work and wanting to continue with it.

While there, I was told my father’s heart surgery, which had some complications, was going to require an extended stay and we were running out of relatives and companions to accompany him. I booked a flight for seven hours after I’d arrive back in New York to go to North Carolina and stay with him. My father has means, so I stayed in a good nearby hotel room. I stayed with him for two and a half weeks, booking ten to sixteen hour days to accompany him through a maze of annoyances, indignities, smart doctors, variant nurses ranging from saints to morons, and generally ensure his continuance.

In the middle of this, I had a non-movable requirement to move the manuals out of Maryland and send them to California. Looking through several possibilities, I settled with: Drive five hours to Maryland from North Carolina, do the work across three days, and drive back to North Carolina. The work in Maryland had a number of people helping me, and involved pallet jacks, forklifts, trucks, and crazy amounts of energy drinks. We got almost all of it, with a third batch ready to go. I drove back the five hours to North Carolina and caught up on all my podcasts.

I stayed with my father another week and change, during which I dented my rental car, and hit another hard limit: I was going to fly to Australia. I also, to my utter horror, realized I was coming down with some sort of cold/flu. I did what I could – stabilized my father’s arrangements, went into the hotel room, put on my favorite comedians in a playlist, turned out the lights, drank 4,000mg of Vitamin C, banged down some orange juice, drank Mucinex, and covered myself in 5 blankets. I woke up 15 hours later in a pool of sweat and feeling like I’d crossed the boundary with that disease. I went back to the hospital to assure my dad was OK (he was), and then prepped for getting back to NY, where I discovered almost every flight for the day was booked due to so many cancelled flights the previous day.

After lots of hand-wringing, I was able to book a very late flight from North Carolina to New York, and stayed there for 5 hours before taking a 25 hour two-segment flight through Dubai to Melbourne.

I landed in Melbourne on Monday the 13th of February, happy that my father was stable back in the US, and prepping for my speech and my other commitments in the area.

On Tuesday I had a heart attack.

We know it happened then, or began to happen, because of the symptoms I started to show – shortness of breath, a feeling of fatigue and an edge of pain that covered my upper body like a jacket. I was fucking annoyed – I felt like I was just super tired and needed some energy, and energy drinks and caffiene weren’t doing the trick.

I met with my hosts for the event I’d do that Saturday, and continued working on my speech.

I attended the conference for that week, did a couple interviews, saw some friends, took some nice tours of preservation departments and discussed copyright with very smart lawyers from the US and Australia.

My heart attack continued, blocking off what turned out to be a quarter of my bloodflow to my heart.

This was annoying me but I didn’t know it was, so according to my fitbit I walked 25 miles, walked up 100 flights of stairs, and maintained hours of exercise to snap out of it, across the week.

I did a keynote for the conference. The next day I hosted a wonderful event for seven hours. I asked for a stool because I said I was having trouble standing comfortably. They gave me one. I took rests during it, just so the DJ could get some good time with the crowds. I was praised for my keeping the crowd jumping and giving it great energy. I’d now had been having a heart attack for four days.

That Sunday, I walked around Geelong, a lovely city near Melbourne, and ate an exquisite meal at Igni, a restaurant whose menu basically has one line to tell you you’ll be eating what they think you should have. Their choices were excellent. Multiple times during the meal, I dozed a little, as I was fatigued. When we got to the tram station, I walked back to the apartment to get some rest. Along the way, I fell to the sidewalk and got up after resting.

I slept off more of the growing fatigue and pain.

The next day I had a second exquisite meal of the trip at Vue Le Monde, a meal that lasted from about 8pm to midnight. My partner Rachel loves good meals and this is one of the finest you can have in the city, and I enjoyed it immensely. It would have been a fine last meal. I’d now had been experiencing a heart attack for about a week.

That night, I had a lot of trouble sleeping. The pain was now a complete jacket of annoyance on my body, and there was no way to rest that didn’t feel awful. I decided medical attention was needed.

The next morning, Rachel and I walked 5 blocks to a clinic, found it was closed, and walked further to the RealCare Health Clinic. I was finding it very hard to walk at this point. Dr. Edward Petrov saw me, gave me some therapy for reflux, found it wasn’t reflux, and got concerned, especially as having my heart checked might cost me something significant. He said he had a cardiologist friend who might help, and he called him, and it was agreed we could come right over.

We took a taxi over to Dr. Georg Leitl’s office. He saw me almost immediately.

He was one of those doctors that only needed to take my blood pressure and check my heart with a stethoscope for 30 seconds before looking at me sadly. We went to his office, and he told me I could not possibly get on the plane I was leaving on in 48 hours. He also said I needed to go to Hospital very quickly, and that I had some things wrong with me that needed attention.

He had his assistants measure my heart and take an ultrasound, wrote something on a notepad, put all the papers in an envelope with the words “SONNY PALMER” on them, and drove me personally over in his car to St. Vincent’s Hospital.

Taking me up to the cardiology department, he put me in the waiting room of the surgery, talked to the front desk, and left. I waited 5 anxious minutes, and then was bought into a room with two doctors, one of whom turned out to be Dr. Sonny Palmer.

Sonny said Georg thought I needed some help, and I’d be checked within a day. I asked if he’d seen the letter with his name on it. He hadn’t. He went and got it.

He came back and said I was going to be operated on in an hour.

He also explained I had a rather blocked artery in need of surgery. Survival rate was very high. Nerve damage from the operation was very unlikely. I did not enjoy phrases like survival and nerve damage, and I realized what might happen very shortly, and what might have happened for the last week.

I went back to the waiting room, where I tweeted what might have been my possible last tweets, left a message for my boss Alexis on the slack channel, hugged Rachel tearfully, and then went into surgery, or potential oblivion.

Obviously, I did not die. The surgery was done with me awake, and involved making a small hole in my right wrist, where Sonny (while blasting Bon Jovi) went in with a catheter, found the blocked artery, installed a 30mm stent, and gave back the blood to the quarter of my heart that was choked off. I listened to instructions on when to talk or when to hold myself still, and I got to watch my beating heart on a very large monitor as it got back its function.

I felt (and feel) legions better, of course – surgery like this rapidly improves life. Fatigue is gone, pain is gone. It was also explained to me what to call this whole event: a major heart attack. I damaged the heart muscle a little, although that bastard was already strong from years of high blood pressure and I’m very young comparatively, so the chances of recovery to the point of maybe even being healthier than before are pretty good. The hospital, St. Vincents, was wonderful – staff, environment, and even the food (incuding curry and afternoon tea) were a delight. My questions were answered, my needs met, and everyone felt like they wanted to be there.

It’s now been 4 days. I was checked out of the hospital yesterday. My stay in Melbourne was extended two weeks, and my hosts (MuseumNext and ACMI) paid for basically all of the additional AirBNB that I’m staying at. I am not cleared to fly until the two weeks is up, and I am now taking six medications. They make my blood thin, lower my blood pressure, cure my kidney stones/gout, and stabilize my heart. I am primarily resting.

I had lost a lot of weight and I was exercising, but my cholesterol was a lot worse than anyone really figured out. The drugs and lifestyle changes will probably help knock that back, and I’m likely to adhere to them, unlike a lot of people, because I’d already been on a whole “life reboot” kick. The path that follows is, in other words, both pretty clear and going to be taken.

Had I died this week, at the age of 46, I would have left behind a very bright, very distinct and rather varied life story. I’ve been a bunch of things, some positive and negative, and projects I’d started would have lived quite neatly beyond my own timeline. I’d have also left some unfinished business here and there, not to mention a lot of sad folks and some extremely quality-variant eulogies. Thanks to a quirk of the Internet Archive, there’s a little statue of me – maybe it would have gotten some floppy disks piled at its feet.

Regardless, I personally would have been fine on the accomplishment/legacy scale, if not on the first-person/relationships/plans scale. That my Wikipedia entry is going to have a different date on it than February 2017 is both a welcome thing and a moment to reflect.

I now face the Other Half, whatever events and accomplishments and conversations I get to engage in from this moment forward, and that could be anything from a day to 100 years.

Whatever and whenever that will be, the tweet I furiously typed out on cellphone as a desperate last-moment possible-goodbye after nearly a half-century of existence will likely still apply:

“I have had a very fun time. It was enormously enjoyable, I loved it all, and was glad I got to see it.”


Three take aways to understand Cloudflare's apocalyptic-proportions mess

Published 24 Feb 2017 by Carlos Fenollosa in Carlos Fenollosa — Blog.

It turns out that Cloudflare's proxies have been dumping uninitialized memory that contains plain HTTPS content for an indeterminate amount of time. If you're not familiar with the topic, let me summarize it: this is the worst crypto news in the last 10 years.

As usual, I suggest you read the HN comments to understand the scandalous magnitude of the bug.

If you don't see this as a news-opening piece on TV it only confirms that journalists know nothing about tech.

How bad is it, really? Let's see

I'm finding private messages from major dating sites, full messages from a well-known chat service, online password manager data, frames from adult video sites, hotel bookings. We're talking full HTTPS requests, client IP addresses, full responses, cookies, passwords, keys, data, everything

If the bad guys didn't find the bug before Tavis, you may be on the clear. However, as usual in crypto, you must assume that any data you submitted through a Cloudflare HTTPS proxy has been compromised.

Three take aways

A first take away, crypto may be mathematically perfect but humans err and the implementations are not. Just because something is using strong crypto doesn't mean it's immune to bugs.

A second take away, MITMing the entire Internet doesn't sound so compelling when you put it that way. Sorry to be that guy, but this only confirms that the centralization of the Internet by big companies is a bad idea.

A third take away, change all your passwords. Yep. It's really that bad. Your passwords and private requests may be stored somewhere, on a proxy or on a malicious actor's servers.

Well, at least change your banking ones, important services like email, and master passwords on password managers -- you're using one, right? RIGHT?

You can't get back any personal info that got leaked but at least you can try to minimize the aftershock.

Update: here is a provisional list of affected services. Download the full list, export your password manager data into a csv file, and compare both files by using grep -f sorted_unique_cf.txt your_passwords.csv.

Afterwards, check the list of potentially affected iOS apps

Let me conclude by saying that unless you were the victim of a targeted attack it's improbable that this bug is going to affect you at all. However, that small probability is still there. Your private information may be cached somewhere or stored on a hacker's server, waiting to be organized and leaked with a flashy slogan.

I'm really sorry about the overly dramatic post, but this time it's for real.

Tags: security, internet, news

Comments? Tweet  

DigitalOcean, Your Data, and the Cloudflare Vulnerability

Published 23 Feb 2017 by Nick Vigier in DigitalOcean: Cloud computing designed for developers.

Over the course of the last several hours, we have received a number of inquiries about the Cloudflare vulnerability reported on February 23, 2017. Since the information release, we have been told by Cloudflare that none of our customer data has appeared in search caches. The DigitalOcean security team has done its own research into the issue, and we have not found any customer data present in the breach.

Out of an abundance of caution, DigitalOcean's engineering teams have reset all session tokens for our users, which will require that you log in again.

We recommend that you do the following to further protect your account:

Again, we would like to reiterate that there is no evidence that any customer data has been exposed as a result of this vulnerability, but we care about your security. So we are therefore taking this precaution as well as continuing to monitor the situation.

Nick Vigier, Director of Security

The localhost page isn’t working on MediaWiki

Published 23 Feb 2017 by hasanghaforian in Newest questions tagged mediawiki - Webmasters Stack Exchange.

I want to use Widget PDF to embed PDF files on my MediaWiki pages. So at first installed Extension:Widgets on MediaWiki and it seems it is installed (I can see it in Installed extensions list in Special:Version of the Wiki). The I copied and pasted the entire source of the PDF widget code page into a page called Widget:PDF on my Wiki:

<big>This widget allows you to '''embed PDF files''' on your wiki page.</big>

Created by [ühler Wilhelm Bühler] and adapted by [ Karsten Hoffmeyer].

== Using this widget ==
For information on how to use this widget, see [ widget description page on].

== Copy to your site ==
To use this widget on your site, just install [ MediaWiki Widgets extension] and copy the [{{fullurl:{{FULLPAGENAME}}|action=edit}} full source code] of this page to your wiki as page '''{{FULLPAGENAME}}'''.
</noinclude><includeonly><object class="pdf-widget" data="<!--{$url|validate:url}-->" type="application/pdf" wmode="transparent" style="z-index: 999; height: 100%; min-height: <!--{$height|escape:'html'|default:680}-->px; width: 100%; max-width: <!--{$width|escape:'html'|default:960}-->px;"><param name="wmode" value="transparent">
<p>Currently your browser does not use a PDF plugin. You may however <a href="<!--{$url|validate:url}-->">download the PDF file</a> instead.</p></object></includeonly>

My PDF file is under this URL:


And it's name is File:GraphicsandAnimations-Devoxx2010.pdf. So as described here, I added this code to my Wiki page:


But this error occured:

The localhost page isn’t working
localhost is currently unable to handle this request. 

What I did:

  1. Also I tried this (original example of the Widget PDF)


    But result was the same.

  2. I read Extension talk:Widgets but did not find any thing.

  3. I opened Chrome DevTools (Ctrl+Shift+I), but there was no error.

How I can solve the problem?


After some times, I tried to uninstall Widget PDF and Extension:Widgets and reinstall them. So I removed Extension:Widgets files/folder from $IP/extensions/ and also deleted Widget:PDF page from Wiki. Then I installed Extension:Widgets again, but now, I can not open the Wiki pages at all (I see above error again), unless I delete require_once "$IP/extensions/Widgets/Widgets.php"; from LocalSettings.php. So I even cannot try to load Extension:Widgets.

Now I see this error in DevTools:

Failed to load resource: the server responded with a status of 500 (Internal Server Error)

Also after uninstalling Extension:Widgets, I tried Extension:PDFEmbed and unfortunately again I saw above error.

Mediawiki doesn't send any email

Published 19 Feb 2017 by fpiette in Newest questions tagged mediawiki - Ask Ubuntu.

My mediawiki installation (1.28.0, PHP 7.0.13) doesn't send any email and yet there is no error emitted. I checked using Special:EmailUser page.

What I have tryed: 1) A simple PHP script to send a mail using PHP's mail() function. It works. 2) I have turned PHP mail log. There is a normal line for each Mediawiki email "sent".

PHP is configured (correctly since it works) to send email using Linux SendMail. MediaWiki is not configured to use direct SMTP.

Any suggestion appreciated. Thanks.

Week #5: Politics and the Super Bowl – chewing a pill too big to swallow

Published 17 Feb 2017 by legoktm in The Lego Mirror.

For a little change, I'd like to talk about the impact of sports upon us this week. The following opinion piece was first written for La Voz, and can also be read on their website.

Super Bowl commercials have become the latest victim of extreme politicization. Two commercials stood out from the rest by featuring pro-immigrant advertisements in the midst of a political climate deeply divided over immigration law. Specifically, Budweiser aired a mostly fictional story of their founder traveling to America to brew, while 84 Lumber’s ad followed a mother and daughter’s odyssey to America in search of a better life.

The widespread disdain toward non-white outsiders, which in turn has created massive backlash toward these advertisements, is no doubt repulsive, but caution should also be exercised when critiquing the placement of such politicization. Understanding the complexities of political institutions and society are no doubt essential, yet it is alarming that every facet of society has become so politicized; ironically, this desire to achieve an elevated political consciousness actually turns many off from the importance of politics.

Football — what was once simply a calming means of unwinding from the harsh winds of an oppressive world — has now become another headline news center for political drama.

President George H. W. Bush and his wife practically wheeled themselves out of a hospital to prepare for hosting the game. New England Patriots owner, Robert Kraft, and quarterback, Tom Brady, received sharp criticism for their support of Donald Trump, even to the point of losing thousands of dedicated fans.

Meanwhile, the NFL Players Association publicly opposed President Trump’s immigration ban three days before the game, with the NFLPA’s president saying “Our Muslim brothers in this league, we got their backs.”

Let’s not forget the veterans and active service members that are frequently honored before NFL games, except that’s an advertisement too – the Department of Defense paid NFL teams over $5 million over four years for those promotions.

Even though it’s an America’s pastime, football, and other similar mindless outlets, provide the role of allowing us to escape whenever we need a break from reality, and for nearly three hours on Sunday, America got its break, except for those commercials. If we keep getting nagged about an issue, even if we’re generally supportive, t will eventually become incessant to the point of promoting nihilism.

When Meryl Streep spoke out at the Golden Globes, she turned a relaxing event of celebrating fawning into a political shitstorm which redirected all attention back toward Trump controversies. Even she was mostly correct, the efficacy becomes questionable after such repetition as many will become desensitized.

Politics are undoubtedly more important than ever now, but for our sanity’s sake, let’s keep it to a minimum in football. That means commercials too.

What have we got in our digital archive?

Published 13 Feb 2017 by Jenny Mitcham in Digital Archiving at the University of York.

Do other digital archivists find that the work of a digital archivist rarely involves doing hands on stuff with digital archives? When you have to think about establishing your infrastructure, writing policies and plans and attending meetings it leaves little time for activities at the coal face. This makes it all the more satisfying when we do actually get the opportunity to work with our digital holdings.

In the past I've called for more open sharing of profiles of digital archive collections but I am aware that I had not yet done this for the contents of our born digital collections here at the Borthwick Institute for Archives. So here I try to redress that gap.

I ran DROID (v 6.1.5, signature file v 88, container signature 20160927) over the deposited files in our digital archive and have spent a couple of days crunching the results. Note that this just covers the original files as they have been given to us. It does not include administrative files that I have added, or dissemination or preservation versions of files that have subsequently been created.

I was keen to see:
...and also use these results to:
  • Inform future preservation planning and priorities
  • Feed further information to the PRONOM team at The National Archives
  • Get us to Level 2 of the NDSA Levels of Digital Preservation which asks for "an inventory of file formats in use" and which until now I haven't been collating!

Digital data has been deposited with us since before I started at the Borthwick in 2012 and continues to be deposited with us today. We do not have huge quantities of digital archives here as yet (about 100GB) and digital deposits are still the exception rather than the norm. We will be looking to chase digital archives more proactively once we have a Archivematica in place and appropriate workflows established.

Last modified dates (as recorded by DROID) appear to range from 1984 to 2017 with a peak at 2008. This distribution is illustrated below. Note however, that this data is not always to be trusted (that could be another whole blog post in itself...). One thing that it is fair to say though is that the archive stretches back right to the early days of personal computers and up to the present day.

Last modified dates on files in the Borthwick digital archive

Here are some of the findings of this profiling exercise:

Summary statistics

  • Droid reported that 10005 individual files were present
  • 9431 (94%) of the files were given a file format identification by Droid. This is a really good result ...or at least it seems it in comparison to my previous data profiling efforts which have focused on research data. This result is also comparable with those found within other digital archives, for example 90% at Bentley Historical Library, 96% at Norfolk Record Office and 98% at Hull University Archives
  • 9326 (99%) of those files that were identified were given just one possible identification. 1 file was given 2 different identifications (an xlsx file) and 104 files (with a .DOC extension) were given 8 identifications. In all these cases of multiple identifications, identification was done by file extension rather than signature - which perhaps explains the uncertainty

Files that were identified

So perhaps these are things I'll look into in a bit more detail if I have time in the future.

  • 90 different file formats were identified within this collection of data

  • Of the identified files 1764 (19%) were identified as Microsoft Word Document 97-2003. This was followed very closely by JPEG File Interchange Format version 1.01 with 1675 (18%) occurrences. The top 10 identified files are illustrated below:

  • This top 10 is in many ways comparable to other similar profiles that have been published recently from Bentley Historical Library, Hull University Archive and Norfolk Records Office with high occurrences of Microsoft Word, PDF and JPEG images. In contrast. what it is not so common in this profile are HTML files and GIF image files - these only just make it into the top 50. 

  • Also notable in our top ten are the Sibelius files which haven't appeared in other recently published profiles. Sibelius is musical notation software and these files appear frequently in one of our archives.

Files that weren't identified

  • Of the 574 files that weren't identified by DROID, 125 different file extensions were represented. For most of these there was just a single example of each.

  • 160 (28%) of the unidentified files had no file extension at all. Perhaps not surprisingly it is the earlier files in our born digital collection (files from the mid 80's), that are most likely to fall into this category. These were created at a time when operating systems seemed to be a little less rigorous about enforcing the use of file extensions! Approximately 80 of these files are believed to be WordStar 4.0 (PUID:  x-fmt/260) which DROID would only be able to recognise by file extension. Of course if no extension is included. DROID has little chance of being able to identify them!

  • The most common file extensions of those files that weren't identified are visible in the graph below. I need to do some more investigation into these but most come from 2 of our archives that relate to electronic music composition:

I'm really pleased to see that the vast majority of the files that we hold can be identified using current tools. This is a much better result than for our research data. Obviously there is still room for improvement so I hope to find some time to do further investigations and provide information to help extend PRONOM.

Other follow on work involves looking at system files that have been highlighted in this exercise. See for example the AppleDouble Resource Fork files that appear in the top ten identified formats. Also appearing quite high up (at number 12) were Thumbs.db files but perhaps that is the topic of another blog post. In the meantime I'd be really interested to hear from anyone who thinks that system files such as these should be retained.

Harvesting EAD from AtoM: a collaborative approach

Published 10 Feb 2017 by Jenny Mitcham in Digital Archiving at the University of York.

In a previous blog post AtoM harvesting (part 1) - it works! I described how archival descriptions within AtoM are being harvested as Dublin Core for inclusion within our University Library Catalogue.* I also hinted that this wouldn’t be the last you would hear from me on AtoM harvesting and that plans were afoot to enable much richer metadata in EAD 2002 XML (Encoded Archival Description) format to be harvested via OAI-PMH.

I’m pleased to be able to report that this work is now underway.

The University of York along with five other organisations in the UK have clubbed together to sponsor Artefactual Systems to carry out the necessary development work to make EAD harvesting possible. This work is scheduled for release in AtoM version 2.4 (due out in the Spring).

The work is being jointly sponsored by:

We are also receiving much needed support in this project from The Archives Hub who are providing advice on the AtoM EAD and will be helping us test the EAD harvesting when it is ready. While the sponsoring institutions are all producers of AtoM EAD, The Archives Hub is a consumer of that EAD. We are keen to ensure that the archival descriptions that we enter into AtoM can move smoothly to The Archives Hub (and potentially to other data aggregators in the future), allowing the richness of our collections to be signposted as widely as possible.

Adding this harvesting functionality to AtoM will enable The Archives Hub to gather data direct from us on a regular schedule or as and when updates occur, ensuring that:

So, what are we doing at the moment?

What we are doing at the moment is good and a huge step in the right direction, but perhaps not perfect. As we work together on this project we are coming across areas where future work would be beneficial in order to improve the quality of the EAD that AtoM produces or to expand the scope of what can be harvested from AtoM. I hope to report on this in more detail at the end of the project, but in the meantime, do get in touch if you are interested in finding out more.

* It is great to see that this is working well and our Library Catalogue is now appearing in the referrer reports for the Borthwick Catalogue on Google Analytics. People are clearly following these new signposts to our archives!

Week #4: 500 for Mr. San Jose Shark

Published 9 Feb 2017 by legoktm in The Lego Mirror.

He did it: Patrick Marleau scored his 500th career goal. He truly is Mr. San Jose Shark.

I had the pleasure of attending the next home game on Saturday right after he reached the milestone in Vancouver, and nearly lost my voice chearing for Marleau. They mentioned his accomplishment once before the game and again during a break, and each time Marleau would only stand up and acknowledge the crowd cheering for him when he realized they would not stop until he did.

He's had his ups and downs, but he's truly a team player.

“I think when you hit a mark like this, you start thinking about everyone that’s helped you along the way,” Marleau said.

And on Saturday at home, Marleau assisted on both Sharks goals, helping out his teammates who had helped Marleau score his over the past two weeks.

Congrats Marleau, and thanks for the 20 years of hockey. Can't wait to see you raise the Cup.

Simpson and his Donkey – an exhibition

Published 9 Feb 2017 by carinamm in State Library of Western Australia Blog.

Illustrations by Frané Lessac and words by Mark Greenwood share the heroic story of John Simpson Kirkpatrick in the picture book Simpson and his Donkey.  The exhibition is on display at the State Library until  27 April. 

Unpublished spread 14 for pages 32 – 33
Collection of draft materials for Simpson and his Donkey, PWC/254/18 

The original illustrations, preliminary sketches and draft materials displayed in this exhibition form part of the State Library’s Peter Williams’ collection: a collection of original Australian picture book art.

Known as ‘the man with the donkey’, Simpson was a medic who rescued wounded soldiers at Gallipoli during World War I.

The bravery and sacrifice attributed to Simpson is now considered part of the ‘Anzac legend’. It is the myth and legend of John Simpson that Frané Lessac and Mark Greenwood tell in their book.

Frané Lessac and Mark Greenwood also travelled to Anzac Cove to explore where Simpson and Duffy had worked.  This experience and their research enabled them to layer creative interpretation over historical information and Anzac legend.


On a moonless April morning, PWC254/6 

Frané Lessac is a Western Australian author-illustrator who has published over forty books for children. Frané speaks at festivals in Australia and overseas, sharing the process of writing and illustrating books. She often illustrates books by , Mark Greenwood, of which Simpson and his Donkey is just one example.

Simpson and his Donkey is published by Walker Books, 2008. The original illustrations are  display in the Story Place Gallery until 27 April 2017.


Filed under: Children's Literature, community events, Exhibitions, Illustration, Picture Books, SLWA collections, SLWA displays, WA books and writers, WA history, Western Australia Tagged: children's literature, exhibitions, Frane Lessac, Mark Greenwood, Peter Williams collection, Simpson and his Donkey, State Library of Western Australia, The Story Place


Published 6 Feb 2017 by timbaker in Tim Baker.

So I’ve got this speaking gig coming up at the Pursue Your Passion conference in Bryon Bay Saturday week, February 18. And I’ve been thinking a lot about what I want to say. One of my main qualifications for this gig is my 2011 round...

Week #3: All-Stars

Published 2 Feb 2017 by legoktm in The Lego Mirror.

via /u/PAGinger on reddit

Last weekend was the NHL All-Star game and skills competition, with Brent Burns, Martin Jones, and Joe Pavelski representing the San Jose Sharks in Los Angeles. And to no one's surprise, they were all booed!

Pavelski scored a goal during the tournament for the Pacific Division, and Burns scored during the skills competition's "Four Line Challenge". But since they represented the Pacific, we have to talk about the impossible shot Mike Smith made.

And across the country, the 2017 NFL Pro Bowl (their all-star game) was happening at the same time. The Oakland Raiders had seven Pro Bowlers (tied for most from any team), and the San Francisco 49ers had...none.

In the meantime the 49ers managed to hire a former safety with no General Manager-related experience as their new GM. It's really not clear what Jed York, the 49ers owner, is trying out here, and why he would sign John Lynch to a six year contract.

But really, how much worse could it get for the 49ers?

Updates to

Published 29 Jan 2017 by legoktm in The Lego Mirror.

Over the weekend I migrated and associated services over to a new server. It's powered by Debian Jessie instead of the slowly aging Ubuntu Trusty. Most services were migrated with no downtime by rsync'ing content over and the updating DNS. Only had some downtime due to needing to stop the service before copying over the database.

I did not migrate my IRC bouncer history or configuration, so I'm starting fresh. So if I'm no longer in a channel, feel free to PM me and I'll rejoin!

At the same time I moved the main homepage to MediaWiki. Hopefully that will encourage me to update the content on it more often.

Finally, the tor relay node I'm running was moved to a separate server entirely. I plan on increasing the resources allocated to it.


Published 26 Jan 2017 by legoktm in The Lego Mirror.

The only person who would dare upstage Patrick Marleau's four goal night is Randy Hahn, with his hilarious call after Marleau's third goal to finish a natural hat-trick: "NATTY HATTY FOR PATTY". And after scoring another, Marleau became the first player to score four goals in a single period since the great Mario Lemieux did in 1997. He's also the third Shark to score four goals in a game, joining Owen Nolan (no video available, but his hat-trick from the 1997 All-Star game is fabulous) and Tomáš Hertl.

Marleau is also ready to hit his next milestone of 500 career goals - he's at 498 right now. Every impressive stat he puts up just further solidifies him as one of the greatest hockey players of his generation. But he's still missing the one achievement that all the greats need - a Stanley Cup. The Sharks made their first trip to the Stanley Cup Finals last year, but realistically had very little chance of winning; they simply were not the better team.

The main question these days is how long Marleau and Joe Thornton will keep playing for, and if they can stay healthy until they eventually win that Stanley Cup.

Discuss this post on Reddit.

Creating an annual accessions report using AtoM

Published 24 Jan 2017 by Jenny Mitcham in Digital Archiving at the University of York.

So, it is that time of year where we need to complete our annual report on accessions for the National Archives. Along with lots of other archives across the UK we send The National Archives summary information about all the accessions we have received over the course of the previous year. This information is collated and provided online on the Accessions to Repositories website for all to see.

The creation of this report has always been a bit time consuming for our archivists, involving a lot of manual steps and some re-typing but since we have started using AtoM as our Archival Management System the process has become much more straightforward.

As I've reported in a previous blog post, AtoM does not do all that we want to do in the way of reporting via it's front end.

However, AtoM has an underlying MySQL database and there is nothing to stop you bypassing the interface, looking at the data behind the scenes and pulling out all the information you need.

One of the things we got set up fairly early in our AtoM implementation project was a free MySQL client called Squirrel. Using Squirrel or another similar tool, you can view the database that stores all your AtoM data, browse the data and run queries to pull out the information you need. It is also possible to update the data using these SQL clients (very handy if you need to make any global changes to your data). All you need initially is a basic knowledge of SQL and you can start pulling some interesting reports from AtoM.

The downside of playing with the AtoM database is of course that it isn't nearly as user friendly as the front end.

It is always a bit of an adventure navigating the database structure and trying to work out how the tables are linked. Even with the help of an Entity Relationship Diagram from Artefactual creating more complex queries is ...well ....complex!

AtoM's database tables - there are a lot of them!

However, on a positive note, the AtoM user forum is always a good place to ask stupid questions and Artefactual staff are happy to dive in and offer advice on how to formulate queries. I'm also lucky to have help from more technical colleagues here in Information Services (who were able to help me get Squirrel set up and talking to the right database and can troubleshoot my queries) so what follows is very much a joint effort.

So for those AtoM users in the UK who are wrestling with their annual accessions report, here is a query that will pull out the information you need:

SELECT accession.identifier,, accession_i18n.title, accession_i18n.scope_and_content, accession_i18n.received_extent_units, 
accession_i18n.location_information, case when cast(event.start_date as char) like '%-00-00' then left(cast(event.start_date as char),4) 
else cast(event.start_date as char)
end as start_date,
case when cast(event.end_date as char) like '%-00-00' then left(cast(event.end_date as char),4) 
else cast(event.end_date as char)
end as end_date,
from accession
LEFT JOIN event on
LEFT JOIN event_i18n on
JOIN accession_i18n ON
where like '2016%'
order by identifier

A couple of points to make here:

  • In a previous version of the query, we included some other tables so we could also capture information about the creator of the archive. The addition of the relation, actor and actor_i18n tables made the query much more complicated and for some reason it didn't work this year. I have not attempted to troubleshoot this in any great depth for the time being as it turns out we are no longer recording creator information in our accessions records. Adding a creator record to an accessions entry creates an authority record for the creator that is automatically made public within the AtoM interface and this ends up looking a bit messy (as we rarely have time at this point in the process to work this into a full authority record that is worthy of publication). Thus as we leave this field blank in our accession record there is no benefit in trying to extract this bit of the database.
  • In an earlier version of this query there was something strange going on with the dates that were being pulled out of the event table. This seemed to be a quirk that was specific to Squirrel. A clever colleague solved this by casting the date to char format and including a case statement that will list the year when there's only a year and the full date when fuller information has been entered. This is useful because in our accession records we enter dates to different levels. 
So, once I've exported the results of this query, put them in an Excel spreadsheet and sent them to one of our archivists, all that remains for her to do is to check through the data, do a bit of tidying up, ensure the column headings match what is required by The National Archives and the spreadsheet is ready to go!

Bromptons in Museums and Art Galleries

Published 23 Jan 2017 by Andy Mabbett in Andy Mabbett, aka pigsonthewing.

Every time I visit London, with my Brompton bicycle of course, I try to find time to take in a museum or art gallery. Some are very accommodating and will cheerfully look after a folded Brompton in a cloakroom (e.g. Tate Modern, Science Museum) or, more informally, in an office or behind the security desk (Bank of England Museum, Petrie Museum, Geffrye Museum; thanks folks).

Brompton bicycle folded

When folded, Brompton bikes take up very little space

Others, without a cloakroom, have lockers for bags and coats, but these are too small for a Brompton (e.g. Imperial War Museum, Museum of London) or they simply refuse to accept one (V&A, British Museum).

A Brompton bike is not something you want to chain up in the street, and carrying a hefty bike-lock would defeat the purpose of the bike’s portability.

Jack Wills, New Street (geograph 4944811)

This Brompton bike hire unit, in Birmingham, can store ten folded bikes each side. The design could be repurposed for use at venues like museums or galleries.

I have an idea. Brompton could work with museums — in London, where Brompton bikes are ubiquitous, and elsewhere, though my Brompton and I have never been turned away from a museum outside London — to install lockers which can take a folded Brompton. These could be inside with the bag lockers (preferred) or outside, using the same units as their bike hire scheme (pictured above).

Where has your Brompton had a good, or bad, reception?


Less than two hours after I posted this, Will Butler-Adams, MD of Brompton, >replied to me on Twitter:

so now I’m reaching out to museums, in London to start with, to see who’s interested.

The post Bromptons in Museums and Art Galleries appeared first on Andy Mabbett, aka pigsonthewing.

Running with the Masai

Published 23 Jan 2017 by Tom Wilson in tom m wilson.

What are you going to do if you like tribal living and you’re in the cold winter of the Levant?  Head south to the Southern Hemisphere, and to the wilds of Africa. After leaving Israel and Jordan that is exactly what I did. I arrived in Nairobi and the first thing which struck me was […]

Week #1: Who to root for this weekend

Published 22 Jan 2017 by legoktm in The Lego Mirror.

For the next 10 weeks I'll be posting sports content related to Bay Area teams. I'm currently taking an intro to features writing class, and we're required to keep a blog that focuses on a specific topic. I enjoy sports a lot, so I'll be covering Bay Area sports teams (Sharks, Earthquakes, Raiders, 49ers, Warriors, etc.). I'll also be trialing using Reddit for comments. If it works well, I'll continue using it for the rest of my blog as well. And with that, here goes:

This week the Green Bay Packers will be facing the Atlanta Falcons in the very last NFL game at the Georgia Dome for the NFC Championship. A few hours later, the Pittsburgh Steelers will meet the New England Patriots in Foxboro competing for the AFC Championship - and this will be only the third playoff game in NFL history featuring two quarterbacks with multiple Super Bowl victories.

Neither Bay Area football team has a direct stake in this game, but Raiders and 49ers fans have a lot to root for this weekend.

49ers: If you're a 49ers fan, you want to root for the Falcons to lose. This might sound a little weird, but currently the 49ers are looking to hire Falcons offensive coordinator, Kyle Shanahan, as their new head coach. However, until the Falcons' season ends, they cannnot officially hire him. And since 49ers general manager search depends upon having a head coach, they can get a head start by two weeks if the Falcons lose this weekend.

Raiders: Do you remember the Tuck Rule Game? If so, you'll still probably be rooting for anyone but Tom Brady, quarterback for the Patriots. If not, well, you'll probably want to root for the Steelers, who eliminated Raiders' division rival Kansas City Chiefs last weekend in one of the most bizarre playoff games. Even though the Steelers could not score a single touchdown, they topped the Chiefs two touchdowns with a record six field goals. Raiders fans who had to endure two losses to the Chiefs this season surely appreciated how the Steelers embarrassed the Chiefs on prime time television.

Discuss this post on Reddit.

Four Stars of Open Standards

Published 21 Jan 2017 by Andy Mabbett in Andy Mabbett, aka pigsonthewing.

I’m writing this at UKGovCamp, a wonderful unconference. This post constitutes notes, which I will flesh out and polish later.

I’m in a session on open standards in government, convened by my good friend Terence Eden, who is the Open Standards Lead at Government Digital Service, part of the United Kingdom government’s Cabinet Office.

Inspired by Tim Berners-Lee’s “Five Stars of Open Data“, I’ve drafted “Four Stars of Open Standards”.

These are:

  1. Publish your content consistently
  2. Publish your content using a shared standard
  3. Publish your content using an open standard
  4. Publish your content using the best open standard

Bonus points for:

Point one, if you like is about having your own local standard — if you publish three related data sets for instance, be consistent between them.

Point two could simply mean agreeing a common standard with other items your organisation, neighbouring local authorities, or suchlike.

In points three and four, I’ve taken “open” to be the term used in the “Open Definition“:

Open means anyone can freely access, use, modify, and share for any purpose (subject, at most, to requirements that preserve provenance and openness).

Further reading:

The post Four Stars of Open Standards appeared first on Andy Mabbett, aka pigsonthewing.

Supporting Software Freedom Conservancy

Published 17 Jan 2017 by legoktm in The Lego Mirror.

Software Freedom Conservancy is a pretty awesome non-profit that does some great stuff. They currently have a fundraising match going on, that was recently extended for another week. If you're able to, I think it's worthwhile to support their organization and mission. I just renewed my membership.

Become a Conservancy Supporter!

A Doodle in the Park

Published 16 Jan 2017 by Dave Robertson in Dave Robertson.

The awesome Carolyn White is doing a doodle a day, but in this case it was a doodle of Dave, with Tore and The Professor, out in the summer sun of the Manning Park Farmers and Artisan Market.


MediaWiki - powered by Debian

Published 16 Jan 2017 by legoktm in The Lego Mirror.

Barring any bugs, the last set of changes to the MediaWiki Debian package for the stretch release landed earlier this month. There are some documentation changes, and updates for changes to other, related packages. One of the other changes is the addition of a "powered by Debian" footer icon (drawn by the amazing Isarra), right next to the default "powered by MediaWiki" one.

Powered by Debian

This will only be added by default to new installs of the MediaWiki package. But existing users can just copy the following code snippet into their LocalSettings.php file (adjust paths as necessary):

# Add a "powered by Debian" footer icon
$wgFooterIcons['poweredby']['debian'] = [
    "src" => "/mediawiki/resources/assets/debian/poweredby_debian_1x.png",
    "url" => "",
    "alt" => "Powered by Debian",
    "srcset" =>
        "/mediawiki/resources/assets/debian/poweredby_debian_1_5x.png 1.5x, " .
        "/mediawiki/resources/assets/debian/poweredby_debian_2x.png 2x",

The image files are included in the package itself, or you can grab them from the Git repository. The source SVG is available from Wikimedia Commons.


Published 12 Jan 2017 by timbaker in Tim Baker.

The Pursue Your Passion conference is on in Byron Bay on Saturday February 18 and is designed for anyone wanting to make 2017 the year they really start following their dreams, living their bliss and all that good stuff. I’m one of three speakers on...

Importing pages breaks category feature

Published 10 Jan 2017 by Paul in Newest questions tagged mediawiki - Webmasters Stack Exchange.

I just installed MediaWiki 1.27.1 and setup completes without issue on a server with Ubuntu 16.04, nginx, PHP 5.6, and MariaDB 10.1.

I created an export file with a different wiki using the Special:Export page. I then imported the articles to the new wiki using the Special:Import page. The file size is smaller than any limits and time the operation takes to complete is much less than configured timeouts.

Before import, I have created articles and categories and everything works as expected.

However, after importing, when I create a category tag on an article, clicking the link to the category's page doesn't show the article in the category.

I am using this markup within the article to create the category:

[[Category:Category Name]]

Is this a bug or am I missing something?

Teacup – One Boy’s Story of Leaving His Homeland

Published 8 Jan 2017 by carinamm in State Library of Western Australia Blog.


“Once there was a boy who had to leave home …and find another. In his bag he carried a book, a bottle and a blanket. In his teacup he held some earth from where he used to play”

A musical performance adapted from the picture book Teacup written by Rebecca Young and illustrated Matt Ottley, will premiere at the State Library of Western Australia as part of Fringe Festival. 

Accompanied by musicians from Perth chamber music group Chimera Ensemble, Music Book’s Narrator Danielle Joynt and Lark Chamber Opera’s soprano composer Emma Jayakumar, the presentation of Teacup will be a truly ‘multi-modal’ performance, where the music of Matt Ottley will ‘paint’ the colours, scenery and words into life.

Performance Times:

Fri 27 January 2:30pm
Sat 28 January 10:30am, 1pm and 2:30pm
Sun 29 January 10:30am, 1pm and 2:30pm

Matt Ottley’s original paintings from the picture book Teacup from part of the State Library’s Peter Williams collection of original picture book art. The artworks will be displayed in  Teacup – an exhibition in the ground floor gallery between 20 January – 24 March 2017.

Image credit: Cover illustration for Teacup, Matt Ottley, 2015. State Library of Western Australia, PWC/255/01  Reproduced in the book Teacup written by Rebecca Young with illustrations by Matt Ottley. Published by Scholastic, 2015.

This event is supported by the City of Perth 

Filed under: Children's Literature, community events, Concerts, Exhibitions, Illustration, Music, SLWA collections, SLWA displays, SLWA events, SLWA Exhibitions, Uncategorized Tagged: exhibitions, Matt Ottley, Music Book Stories Inc., Peter Williams collection, State Library of Western Australia, Teacup - One Boy's Story of Leaving His Homeland

Big Tribes

Published 5 Jan 2017 by Tom Wilson in tom m wilson.

In Jerusalem yesterday I encountered three of the most sacred sites of some of the biggest religions on earth. First the Western Wall, the most sacred site for Jews worldwide. Then after some serious security checks and long wait in a line we were allowed up a long wooden walkway, up to the Temple Mount.   […]

A Year Without a Byte

Published 4 Jan 2017 by Archie Russell in

One of the largest cost drivers in running a service like Flickr is storage. We’ve described multiple techniques to get this cost down over the years: use of COS, creating sizes dynamically on GPUs and perceptual compression. These projects have been very successful, but our storage cost is still significant.
At the beginning of 2016, we challenged ourselves to go further — to go a full year without needing new storage hardware. Using multiple techniques, we got there.

The Cost Story

A little back-of-the-envelope math shows storage costs are a real concern. On a very high-traffic day, Flickr users upload as many as twenty-five million photos. These photos require an average of 3.25 megabytes of storage each, totalling over 80 terabytes of data. Stored naively in a cloud service similar to S3, this day’s worth of data would cost over $30,000 per year, and continue to incur costs every year.

And a very large service will have over two hundred million active users. At a thousand images each, storage in a service similar to S3 would cost over $250 million per year (or $1.25 / user-year) plus network and other expenses. This compounds as new users sign up and existing users continue to take photos at an accelerating rate. Thankfully, our costs, and every large service’s costs, are different than storing naively at S3, but remain significant.

Cost per byte have decreased, but bytes per image from iPhone-type platforms have increased. Cost per image hasn’t changed significantly.

Storage costs do drop over time. For example, S3 costs dropped from $0.15 per gigabyte month in 2009 to $0.03 per gigabyte-month in 2014, and cloud storage vendors have added low-cost options for data that is infrequently accessed. NAS vendors have also delivered large price reductions.

Unfortunately, these lower costs per byte are counteracted by other forces. On iPhones, increasing camera resolution, burst mode and the addition of short animations (Live Photos) have increased bytes-per-image rapidly enough to keep storage cost per image roughly constant. And iPhone images are far from the largest.

In response to these costs, photo storage services have pursued a variety of product options. To name a few: storing lower quality images or re-compressing, charging users for their data usage, incorporating advertising, selling associated products such as prints, and tying storage to purchases of handsets.

There are also a number of engineering approaches to controlling storage costs. We sketched out a few and cover three that we implemented below: adjusting thresholds on our storage systems, rolling out existing savings approaches to more images, and deploying lossless JPG compression.

Adjusting Storage Thresholds

As we dug into the problem, we looked at our storage systems in detail. We discovered that our settings were based on assumptions about high write and delete loads that didn’t hold. Our storage is pretty static. Users only rarely delete or change images once uploaded. We also had two distinct areas of just-in-case space. 5% of our storage was reserved space for snapshots, useful for undoing accidental deletes or writes, and 8.5% was held free in reserve. This resulted in about 13% of our storage going unused. Trade lore states that disks should remain 10% free to avoid performance degradation, but we found 5% to be sufficient for our workload. So we combined our our two just-in-case areas into one and reduced our free space threshold to that level. This was our simplest approach to the problem (by far), but it resulted in a large gain. With a couple simple configuration changes, we freed up more than 8% of our storage.

Adjusting storage thresholds

Extending Existing Approaches

In our earlier posts, we have described dynamic generation of thumbnail sizes and perceptual compression. Combining the two approaches decreased thumbnail storage requirements by 65%, though we hadn’t applied these techniques to many of our images uploaded prior to 2014. One big reason for this: large-scale changes to older files are inherently risky, and require significant time and engineering work to do safely.

Because we were concerned that further rollout of dynamic thumbnail generation would place a heavy load on our resizing infrastructure, we targeted only thumbnails from less-popular images for deletes. Using this approach, we were able to handle our complete resize load with just four GPUs. The process put a heavy load on our storage systems; to minimize the impact we randomized our operations across volumes. The entire process took about four months, resulting in even more significant gains than our storage threshold adjustments.

Decreasing the number of thumbnail sizes

Lossless JPG Compression

Flickr has had a long-standing commitment to keeping uploaded images byte-for-byte intact. This has placed a floor on how much storage reduction we can do, but there are tools that can losslessly compress JPG images. Two well-known options are PackJPG and Lepton, from Dropbox. These tools work by decoding the JPG, then very carefully compressing it using a more efficient approach. This typically shrinks a JPG by about 22%. At Flickr’s scale, this is significant. The downside is that these re-compressors use a lot of CPU. PackJPG compresses at about 2MB/s on a single core, or about fifteen core-years for a single petabyte worth of JPGs. Lepton uses multiple cores and, at 15MB/s, is much faster than packJPG, but uses roughly the same amount of CPU time.

This CPU requirement also complicated on-demand serving. If we recompressed all the images on Flickr, we would need potentially thousands of cores to handle our decompress load. We considered putting some restrictions on access to compressed images, such as requiring users to login to access original images, but ultimately found that if we targeted only rarely accessed private images, decompressions would occur only infrequently. Additionally, restricting the maximum size of images we compressed limited our CPU time per decompress. We rolled this out as a component of our existing serving stack without requiring any additional CPUs, and with only minor impact to user experience.

Running our users’ original photos through lossless compression was probably our highest-risk approach. We can recreate thumbnails easily, but a corrupted source image cannot be recovered. Key to our approach was a re-compress-decompress-verify strategy: every recompressed image was decompressed and compared to its source before removing the uncompressed source image.

This is still a work-in-progress. We have compressed many images but to do our entire corpus is a lengthy process, and we had reached our zero-new-storage-gear goal by mid-year.

On The Drawing Board

We have several other ideas which we’ve investigated but haven’t implemented yet.

In our current storage model, we have originals and thumbnails available for every image, each stored in two datacenters. This model assumes that the images need to be viewable relatively quickly at any point in time. But private images belonging to accounts that have been inactive for more than a few months are unlikely to be accessed. We could “freeze” these images, dropping their thumbnails and recreate them when the dormant user returns. This “thaw” process would take under thirty seconds for a typical account. Additionally, for photos that are private (but not dormant), we could go to a single uncompressed copy of each thumbnail, storing a compressed copy in a second datacenter that would be decompressed as needed.

We might not even need two copies of each dormant original image available on disk. We’ve pencilled out a model where we place one copy on a slower, but underutilized, tape-based system while leaving the other on disk. This would decrease availability during an outage, but as these images belong to dormant users, the effect would be minimal and users would still see their thumbnails. The delicate piece here is the placement of data, as seeks on tape systems are prohibitively slow. Depending on the details of what constitutes a “dormant” photo these techniques could comfortably reduce storage used by over 25%.

We’ve also looked into de-duplication, but we found our duplicate rate is in the 3% range. Users do have many duplicates of their own images on their devices, but these are excluded by our upload tools.  We’ve also looked into using alternate image formats for our thumbnail storage.    WebP can be much more compact than ordinary JPG but our use of perceptual compression gets us close to WebP byte size and permits much faster resize.  The BPG project proposes a dramatically smaller, H.265 based encoding but has IP and other issues.

There are several similar optimizations available for videos. Although Flickr is primarily image-focused, videos are typically much larger than images and consume considerably more storage.


Optimization over several releases

Since 2013 we’ve optimized our usage of storage by nearly 50%.  Our latest efforts helped us get through 2016 without purchasing any additional storage,  and we still have a few more options available.

Peter Norby, Teja Komma, Shijo Joy and Bei Wu formed the core team for our zero-storage-budget project. Many others assisted the effort.

Hello 2017

Published 4 Jan 2017 by Jenny Mitcham in Digital Archiving at the University of York.

Looking back

2016 was a busy year.

I can tell that from just looking at my untidy desk...I was going to include a photo at this point but that would be too embarrassing.

The highlights of 2016 for me were getting our AtoM catalogue released and available to the world in April, completing Filling the Digital Preservation Gap (and seeing the project move from the early 'thinking' phases to actual implementation) and of course having our work on this project shortlisted in the Research and Innovation category of the Digital Preservation Awards.

...but other things happened too. Blogging really is a great way of keeping track of what I've been working on and of course what people are most interested to read about.

The top 5 most viewed posts from 2016 on this blog have been as follows:

Looking forward

So what is on the horizon for 2017?

Here are some of the things I'm going to be working on - expect blog posts on some or all of these things as the year progresses.


I blogged about AtoM a fair bit last year as we prepared our new catalogue for release in the wild! I expect I'll be talking less about AtoM this year as it becomes business as usual at the Borthwick, but don't expect me to be completely silent on this topic.

A group of AtoM users in the UK is sponsoring some development work within AtoM to enable EAD to be harvested via OAI-PMH. This is a very exciting new collaboration and will see us being able to expose our catalogue entries to the wider world, enabling them to be harvested by aggregators such as the Archives Hub. I'm very much looking forward to seeing this take shape.

This year I'm also keen to explore the Locations functionality of AtoM to see whether it is fit for our purposes.


Work with Archivematica is of course continuing. 

Post Filling the Digital Preservation Gap at York we are working on moving our proof of concept into production. We are also continuing our work with Jisc on the Research Data Shared Service. York is a pilot institution for this project so we will be improving and refining our processes and workflows for the management and preservation of research data through this collaboration.

Another priority for the year is to make progress with the preservation of the born digital data that is held by the Borthwick Institute for Archives. Over the year we will be planning a different set of Archivematica workflows specifically for the archives. I'm really excited about seeing this take shape.

We are also thrilled to be hosting the first European ArchivematiCamp here in York in the Spring. This will be a great opportunity to get current and potential Archivematica users across the UK and the rest of Europe together to share experiences and find out more about the system. There will no doubt be announcements about this over the next couple of months once the details are finalised so watch this space.

Ingest processes

Last year a new ingest PC arrived on my desk. I haven't yet had much chance to play with this but the plan is to get this set up for digital ingest work.

I'm keen to get BitCurator installed and to refine our current digital ingest procedures. After some useful chats about BitCurator with colleagues in the UK and the US over 2016 I'm very much looking forward to getting stuck into this.

...but really the first challenge of 2017 is to tidy my desk!

Impressions of Jerusalem and Tel Aviv

Published 3 Jan 2017 by Tom Wilson in tom m wilson.

Arriving in Israel… Coming over the border from Jordan it was forbidding and stern – as though I was passing through a highly militarised zone, which indeed I was. Machine gun towers, arid, blasted dune landscape, and endless security checks and waiting about. Then I was in the West Bank. The first thing I noticed […]


Published 29 Dec 2016 by Tom Wilson in tom m wilson.

I have been travelling West from Asia.  When I was in Colombo I photographed a golden statue of the Buddha facing the Greco-Roman heritage embodied in Colombo’s Town Hall.  And now I’ve finally reached a real example of the Roman Empire’s built heritage – the city of Jerash in Jordan.  Jerash is one of the […]

We Are Bedu

Published 26 Dec 2016 by Tom Wilson in tom m wilson.

While in Wadi Musa I had met our Bedu guide’s 92 year old mother. She was living in an apartment in the town. I asked her if she preferred life when she was a young woman and there was less access to Western conveniences, or if she preferred life in the town today. She told me […]

Montreal Castle

Published 26 Dec 2016 by Tom Wilson in tom m wilson.

  I’ve been at Montreal (known Arabic as Shawbak) Castle, a crusader castle south of Wadi Musa. Standing behind the battlements I had looked through a slit in the stone.  Some of this stone had been built by Christians from Western Europe around 1115 AD in order to take back the Holy Land from Muslims. Through […]


Published 26 Dec 2016 by Tom Wilson in tom m wilson.

Mountains entered. Size incalculable. Mystical weight and folds of stone. Still blue air. The first day in Petra we headed out to Little Petra, a few kms away from the more famous site, where a narrow canyon is filled with Nabatean caves, carved around 2000 years ago. On the way we took a dirt track […]

Now That’s What I Call Script-Assisted-Classified Pattern Recognized Music

Published 24 Dec 2016 by Jason Scott in ASCII by Jason Scott.

Merry Christmas; here is over 500 days (12,000 hours) of music on the Internet Archive.

Go choose something to listen to while reading the rest of this. I suggest either something chill or perhaps this truly unique and distinct ambient recording.


Let’s be clear. I didn’t upload this music, I certainly didn’t create it, and actually I personally didn’t classify it. Still, 500 Days of music is not to be ignored. I wanted to talk a little bit about how it all ended up being put together in the last 7 days.

One of the nice things about working for a company that stores web history is that I can use it to do archaeology against the company itself. Doing so, I find that the Internet Archive started soliciting “the people” to begin uploading items en masse around 2003. This is before YouTube, and before a lot of other services out there.

I spent some time tracking dates of uploads, and you can see various groups of people gathering interest in the Archive as a file destination in these early 00’s, but a relatively limited set all around.

Part of this is that it was a little bit of a non-intuitive effort to upload to the Archive; as people figured it all out, they started using it, but a lot of other people didn’t. Meanwhile, Youtube and other also-rans come into being and they picked up a lot of the “I just want to put stuff up” crowd.

By 2008, things start to take off for Internet Archive uploads. By 2010, things take off so much that 2008 looks like nothing. And now it’s dozens or hundreds of uploads of multi-media uploads a day through all the Archive’s open collections, not to count others who work with specific collections they’ve been given administration of.

In the case of the general uploads collection of audio, which I’m focusing on in this entry, the number of items is now at over two million.

This is not a sorted, curated, or really majorly analyzed collection, of course. It’s whatever the Internet thought should be somewhere. And what ideas they have!

Quality is variant. Finding things is variant, although the addition of new search facets and previews have made them better over the years.

I decided to do a little experiment: slight machine-assisted “find some stuff” sorting. Let it loose on 2 million items in the hopper, see what happens. The script was called Cratedigger.

Previously, I did an experiment against keywording on texts at the archive – the result was “bored intern” level, which was definitely better than nothing, and in some cases, that bored internet could slam through a 400 page book and determine a useful word cloud in less than a couple seconds. Many collections of items I uploaded have these word clouds now.

It’s a little different with music. I went about it this way with a single question:

Cratediggers is not an end-level collection – it’s a holding bay to do additional work, but it does show the vast majority of people would upload a sound file and almost nothing else. (I’ve not analyzed quality of description metadata in the no-image items – that’ll happen next.) The resulting ratio of items-in-uploads to items-for-cratediggers is pretty striking – less than 150,000 items out of the two million passed this rough sort.

The Bored Audio Intern worked pretty OK. By simply sending a few parameters, The Cratediggers Collection ended up building on itself by the thousands without me personally investing time. I could then focus on more specific secondary scripts that do things and an even more lazy manner, ensuring laziness all the way down.

The next script allowed me to point to an item in the cratediggers collection and say “put everything by this uploader that is in Cratediggers into this other collection”, with “this other collection” being spoken word, sermons, or music. In general, a person who uploaded music that got into Cratediggers generally uploaded other music. (Same with sermons and spoken word.) It worked well enough that as I ran these helper scripts, they did amazingly well. I didn’t have to do much beyond that.

As of this writing, the music collection contains over 400 solid days of Music. They are absolutely genre-busting, ranging from industrial and noise all the way through beautiful Jazz and acapella. There are one-of-a-kind Rock and acoustic albums, and simple field recordings of Live Events.

And, ah yes, the naming of this collection… Some time ago I took the miscellaneous texts and writings and put them into a collection called Folkscanomy.

After trying to come up with the same sort of name for sound, I discovered a very funny thing: you can’t really attached any two words involving sound together and not already have some company that has the name of Manufacturers using it. Trust me.

And that’s how we ended up with Folksoundomy.

What a word!

The main reason for this is I wanted something unique to call this collection of uploads that didn’t imply they were anything other than contributed materials to the Archive. It’s a made-up word, a zesty little portmanteau that is nowhere else on the Internet (yet). And it leaves you open for whatever is in them.

So, about the 500 days of music:

Absolutely, one could point to YouTube and the mass of material being uploaded there as being superior to any collection sitting on the archive. But the problem is that they have their own robot army, which is a tad more evil than my robotic bored interns; you have content scanners that have both false positives and strange decorations, you have ads being put on the front of things randomly, and you have a whole family of other small stabs and Jabs towards an enjoyable experience getting in your way every single time. Internet Archive does not log you, require a login, or demand other handfuls of your soul. So, for cases where people are uploading their own works and simply want them to be shared, I think the choice is superior.

This is all, like I said, an experiment – I’m sure the sorting has put some things in the wrong place, or we’re missing out on some real jewels that didn’t think to make a “cover” or icon to the files. But as a first swipe, I moved 80,000 items around in 3 days, and that’s more than any single person can normally do.

There’s a lot more work to do, but that music collection is absolutely filled with some beautiful things, as is the whole general Folksoundomy collection. Again, none of this is me, or some talent I have – this is the work of tens of thousands of people, contributing to the Archive to make it what it is, and while I think the Wayback Machine has the lion’s share of the Archive’s world image (and deserves it), there’s years of content and creation waiting to be discovered for anyone, or any robot, that takes a look.

My Top Ten Gigs (as a Punter) and Why

Published 24 Dec 2016 by Dave Robertson in Dave Robertson.

I had a dream. In the dream I had a manager. The manager told me I should write a “list” style post, because they were trending in popularity. She mumbled something about the human need for arbitrary structure amongst the chaos of existence. Anyway, these short anecdotes and associated music clips resulted. I think I really did attend these gigs though, and not just in a dream.

10. Dar Williams at Phoenix Concert Theatre, Toronto, Canada – 20 August 2003

You don’t need fancy instrumentation when you’re as charming, funny and smart as Dar Williams. One of her signature tunes, The Christians and the Pagans, seems appropriate to share this evening, given the plot takes place on Christmas Eve.

9. Paul Kelly at Sidetrack Cafe, Edmonton, Canada – 18 March 2004

The memorable thing about this gig was all the Aussies coming out of the woodwork of this icy Prairie oil town, whose thriving music underbelly was a welcome surprise to me. Incidentally, the Sidetrack Cafe is the main location of events in “For a Short Time” by fellow Aussie songwriter Mick Thomas. Tiddas did a sweet cover of this touching song:

8. Hussy Hicks at the Town Hall, Nannup – 5 March 2016

Julz Parker and Leesa Gentz have serious musical chops. Julz shreds on guitar and Leesa somehow manages not to shred her vocal chords despite belting like a beautiful banshee. Most importantly they have infectious fun on stage, and I could have picked any of the gigs I’ve been to, but I’ll go with the sweat anointed floor boards of one of their infamous Nannup Town Hall shows. This video is a good little primer on the duo.

7. The National at Belvoir Amphitheatre, Swan Valley – 14 February 2014

After this gig I couldn’t stop dancing in the paddock with friends and strangers amongst the car headlights. The National are a mighty fine indie rock band, fronted by the baritone voice of Matt Berninger. He is known for downing a bottle of wine on stage, and is open about it being a crutch to deal with nerves and get in the zone. This clip from Glastonbury is far from his best vocal delivery, but its hard to argue that its not exciting and the audience are certainly on his wavelength!

6. Kathleen Edwards at Perth Concert Hall balcony – 17 February 2006

I was introduced to Kathleen Edwards by a girlfriend who covered “Hockey Skates” and I didn’t hesitate to catch her first, and so far only, performance in Perth. The easy banter of this fiery red head, and self proclaimed potty mouth, included warning a boisterous women in the audience that her husband/guitarist, Colin Cripps, was not “on the market”. Change the Sheets is a particularly well produced song of Kathleen’s, engineered by Justin Vernon (aka Bon Iver):

5. The Cure at Perth Arena – 31 July 2016

One of the world’s most epic bands, they swing seamlessly from deliriously happy pop to gut-wrenching rock dirges, all with perfectly layered instrumentation. This was third Cure show and my favourite, partly because I was standing (my preferred way to experience any energetic music) and also great sound that meant I didn’t need my usual ear plugs. Arguably the best Cure years were 85 to 92 when they had Boris Williams on drums, but this was a fine display and at the end of the three hours I wanted them to keep playing for three more. “Lovesong” is my innocent karaoke secret:

4. Lucie Thorne & Hamish Stuart in my backyard – 26 Feburary 2014

I met Lucie Thorne at a basement bar called the Green Room in Vancouver in 2003. She is the master of the understatement, with a warm voice that glides out the side of her mouth, and evocative guitar work cooked just the right amount. Her current style is playing a Guild Starfire through a tremolo pedal into a valve amp, while being accompanied by the tasteful jazz drumming legend Hamish Stuart. Here’s a clip of the house concert in question:

3. Ryan Adams and the Cardinals at Metropolis, Fremantle – 25 January 2009

The first review I read of a Ryan Adam’s album said he could break hearts singing a shopping list, and he’s probably the artist I’ve listened to the most in the last decade. He steals ideas from the greats of folk, country, rock, metal, pop and alt-<insert genre>, but does it so well and so widely, and with such a genuine love and talent for music. I’m glad I caught The Cardinals in their prime and there was a sea of grins flowing out onto the street after the three hour show. This stripped back acoustic version of “Fix It” is one of my favourites:

2. Damien Rice at Civic Hotel – 9 October 2004

I feel Damien Rice’s albums, with the exception of “B-Sides”, are over-produced with too many strings garishly trying to tug your heart strings. Live and solo however, Damien is a rare force with no strings attached or required. I heard a veteran music producer say the only solo live performer he’s seen with a similar power over an audience was Jeff Buckley. I remember turning around once at the Civic Hotel gig and seeing about half the audience in tears, and I was well and truly welling up.

1. Portland Cello Project performing Radiohead’s Ok Computer at Aladin Theatre, Portland, Oregon – 22 September 2012

Well if crying is going to be a measure of how good a gig is then choosing my number one is easy. I cried all the way through the Portland Cello Project’s performance of Ok Computer and wrote a whole separate post about that.

Honourable mentions:

Joe Pug at Hardly Strictly Bluegrass, San Francisco – October 2012.

Yothu Yindi at Curtin University – 1996

Billy Bragg at Enmore Theatre, Sydney – 14 April 1999

Sally Dastey at Mojos – 2004

CR Avery at Marine Club in Vancouver – 28 November 2003

Jill Sobule at Vancouver Folk Festival – July 2003

Let the Cat Out in my lounge room – 2011

Martha Wainwright atFly By Night, Fremantle – 22 November 2008

The Mountain Goats at The Bakery, Perth –  1 May 2012… coming to town again in April – come!


texvc back in Debian

Published 23 Dec 2016 by legoktm in The Lego Mirror.

Today texvc was re-accepted for inclusion into Debian. texvc is a TeX validator and converter than can be used with the Math extension to generate PNGs of math equations. It had been removed from Jessie when MediaWiki itself was removed. However, a texvc package is still useful for those who aren't using the MediaWiki Debian package, since it requires OCaml to build from source, which can be pretty difficult.

Pending no other issues, texvc will be included in Debian Stretch. I am also working on having it included in jessie-backports for users still on Jessie.

And as always, thanks to Moritz for reviewing and sponsoring the package!

MediaWiki not creating a log file and cannot access the database

Published 22 Dec 2016 by sealonging314 in Newest questions tagged mediawiki - Webmasters Stack Exchange.

I'm trying to set up MediaWiki on an Apache2 server. Currently, when I navigate to the directory where the wiki is stored in my web browser, I see the contents of LocalSettings.php dumped on the screen, as well as this error message:

Sorry! This site is experiencing technical difficulties.

Try waiting a few minutes and reloading.

(Cannot access the database)

I have double-checked the database name, username, and password in LocalSettings.php, and I am able to log in using these credentials on the web server. I am using a mysql database.

I have been trying to set up a debug log so that I can see a more detailed error message. Here's what I've added to my LocalSettings.php:

$wgDebugLogFile = "/var/log/mediawiki/debug-{$wgDBname}.log";

The directory /var/log/mediawiki has 777 permissions, but no log file is even created. I've tried restarting the Apache server, which doesn't help.

Why is MediaWiki not creating a debug log? Are there other logs that I should be looking at for more detailed error messages? What could the reason be for the error message that I'm getting?

Arriving in Jordan

Published 18 Dec 2016 by Tom Wilson in tom m wilson.

I’ve arrived in the Middle East, in Jordan.  It is winter here.  Yesterday afternoon I visited the Amman Citadel, a raised acropolis in the centre of the capital. It lies atop a prominent hill in the centre of the city, and as you walk around the ruins of Roman civilisation you look down on box-like limestone-coloured apartment […]

Clone an abandoned MediaWiki site

Published 17 Dec 2016 by Bob Smith in Newest questions tagged mediawiki - Webmasters Stack Exchange.

Is there any way to clone a MediaWiki site that's been abandoned by the owner and all admins? None of the admins have been seen in 6 months and all attempts to contact any of them over the past 3-4 months have failed and the community is worried for the future of the Wiki. We have all put countless man-hours into the Wiki and to lose it now would be beyond devastating.

What would be the simplest way to go about this?


Sri Lanka: The Green Island

Published 12 Dec 2016 by Tom Wilson in tom m wilson.

I just arrived in Tangalle.  What a journey… local bus from Galle Fort. Fast paced Hindi music, big buddha in the ceiling with flashing lights, another buddha on the dash board of the bus wrapped in plastic, a driver who swung the old 1970s Leyland bus around corners to the point where any more swing […]

Spices and Power in the Indian Ocean

Published 12 Dec 2016 by Tom Wilson in tom m wilson.

I’m in Galle, on the south-east coast of Sri Lanka. From the rooftop terrace above the hotel room I’m sitting in the sound of surf gently crumbling on the reef beyond the Fort’s ramparts can be heard, and the breathing Indian ocean is glimpsed through tall coconut trees. The old city juts out into the […]

Digital Preservation Awards 2016 - celebrating collaboration and innovation

Published 7 Dec 2016 by Jenny Mitcham in Digital Archiving at the University of York.

Last week members of the Filling the Digital Preservation Gap project team were lucky enough to experience the excitement and drama of the biannual Digital Preservation Awards!

The Awards ceremony was held at the Wellcome Collection in London on the evening of the 30th November. As always it was a glittering affair, complete with dramatic bagpipe music (I believe it coincided with St Andrew's Day!) and numerous references to Strictly Come Dancing from the judges and hosts!

This year our project had been shortlisted for the Software Sustainability Institute award for Research and Innovation. It was fantastic to be a finalist considering the number of nominations from across the world in this category and we certainly felt we had some strong competition from the other shortlisted projects.

One of the key strengths in our own project has been the collaboration between the Universities of York and Hull. Additionally, collaboration with Artefactual Systems, The National Archives and the wider digital preservation community has also been hugely beneficial.

Interestingly, collaboration was a key feature of all the finalists in this category, perhaps demonstrating just how important this is in order to make effective progress in this area.

The 4C project "Collaboration to Clarify the Costs of Curation" was a European project which looked at costs and benefits relating to digital preservation activities within its partner organisations and beyond. Project outputs in use across the sector include the Curation Costs Exchange.

The winner in our category however was the Dutch National Coalition for Digital Preservation (NCDD) with Constructing a Network of Nationwide Facilities Together. Again there was a strong focus on collaboration - this time cross-domain collaboration within the Netherlands. Under the motto "Joining forces for our digital memory", the project has been constructing a framework for a national shared infrastructure for digital preservation. This collaboration aimed to ensure that each institution does not have to reinvent the wheel as they establish their own digital preservation facilities. Clearly an ambitious project, and perhaps one we can learn from in the UK Higher Education sector as we work with Jisc on their Shared Service for Research Data.

Some of the project team from York and Hull at the awards reception

The awards ceremony itself came at the end of day one of the PERICLES conference where there was an excellent keynote speech from Kara Van Malssen from AV Preserve (her slides are available on SlideShare - I'd love to know how she creates such beautiful slides!).

In the context of the awards ceremony I was pondering one of the messages of Kara's talk that discussed our culture of encouraging and rewarding constant innovation and the challenges that this brings - especially for those of us who are 'maintainers'.

Maintainers maintain systems, services and the status quo - some of us maintain digital objects for the longer term and ensure we can continue to provide access to them. She argued that there are few rewards for maintainers and the incentives generally go to those who are innovating. If those around us are always chasing the next shiny new thing, how can the digital preservation community keep pace?

I would argue however that in the world of digital preservation itself, rewards for innovation are not always forthcoming. It can be risky for an institution to be an innovator in this area rather than doing what we have always done (which may actually bring risks of a different kind!) and this can stifle progress or lead to inaction.

This is why for me, the Digital Preservation Awards are so important. Being recognised as a finalist for the Research and Innovation award sends a message that what we have achieved is worthwhile and demonstrates that doing something different is A Good Thing.

For that I am very grateful. :-)

wikidiff2 1.4.1

Published 7 Dec 2016 by legoktm in The Lego Mirror.

In MediaWiki 1.28, MaxSem improved diff limits in the pure PHP diff implementation that ships with MediaWiki core. However Wikimedia and other larger wikis use a PHP extension called wikidiff2, for better performance and additional support for Japanese, Chinese, and Thai.

wikidiff2 1.4.1 is now available in Debian unstable and will ship in stretch, and should soon be available in jessie-backports and my PPA for Ubuntu Trusty and Xenial users. This is the first major update of the package in two years. And installation in MediaWiki 1.27+ is now even more straightforward, as long as the module is installed, it will automatically be used, no global configuration required.

Additionally, releases of wikidiff2 will now be hosted and signed on

Tropical Architecture – Visiting Geoffrey Bawa’s Place

Published 6 Dec 2016 by Tom Wilson in tom m wilson.

I’ve arrived in Sri Lanka. Let me be honest: first impressions of Colombo bring forth descriptors like pushy, moustache-wearing, women-dominating, smog-covered, coarse, opportunistic and disheveled. It is not a city that anybody should rush to visit.  However this morning I found my way through this city to a tiny pocket of beauty and calm – the […]

Housing the Fairbairn Collection

Published 6 Dec 2016 by slwacns in State Library of Western Australia Blog.

The Fairbairn collection includes over 100 artefacts of various types; clothing, a sword,  hair ornaments made out of human hair, items used for sewing , just to name a few. All of these objects need to be stored in the best possible way.

Click to view slideshow.

Housing is the process of making protective enclosures for objects to be stored in. By housing an object or group of objects we are creating a micro environment; temperature and humidity become more stable, direct light is deflected, materials are not damaged when handled or when placed on a shelf. Housing can be a box, folder or tray that has been custom made and fitted out to the exact requirements of the object. Inert materials and/or  acid free board are used.

Some of the objects in the Fairbairn collection required conservation treatment before they were housed. For example, the leather had detached from the front of this object but was reattached during treatment.

Some objects required individual housing (for example clothing items, sword and shoes) but the majority of the objects could be housed in groups. These groups were determined by object type and the material it was made of (for example all the coin purses made from similar materials are in a group).


This was done not only for ease of locating a particular object but because different material types can need different storage conditions and some materials can affect other materials if stored together (for example the vapours released from wood can cause metals to corrode).


Each object was arranged to fit into a box in such a way so that its weight would be evenly supported and so that it can be retrieved without being damaged or damaging neighbouring objects. Then layers of board and/or foam were built up to support the items.


Labels were placed to give direction on safely removing the objects from there housing. Labels were also placed on the outside of the boxes to identify what each box holds  as well as the correct way to place each object inside the box.


Custom supports were made for some objects. For example the internal support for this hat.


Each item in the Fairbairn collection has now been housed and placed carefully into long term storage with the rest of the State Library of Western Australia’s collection.

Filed under: SLWA collections, State Library of Western Australia, Uncategorized, WA, Western Australia Tagged: collection, conservation, Fairbairn, Housing, slwa, State Library of WA, State Library of Western Australia

Walking to the Mountain Monastery

Published 4 Dec 2016 by Tom Wilson in tom m wilson.

That little dot in the north west of south-east Asia is Chiang Mai.  As you can see there is a lot of darkness around it.  Darkness equals lots of forest and mountains. I’ve recently returned from the mountains to Chiang Mai.  Its very much a busy and bustling city, but even here people try to bring […]

Where is "MediaWiki:Vector.css" of my MediaWiki

Published 4 Dec 2016 by hasanghaforian in Newest questions tagged mediawiki - Webmasters Stack Exchange.

I want to install Skin:Vector-DarkCSS on my MediaWiki. It must be simple, but second step of installation instructions syas I have to edit MediaWiki:Vector.css on my wiki. I searched for file with name MediaWiki:Vector.css, but could not found in MediaWiki home. Where is that file? Do I need to create that?

Forget travel guides.

Published 29 Nov 2016 by Tom Wilson in tom m wilson.

Lonely Planet talks up every country in the world, and if you read their guides every city and area seems to have a virtue worth singing. But the fact is that we can’t be everywhere and are forced to choose where to be as individuals on the face of this earth. And some places are just […]

MediaWiki VisualEditor Template autocomplete

Published 29 Nov 2016 by Patrick in Newest questions tagged mediawiki - Webmasters Stack Exchange.

Running MediaWiki 1.28, but I had this problem with 1.27, and was hoping it would be resolved.

I am using VisualEditor, and would like my users to be able to get an autocomplete when inserting a template.

I have TemplateData installed. And can confirm api.php is returning matches

62:{title: "Template:DefaultHeader", params: {},…}
117:{title: "Template:DefaultFooter", params: {},…}

But I don't get a drop down, and there is no errors in the debug console

Back That Thing Up

Published 29 Nov 2016 by Jason Scott in ASCII by Jason Scott.


I’m going to mention two backup projects. Both have been under way for some time, but the world randomly decided the end of November 2016 was the big day, so here I am.

The first is that the Internet Archive is adding another complete mirror of the Wayback machine to one of our satellite offices in Canada. Due to the laws of Canada, to be able to do “stuff” in the country, you need to set up a separate company from your US concern. If you look up a lot of major chains and places, you’ll find they all have Canadian corporations. Well, so does the Internet Archive and that separate company is in the process of getting a full backup of the Wayback machine and other related data. It’s 15 petabytes of material, or more. It will cost millions of dollars to set up, and that money is already going out the door.

So, if you want, you can go to the donation page and throw some money in that direction and it will make the effort go better. That won’t take very long at all and you can feel perfectly good about yourself. You need read no further, unless you have an awful lot of disk space, at which point I suggest further reading.


Whenever anything comes up about the Internet Archive’s storage solutions, there’s usually a fluttery cloud of second-guessing and “big sky” suggestions about how everything is being done wrong and why not just engage a HBF0_X2000-PL and fark a whoziz and then it’d be solved. That’s very nice, but there’s about two dozen factors in running an Internet Archive that explain why RAID-1 and Petabyte Towers combined with self-hosting and non-cloud storage has worked for the organization. There are definitely pros and cons to the whole thing, but the uptime has been very good for the costs, and the no-ads-no-subscription-no-login model has been working very well for years. I get it – you want to help. You want to drop the scales from our eyes and you want to let us know about the One Simple Trick that will save us all.

That said, when this sort of insight comes out, it’s usually back-of-napkin and done by someone who will be volunteering several dozen solutions online that day, and that’s a lot different than coming in for a long chat to discuss all the needs. I think someone volunteering a full coherent consult on solutions would be nice, but right now things are working pretty well.

There are backups of the Internet Archive in other countries already; we’re not that bone stupid. But this would be a full, consistently, constantly maintained full backup in Canada, and one that would be interfaced with other worldwide stores. It’s a preparation for an eventuality that hopefully won’t come to pass.

There’s a climate of concern and fear that is pervading the landscape this year, and the evolved rat-creatures that read these words in a thousand years will be able to piece together what that was. But regardless of your take on the level of concern, I hope everyone agrees that preparation for all eventualities is a smart strategy as long as it doesn’t dilute your primary functions. Donations and contributions of a monetary sort will make sure there’s no dilution.

So there’s that.

Now let’s talk about the backup of this backup a great set of people have been working on.


About a year ago, I helped launch INTERNETARCHIVE.BAK. The goal was to create a fully independent distributed copy of the Internet Archive that was not reliant on a single piece of Internet Archive hardware and which would be stored on the drives of volunteers, with 3 geographically distributed copies of the data worldwide.

Here’s the current status page of the project. We’re backing up 82 terabytes of information as of this writing. It was 50 terabytes last week. My hope is that it will be 1,000 terabytes sooner rather than later. Remember, this is 3 copies, so to do each terabyte needs three terabytes.

For some people, a terabyte is this gigantically untenable number and certainly not an amount of disk space they just have lying around. Other folks have, at their disposal, dozens of terabytes. So there’s lots of hard drive space out there, just not evenly distributed.

The IA.BAK project is a complicated one, but the general situation is that it uses the program git-annex to maintain widely-ranged backups from volunteers, with “check-in” of data integrity on a monthly basis. It has a lot of technical meat to mess around with, and we’ve had some absolutely stunning work done by a team of volunteering developers and maintainers (and volunteers) as we make this plan work on the ground.

And now, some thoughts on the Darkest Timeline.


I’m both an incredibly pessimistic and optimistic person. Some people might use the term “pragmatic” or something less charitable.

Regardless, I long ago gave up assumptions that everything was going to work out OK. It has not worked out OK in a lot of things, and there’s a lot of broken and lost things in the world. There’s the pessimism. The optimism is that I’ve not quite given up hope that something can’t be done about it.

I’ve now dedicated 10% of my life to the Internet Archive, and I’ve dedicated pretty much all of my life to the sorts of ideals that would make me work for the Archive. Among those ideals are free expression, gathering of history, saving of the past, and making it all available to as wide an audience, without limit, as possible. These aren’t just words to me.

Regardless of if one perceives the coming future as one rife with specific threats, I’ve discovered that life is consistently filled with threats, and only vigilance and dedication can break past the fog of possibilities. To that end, the Canadian Backup of the Internet Archive and the IA.BAK projects are clear bright lines of effort to protect against all futures dark and bright. The heritage, information and knowledge within the Internet Archive’s walls are worth protecting at all cost. That’s what drives me and why these two efforts are more than just experiments or configurations of hardware and location.

So, hard drives or cash, your choice. Or both!

Countryman – Retreating to North-West Thailand

Published 29 Nov 2016 by Tom Wilson in tom m wilson.

Made it to Cave Lodge in the small village of Tham Lot.  The last time I was here was seven years ago. I’m sitting on a hammock above the softly flowing river and the green valley. A deeply relaxing place. I arrived here a few days ago. We came on our motorbike taxis from the main […]

De Anza students football fandoms endure regardless of team success

Published 28 Nov 2016 by legoktm in The Lego Mirror.

Fans of the San Francisco 49ers and Oakland Raiders at De Anza College are loyal to their teams even when they are not doing well, but do prefer to win.

The Raiders lead the AFC West with a 9-2 record, while the 49ers are last in the NFC West with a 1-10 record. This is a stark reversal from 2013, when the 49ers were competing in the Super Bowl and the Raiders finished the season with a 4-12 record, as reported by The Mercury News.

49ers fans are not bothered though.

“My entire family is 49ers fans, and there is no change in our fandom due to the downturn,” said Joseph Schmidt.

Schmidt recently bought a new 49ers hat that he wears around campus.

Victor Bejarano concurred and said, “I try to watch them every week, even when they’re losing.”

A fan since 2011, he too wears a 49ers hat around campus to show his support for the team.

Sathya Reach said he has stopped watching the 49ers play not because of their downfall, but because of an increased focus on school.

“I used to watch (the 49ers) with my cousins, not so much anymore,” Reach said.

Kaepernick in 2012 Mike Morbeck/CC-BY-SA

Regardless of their support, 49ers fans have opinions on how the team is doing, mostly about 49ers quarterback Colin Kaepernick. Kaepernick protests police brutality against minorities before each game by kneeling during the national anthem. His protest placed him on the cover of TIME magazine, and ranked as the most disliked player in the NFL in a September poll conducted by E-Poll Marketing Research.

Bejarano does not follow Kaepernick’s actions off the field, but said that on the field, Kaepernick was not getting the job done.

“He does what he does, and has his own reasons,” Reach said.

Self-described Raider “fanatic” Mike Nijmeh agreed, calling Kaepernick a bad quarterback.

James Stewart, a Raiders’ fan since 5 years old, disagreed and said, “I like Kaepernick, and wouldn’t mind if he was a Raiders’ backup quarterback.”

Reader Poll: Could Derek Carr be the MVP this year?
Maybe in 5 years
Tom Brady

Both Nijmeh and Stewart praised the Raiders' quarterback, Derek Carr, and Nijmeh, dressed in his Raiders hat, jacket and jersey, said, “Carr could easily be the MVP this year.”

Stewart said that while he also thought Carr is MVP caliber, Tom Brady, the quarterback of the New England Patriots, is realistically more likely to win.

“Maybe in five years,” said Stewart, explaining that he expected Brady to have retired by then.

He is not the only one, as Raider teammate Khalil Mack considers Carr to be a potential MVP, reported USA Today. USA Today Sports’ MVP tracker has Carr in third.

Some 49ers fans are indifferent about the Raiders, others support them because of simply being in the Bay Area, and others just do not like them.

Bejarano said that he supports the Raiders because they are a Bay Area team, but that it bothers him that they are doing so well in contrast to the 49ers.

Nijmeh summed up his feelings by saying the Raiders’ success has made him much happier on Sundays.

Related Stories:


Published 26 Nov 2016 by mblaney in Tags from simplepie.

Merge pull request #495 from mblaney/master

New release 1.4.3

Karen Village Life

Published 26 Nov 2016 by Tom Wilson in tom m wilson.

The north-west corner of Thailand is the most sparsely populated corner of the country.  Mountains, forests and rivers, as far as the eye can see.  And sometimes a village. This village is called Menora.  Its a Karen village, without electricity or running water.  Its very, very remote and not mapped on Google Maps. Living out […]


Published 25 Nov 2016 by timbaker in Tim Baker.

So, I just spent two days walking, and paddling, from Byron Bay to the Gold Coast – well Fingal, actually, just over the border in Northern NSW. I was fortunate enough to be invited along with a group of Indigenous dad’s and their sons on...

Thai Forest Buddhism

Published 22 Nov 2016 by Tom Wilson in tom m wilson.

  The forests of Thailand have been retreat for, particularly since the 1980s.  Forest monks, who go to the forests to meditate, have seen their home get smaller and smaller.  In some cases this has prompted them to become defenders of the forest, for example performing tree ordination ceremonies, effectively ordaining a tree in saffron robes […]

Every little bit helps: File format identification at Lancaster University

Published 21 Nov 2016 by Jenny Mitcham in Digital Archiving at the University of York.

This is a guest post from Rachel MacGregor, Digital Archivist at Lancaster University. Her work on identifying research data follows on from the work of Filling the Digital Preservation Gap and provides a interesting comparison with the statistics reported in a previous blog post and our final project report.

Here at Lancaster University I have been very inspired by the work at York on file format identification and we thought it was high time I did my own analysis of the one hundred or so datasets held here.  The aim is to aid understanding of the nature of research data as well as to inform our approaches to preservation.  Our results are comparable to York's in that the data is characterised as research data (as yet we don't have any born digital archives or digitised image files).  I used DROID (version 6.2.1) as the tool for file identification - there are others and it would be interesting to compare results at some stage with results from using other software such as FILE (FITS), Apache Tika etc.

The exercise was carried out using the following signature files: DROID_SignatureFile_V88 and container-signature-file-20160927.  The maximum number of bytes DROID was set to scan at the start and end of each file was 65536 (which is the default setting when you install DROID).

Summary of the statistics:

There were a total of 24,705 files (so a substantially larger sample than in the comparable study at York)

Of these: 
  • 11008 (44.5%) were identified by DROID and 13697 (55.5%) not.
  • 99.3% were given one file identification and 76 files had multiple identifications.  
    • 59 files had two possible identifications
    • 13 had 3 identifications
    • 4 had 4 possible identifications.  
  • 50 of these files were asc files identified (by extension) as either 8-bit or 7-bit ASCII text files.  The remaining 26 were identified by container as various types of Microsoft files. 

Files that were identified

Of the 11008 identified files:
  • 89.34% were identified by signature: this is the overwhelming majority, far more than in Jen's survey
  • 9.2% were identified by extension, a much smaller proportion than at York
  • 1.46% identified by container

However there was one large dataset containing over 7,000 gzip files, all identified by signature which did skew the results rather.  With those files removed, the percentages identified by different methods were as follows:

  • 68% (2505) by signature
  • 27.5% (1013) by extension
  • 4.5% (161) by container

This was still different from York's results but not so dramatically.

Only 38 were identified as having a file extension mismatch (0.3%) but closer inspection may reveal more.  Of these most were Microsoft files with multiple id's (see above) but also a set of lsm files identified as TIFFs.  This is not a format I'm familiar with although it seems as if lsm is a form of TIFF file but how do I know if this is a "correct" id or not?

59 different file formats were identified, the most frequently occurring being the GZIP format (as mentioned above) with 7331 instances.  The next most popular was, unsurprisingly xml (similar to results at York) with 1456 files spread across the datasets.  The top 11 were:

Top formats identified by DROID for Lancaster University's research data

Files that weren't identified

There were 13697 files not identified by DROID of which 4947 (36%) had file extensions.  This means there was a substantial proportion of files with no file extension (64%). This is much higher than the result at York which was 26%. As at York there were 107 different extensions in the unidentified files of which the top ten were:

Top counts of unidentified file extensions

Top extensions of unidentified files

This top ten are quite different to York's results, though in both institutions dat files topped the list by some margin! We also found 20 inp and 32 out files which also occur in York's analysis. 

Like Jen at York I will be looking for a format to analyse further to create a signature - this will be a big step for me but will help my understanding of the work I am trying to do as well as contribute towards our overall understanding of file format types.

Every little bit helps.

In Which I Tell You It’s A Good Idea To Support a Magazine-Scanning Patreon

Published 20 Nov 2016 by Jason Scott in ASCII by Jason Scott.

So, Mark Trade and I have never talked, once.

All I know about Mark is that due to his efforts, over 200 scans of magazines are up on the Archive.


These are very good scans, too. The kind of scans that a person looking to find a long-lost article, verify a hard-to-grab fact, or needs to pass along to others a great image would kill to have. 600 dots per inch, excellent contrast, clarity, and the margins cut just right.


So, I could fill this entry with all the nice covers, but covers are kind of easy, to be frank. You put them face down on the scanner, you do a nice big image, and then touch it up a tad. The cover paper and the printing is always super-quality compared to the rest, so it’ll look good:


But the INSIDE stuff… that’s so much harder. Magazines were often bound in a way that put the images RIGHT against the binding and not every magazine did the proper spacing and all of it is very hard to shove into a scanner and not lose some information. I have a lot of well-meaning scans in my life with a lot of information missing.

But these…. these are primo.




When I stumbled on the Patreon, he had three patrons giving him $10 a month. I’d like it to be $500, or $1000. I want this to be his full-time job.

Reading the patreon page’s description of his process shows he’s taking it quite seriously. Steaming glue, removing staples. I’ve gone on record about the pros and cons of destructive scanning, but game magazines are not rare, just entirely unrepresented in scanned items compared to how many people have these things in their past.

I read something like this:

It is extremely unlikely that I will profit from your pledge any time soon. My scanner alone was over $4,000 and the scanning software was $600. Because I’m working with a high volume of high resolution 600 DPI images I purchased several hard drives including a CalDigit T4 20TB RAID array for $2,000. I have also spent several thousand dollars on the magazines themselves, which become more expensive as they become rarer. This is in addition to the cost of my computer, monitor, and other things which go into the creation of these scans. It may sound like I’m rich but really I’m just motivated, working two jobs and pursuing large projects.

…and all I think about is, this guy is doing so much amazing work that so many thousands could be benefiting from, and they should throw a few bucks at him for his time.

My work consists of carefully removing individual pages from magazines with a heat gun or staple-remover so that the entire page may be scanned. Occasionally I will use a stack paper cutter where appropriate and will not involve loss of page content. I will then scan the pages in my large format ADF scanner into 600 DPI uncompressed TIFFs. From there I either upload 300 DPI JPEGs for others to edit and release on various sites or I will edit them myself and store the 600 DPI versions in backup hard disks. I also take photos of magazines still factory-sealed to document their newsstand appearance. I also rip full ISOs of magazine coverdiscs and make scans of coverdisc sleeves on a color-corrected flatbed scanner and upload those to as well.

This is the sort of thing I can really get behind.

The Internet Archive is scanning stuff, to be sure, but the focus is on books. Magazines are much, much harder to scan – the book scanners in use are just not as easy to use with something bound like magazines are. The work that Mark is doing is stuff that very few others are doing, and to have canonical scans of the advertisements, writing and materials from magazines that used to populate the shelves is vital.

Some time ago, I’ve given all my collection of donated Game-related magazines to the Museum of Art and Digital Entertainment, because I recognized I couldn’t be scanning them anytime soon, and how difficult it was going to be to scan it. It would take some real major labor I couldn’t personally give.

Well, here it is. He’s been at it for a year. I’d like to see that monthly number jump to $100/month, $500/month, or more. People dropping $5/month towards this Patreon would be doing a lot for this particular body of knowledge.

Please consider doing it.


A Simple Explanation: VLC.js

Published 17 Nov 2016 by Jason Scott in ASCII by Jason Scott.

The previous entry got the attention it needed, and the maintainers of the VLC project connected with both Emularity developers and Emscripten developers and the process has begun.

The best example of where we are is this screenshot:


The upshot of this is that a javascript compiled version of the VLC player now runs, spits out a bunch of status and command line information, and then gets cranky it has no video/audio device to use.

With the Emularity project, this was something like 2-3 months into the project. In this case, it happened in 3 days.

The reasons it took such a short time were multi-fold. First, the VLC maintainers jumped right into it at full-bore. They’ve had to architect VLC for a variety of wide-ranging platforms including OSX, Windows, Android, and even weirdos like OS/2; to have something aimed at “web” is just another place to go. (They’d also made a few web plugins in the past.) Second, the developers of Emularity and Emscripten were right there to answer the tough questions, the weird little bumps and switchbacks.

Finally, everybody has been super-energetic about it – diving into the idea, without getting hung up on factors or features or what may emerge; the same flexibility that coding gives the world means that the final item will be something that can be refined and improved.

So that’s great news. But after the initial request went into a lot of screens, a wave of demands and questions came along, and I thought I’d answer some of them to the best of my abilities, and also make some observations as well.


When you suggest something somewhat crazy, especially in the programming or development world, there’s a variant amount of response. And if you end up on Hackernews, Reddit, or a number of other high-traffic locations, those reactions fall into some very predictable areas:

So, quickly on some of these:

But let’s shift over to why I think this is important, and why I chose VLC to interact with.

First, VLC is one of those things that people love, or people wish there was something better than, but VLC is what we have. It’s flexible, it’s been well-maintained, and it has been singularly focused. For a very long time, the goal of the project has been aimed at turning both static files AND streams into something you can see on your machine. And the machine you can see it on is pretty much every machine capable of making audio and video work.

Fundamentally, VLC is a bucket that, when dropped into with a very large variance of sound-oriented or visual-oriented files and containers, will do something with them. DVD ISO files become playable DVDs, including all the features of said DVDs. VCDs become craptastic but playable DVDs. MP3, FLAC, MIDI, all of them fall into VLC and start becoming scrubbing-ready sound experiences. There are quibbles here and there about accuracy of reproduction (especially with older MOD-like formats like S3M or .XM) but these are code, and fixable in code. That VLC doesn’t immediately barf on the rug with the amount of crapola that can be thrown at it is enormous.

And completing this thought, by choosing something like VLC, with its top-down open source condition and universal approach, the “closing of the loop” from VLC being available in all browsers instantly will ideally cause people to find the time to improve and add formats that otherwise wouldn’t experience such advocacy. Images into Apple II floppy disk image? Oscilloscope captures? Morse code evaluation? Slow Scan Television? If those items have a future, it’s probably in VLC and it’s much more likely if the web uses a VLC that just appears in the browser, no fuss or muss.


Fundamentally, I think my personal motivations are pretty transparent and clear. I help oversee a petabytes-big pile of data at the Internet Archive. A lot of it is very accessible; even more of it is not, or has to have clever “derivations” pulled out of it for access. You can listen to .FLACs that have been uploaded, for example, because we derive (noted) mp3 versions that go through the web easier. Same for the MPG files that become .mp4s and so on, and so on. A VLC that (optionally) can play off the originals, or which can access formats that currently sit as huge lumps in our archives, will be a fundamental world changer.

Imagine playing DVDs right there, in the browser. Or really old computer formats. Or doing a bunch of simple operations to incoming video and audio to improve it without having to make a pile of slight variations of the originals to stream. VLC.js will do this and do it very well. The millions of files that are currently without any status in the archive will join the millions that do have easy playability. Old or obscure ideas will rejoin the conversation. Forgotten aspects will return. And VLC itself, faced with such a large test sample, will get better at replaying these items in the process.

This is why this is being done. This is why I believe in it so strongly.


I don’t know what roadblocks or technical decisions the team has ahead of it, but they’re working very hard at it, and some sort of prototype seems imminent. The world with this happening will change slightly when it starts working. But as it refines, and as these secondary aspects begin, it will change even more. VLC will change. Maybe even browsers will change.

Access drives preservation. And that’s what’s driving this.

See you on the noisy and image-filled other side.

School Magazines

Published 17 Nov 2016 by leonieh in State Library of Western Australia Blog.

avon_northam_june_1939_cover_2016-10-26_0936School magazines provide a fascinating glimpse into the past.

What was high school like from 1915 through to the 1950s? What issues interested teenagers? How did they react to current events including two world wars? In what ways did they express themselves differently from today’s teens? What sort of jokes did they find amusing? (Hint: there are many of what we would call “dad jokes”.)

The State Library holds an extensive collection of school magazines from both public and private schools. Most don’t start until after 1954 which, as with newspapers, is our cut-off date for digitising, but we have digitised some early issues from public schools.


In the first part of the 19th Century they were generally produced by the students, with minimal input from school staff – and it shows. The quality of individual issues varies widely, depending, most probably, on the level of talent, interest and time invested by the responsible students.


Cricket cartoon Northam High School (The Avon) Sept. 1930

These magazines may include named photographs of prefects and staff, sporting teams and academic prize winners. Photographs from early editions tend to be of much higher quality, possibly because they were taken using glass negatives.


Essay competition. The subject: “A letter from Mr Collins congratulating Elizabeth on her engagement to Mr Darcy”  Phyllis Hand and Jean McIntyre were the prize winners.      Perth Girls’ School Magazine Nov. 1922

You will find poetry and essays, sketches by and of students, amateur cartooning, and many puns, jokes and limericks.

Some issues include ex-student notes with news about the careers, marriages and movements of past students. There is an occasional obituary.


Northam High School (The Avon) June 1943


Does anyone know these twins from Meckering?  Northam High School (The Avon) May 1925

Issues from the war years are particularly interesting and touching. You may also find rolls of honour naming ex-students serving in the forces.

There is also often advertising for local businesses.










Click to view slideshow.


Girls’ A Hockey Team Albany High School (Boronia) Dec. 1925

These magazines reflect the attitudes of their tight-knit local community of the time.  Expect to hear the same exhortations to strive for academic, moral and sporting excellence that we hear in schools today – while observing the (in retrospect) somewhat naïve patriotism and call to Empire and the occasional casual racism.


Click to view slideshow.

The following high school magazines for various dates are either available now online or will appear in the coming weeks: Perth Boys’ School MagazinePerth Girls’ School Magazine (later The Magpie); Fremantle Boys’ School; Northam High School (The Avon); Girdlestone High School (Coolibah); Eastern Goldfields Senior High School (The Golden Mile – later Pegasus); Bunbury High School (Kingia); Albany High School (Boronia) and Perth Modern (The Sphinx). None are complete and we would welcome donations of missing volumes to add to our Western Australian collections.

If you would like to browse our digitised high school magazines search the State Library catalogue using the term: SCHOOL MAGAZINES

*Some issues of The Magpie are too tightly bound for digitising so they are currently being disbound. They will then be digitised and rebound. Issues should appear in the catalogue in the near future.

Filed under: Family History, SLWA collections, SLWA news, State Library of Western Australia, Uncategorized, WA history, Western Australia Tagged: albany school, bunbury school, digitised magazines, fremantle boys' school, goldfields school, northam school, online magazines wa, perth boys' school, perth modern school, school magazines, State Library of WA, State Library of Western Australia, WA history

What do "Pro" users want?

Published 16 Nov 2016 by Carlos Fenollosa in Carlos Fenollosa — Blog.

My current machine is a 2013 i7 Macbook Air. It doesn't have the Pro label, however, It has two USB 3.0 ports, an SD slot and a Thunderbolt port. 12 hours of battery life. One of the best non-retina screens around. Judging by this week's snarky comments, it's more Pro than the 2016 Macbook Pro.

Me, I love this laptop. In fact, I love it so much that I bought it to replace an older MBA. I really hoped that Apple would keep selling the same model with a Retina screen and bumped specs.

But is it a Pro computer or not? Well, let me twist the language. I make my living with computers, so by definition it is. Let's put it another way around: I could have spent more money for a machine which has Pro in its name, but that wouldn't have improved my work output.

What is a Pro user?

So there's this big discussion on whether the Pro label means anything for Apple.

After reading dozens of reviews and blog posts, unsurprisingly, one discovers that different people have different needs. The bottom line is that a Pro user is someone who needs to get their work done and cannot tolerate much bullshit with their tools.

In my opinion, the new Macbook Pros are definitely a Pro machine, even with some valid criticisms. Apple product releases are usually followed by zesty discussions, but this time it's a bit different. It's not only angry Twitter users who are complaining; professional reviewers, engineers, and Pro users have also voiced their concerns.

I think we need to stop thinking that Apple is either stupid or malevolent. They are neither. As a public company, the metric by which their executives are evaluated is stock performance. Infuriating users for no reason only leads to decreasing sales, less benefits, and unhappy investors.

I have some theories on why Apple seems to care less about the Mac, and why many feel the need to complain.

Has the Pro market changed?

Let's be honest: for the last five years Apple probably had the best and most popular computer lineup and pricing in their history. All markets (entry, pro, portability, desktops) had fantastic machines which were totally safe to buy and recommend, at extremely affordable prices.

I've seen this myself. In Spain, as one of the poorest EU countries, Apple is not hugely popular. Macs and iPhones are super expensive, and many find it difficult to justify an Apple purchase on their <1000€ salary.

However, in the last three to five years, everybody seemed to buy a Mac, even friends of mine who swore they would never do it. They finally caved in, not because of my advice, but because their non-nerd friends recommend MBPs. And that makes sense. In a 2011 market saturated by ultraportables, Windows 8, and laptops which break every couple years, Macs were a great investment. You can even resell them after five years for 50% of their price, essentially renting them for half price.

So what happened? Right now, not only Pros are using the Macbook Pro. They're not a professional tool anymore, they're a consumer product. Apple collects usage analytics for their machines and, I suppose, makes informed decisions, like removing less used ports or not increasing storage on iPhones for a long time.

What if Apple is being fed overwhelmingly non-Pro user data for their Pro machines and, as a consequence, their decisions don't serve Pro users anymore, but rather the general public?

First, let's make a quick diversion to address the elephant in the room because, after all, I empathize with the critics.

Apple is Apple

Some assertions you can read on the Internet seem out of touch with a company which made the glaring mistake of building a machine without a floppy, released a lame mp3 player without wireless and less space than a Nomad, tried to revolutionize the world with a phone without a keyboard, and produced an oversized iPhone which is killing the laptop in the consumer market.

Apple always innovates. You can agree whether the direction is correct, but they do. They also copy, and they also steal, like every other company.

What makes them stand out is that they are bolder, dare I say, more courageous than others, to the point of having the courage to use the word courage to justify an unpopular technical decision.

They take more risks on their products. Yes, I think that the current audio jack transition could've been handled better, but they're the first "big brand" to always make such changes on their core products.

This brings us to my main gripe with the current controversy. I applaud their strategy of bringing iPhone ideas, both hardware and software, to the Mac. That is a fantastic policy. You can design a whole device around a touch screen and a secure enclave, then miniaturize it and stick it on a Macbook as a Touch Bar.

Having said that, us pros are generally conservative: we don't update our OS until versions X.1 or X.2, we need all our tools to be compatible, and we don't usually buy first-gen products, unless we self-justify our new toy as a "way to test our app experience on users who have this product".

The Great Criticism Of The 2016 Macbook Pro is mainly fueled by customers who wanted something harder, better, faster, stronger (and cheaper) and instead they got a novel consumer machine with few visible Pro improvements over the previous one and some prominent drawbacks.

Critical Pros are disappointed because they think Apple no longer cares about them. They feel they have no future using products from this company they've long invested in. Right now, there is no clear competitor to the Mac, but if it were, I'm sure many people would vote with their wallets to the other guy.

These critics aren't your typical Ballmers bashing the iPhone out of spite. They are concerned, loyal customers who have spent tens of thousands of dollars in Apple's products.

What's worse, Apple doesn't seem to understand the backlash, as shown by recent executive statements. Feeling misunderstood just infuriates people more, and there are few things as powerful as people frustrated and disappointed with the figures and institutions they respect.

Experiment, but not on my lawn

If I could ask Apple for just one thing, it would be to restrict their courage to the consumer market.

'Member the jokes about the 2008 Macbook Air? Only one port, no DVD drive?

The truth is, nobody cared because that machine was clearly not for them; it was an experiment, which if I may say so, turned out to be one of the most successful ever. Eight years later, many laptops aspire to be a Macbook Air, and the current entry Apple machine, the Macbook "One", is only an iteration on that design.

Nowadays, Apple calls the Retina MBA we had been waiting for a "Macbook Pro". That machine has a 15W CPU, only two ports—one of which is needed for charging—, good enough internals, and a great battery for light browsing which suffers on high CPU usage.

But when Apple rebrands this Air as a Pro, real pros get furious, because that machine clearly isn't for them. And this time, to add more fuel to the fire, the consumer segment gets furious too, since it's too expensive, to be exact, $400 too expensive.

By making the conscious decision of positioning this as a Pro machine both in branding and price point, Apple is sending the message that they really do consider this a Pro machine.

One unexpected outcome of this crisis

Regardless, there is one real, tangible risk for Apple.

When looking at the raw numbers, what Apple sees is this: 70% of their revenue comes from iOS devices. Thus, they prioritize around 70% of company resources to that segment. This makes sense.


Unless there is an external factor which drives iPhone sales: the availability of iPhone software, which is not controlled by Apple. This software is developed by external Pros. On Macs.

The explosion of the iOS App Store has not been a coincidence. It's the combination of many factors, one of which is a high number of developers and geeks using a Mac daily, thanks to its awesomeness and recent low prices. How many of us got into iPhone development just because Xcode was right there in our OS?

Similarly to how difficult it is to find COBOL developers because barely anyone learns it anymore, if most developers, whichever their day job is, start switching from a Mac to a PC, the interest for iOS development will dwindle quickly.

In summary, the success of the iPhone is directly linked to developer satisfaction with the Mac.

This line of reasoning is not unprecedented. In the 90s, almost all developers were using the Microsoft platform until Linux and OSX appeared. Nowadays, Microsoft is suffering heavily for their past technical decisions. Their mobile platform crashed not because the phones were bad, but because they had no software available.

Right now, Apple is safe, and Pro users will keep using Macs not only thanks to Jobs' successful walled garden strategy, but also because they are the best tools for the job.

While Pro users may not be trend-setters, they win in the long term. Linux won in the server. Apple won the smartphone race because it had already won the developer race. They made awesome laptops and those of us who were using Linux just went ahead and bought a Mac.

Apple thinks future developers will code on iPads. Maybe that's right 10 years from now. The question is, can they save this 10-year gap between current developers and future ones?

The perfect Pro machine

This Macbook Pro is a great machine and, with USB-C ports, is future proof.

Dongles and keyboards are a scapegoat. Criticisms are valid, but I feel they are unjustly directed to this specific machine instead of Apple's strategy in general. Or, at least, the tiny part that us consumers see.

Photographers want an SD slot. Developers want more RAM for their VMs. Students want lower prices. Mobile professionals want an integrated LTE chip. Roadies want more battery life. Here's my wish, different than everybody else's: I want the current Macbook Air with a Retina screen and 20 hours of battery life (10 when the CPU is peaking)

Everybody seems to be either postulating why this is not a Pro machine or criticizing the critics. And they are all right.

Unfortunately, unless given infinite resources, the perfect machine will not exist. I think the critics know that, even if many are projecting their rage on this specific machine.

A letter to Santa

Pro customers, myself included, are afraid that Apple is going to stab them on the back in a few years, and Apple is not doing anything substantial to reduce these fears.

In computing, too, perception is as important as cold, hard facts.

Macs are a great UNIX machine for developers, have a fantastic screen for multimedia Pros, get amazing build quality value for budget constrained self-employed engineers, work awesomely with audio setups thanks to almost inaudible fans, triple-A software is available, and you can even install Windows.

We have to admit that us Pros are mostly happily locked in the Apple ecosystem. When we look for alternatives, in many cases, we only see crap. And that's why we are afraid. Is it our own fault? Of course, we are all responsible for our own decisions. Does this mean we have no right to complain?

Apple, if you're listening, please do:

  1. Remember that you sell phones because there's people developing apps for them.
  2. Ask your own engineers which kind of machine they'd like to develop on. Keep making gorgeous Starbucks ornaments if you wish, but clearly split the product lines and the marketing message so all consumers feel included.
  3. Many iOS apps are developed outside the US and the current price point for your machines is too high for the rest of the world. I know we pay for taxes, but even when accounting for that, a bag of chips, an apartment, or a bike doesn't cost the same in Manhattan than in Barcelona.
  4. Keep making great hardware and innovating, but please, experiment with your consumer line, not your Pro line.
  5. Send an ACK to let us Pros recover our trust in you. Unfortunately, at this point, statements are not enough.

Thank you for reading.

Tags: hardware, apple

Comments? Tweet  

Sukhothai: The Dawn of Happiness

Published 16 Nov 2016 by Tom Wilson in tom m wilson.

  It is early morning in Sukhothai, the first capital of present day Thailand, in the north of the country.  From the Sanskrit, Sukhothai means ‘dawn of happiness’.  The air is still cool this morning, and the old city is empty of all but two or three tourists.  Doves coo gently from ancient stone rooftops. […]

AtoM harvesting (part 1) - it works!

Published 15 Nov 2016 by Jenny Mitcham in Digital Archiving at the University of York.

When we first started using Access to Memory (AtoM) to create the Borthwick Catalogue we were keen to enable our data to be harvested via OAI-PMH (more about this feature of AtoM is available in the documentation). Indeed the ability to do this was one of our requirements when we were looking to select a new Archival Management System (read about our system requirements here).

Look! Archives now available in Library Catalogue search
So it is with great pleasure that I can announce that we are now exposing some of our data from AtoM through our University Library catalogue YorSearch. Dublin Core metadata is automatically harvested nightly from our production AtoM instance - so we don't need to worry about manual updates or old versions of our data hanging around.

Our hope is that doing this will allow users of the Library Catalogue (primarily staff and students at the University of York) to happen upon relevant information about the archives that we hold here at the Borthwick whilst they are carrying out searches for other information resources.

We believe that enabling serendipitous discovery in this way will benefit those users of the Library Catalogue who may have no idea of the extent and breadth of our holdings and who may not know that we hold archives of relevance to their research interests. Increasing the visibility of the archives within the University of York is an useful way of signposting our holdings and we think this should bring benefits both to us and our potential user base.

A fair bit of thought (and a certain amount of tweaking within YorSearch) went into getting this set up. From the archives perspective, the main decision was around exactly what should be harvested. It was agreed that only top level records from the Borthwick Catalogue should be made available in this way. If we had enabled the harvesting of all levels of records, there was a risk that search results would have been swamped by hundreds of lower level records from those archives that have been fully catalogued. This would have made the search results difficult to understand, particularly given the fact that these results could not have been displayed in a hierarchical way so the relationships between the different levels would be unclear. We would still encourage users to go direct to the Borthwick Catalogue itself to search and browse lower levels of description.

It should also be noted that only a subset of the metadata within the Borthwick Catalogue will be available through the Library Catalogue. The metadata we create within AtoM is compliant with ISAD(G): General International Standard Archival Description which contains 26 different data elements. In order to facilitate harvesting using OAI-PMH, data within AtoM is mapped to simple Dublin Core and this information is available for search and retrieval via YorSearch. As you can see from the screen shot below, Dublin Core does allow a useful level of information to be harvested, but it is not as detailed as the original record.

An example of one of our archival descriptions converted to Dublin Core within YorSearch

Further work was necessary to change the default behaviour within Primo (the software that YorSearch runs on) which displayed results from the Borthwick Catalogue with the label Electronic resource. This is what it calls anything that is harvested as Dublin Core. We didn't think this would be helpful to users because even though the finding aid itself (within AtoM) is indeed an electronic resource, the actual archive that it refers to isn't. We were keen that users didn't come to us expecting everything to be digitised! Fortunately it was possible to change this label to Borthwick Finding Aid, a term that we think will be more helpful to users.
Searches within our library catalogue (YorSearch) now surface Borthwick finding aids, harvested from AtoM.
These are clearly labelled as Borthwick Finding Aids.

Click through to a Borthwick Finding Aid and you can see the full archival description in AtoM in an iFrame

Now this development has gone live we will be able to monitor the impact. It will be interesting to see whether traffic to the Borthwick Catalogue increases and whether a greater number of University of York staff and students engage with the archives as a result.

However, note that I called this blog post AtoM harvesting (part 1).

Of course that means we would like to do more.

Specifically we would like to move beyond just harvesting our top level records as Dublin Core and enable harvesting of all of our archival descriptions in full in Encoded Archival Description (EAD) - an XML standard that is closely modelled on ISAD(G).  This is currently not possible within AtoM but we are hoping to change this in the future.

Part 2 of this blog post will follow once we get further along with this aim...

What is the mediawiki install path on Ubuntu when you install it from the Repos?

Published 15 Nov 2016 by Akiva in Newest questions tagged mediawiki - Ask Ubuntu.

What is the mediawiki install path on Ubuntu when you install it from the Repos?

Specifically looking for the extensions folder.

Automating transfers with Automation Tools

Published 14 Nov 2016 by Unknown in Digital Archiving at the University of York.

This is a guest post by Julie Allinson, Technology Development Manager for Library & Archives at York. Julie has been working on York's implementation for the 'Filling the Digital Preservation Gap' project. This post describes how we have used Artefactual Systems' Automation Tools at York.

For Phase three of our 'Filling the Digital Preservation Gap' we have delivered a proof-of-concept implementation to to illustrate how PURE and Archivematica can be used as part of a Research Data management lifecycle.

One of the requirements for this work was the ability to fully automate a transfer in Archivematica. Automation Tools is a set of python scripts from Artefactual Systems that are designed to help.

The way Automation Tools works is that a script ( runs regularly at a set interval (as cron task). The script is fed a set of parameters and, based on these, checks for new transfers in the given transfer source directory. On finding something, a transfer in Archivematica is initiated and approved.

One of the neat features of Automation Tools is that if you need custom behaviour, there are hooks in the script that can run other scripts within specified directories. The 'pre-transfer' scripts are run before the transfer starts and 'user input' scripts can be used to act when manual steps in the processing are reached. A processing configuration can be supplied and this can fully automate all steps, or leave some manual as desired.

The best way to use Automation Tools is to fork the github repository and then add local scripts into the pre-transfer and/or user-input directories.

So, how have we used Automation Tools at York?

When a user deposits data through our Research Data York (RDYork) application, the data is written into a folder within the transfer source directory named with the id of our local Fedora resource for the data package. The directory sits on filestore that is shared between the Archivematica and RDYork servers. On seeing a new transfer, three scripts run: - this script copies the dedicated datasets processing config into the directory where the new data resides. - this script simply makes sure the correct file permissions are in place so that Archivematica can access the data. - this script looks for a file called 'metadata.json' which contains metadata from PURE and if it finds it, processes the contents and writes out a metadata.csv file in a format that Archivematica will understand. These scripts are all fairly rudimentary, but could be extended for other use cases, for example to process metadata files from different sources or to select a processing config for different types of deposit.

Our processing configuration for datasets is fully automated so by using automation tools we never have to look at the Archivematica interface.

With as inspiration I have added a second script called This one speaks directly to APIs in our researchdatayork application and updates our repository objects with information from Archiveamtica, such as the UUID for the AIP and the location of the package itself. In this way our two 'automation' scripts keep researchdatayork and Archivematica in sync. Archivematica is alerted when new transfers appear and automates the ingest, and researchdatayork is updated with the status once Archivematica has finished processing.

The good news is, the documentation for Automation Tools is very clear and that makes it pretty easy to get started. Read more at


Published 11 Nov 2016 by timbaker in Tim Baker.

  I had a great time giving a surf writing workshop during the Ubud Writers and Readers Festival. We had a fantastic venue, the co-working space Hubud, just across the road from the Monkey Forest, with its intriguing bamboo construction, gracious staff and warm and...


Published 11 Nov 2016 by timbaker in Tim Baker.

I just came across an extract from Tim Winton’s new book, The Boy Behind the Curtain, a memoir that reveals plenty about the famously reclusive writer. This extract largely concerns Winton’s 50-year long romance with wave riding and as I read I was chuffed to realise...

Working the polls: reflection

Published 9 Nov 2016 by legoktm in The Lego Mirror.

As I said earlier, I worked the polls from 6 a.m. to roughly 9:20 p.m. We had one voter come in at the nick of time at 7:59 p.m.

I was glad to see that we had a lot of first time voters, as well as some who just filled out one issue on the three(!) page ballot, and then left. Overall, I've come to the conclusion that everyone is just like me and votes just to get a sticker. We had quite a few people who voted by mail and stopped by just to get their "I voted!" sticker.

I should get paid $145 for working, which I shall be donating to And I plan to be helping out during the next election!

HSTS header not being sent though rule is present and mod_headers is enabled

Published 5 Nov 2016 by jww in Newest questions tagged mediawiki - Server Fault.

We enabled HSTS in httpd.conf in the Virtual Host handling port 443. We tried with and without the <IfModule mod_headers.c>:

<IfModule mod_headers.c>
    Header set Strict-Transport-Security "max-age=10886400; includeSubDomains"

But the server does not include the header in a response. Below is from curl over HTTPS:

> GET / HTTP/1.1
> Host:
> User-Agent: curl/7.51.0
> Accept: */*
< HTTP/1.1 200 OK
< Date: Sat, 05 Nov 2016 22:49:25 GMT
< Server: Apache/2.4.6 (CentOS) OpenSSL/1.0.1e-fips
< Last-Modified: Wed, 02 Nov 2016 01:27:08 GMT
< ETag: "8988-5404756e12afc"
< Accept-Ranges: bytes
< Content-Length: 35208
< Vary: Accept-Encoding
< Content-Type: text/html; charset=UTF-8

The relevant section of httpd.conf is shown below. The cURL transcript is shown below. Apache shows mod_header is loaded, and grepping all the logs don't reveal an error.

The Apache version is Apache/2.4.6 (CentOS). The PHP version is 5.4.16 (cli) (built: Aug 11 2016 21:24:59). The Mediawiki version is 1.26.4.

What might be the problem here, and how could I solve this?


<VirtualHost *:80>
    ServerAlias * *.cryptopp.*

    <IfModule mod_rewrite.c>
        RewriteEngine On
        RewriteCond %{REQUEST_METHOD} ^TRACE
        RewriteRule .* - [F]
        RewriteCond %{REQUEST_METHOD} ^TRACK
        RewriteRule .* - [F]
        #redirect all port 80 traffic to 443
        RewriteCond %{SERVER_PORT} !^443$
        RewriteRule ^/?(.*)$1 [L,R]

<VirtualHost *:443>
    ServerAlias * *.cryptopp.*

    <IfModule mod_headers.c>
        Header set Strict-Transport-Security "max-age=10886400; includeSubDomains"


# cat /etc/httpd/conf.modules.d/00-base.conf | grep headers
LoadModule headers_module modules/

# httpd -t -D DUMP_MODULES | grep header
 headers_module (shared)

error logs

# grep -IR "Strict-Transport-Security" /etc
/etc/httpd/conf/httpd.conf:        Header set Strict-Transport-Security "max-age=10886400; includeSubDomains" env=HTTPS  
# grep -IR "Strict-Transport-Security" /var/log/
# grep -IR "mod_headers" /var/log/


# find /var/www -name '.htaccess' -printf '%p\n' -exec cat {} \;
Deny from all
Deny from all
Deny from all
Deny from all
Deny from all
Deny from all
# Protect against bug 28235
<IfModule rewrite_module>
    RewriteEngine On
    RewriteCond %{QUERY_STRING} \.[^\\/:*?\x22<>|%]+(#|\?|$) [nocase]
    RewriteRule . - [forbidden]
# Protect against bug 28235
<IfModule rewrite_module>
    RewriteEngine On
    RewriteCond %{QUERY_STRING} \.[^\\/:*?\x22<>|%]+(#|\?|$) [nocase]
    RewriteRule . - [forbidden]
    # Fix for bug T64289
    Options +FollowSymLinks
Deny from all
Deny from all
RewriteEngine on
RewriteRule ^wiki/?(.*)$ /w/index.php?title=$1 [L,QSA]
<IfModule mod_deflate.c>
<FilesMatch "\.(js|css|html)$">
SetOutputFilter DEFLATE

curl transcript

$ /usr/local/bin/curl -Lv
* Rebuilt URL to:
*   Trying
* Connected to ( port 80 (#0)
> GET / HTTP/1.1
> Host:
> User-Agent: curl/7.51.0
> Accept: */*
< HTTP/1.1 302 Found
< Date: Sat, 05 Nov 2016 22:49:25 GMT
< Server: Apache/2.4.6 (CentOS) OpenSSL/1.0.1e-fips
< Location:
< Content-Length: 209
< Content-Type: text/html; charset=iso-8859-1
* Ignoring the response-body
* Curl_http_done: called premature == 0
* Connection #0 to host left intact
* Issue another request to this URL: ''
*   Trying
* Connected to ( port 443 (#1)
* ALPN, offering http/1.1
* Cipher selection: ALL:!EXPORT:!EXPORT40:!EXPORT56:!aNULL:!LOW:!RC4:@STRENGTH
* successfully set certificate verify locations:
*   CAfile: /opt/local/share/curl/curl-ca-bundle.crt
  CApath: none
* TLSv1.2 (OUT), TLS header, Certificate Status (22):
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
* TLSv1.2 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Client hello (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS change cipher, Client hello (1):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-AES256-GCM-SHA384
* ALPN, server did not agree to a protocol
* Server certificate:
*  subject: OU=Domain Control Validated; OU=COMODO SSL Unified Communications
*  start date: Sep 17 00:00:00 2015 GMT
*  expire date: Sep 16 23:59:59 2018 GMT
*  subjectAltName: host "" matched cert's ""
*  issuer: C=GB; ST=Greater Manchester; L=Salford; O=COMODO CA Limited; CN=COMODO RSA Domain Validation Secure Server CA
*  SSL certificate verify ok.
> GET / HTTP/1.1
> Host:
> User-Agent: curl/7.51.0
> Accept: */*
< HTTP/1.1 200 OK
< Date: Sat, 05 Nov 2016 22:49:25 GMT
< Server: Apache/2.4.6 (CentOS) OpenSSL/1.0.1e-fips
< Last-Modified: Wed, 02 Nov 2016 01:27:08 GMT
< ETag: "8988-5404756e12afc"
< Accept-Ranges: bytes
< Content-Length: 35208
< Vary: Accept-Encoding
< Content-Type: text/html; charset=UTF-8
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
  <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  <title>Crypto++ Library 5.6.5 | Free C++ Class Library of Cryptographic Schemes</title>
  <meta name="description" content=
  "free C++ library for cryptography: includes ciphers, message authentication codes, one-way hash functions, public-key cryptosystems, key agreement schemes, and deflate compression">
  <link rel="stylesheet" type="text/css" href="cryptopp.css">

Firefox "The page isn’t redirecting properly" for a Wiki (all other Pages and UAs are OK) [closed]

Published 5 Nov 2016 by jww in Newest questions tagged mediawiki - Webmasters Stack Exchange.

We are having trouble with a website for a free and open source project. The website and its three components are as follows. Its running on a CenOS 7 VM hosted by someone else (PaaS).

The Apache version is Apache/2.4.6 (CentOS). The PHP version is 5.4.16 (cli) (built: Aug 11 2016 21:24:59). The Mediawiki version is 1.26.4.

The main site is OK and can be reached through both and in all browsers and user agents. The manual is OK and can be reached through both and in all browsers and user agents.

The wiki is OK under most Browsers and all tools. Safari is OK. Internet Explorer is OK. Chrome is untested because I don't use it. Command line tools like cURL and wget are OK. A trace using wget is below.

The wiki is a problem under Firefox. It cannot be reached at either and in Firefox. Firefox displays an error on both OS X 10.8 and Windows 8. Firefox is fully patched to the platform. The failure is:

enter image description here

We know the problem is due to a recent change to direct all traffic to HTTPS. The relevant addition to httd.conf is below. The change in our policy is due to Chrome's upcoming policy change regarding Security UX indicators.

I know these are crummy questions (none of us are webmasters or admins in our day job)... What is the problem? How do I troubleshoot it? How do I fix it?

wget trace

$ wget 
--2016-11-05 12:53:54--
Resolving (
Connecting to (||:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: [following]
--2016-11-05 12:53:54--
Resolving (
Connecting to (||:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: [following]
--2016-11-05 12:53:54--
Connecting to (||:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘index.html’

index.html              [ <=>                ]  20.04K  --.-KB/s    in 0.03s   

2016-11-05 12:53:54 (767 KB/s) - ‘index.html’ saved [20520]

Firefox access_log

# tail -16 /var/log/httpd/access_log
<removed irrelevant entries> - - [05/Nov/2016:13:00:52 -0400] "GET /wiki/Main_Page HTTP/1.1" 302 20 "" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:48.0) Gecko/20100101 Firefox/48.0" - - [05/Nov/2016:13:00:52 -0400] "GET /wiki/Main_Page HTTP/1.1" 302 20 "" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:48.0) Gecko/20100101 Firefox/48.0" - - [05/Nov/2016:13:00:53 -0400] "GET /wiki/Main_Page HTTP/1.1" 302 20 "" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:48.0) Gecko/20100101 Firefox/48.0" - - [05/Nov/2016:13:00:53 -0400] "GET /wiki/Main_Page HTTP/1.1" 302 20 "" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:48.0) Gecko/20100101 Firefox/48.0" - - [05/Nov/2016:13:00:53 -0400] "GET /wiki/Main_Page HTTP/1.1" 302 20 "" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:48.0) Gecko/20100101 Firefox/48.0" - - [05/Nov/2016:13:00:53 -0400] "GET /wiki/Main_Page HTTP/1.1" 302 20 "" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:48.0) Gecko/20100101 Firefox/48.0" - - [05/Nov/2016:13:00:53 -0400] "GET /wiki/Main_Page HTTP/1.1" 302 20 "" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:48.0) Gecko/20100101 Firefox/48.0" - - [05/Nov/2016:13:00:53 -0400] "GET /wiki/Main_Page HTTP/1.1" 302 20 "" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:48.0) Gecko/20100101 Firefox/48.0" - - [05/Nov/2016:13:00:53 -0400] "GET /wiki/Main_Page HTTP/1.1" 302 20 "" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:48.0) Gecko/20100101 Firefox/48.0" - - [05/Nov/2016:13:00:54 -0400] "GET /wiki/Main_Page HTTP/1.1" 302 20 "" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:48.0) Gecko/20100101 Firefox/48.0" - - [05/Nov/2016:13:00:54 -0400] "GET /wiki/Main_Page HTTP/1.1" 302 20 "" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:48.0) Gecko/20100101 Firefox/48.0"

httd.conf change

<VirtualHost *:80>
    ServerAlias * *.cryptopp.*

    <IfModule mod_rewrite.c>
        RewriteEngine On
        RewriteCond %{REQUEST_METHOD} ^TRACE
        RewriteRule .* - [F]
        RewriteCond %{REQUEST_METHOD} ^TRACK
        RewriteRule .* - [F]

        #redirect all port 80 traffic to 443
        RewriteCond %{SERVER_PORT} !^443$
        RewriteRule ^/?(.*)$1 [L,R]

<VirtualHost *:443>
    ServerAlias * *.cryptopp.*

From old York to New York: PASIG 2016

Published 4 Nov 2016 by Jenny Mitcham in Digital Archiving at the University of York.

My walk to the conference on the first day
Last week I was lucky enough to attend PASIG 2016 (Preservation and Archiving Special Interest Group) at the Museum of Modern Art in New York. A big thanks to Jisc who generously funded my conference fee and travel expenses. This was the first time I have attended PASIG but I had heard excellent reports from previous conferences and knew I would be in for a treat.

On the conference website PASIG is described as "a place to learn from each other's practical experiences, success stories, and challenges in practising digital preservation." This sounded right up my street and I was not disappointed. The practical focus proved to be a real strength.

The conference was three days long and I took pages of notes (and lots of photographs!). As always, it would be impossible to cover everything in one blog post so here is a round up of some of my highlights. Apologies to all of those speakers who I haven't mentioned.

 The first day was Bootcamp - all about finding your feet and getting started with digital preservation. However, this session had value not just for beginners but for those of us who have been working in this area for some time. There are always new things to learn in this field and a sometimes a benefit in being walked through some of the basics.

The highlight of the first day for me was an excellent talk by Bert Lyons from AVPreserve called "The Anatomy of Digital Files". This talk was a bit of a whirlwind (I couldn't type my notes fast enough) but it was so informative and hugely valuable. Bert talked us through the binary and hexadecimal notation systems and how they relate to content within a file. This information backed up some of the things I had learnt when investigating how file format signatures are created and really should be essential learning for all digital archivists. If we don't really understand what digital files are made up of then it is hard to preserve them.

Bert also went on to talk about the file system information - which is additional to the bytes within the file - and how crucial it is to also preserve this information alongside the file itself. If you want to know more, there is a great blog post by Bert that I read earlier this year - What is the chemistry of digital preservation?. It includes a comparison about the need to understand the materials you are working with whether you are working in physical conservation or digital preservation. One of the best blog posts I've read this year so pleased to get the chance to shout about it here!

Hands up if you love ISO 16363!
Kara Van Malssen, also from AVPreserve gave another good presentation called "How I learned to stop worrying and love ISO16363". Although specifically intended for formal certification, she talked about its value outside the certification process - for self assessment, to identify gaps and to prioritise further work. She concluded by saying that ISO16363 is one of the most valuable digital preservation tools we have.

Jon Tilbury from Preservica gave a thought provoking talk entitled "Preservation Architectures - Now and in the Future". He talked about how tool provision has evolved, from individual tools (like PRONOM and DROID) to integrated tools designed for an institution, to out of the box solutions. He suggested that the fourth age of digital preservation will be embedded tools - with digital preservation being seamless and invisible and very much business as usual. This will take digital preservation from the libraries and archives sector to the business world. Users will be expecting systems to be intuitive and highly automated - they won't want to think in OAIS terms. He went on to suggest that the fifth age will be when every day consumers (specifically his mum!) are using the tools without even thinking about it! This is a great vision - I wonder how long it will take us to get there?

Erin O'Meara from University of Arizona Libraries gave an interesting talk entitled "Digital Storage: Choose your own adventure". She discussed how we select suitable preservation storage and how we can get a seat at the table for storage discussions and decisions within our institutions. She suggested that often we are just getting what we are given rather than what we actually need. She referenced the excellent NDSA Levels of Digital Preservation which are a good starting point when trying to articulate preservation storage needs (and one which I have used myself). Further discussions on Twitter following on from this presentation highlighted the work on preservation storage requirements being carried out as a result of a workshop at iPRES 2016, so this is well worth following up on.

A talk from Amy Rushing and Julianna Barrera-Gomez from the University of Texas at San Antonio entitled "Jumping in and Staying Afloat: Creating Digital Preservation Capacity as a Balancing Act" really highlighted for me one of the key messages that has come out of our recent project work for Filling the Digital Preservation Gap. This is that, choosing a digital preservation system is relatively easy but actually deciding how to use it is the harder! After ArchivesDirect (a combination of Archivematica and DuraSpace) was selected as their preservation system (which included 6TB of storage), Amy and Julianna had a lot of decisions to make in order to balance the needs of their collections with the available resources. It was a really interesting case study and valuable to hear how they approached the problem and prioritised their collections.

The Museum of Modern Art in New York
Andrew French from Ex Libris Solutions gave an interesting insight into a more open future for their digital preservation system Rosetta. He pointed out that institutions when selected digital preservation systems focus on best practice and what is known. They tend to have key requirements relating to known standards such as OAIS, Dublin Core, PREMIS and METS as well as a need for automated workflows and a scalable infrastructure. However, once they start using the tool, they find they want other things too - they want to plug in different tools that suit their own needs.

In order to meet these needs, Rosetta is moving towards greater openness, enabling institutions to swap out any of the tools for ingest, preservation, deposit or publication. This flexibility allows the system to be better suited for a greater range of use cases. They are also being more open with their documentation and this is a very encouraging sign. The Rosetta Developer Network documentation is open to all and includes information, case studies and workflows from Rosetta users that help describe how Rosetta can be used in practice. We can all learn a lot from other people even if we are not using the same DP system so this kind of sharing is really great to see.

MOMA in the rain on day 2!
Day two of PASIG was a practitioners knowledge exchange. The morning sessions around reproducibility of research were of particular interest to me given my work on research data preservation and it was great to see two of the presentations referencing the work of the Filling the Digital Preservation Gap project. I'm really pleased to see our work has been of interest to others working in this area.

One of the most valuable talks of the day for me was from Fernando Chirigati from New York University. He introduced us to a useful new tool called ReproZip. He made the point that the computational environment is as important as the data itself for the reproducibility of research data. This could include information about libraries used, environment variables and options. You can not expect your depositors to find or document all of the dependencies (or your future users to install them). What ReproZip does is package up all the necessary dependencies along with the data itself. This package can then be archived and re-used in the future. ReproZip can also be used to unpack and re-use the data in the future. I can see a very real use case for this for researchers within our institution.

Another engaging talk from Joanna Phillips from the Guggenheim Museum and and Deena Engel of New York University described a really productive collaboration between the two institutions. Computer Science students from NYU have been working closely with the time-based media conservator at the museum on the digital artworks in their care. This symbiotic relationship enables the students to earn credit towards their academic studies whilst the museum receives valuable help towards understanding and preserving some of their complex digital objects. Work that the students carry out includes source code analysis and the creation of full documentation of the code so that is can be understood by others. Some also engage with the unique preservation challenges within the artwork, considering how it could be migrated or exhibited again. It was clear from the speakers that both institutions get a huge amount of benefit from this collaboration. A great case study!

Karen Cariani from WGBH Educational Foundation talked about their work (with Indiana University Libraries) to build HydraDAM2. This presentation was of real interest to me given our recent Filling the Digital Preservation Gap project in which we introduced digital preservation functionality to Hydra by integrating it with Archivematica.  HydraDAM2 was a different approach, building a preservation head for audio-visual material within Hydra itself. Interesting to see a contrasting solution and to note the commonalities between their project and ours (particularly around the data modelling work and difficulties recruiting skilled developers).

More rain at the end of day 2
Ben Fino Radin from the Museum of Modern Art in "More Data, More Problems: Designing Efficient Workflows at Petabyte Scale" highlighted the challenges of digitising their time-based media holdings and shared some calculations around how much digital storage space would be required if they were to digitise all of their analogue holdings. This again really highlighted some big issues and questions around digital preservation. When working with large collections, organisations need to prioritise and compromise and these decisions can not be taken lightly. This theme was picked up again on day 3 in the session around environmental sustainability.

The lightning talks on the afternoon of the second day were also of interest. Great to hear from such a range of practitioners.... though I did feel guilty that I didn't volunteer to give one myself! Next time!

On the morning of day 3 we were treated to an excellent presentation by Dragan Espenschied from Rhizome who showed us Webrecorder. Webrecorder is a new open source tool for creating web archives. It uses a single system both for initial capture and subsequent access. One of its many strengths appears to be the ability to capture dynamic websites as you browse them and it looks like it will be particularly useful for websites that are also digital artworks. This is definitely one to watch!

MOMA again!
Also on day 3 was a really interesting session on environmental responsibility and sustainability. This was one of the reasons that PASIG made me think...this is not the sort of stuff we normally talk about so it was really refreshing to see a whole session dedicated to it.

Eira Tansey from the University of Cincinnati gave a very thought provoking talk with a key question for us to think about - why do we continue to buy more storage rather than appraise? This is particularly important considering the environmental costs of continuing to store more and more data of unknown value.

Ben Goldman of Penn State University also picked up this theme, looking at the carbon footprint of digital preservation. He pointed out the paradox in the fact we are preserving data for future generations but we are powering this work with fossil fuels. Is preserving the environment not going to be more important to future generations than our digital data? He suggested that we consider the long term impacts of our decision making and look at our own professional assumptions. Are there things that we do currently that we could do with less impact? Are we saving too many copies of things? Are we running too many integrity checks? Is capturing a full disk image wasteful? He ended his talk by suggesting that we should engage in a debate about the impacts of what we do.

Amelia Acker from the University of Texas at Austin presented another interesting perspective on digital preservation in mobile networks, asking how our collections will change as we move from an information society to a networked era and how mobile phones change the ways we read, write and create the cultural record. The atomic level of the file is no longer there on mobile devices.  Most people don't really know where the actual data is on their phones or tablets, they can't show you the file structure. Data is typically tied up with an app and stored in the cloud and apps come and go rapidly. There are obvious preservation challenges here! She also mentioned the concept of the legacy contact on Facebook...something which had passed me by, but which will be of interest to many of us who care about our own personal digital legacy.

Yes, there really is steam coming out of the pavements in NYC
The stand out presentation of the conference for me was "Invisible Defaults and Percieved Limitations: Processing the Juan Gelman Files" from Eliva Arroyo-Ramirez from Princeton University. She described the archive of Juan Gelman, an Argentinian poet and human rights activist. Much of the archive was received on floppy disks and included documents relating to his human rights work and campaigns for the return of his missing son and daughter-in-law. The area she focused on within her talk was about how we preserve files with accented characters in the file names.

Diacritics can cause problems when trying to open the files or use our preservation tools (for example Bagger). When she encountered problems like these she put a question out to the digital preservation community asking how to solve the problem and she was grateful to receive so many responses but at the same time was concerned about the language used. It was suggested that she 'scrub', 'clean' or 'detox' the file names in order to remove the 'illegal characters' but she was concerned that our attitudes towards accented characters further marginalises those who do not fit into our western ideals.

She also explored how removing or replacing these accented characters would impact on the files themselves and it was clear that meaning would change significantly.  'Campaign' (a word included in so many of the filenames) would change to 'bell'. She decided not to change the file names but to try and find a work around and she was eventually successful in finding a way to keep the filenames as they were (using the command line to turn the latin characters to UTF8). The message that she ended on was that we as archivists should do no harm whether we are dealing with physical or digital archives. We must juggle our priorities but think hard about where we compromise and what is important to preserve. It is possible to work through problems rather than work around them and we need to be conscious of the needs of collections that fall outside our defaults. This was real food for thought and prompted an interesting conversation on twitter afterwards.

Times Square selfie!
Not only did I have a fantastic week in New York (its not every day you can pop out in your lunch break to take a selfie in Times Square!), but I also came away with lots to think about. PASIG is a bit closer to home next year (in Oxford) so I am hoping I'll be there!

Wikidata Map Animations

Published 4 Nov 2016 by addshore in Addshore.

Back in 2013 maps were generated almost daily to track the immediate usage of the then new coordinate location within the project. An animation was then created by Denny & Lydia showing the amazing growth which can be seen on commons here. Recently we found the original images used to make this animation starting in June 2013 and extending to September 2013, and to celebrate the fourth birthday of Wikidata we decided to make a few new animations.

The above animation contains images from 2013 (June to September) and then 2014 onwards.

This gap could be what resulted in the visible jump in brightness of the gif. This jump could also be explained by different render settings used to create the map, at some point we should go back and generate standardized images for every week / months that coordinates have existed on Wikidata.

The whole gif and the individual halves can all be found on commons under CC0:

The animations were generated directly from png files using the following command:

convert -delay 10 -loop 0 *.png output.gif

These animations use the “small” images generated in previous posts such as Wikidata Map October 2016.

A Simple Request: VLC.js

Published 1 Nov 2016 by Jason Scott in ASCII by Jason Scott.

Almost five years ago to today, I made a simple proposal to the world: Port MAME/MESS to Javascript.

That happened.

I mean, it cost a dozen people hundreds of hours of their lives…. and there were tears, rage, crisis, drama, and broken hearts and feelings… but it did happen, and the elation and the world we live in now is quite amazing, with instantaneous emulated programs in the browser. And it’s gotten boring for people who know about it, except when they haven’t heard about it until now.

By the way: work continues earnestly on what was called JSMESS and is now called The Emularity. We’re doing experiments with putting it in WebAssembly and refining a bunch of UI concerns and generally making it better, faster, cooler with each iteration. Get involved – come to #jsmess on EFNet or contact me with questions.

In celebration of the five years, I’d like to suggest a new project, one of several candidates I’ve weighed but which I think has the best combination of effort to absolute game-changer in the world.


Hey, come back!

It is my belief that a Javascript (later WebAssembly) port of VLC, the VideoLan Player, will fundamentally change our relationship to a mass of materials and files out there, ones which are played, viewed, or accessed. Just like we had a lot of software locked away in static formats that required extensive steps to even view or understand, so too do we have formats beyond the “usual” that are also frozen into a multi-step process. Making these instantaneously function in the browser, all browsers, would be a revolution.

A quick glance at the features list of VLC shows how many variant formats it handles, from audio and sound files through to encapsulations like DVD and VCDs. Files that now rest as hunks of ISOs and .ZIP files that could be turned into living, participatory parts of the online conversation. Also, formats like .MOD and .XM (trust me) would live again effectively.

Also, VLC has weathered years and years of existence, and the additional use case for it would help people contribute to it, much like there’s been some improvements in MAME/MESS over time as folks who normally didn’t dip in there added suggestions or feedback to make the project better in pretty obscure realms.

I firmly believe that this project, fundamentally, would change the relationship of audio/video to the web. 

I’ll write more about this in coming months, I’m sure, but if you’re interested, stop by #vlcjs on EFnet, or ping me on twitter at @textfiles, or write to me at with your thoughts and feedback.

See you.


Digital Collecting – Exciting and Challenging times

Published 31 Oct 2016 by slwacns in State Library of Western Australia Blog.

Dear Reader, this post does not (yet) have a happy ending, but rather it’s a snapshot of some of the challenges we’re facing, and might provide some insight into how we handle content (especially the digital stuff).  I’m also hoping it’ll start you thinking about how you might handle/organise your own personal collections.  If it does, please let me know by adding a comment below.  Now enough from me, and on with the story…


Not so long ago we received a trolley full of files from a private organisation.  This is not an unusual scenario, as we often collect from Western Australian organisations, and it is part of the job of our Collection Liaison team to evaluate and respond to offers of content.  The files we received included the usual range of hardcopy content – Annual Reports, promotional publications, internal memos and the like… and a hard drive.

Not being totally sure what was on the hard drive, we thought we’d best take a look.  We used our write blocker (a device to stop any changes happening on the hard drive), and accessed the drive.  Well, we tried to… Challenge 1 was hit – we couldn’t open the drive.  A bit of investigation later, (and with the use of a Mac), the drive was accessed.  Funny to think at this point how used we get to our own ‘standard’ environments. If you are the only person in your family to use a Mac, and your drives are Mac formatted, how are you going to share files with Windows users?

Once we could get to the content, we carefully copied the contents onto a working directory on our storage system.  (Carefully for us means programmatically checking files we were transferring, and re-checking them once copied to ensure the files weren’t corrupted or changed during the transfer process).  At the same time, our program created a list of contents of the drive.  There were a mere 15,000 files.  Challenge 2 started to emerge… fifteen thousand is a big number of files!  How many files would you have on your device(s)?  If you gave them all to someone, would they freak out, or would they know which ones were important?

[Enter some investigation into the content of the files].  Hmmm – looks like most things are well organised – I can see that a couple of directories are labelled by year (‘2014’, ‘2015’, ‘2016’), and there are some additional ‘Project’ folders.  Great!  This is really quite OK.  What’s more (following our guidelines), the donor has provided us with details of each section of the collection – including a (necessarily broad) description of what’s on the drive – that’ll be really helpful when our cataloguers need to describe the contents. Challenge 4 – Identifying the contents, is (at a high level anyway) looking doable.  Oops – hold that thought – there’s a directory of files called ‘Transferred’ – What does that mean? Hmmm…


Enough for now – stay tuned to updates on the processing of this collection, and feel free to get in touch.  Comments below, or if you think we may have something that is collectable, start at this web page:

Filed under: Uncategorized

Manually insert text into existing MediaWiki table row?

Published 30 Oct 2016 by jww in Newest questions tagged mediawiki - Webmasters Stack Exchange.

I'm trying to update a page for a MediaWiki database running MW version 1.26.4. The MediaWiki is currenty suffering unexplained Internal Server Errors, so I am trying to perform an end-around by updating the database directly.

I logged into the database with the proper credentials. I dumped the table of interest and I see the row I want to update:

MariaDB [my_wiki]> select * from wikicryptopp_page;
| page_id | page_namespace | page_title                                                                | page_restrictions | page_is_redirect | page_is_new | page_random        | page_touched   | page_latest | page_len | page_content_model | page_links_updated | page_lang |
|       1 |              0 | Main_Page                                                                 |                   |                0 |           0 |     0.161024148737 | 20161011215919 |       13853 |     3571 | wikitext           | 20161011215919     | NULL      |
|    3720 |              0 | GNUmakefile                                                               |                   |                0 |           0 |     0.792691625226 | 20161030095525 |       13941 |    36528 | wikitext           | 20161030095525     | NULL      |

I know exactly where the insertion should occur, and I have the text I want to insert. The Page Title is GNUmakefile, and the Page ID is 3720.

The text is large at 36+ KB, and its sitting on the filesystem in a text file. How do I manually insert the text into existing table row?

How to log-in with more rights than Admin or Bureaucrat?

Published 30 Oct 2016 by jww in Newest questions tagged mediawiki - Webmasters Stack Exchange.

I'm having a heck of a time with MediaWiki and an Internal Server Error. I'd like to log-in with more privileges than afforded by Admin and Bureaucrat in hopes of actually being able to save a page.

I am an admin on the VM that hosts the wiki. I have all the usernames and passwords at my disposal. I tried logging in with the MediaWiki user and password from LocalSettings.php but the log-in failed.

Is it possible to acquire more privileges than provided by Admin or Bureaucrat? If so, how do I log-in with more rights than Admin or Bureaucrat?

Character set 'utf-8' is not a compiled character set and is not specified in the '/usr/share/mysql/charsets/Index.xml' file

Published 28 Oct 2016 by jww in Newest questions tagged mediawiki - Webmasters Stack Exchange.

We are trying to upgrade our MediaWiki software. According to Manual:Upgrading -> UPGRADE -> Manual:Backing_up_a_wiki, we are supposed to backup the database with:

mysqldump -h hostname -u userid -p --default-character-set=whatever dbname > backup.sql

When we run the command with our parameters and --default-character-set=utf-8:

$ sudo mysqldump -h localhost -u XXX -p YYY --default-character-set=utf-8 ZZZ > 
mysqldump: Character set 'utf-8' is not a compiled character set and is not spec
ified in the '/usr/share/mysql/charsets/Index.xml' file

Checking Index.xml appears to show utf-8 is available. UTF-8 is specifically called out by Manual:$wgDBTableOptions.

$ cat /usr/share/mysql/charsets/Index.xml | grep -B 3 -i 'utf-8'
<charset name="utf8">
  <description>UTF-8 Unicode</description>

We tried both UTF-8 and utf-8 as specified by Manual:$wgDBTableOptions.

I have a couple of questions. First, can we omit --default-character-set since its not working as expected? Second, if we have to use --default-character-set, then what is used to specify UTF-8?

A third, related question is, can we forgo mysqldump all-together by taking the wiki and database offline and then making a physical copy of the database? I am happy to make a copy of the physical database for a restore; and I really don't care much for using tools that cause more trouble than they solve.

If the third item is a viable option, then what is the physical database file that needs to be copied?

Wikidata Map October 2016

Published 28 Oct 2016 by addshore in Addshore.

I has been another 5 months since my last post about the Wikidata maps, and again some areas of the world have lit up. Since my last post at least 9 noticeable areas have appeared with many new items containing coordinate locations. These include Afghanistan, Angola, Bosnia & Herzegovina, Burundi, Lebanon, Lithuania, Macedonia, South Sudan and Syria.

The difference map below was generated using Resemble.js. The pink areas show areas of difference between the two maps from April and October 2016.

Who caused the additions?

To work out what items exist in the areas that have a large amount of change the Wikidata query service can be used. I adapted a simple SPARQL query to show the items within a radius of the centre of each area of increase. For example Afghanistan used the following query:

 SELECT ?place ?placeLabel ?location ?instanceLabel
  wd:Q889 wdt:P625 ?loc . 
  SERVICE wikibase:around { 
      ?place wdt:P625 ?location . 
      bd:serviceParam wikibase:center ?loc . 
      bd:serviceParam wikibase:radius "100" . 
  OPTIONAL {    ?place wdt:P31 ?instance  }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
  BIND(geof:distance(?loc, ?location) as ?dist) 
} ORDER BY ?dist

The query can be see running here and above. The items can then directly be clicked on, the history loaded.

The individual edits that added the coordinates can easily be spotted.

Of course this could also be done using a script following roughly the same process.

It looks like Reinheitsgebot (Magnus Manske) can be attributed to many of the areas of mass increase due to a bot run in April 2016. It looks like KrBot can be attributed to many of the coordinates in Lithuania due to a bot run in May 2016.

October 2016 maps

The October 2016 maps can be found on commons:

Labs project

I have given the ‘Wikidata Analysis’ tool a speedy reboot over the past weeks and generated many maps for may old dumps that are not currently on Wikimedia Commons.

The tool now contains a collection of date stamp directories which contain the data generated by the Java dump scanning tool as well ad the images that are then generated from that data using a Python script.

MediaWiki's VisualEditor component Parsoid not working after switching php7.0 to php5.7

Published 27 Oct 2016 by Dávid Kakaš in Newest questions tagged mediawiki - Ask Ubuntu.

I would like to ask you for your help with:

Because of forum CMS phpBB is not currently supporting >= php7.0 I had to switch to php5.6 on my Ubuntu16.04 LTS server. So installed php5.6 files from ppa:ondrej/php and by :

sudo a2dismod php7.0 ; sudo a2enmod php5.6 ; sudo service apache2 restart
sudo ln -sfn /usr/bin/php5.6 /etc/alternatives/php

... I switched to php5.6.

Unfortunately, this caused my MediaWiki's VisualEditor stop working. I made the MediaWiki plug-in talk to parsoid server before switching php and everything was working as expected. Also, when I switched back to php7.0 using:

sudo a2dismod php5.6 ; sudo a2enmod php7.0 ; sudo service apache2 restart sudo ln -sfn /usr/bin/php7.0 /etc/alternatives/php

... wiki is working fine once again, however posts with phpBB functionalities like phpBBCodes and tags are failing to be submitted. Well php7.0 version is unsupported so I cannot complain, so I am trying to make Parsoid work with php5.6 (which should be supported).

Error displayed when:

Other error (posible) simptoms:

[warning] [{MY_PARSOID_CONF_PREFIX}/Hlavná_stránka] non-200 response: 401 <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <html><head> <title>401 Unauthorized</title> </head><body> <h1>Unauthorized</h1> <p>This server could not verify that you are authorized to access the document requested. Either you supplied the wrong credentials (e.g., bad password), or your browser doesn't understand how to supply the credentials required.</p> <hr> <address>Apache/2.4.18 (Ubuntu) Server at Port 443</address> </body></html>

... however, now I dont get any warnings in the log! Even when performing "sudo service parsoid status" it shows "/bin/sh -c /usr/bin/nodejs /usr/lib/parsoid/src/bin/server.js -c /etc/mediawiki/parsoid/server.js -c /etc/mediawiki/parsoid/settings.js >> /var/log/parsoid/parsoid.log 2>&1" which as I hope means it is outputing error measseages to the log.

I tried:

Possible Cause:

What do you think? Any suggestion how to solve or further test this problem?

P.S. Sorry for badly formated code in question, but it somehow broke ... seems I am the problem after all :-D

Working the polls

Published 19 Oct 2016 by legoktm in The Lego Mirror.

After being generally frustrated by this election cycle and wanting to contribute to make it less so, I decided to sign up to work at the polls this year, and help facilitate the election. Yesterday, we had election officer training by the Santa Clara County Registrar of Voter's office. It was pretty fascinating to me given that I've only ever voted by mail, and haven't been inside a physical polling place in years. But the biggest takeaway I had, was that California goes to extraordinary lengths to ensure that everyone can vote. There's basically no situation in which someone who claims they are eligible to vote is denied being able to vote. Sure, they end up voting provisionally, but I think that is significantly better than turning them away and telling them they can't vote.

Filling the Digital Preservation Gap - final report available

Published 19 Oct 2016 by Jenny Mitcham in Digital Archiving at the University of York.

Today we have published our third and final Filling the Digital Preservation Gap report.

The report can be accessed from Figshare:

This report details work the team at the Universities of York and Hull have been carrying out over the last six months (from March to September 2016) during phase 3 of the project.

The first section of the report focuses on our implementation work. It describes how each institution has established a proof of concept implementation of Archivematica integrated with other systems used for research data management. As well as describing how these implementations work it also discusses future priorities and lessons learned.

The second section of the report looks in more detail at the file format problem for research data. It discusses DROID profiling work that has been carried out over the course of the project (both for research data and other data types) and signature development to increase the number of research data signatures in the PRONOM registry. In recognition of the fact that this is an issue that can only be solved as a community, it also includes recommendations for a variety of different stakeholder groups.

The final section of the report details the outreach work that we have carried out over the course of this final phase of the project. It has been a real pleasure to have been given an opportunity to speak about our work at so many different events and to such a variety of different groups over the last few months!

The last of this run of events in our calendars is the final Jisc Research Data Spring showcase in Birmingham tomorrow (20th October). I hope to see you there!

"wiki is currently unable to h