Sam's news

Here are some of the news sources I follow.

My main website is at

compatible version of wysiwyg in mediawiki

Published 20 Mar 2018 by John F. in Newest questions tagged mediawiki - Stack Overflow.

Is there any compatible and working package of WYSIWYG that works well with this version of mediawiki?

MediaWiki 1.30.0 PHP 7.2.3

I got this error while using the one given in the documentation count(): Parameter must be an array or an object that implements Countable inextensions/WYSIWYG/CKeditorParser.body.php on line 1071

I downloaded this version:-

Episode 4: Bernhard Krabina

Published 20 Mar 2018 by Yaron Koren in Between the Brackets: a MediaWiki Podcast.

Bernhard Krabina is a researcher and consultant for KDZ, the Centre for Public Administration Research, a Vienna, Austria-based nonprofit that focuses on improving and modernizing technology-based solutions in government at all levels within Europe. He has been involved with MediaWiki in government for the last 10 years.

Links for some of the topics discussed:


Published 20 Mar 2018 by fabpot in Tags from Twig.


Published 20 Mar 2018 by fabpot in Tags from Twig.

Guaranteed workaround for debugging "blank page" Mediawiki errors? [on hold]

Published 19 Mar 2018 by user1258361 in Newest questions tagged mediawiki - Server Fault.

I followed the instructions there (added the error_reporting and ini_set where specified) and it doesn't fix the blank page. Can someone post a link or instructions on how to guaranteed get the error reports?

Living with the consequences

Published 19 Mar 2018 by in New Humanist Articles and Posts.

Climate change is already here: we must stop debating deniers and start making tough decisions about what we will save.

How can I save whole wiki?

Published 15 Mar 2018 by Slagathor in Newest questions tagged mediawiki - Stack Overflow.

I'd like to save a whole wiki site (not Wikipedia, but a smaller wiki), with all of it's text and media. I'm using a Windows system. I'd prefer the simplest solution.

Book review: The Rage

Published 15 Mar 2018 by in New Humanist Articles and Posts.

A new book demonstrates the deep commonalities between far-right and Islamic extremists

"Babe" beta

Published 14 Mar 2018 by camiloh in blogs.

KDE Project:

Here's "Babe" beta, but before let me talk to you about what's going on with "Babe" right now.

if you're a student like me and want to start learning new things and work on a exciting open source project the following are some ways you could get involve.

If you feel like contributing and getting involve there are a lot of things you could start with:

BabeIt Platform:
UI design: designing the interface, icons, illustrations, assets , the look and brand of it...
UX design: planning the features, how things will work, helping design the features interactions etc...
Data analysis: working with Python on NLP of music information, web crawling by getting information and storing it. python, javascript, nodejs, mongodb. Also working on signal analysis on sound waves that would involve the Babe application and the BabeIt platform

UI design
UX design
Creating plugins in form of QML, JavaScript and JSON to extend Babe utilities and features
Developing on the background with modern C++ and the KDE frameworks to have a great performance and tight system integration
Developing on the front with QML, JavaScript and making use of Kirigami
Developing Network features like audio streaming on local networks
Helping to better integrate Babe into Android by making use Java and the Android API
Testing Babe on Plasma Mobile and other platform to make the app cross-platform: on Windows, IOS, MAC OS X... etc

Integrating more services to allow a much more wide network of information: making use of APIs from music information services and others like Twitter, FaceBook, Instagram, MusixMatch, deezer... etc and web crawling to find contextual information (making use of genetic algorithms and any other AI techniques)

Contact info:
We are on telegram as Babe
and my email is


Babe was previously released as a qwidgets application, and since a couple months back I've been working on porting the app to make use of QQC2 and also make use of the Kirigami framework, the porting is done and right now I'm adding new features and polishing things at the same time, working between the BabeIt platfrom and the Pulpo MIR system

Before I continue let me let you know that Babe is looking for a new name. There are a few suggestions but nothing definitive as of now, so you're welcome to suggest a new name, but before you jump in with ideas I'll tell you a little about what Babe is aiming to become.

The idea behind Babe is to become something more than just you regular music player, yes, it is a music player as a base, but the idea is to have a place for the music, that meaning: a place to share music with friends and other users, a place to get information about music and to manage your collection and let you discover new music, to stream between devices and so much more than just a regular music player. Babe wants to be a music platform and how is it doing it...?

BabeIt: (also looking for a new name)
Is a web platform in form of an API that will allow to share music information between users, and later on perform data analysis tasks on the metadata and the music lyrics by making use of NLP(natural language processing) techniques and web crawling to find relevant semantic and contextual information about the music. The platform is now up and running thanks to the KDE community at:

Babe will be the first app to make use of the API and hopefully some other music players or apps will find it useful and use it too.

Think of BabeIt as a mix of LastFm/Genius/SoundCloud, and who knows maybe a free open-source Spotify alternative. I will write another blog post about BabeIt later on to tell you what it would offer.

is the engine that finds metadata and other music information on the internet for your music collection, in order to generate semantic suggestions. It collects music lyrics, artwork, tags, stats, metadata, wikis etc...

Mobile ready:
Babe right now works on desktops and on Android and will work on your Plama Mobile device , fitting-in flawlessly thanks to the work of the Kirigami team.

There's an initial integration between devices named Linking. If you have Babe on your Android device and also installed on you GNU Linux distro you can browse your desktop collection from your phone and (still not ready) stream the tracks of you desire, and also vice-versa. Also if you have another friend around both will be able to listen to the same track from whoever collection is serving as the server. Hopefully that won't be consider illegal as it is working on a local network, let me know if that would be a problem on the app side...

Babe plans to integrate with different services, and many other services will be able to be created by the community in form of plugins written in QML/JavaScript and JSON.
As an early example there's a YouTube service that allows you to watch the video of the current playing track or whatever other video and even make searches inside the app... and if you have the youtube-dl tool installed in your system Babe will let you collect the track, find the right metadata and artwork for it and save it on your local music collection.
Also you can stream the audio from YouTube, expanding your music collection... :)

There a lot of little things and features you will be discovering by yourself when you try the app...

Things working:
- Regular music playback from local collection
- Adding and removing sources ( cannot remove the default sources like the ~/Music path...)
- Artwork, metadata, tags, wikis, lyrics fetching (on Android there's still a bug when collecting music lyrics that causes the app to crash)
- Creation and removal of playlists, syncing playlist with main-playlist, removing and adding tracks to playlists...
- YouTube searches and videos viewing, collecting tracks if youtube-dl is installed.
- Linking devices via IP address, searching tracks on the client from the server collection.
- Queueing tracks, remove missing tracks option, shuffle playback...
- Local collection searching making use of keywords, like : "artist:", "album:", "tag:", "similar:", "lyrics:", "playlist:" etc...
- Sharing tracks, on desktops via KDE Connect and on Android just the regular sharing options to share to whatever app
- Rating system and colouring tagging.

Things not working yet:
- Streaming tracks from device to device via Linking view
- youtube-dl integration under Android
- Association of tags/keywords with colour tags and playlists
- mpris controls
- android native notifications
- android background service to allow playback on the background
- folders view as a plugin example
- browsing of default playlists of linked device via Linking view
- babeit API integration
- some default playlists don't work yet
- remove dialog

Things missing:
- babeit integration
- soundcloud, deezer services integration
- youtube accounts integration
- equalizer
- suggestions view
- D.J mode
- albums dragging
- edit dialog for editing track metadata information

and probably I'm missing a lot more things i should have mentioned.

In a few days the app will be available on the Neon unstable repos, there will be an AppImage and you will find Babe as a default app on NitruxOS.

Until then here's the beta branch if you feel like compiling from source:

and to finish a big shout-out to NitruxOS and the KDE community for supporting this project. <3

Update 1.3.5 released

Published 14 Mar 2018 by Roundcube Webmail Dev Team in Roundcube Webmail Project News.

We proudly announce a new service release to update the stable version 1.3. It contains fixes to several bugs backported from the master branch. One can be called a minor security fix as it fixes blocking of remote content on specially crafted style tags.

See the full changelog in the release notes on the Github download page.

This release is considered stable and we recommend to update all productive installations of Roundcube with this version. Download it from

Please do backup your data before updating!

How We Support Remote Employees at DigitalOcean

Published 14 Mar 2018 by Amanda Brazzell in The DigitalOcean Blog.

How We Support Remote Employees at DigitalOcean

Remote culture at DigitalOcean is one of my favorite things to talk about when discussing my job. When I first joined the company in June of 2015, there was already a substantial percentage of existing remote employees (better known as our “remotees”). Working with the remotees wasn’t initially a part of my function, but as a member of the Employee Experience Team, I gradually found myself getting to know many of them more personally. I learned about their experiences as distributed employees, some of their pain points, and how it influences their engagement.

Since I've never been remote, I educated myself on best practices for companies with remote employees and how we could expand our top-notch employee experience to those outside of our HQ.

Two and a half years later, our remotee population totals over 200 employees, making up over 50% of our employees, and our program has grown to support both the needs of our business and those who work remotely. To date, remotees score higher in engagement than any other subgroup at the company. This has been attributed to the attention and effort we have actively given to support the remotee experience.

Here’s what we learned and how we adjusted our efforts to better support the remotee experience:

Remote Communication

“Watercooler talk” is an important aspect of working in-office, and it’s a practice that companies seeking to become more remote-friendly have trouble replicating. Being able to easily communicate with other colleagues helps improve team bonds and makes people feel part of the company fabric. At DO, we use several different mediums to avoid having remotees excluded from conversation and risking having information fall through the cracks:

Remote-inclusive Programs

While most of our teams at DigitalOcean are comprised of both in-office and remote employees, there is definite value in giving teams the opportunity to get together in person at different times during the year. Here are the processes we have in place to ensure teams get face time:

Perks for Remotees

While some companies see working from home as a perk in and of itself, we recreate many of the in-office perks and make them available to remotees. This is key to building a cohesive company culture and experience, and one where remotees feel engaged with the company at large.

Our remotes are able to participate in our workstation program, where they get access to different monitors, mouse/keyboards, and trackpads for their home offices, as well as credit up to $100 for headphones of their choice. The equivalent of our commuter benefit for in-house employees is providing remotes a credit toward the cost of either their monthly internet bill or their monthly coworking space membership. Additionally, remotes can opt into a monthly subscription snack box (because snacks are awesome!). Finally, DO covers travel and per diem costs, and provides accommodation at our corporate apartments for remotee visits to HQ.

"Love is What Makes Us Great"

DigitalOcean’s employee experience programs strives to be inclusive of all of our employees. We do this by keeping both the needs of in-office and remote employees in mind, and by adjusting our programs as needed to ensure they can change and scale with our growing organization. Removing obstacles to communication between people in our offices and remotes is essential for building cohesion across teams and to help everyone be the most productive employee they can be, no matter where they’re located.

Apply For a Job @ DO

Amanda Brazzell is DigitalOcean’s Office Experience Team Lead. She has helped build an effective Remote Experience program that drives dispersed employee engagement and job satisfaction. Amanda is a California native who moved to NYC without having ever visited the city before, and has been at DO since 2015.

How should optional or empty values be handled in Semantic Mediawiki?

Published 14 Mar 2018 by Topsy in Newest questions tagged mediawiki - Stack Overflow.

I am setting up some templates for a Semantic Mediawiki implementation. Template params are being fed into annotations. However, the values are optional; there's not always going to be a value in every field. This causes trouble with some data types. Specifically, if I have

| Has phone={{{phone}}}

I will get an error of the form URIs of the form *** are not allowed where *** is either {{{phone}}} or whatever default value I try to drop in there. It seems impossible for datatypes like phone or email to be empty. I cannot figure out how to support empty values for these fields in my templates. What is the correct pattern to use for null values in SMW annotations?

How to best add extensions when using official docker image for mediawiki?

Published 13 Mar 2018 by Streamline in Newest questions tagged mediawiki - Stack Overflow.

We are using the official MediaWiki Docker image and want to be able to add additional MediaWiki extensions.


  1. What is the recommended next step here if we are currently using the docker-compose file below were we mount volumes on the host? Is it to build a new image that wraps the official image? Is there an example somewhere of this modified new image for adding a mediawiki extension?
  2. Or can we just mount an extensions volume on the host in the current docker-compose and if needed make any adjustments the LocalSetting?

This link on the docker website refers to adding PHP extensions and libraries but its not clear to me if this is attempting to be the same answer if wanting to add MediaWiki specific extensions since it does clearly say "PHP Extensions". Or should this documentation page have actually said "MediaWiki Extensions" even though that implies they are written in PHP?

Here is our current docker-compose file entry for mediawiki:

  image: mediawiki
  container_name: mediawiki_production
  mem_limit: 4g
    - /var/www/mediawiki/uploads:/var/www/html/uploads
    - /var/www/mediawiki/LocalSettings.php:/var/www/html/LocalSettings.php
    - TERM=xterm
  restart: always
  network_mode: bridge

The extensions we are considering that are not part of the official image first off are (but would like a scalable solution for more later):

Any examples of an downstream docker image that uses the official mediawiki image as its "FROM" to include a mediawiki extension(s) and an updated docker-compose (if both are required) to be able to add mediawiki extensions would be helpful. Perhaps it may be good to explain what needs to change if the mediawiki extension itself relies on php extensions or libraries that are not already included in base image already vs adding a mediawiki extension that doesn't rely on any additional php extensions or libraries.

htaccess subdomain to "folder" mediawiki

Published 13 Mar 2018 by mapple in Newest questions tagged mediawiki - Stack Overflow.

I'm running a mediawiki installation with a "nice" URL htaccess to show

I'd like to enable localization and translation on subdomains.

I'm thinking displays content from, and displays content from

I think that should be achievable with .htaccess, but I'm not a mediawiki expert. Is there anyone out there that can help with enabling mediawiki translations on a subdomain like the above? I think it's just an .htaccess question, but not sure what it might break with mediawiki too :)

Facebook Inc. starts cannibalizing Facebook

Published 13 Mar 2018 by Carlos Fenollosa in Carlos Fenollosa — Blog.

Xataka is probably the biggest Spanish blogging company. I have always admired them, from my amateur perspective, for their ability to make a business out of writing blogs.

That is why, when they invited me to contribute with an article about the decline of Facebook, I couldn't refuse. Here it is.

Facebook se estanca, pero Zuckerberg tiene un plan: el porqué de las adquisiciones millonarias de WhatsApp e Instagram, or Facebook is stagnating, but Zuckerberg has a plan: the reason behind the billion dollar acquisitions of WhatsApp and Instagram.

Tags: facebook, internet, mobile

Comments? Tweet  

How do I track down the source definition of a custom hook event in a Mediawiki extension?

Published 12 Mar 2018 by user1258361 in Newest questions tagged mediawiki - Stack Overflow.

Here's an example:

A Mediawiki extension where Hooks::run( 'PageForms::BeforeFreeTextSubst', ...) gets invoked but there's no other record or trace of where it's defined. If there was some mapping of strings/names to functions it would be registered somewhere else, and if it was a function name it should show up somewhere else.

I'm seeing this with a few other function hook events.

Have postmodernist thinkers deconstructed truth?

Published 12 Mar 2018 by in New Humanist Articles and Posts.

Bruno Latour once sparked fury by questioning the way scientific facts are produced. Now he worries we’ve gone too far.

How to disable arbitrary PHP 7 code in directory?

Published 9 Mar 2018 by Louise in Newest questions tagged mediawiki - Stack Overflow.

Question at the bottom.

According to the official Mediawiki security guide, I have to

<Directory "/Library/MediaWiki/web/images">
   # Ignore .htaccess files
   AllowOverride None

   # Serve HTML as plaintext, don't execute SHTML
   AddType text/plain .html .htm .shtml .phtml .php .php3 .php4 .php5 .php7

   # Don't run arbitrary PHP code.
   php_admin_flag engine off

   # If you've other scripting languages, disable them too.

However with Apache 2.4.29 and PHP 7.1.15 I get

Invalid command 'php_admin_flag', perhaps misspelled or defined by a module not included in the server configuration

According to this post the solution is

You cannot use php_admin_value/php_admin_flag with PHP compiled as CGI (suPHP), because these options are only supported when PHP is compiled as a module of Apache. Feel free to use php.ini sections to change the settings ( Otherwise - just switch to mod_ruid2+mod_php and you'll be able to use php_admin_value in Apache configuration files.

Even when I have installed

dnf -y install httpd php php-mysqlnd php-gd php-xml php-mbstring mod_ruid2

I get the error, despite having mod_php and mod_ruid2.


Can someone translate the above into what I actually need to do in my case in plain English?

Issue uploading template with MediaWiki

Published 8 Mar 2018 by user3684314 in Newest questions tagged mediawiki - Stack Overflow.

I'm trying to upload a template from Wikipedia into my MediaWiki instance. Basically, I go to and type Template:Infobox under "Add pages manually". Then I select "Include templates", unselect "Include only the current revision, not the full history", and click "Export". The exported file size is 30,422 KB (~30 MB).

When I tried to import at Special:Import on my local MediaWiki instance (version 1.30.0), it basically timed out, saying that I had to login to perform the task, even though I was already logged in.

After reading up on the error, I changed the following:

However, following these steps, stopping the server and then restarting it, now results in one of several types of errors. Either it hangs indefinitely, with "waiting for localhost..." or some localhost URL in the bottom left corner (but this eventually goes away, not uploading at all), it stops entirely, printing a full page error (although this hasn't happened recently), or it results in a database query error, wherein deadlock has occurred.

The database query error is:

Import failed: A database query error has occurred. 
Did you forget to run your application's database schema updater 
after upgrading? Query: INSERT IGNORE INTO `locpage` 
(page_namespace,page_title,page_restrictions,page_is_redirect,page_is_new,page_random,page_touched,page_latest,page_len) VALUES 
Function: WikiPage::insertOn Error: 
1213 Deadlock found when trying to get lock; try restarting transaction (localhost)

I'm up for any ideas on this! Please let me know if you need more information!

Book review: The Impostor

Published 8 Mar 2018 by in New Humanist Articles and Posts.

Javier Cercas' "novel without fiction" is a fascinating analysis of a nation in denial.

Introducing Community DO-ers: March Edition

Published 7 Mar 2018 by Andrew Starr-Bochicchio in The DigitalOcean Blog.

Introducing Community DO-ers: March Edition

Here at DigitalOcean one of our core values is "our community is bigger than just us". From our support of the broader open source community to making our tutorials as platform agnostic as possible, we believe that contributing knowledge and resources to the community benefits not just ourselves but all members – past, present, and future.

We never could have anticipated the amazing amount of support we've received in return. You’ve built open source tools using our API, hosted Meetups across the globe, shared your DigitalOcean stories, and so much more. We wouldn’t be where we are today without you.

We're now six years into this journey and want to start recognizing our members more regularly. So today we are excited to highlight some of our most active Community contributors—our DO-ers!

Marko Mudrinić (@xmudrii)

It’s hard to overstate just how lucky we are to have people like Marko in our Community; he’s an all around rockstar whose contributions span from ocean to ocean. Marko is one of the most prolific users on our Community Q&A platform, where he helps users learn about and build on DigitalOcean. He’s written tutorials on topics like Prometheus and Go, but also puts that knowledge into practice. He is the most active contributor to doctl, our open source command line interface, and has worked extensively on DigitalOcean support in Kubicorn to help users get up and running with Kubernetes.

Mateusz Papiernik (@maticomp)

Mateusz's passion for giving back to the Community inspires us. He has been sharing his technical expertise with us for many years, which you can enjoy in the dozens of tutorials he has published on topics from ProxySQL to Nginx optimization. With even more in the works, he has already helped hundreds of thousands of readers. His genuine enthusiasm and drive to aid others shines through in his writing and his collaboration with our editorial team.

Peter Hsu (@peterdavehello)

Peter is an open source enthusiast who is always going above and beyond. He has traveled across Taiwan to share DigitalOcean with his community—from COSCUP in Taipei to MOPCON in Kaohsiung. As the maintainer of the CDNJS (a free, public, and open-source CDN service), he helps to power millions of websites across the globe. Closer to home, he is an organizer of the DigitalOcean Meetup group in Hsinchu, Taiwan, which is quickly approaching 600 members. With nine events in 2017—including the first Hacktoberfest event of the year—it’s one of our most active Meetups!

Marko, Mateusz, and Peter exemplify some of the best qualities found in our community. All three share our enthusiasm for open source and passion for knowledge-sharing. But they’re not alone! We look forward to recognizing more of our amazing Community members in the coming months.

Are you interested in getting more involved in the DigitalOcean Community? Here are a few places to start:

How do I set up a subdomain on my locally hosted IIS server?

Published 6 Mar 2018 by test in Newest questions tagged mediawiki - Stack Overflow.

I have IIS (8.5) running on a local Windows Server 2012 computer named (for the purposes of this question) SERVER_1. This server has CGI/FastCGI running and working on it. I use this server for various things, but one of the things I use it for is for hosting an internal Wiki (MediaWiki). My design choice, not looking for online solutions.

Everything is working with my current binding that is:

Type    Host Name    Port    IP Address     Binding Information
http    SERVER_1     80      *

If I go to a Windows browser from an intranet computer I can hit my Wiki page by going to http://SERVER_1 (unless it's on Mac - different issue I haven't figured out yet).

What I'm looking to do, however, is have it be accessible by going to http://wiki.SERVER_1 but no matter how I configure the bindings it doesn't seem to work. For example:

Type    Host Name         Port    IP Address     Binding Information
http    wiki.SERVER_1     80      *

Can anyone shed some light on this?

Episode 3: Mike Cariaso

Published 6 Mar 2018 by Yaron Koren in Between the Brackets: a MediaWiki Podcast.

Mike Cariaso is the co-founder of SNPedia, a MediaWiki-based repository of genomic information (founded in 2006), and the creator of Promethease, personal genetic analysis software that uses SNPedia's data.

Links for some of the topics discussed:

Locking down Mediawiki permissions

Published 6 Mar 2018 by melat0nin in Newest questions tagged mediawiki - Stack Overflow.

I'm trying to lock down my Mediawiki 1.30 installation so absolutely nobody can read or edit it, as a baseline to start with group permissions. However, nothing seems to work, and I can't understand why.

At the bottom of my LocalSettings.php file I have the following entries:

$wgGroupPermissions['*']['edit'] = false;
$wgGroupPermissions['*']['read'] = false;
$wgRevokePermissions['user']['edit'] = true;
$wgRevokePermissions['user']['read'] = true;

which I believe should mean that nobody can do anything with the wiki. Obviously this isn't my ultimate goal, but it's a baseline I'd like to get to.

As it is, registered users can login and edit the wiki, while anonymous users cannot.

Can anyone explain what the problem is? It seems as thought nothing I change makes a difference, but I'm definitely editing the correct file...

Is there a way back from sectarianism in Iraq?

Published 5 Mar 2018 by in New Humanist Articles and Posts.

Dictatorship, invasion and war have placed Iraq’s religious and atheist minorities under mortal threat.

Self-hosted websites are doomed to die

Published 3 Mar 2018 by Sam Wilson in Sam's notebook.

I keep wanting to be able to recommend the ‘best’ way for people (who don’t like command lines) to get research stuff online. Is it Flickr, Zenodo, Internet Archive, Wikimedia, and Github? Or is it a shared hosting account on Dreamhost, running MediaWiki, WordPress, and Piwigo? I’d rather the latter! Is it really that hard to set up your own website? (I don’t think so, but I probably can’t see what I can’t see.)

Anyway, even if running your own website, one should still be putting stuff on Wikimedia projects. And even if not using it for everything, Flickr is a good place for photos (in Australia) because you can add them to the Australia in Pictures group and they’ll turn up in searches on Trove. The Internet Archive, even if not a primary and cited place for research materials, is a great place to upload wikis’ public page dumps. So it really seems that the remaining trouble with self-hosting websites is that they’re fragile and subject to complete loss if you abandon them (i.e. stop paying the bills).

My current mitigation to my own sites’ reliance on me is to create annual dumps in multiple formats, including uploading public stuff to IA, and printing some things, and burning all to Blu-ray discs that get stored in polypropylene sleeves in the dark in places I can forget to throw them out. (Of course, I deal in tiny amounts of data, and no video.)

What was it Robert Graves said in I, Claudius about the best way to ensure the survival of a document being to just leave it sitting on ones desk and not try at all to do anything special — because it’s all perfectly random anyway as to what persists, and we can not influence the universe in any meaningful way?

Trying to use 'ugly urls' with mediawiki install

Published 3 Mar 2018 by The Winter One in Newest questions tagged mediawiki - Stack Overflow.

So, I'm in a bit of a weird place. I have a local mediawiki install I've been trying to get running.

Installation went off without a hitch, however I cannot access the wiki when it uses the more normal URL ("localhost/mywiki/index.php/Main_Page"). However, it works fine when using the so-called 'ugly' URL ("localhost/mywiki/index.php?title=Main_Page").

So, with that in mind, I've been trying to set up my mediawiki (version 1.29) to use the "ugly URL"s via the "$wgArticlePath" setting. I've tried a number of different setups, including the following:

But none of them will actually return what I need them to. If anyone has more experience with mediawiki, I'd appreciate the assist.


Published 3 Mar 2018 by fabpot in Tags from Twig.


Published 3 Mar 2018 by fabpot in Tags from Twig.

mediawiki account creation with google only

Published 3 Mar 2018 by John F. in Newest questions tagged mediawiki - Stack Overflow.

I am using media wiki version 1.30.0. I want people to create account through google login only i.e. I don't want people to manually create account by adding username and email but only through google login. Is it possible to do that? ( P.s. Is it possible to add google login in Request Account?).


Published 2 Mar 2018 by Sam Wilson in Sam's notebook.

I think I am learning to love paperbacks. (Am hiding in New Editions this morning.)


Published 2 Mar 2018 by fabpot in Tags from Twig.


Published 2 Mar 2018 by Sam Wilson in Sam's notebook.

This seems cool:


Published 2 Mar 2018 by fabpot in Tags from Twig.

MediaWiki PopupPages extension not working

Published 1 Mar 2018 by Jacobo Tapia in Newest questions tagged mediawiki - Stack Overflow.

I've already installed the PopupsExtension ( ) and the MagicNoCache that is needed to run the Popups. I also checked my Special:Version page and everything is installed.

Then I create a random page that I called MediaWiki:Example, but when I tried to called it using the Popups Extension with the following line: {{#popup:MediaWiki:PopupPage PopupPage | policy=cookie-out,groups-out}} there is only a grey page that says Article Missing.

Am I doing something wrong?

The Deep End Podcast Ep. 14: Making Sense of It All with Attentive

Published 1 Mar 2018 by Hollie Haggans in The DigitalOcean Blog.

The Deep End Podcast Ep. 14: Making Sense of It All with Attentive

There’s such a thing as “too much information”, especially for companies scaling out their sales operations. That’s why Attentive was born in 2015: to help sales teams make their increasing pipelines simpler to manage. Indeed, the small, Portugal-based team is itself focused on scaling, having participated in accelerator programs like Techstars.

In this episode, Attentive founder and CTO Pedro Araújo talks about what it takes to build a tech product from the ground up. Discover their approach to running an engineering team, from adopting new open source technologies, to onboarding junior developers and learning about cloud infrastructure.

Subscribe to the The Deep End Podcast on iTunes and Spotify, or listen to the latest episode on SoundCloud below:

Hollie Haggans heads up Global Partnerships for DigitalOcean’s Hatch program. She is passionate about startups and cold brew coffee. Get in touch with questions at

"Solutions exist, and we can find better ones"

Published 27 Feb 2018 by in New Humanist Articles and Posts.

Q&A with linguist and cognitive scientist Steven Pinker.

Migrating old mediawiki at trusty VM to xenial

Published 26 Feb 2018 by Rancor in Newest questions tagged mediawiki - Stack Overflow.

I need to migrate a 1.21.2 mediawiki (php 5.3.10 - apache 2.2.22 - mysql 5.5.41) to Nginx/php/mariadb latest versions (planing on using same version of mediawiki and then upgrading it)

I'm not familiar with php and migrating from trusty to xenial, I tested mediawiki migration from xenial to xenial and worked flawlessly, but I'm pretty sure it won't be that easy in this scenario, I'm worry the most about PHP 5.3 > 7.

So, I need some advices if you have. Thanks!!

MediaWiki – Search for URL without results

Published 26 Feb 2018 by Kevin Lieser in Newest questions tagged mediawiki - Stack Overflow.

I have a MediaWiki site where I have a domain name as headline. For example: defined as Headline 2 – not linked with the domain. Just Plain text as H2.

But when I try to search for *mydo* it does not output any result(s). When I remove the www. and the .org it gets found.

How can I solve the problem? The search functionality in MediaWiki is not pretty good in my opinion so far...

An eye for changing times

Published 26 Feb 2018 by in New Humanist Articles and Posts.

The return of Queer Eye for the Straight Guy marks a revival of “nice” reality TV, after a decade of screaming matches and controversy

Conference at UWA – Home 2018

Published 26 Feb 2018 by Tom Wilson in thomas m wilson.

I’ll be presenting a paper at the following conference in July 2018.  It will be looking at the theme of aspirations for home ownership from the perspective of Big History.  Hope to see you there.

Conference at UWA – Home 2018

Published 26 Feb 2018 by Tom Wilson in thomas m wilson.

I’ll be presenting a paper at the following conference in July 2018.  It will be looking at the theme of aspirations for home ownership from the perspective of Big History.  Hope to see you there.

Number of categories in Wikipedia

Published 25 Feb 2018 by Eve.F in Newest questions tagged mediawiki - Stack Overflow.

I was wondering if there's any chance I can find the total number of categories in main namespace english wikipedia? I looked up wikipedia statistics, but it seems like they only have information about article counts. Thanks.


Published 24 Feb 2018 by Sam Wilson in Sam's notebook.

I’ve been attempting to write to people again lately. As in, proper letters on paper and in envelopes and stuck through holes in walls and doors. It doesn’t work though. Ten years ago I wrote to people, and it was reasonably easy although one had to ignore the anachronistic self-consciousness. Now, it feels like writing a telegram, for all the relevance it has to modern life. And doing so on some sort of rare letterpress’d form at that — the mechanics have become harder, the whole thing far less familiar. Where even is there a post box around here? Do stamps still come in booklets? What’s it even cost to send a letter? Only people having weddings send things in the post these days.

I once wrote a little system for writing email-letters. It was a bit like Gmail’s system of having the reply-box at the bottom of the to-and-fro conversation, except it went to further extremes of actually deleting the quoted reply text from emails, and of actually tracking correspondents as entities in their own right and not just by email address. It also prohibited writing to more than one person at once.

It feels like there’s a place for a letter-writing system that really is just email but also isn’t one’s normal email client (be that Fastmail, Gmail, Thunderbird, or whatever). Writing to a friend should be a different act to tapping off a note to a colleague or haggling with a civil servant. The user interface should reflect that. It should be simpler, calmer, and prioritise longer paragraphs and better grammar. (I’ve read similar sentiments relating to the design of the Discourse forum software; the developers of that want the software to shunt people towards better discussions, and I’m pretty sure Google don’t have anything like that idea with the Gmail interface. No one wants to write a letter on a blotter edged with full-colour advertisements for Fletcher’s Fantastic Fumigator, and Google want you to use the exact same interface for work and for social interaction. Doesn’t seem like a good idea to me.)

I’d still be using my email archiver, but it dates from an age before two-factor authentication, and improvements in the security of email providers broke it and I’ve not yet gotten around to fixing it. Perhaps it’s time to do so.

Why do socialists only drink herbal tea?

Published 24 Feb 2018 by Sam Wilson in Sam's notebook.

Because they’re sick of non-semantic CSS class names, and of not having sensible default formatting for the main, header, article, section, aside, footer, and nav tags.

No, it’s actually because property is theft!

A few years ago I came across Marx CSS reset, which is a simple and small (7.92 KB) stylesheet that provides decent formatting not only for the usual HTML element, but also all the new HTML5 ones. It does it by not expecting anyone to do things like <ul class="nav">…</ul> but rather <nav><ul>…</ul></nav> which mightn’t seem like much, but it feels better to me.

Maybe I just wish to live in a classless society.


Published 24 Feb 2018 by Sam Wilson in Sam's notebook.

Anxiety is really the most horrible thing.

Mediawiki Error: Call to undefined method User::saveToCache()

Published 23 Feb 2018 by Dortmunder in Newest questions tagged mediawiki - Stack Overflow.

I recently updated a mediawiki installation to 1.30 and are now working on the plugin errors that came in the trail of the update.

My most immediate error is with the RadiusAuthPlugin for Mediawiki. When trying to login the following error is displayed on the page but the login was successful:

[3d3906e176c5476982ff8037] /MEDIAWIKI/index.php?title=Spezial:Login&returnto=Mainpage Error from line 75 of /usr/srv/www/vhosts/HOSTNAME/htdocs-ssl/MEDIAWIKI/extensions/RadiusAuthPlugin/RadiusAuthPlugin.php: Call to undefined method User::saveToCache()


#0 /usr/srv/www/vhosts/HOSTNAME/htdocs-ssl/MEDIAWIKI/includes/auth/AuthPluginPrimaryAuthenticationProvider.php(145): RadiusAuthPlugin->updateUser(User)

#1 /usr/srv/www/vhosts/HOSTNAME/htdocs-ssl/MEDIAWIKI/includes/Hooks.php(177): MediaWiki\Auth\AuthPluginPrimaryAuthenticationProvider->onUserLoggedIn(User)

#2 /usr/srv/www/vhosts/HOSTNAME/htdocs-ssl/MEDIAWIKI/includes/Hooks.php(205): Hooks::callHook(string, array, array, NULL)

#3 /usr/srv/www/vhosts/HOSTNAME/htdocs-ssl/MEDIAWIKI/includes/auth/AuthManager.php(2388): Hooks::run(string, array)

#4 /usr/srv/www/vhosts/HOSTNAME/htdocs-ssl/MEDIAWIKI/includes/auth/AuthManager.php(690): MediaWiki\Auth\AuthManager->setSessionDataForUser(User, boolean)

#5 /usr/srv/www/vhosts/HOSTNAME/htdocs-ssl/MEDIAWIKI/includes/auth/AuthManager.php(382): MediaWiki\Auth\AuthManager->continueAuthentication(array)

#6 /usr/srv/www/vhosts/HOSTNAME/htdocs-ssl/MEDIAWIKI/includes/specialpage/AuthManagerSpecialPage.php(353): MediaWiki\Auth\AuthManager->beginAuthentication(array, string)

#7 /usr/srv/www/vhosts/HOSTNAME/htdocs-ssl/MEDIAWIKI/includes/specialpage/AuthManagerSpecialPage.php(482): AuthManagerSpecialPage->performAuthenticationStep(string, array)

#8 /usr/srv/www/vhosts/HOSTNAME/htdocs-ssl/MEDIAWIKI/includes/htmlform/HTMLForm.php(669): AuthManagerSpecialPage->handleFormSubmit(array, VFormHTMLForm)

#9 /usr/srv/www/vhosts/HOSTNAME/htdocs-ssl/MEDIAWIKI/includes/specialpage/AuthManagerSpecialPage.php(416): HTMLForm->trySubmit()

#10 /usr/srv/www/vhosts/HOSTNAME/htdocs-ssl/MEDIAWIKI/includes/specialpage/LoginSignupSpecialPage.php(316): AuthManagerSpecialPage->trySubmit()

#11 /usr/srv/www/vhosts/HOSTNAME/htdocs-ssl/MEDIAWIKI/includes/specialpage/SpecialPage.php(522): LoginSignupSpecialPage->execute(NULL)

#12 /usr/srv/www/vhosts/HOSTNAME/htdocs-ssl/MEDIAWIKI/includes/specialpage/SpecialPageFactory.php(578): SpecialPage->run(NULL)

#13 /usr/srv/www/vhosts/HOSTNAME/htdocs-ssl/MEDIAWIKI/includes/MediaWiki.php(287): SpecialPageFactory::executePath(Title, RequestContext)

#14 /usr/srv/www/vhosts/HOSTNAME/htdocs-ssl/MEDIAWIKI/includes/MediaWiki.php(851): MediaWiki->performRequest()

#15 /usr/srv/www/vhosts/HOSTNAME/htdocs-ssl/MEDIAWIKI/includes/MediaWiki.php(523): MediaWiki->main()

#16 /usr/srv/www/vhosts/HOSTNAME/htdocs-ssl/MEDIAWIKI/index.php(43): MediaWiki->run()

#17 {main}

(I censored identifying parts of the path for security reasons)

The RadiusAuthPlugin.php:



class RadiusAuthPlugin extends AuthPlugin
    function userExists($username)
        return TRUE;

    function authenticate($username, $password)
        global $wgRadiusAuthPluginServers, $wgRadiusAuthPluginSecret;

        $username = strtolower($username);

        foreach($wgRadiusAuthPluginServers as $server)
          $radius = new Radius($server, $wgRadiusAuthPluginSecret);
          //$radius->SetNasIpAddress('NAS_IP_ADDRESS'); // Needed for some devi$
          //fwrite($handle,"Radius Objekt angelegt\n");
          if ($radius->AccessRequest($username, $password))
              return TRUE;
              //fwrite($handle,"Auth successful\n");
          //    fwrite($handle,"Auth not successful\n");
          //  }

        return FALSE;

    function modifyUITemplate(&$template, &$type)
        global $wgRadiusAuthPluginExtrafields;
        $template->set('usedomain', FALSE);
        $template->set('useemail', FALSE);
        $template->set('create', FALSE);
        //$template->set('create', TRUE);
        $template->set('canremember', FALSE);
        $template->set('extrafields', $wgRadiusAuthPluginExtrafields);

    function autoCreate()
        global $wgRadiusAuthPluginAutoCreate;
                return $wgRadiusAuthPluginAutoCreate;
        return FALSE;

    function validDomain($domain)
        return TRUE;

    function updateUser(&$user)
        global $wgRadiusAuthPluginMaildomain;
        return TRUE;

    function allowPasswordChange()
        //return false;
        return TRUE; // since Ubuntu 14.04

    function allowPropChange($prop = '')
        return FALSE;

    function allowSetLocalPassword()
        return true;

    function setPassword($user, $password)
        return true;

    function updateExternalDB($user)
        return true;

    function canCreateAccounts()
#       return FALSE;
        return TRUE;

    function adduser($user, $password, $email = '', $realname = '')
        return false;
#        return true;

    function strict()
        global $wgRadiusAuthPluginStrict;
                return $wgRadiusAuthPluginStrict;
        return TRUE;

    function strictUserAuth($user)
        global $wgRadiusAuthPluginStrictUserAuth;
                return $wgRadiusAuthPluginStrictUserAuth;
        return TRUE;

    function initUser(&$user, $autocreate = false)
        global $wgRadiusAuthPluginMaildomain, $wgSitename, $wgRadiusAuthPluginM$
        $user->sendMail("[".$wgSitename."] ".$wgRadiusAuthPluginMailSubject,$wg$
        //$user->removeGroup("auto-registered User");

$wgExtensionCredits['other'][] = array(
    'name' => 'RadiusAuthPlugin',
    'version' => '1.1.0',
    'author' => 'James Young',
    'author' => 'edited by Andreas Ihrig',
    'description' => 'Automatic login with a RADIUS server; now with Setting-Op$


I have no idea what to do about that and I'm grateful for any help.

Thanks so far

Onward and Upward Together

Published 22 Feb 2018 by Ben Uretsky in The DigitalOcean Blog.

As we turn the page on 2017, I’m proud to share that DigitalOcean had another tremendous year of rapid growth and strong profitability, a combination which few tech companies have achieved at our scale. We are rapidly approaching $200M in annual recurring revenue and are looking forward to celebrating our 6th anniversary next month. The key to our success is our disruptive offering — a cloud computing platform that is engineered with simplicity at the core — and our vibrant, growing developer community. We see a substantial and growing market need, and believe that DigitalOcean is perfectly positioned to lead this category in the years ahead.

While we have enjoyed great success since I co-founded the company in 2012, I believe we have barely scratched the surface. I’ve been reflecting on our next phase of growth and what it will take to reach our full potential, and it’s become clear to me that now is the right time to identify my successor as CEO of DigitalOcean.

I recognize where my strengths lie and where others will have more experience to give. With all of the exciting opportunities in front of us, including the possibility of an IPO — a long-term goal we have frequently discussed internally — I feel a new seasoned executive will be best to guide the company through the next chapter of our journey. We have engaged a leading search firm to help us find a great leader. One that will be inspirational, able to scale our operations beyond 1,000 people, evolve our go-to-market strategy, and help us reach our audacious vision. Someone who can build a global brand that could potentially help us become a publicly-traded company with the simplest cloud platform for developers to run applications of any size.

Once we’ve identified this person, I’ll be taking on a new role as Chairman of the Board, which will allow me to support our company vision and strategy while working closely with the new CEO.

When Moisey, Mitch, Alec, Jeff, and I started the company in 2012, we left our families and friends in New York to join the Techstars program in Colorado. We slept on bunk beds and worked relentlessly pretty much every day until midnight. Finding product-market fit didn’t happen overnight and it took months of iterating and refining our product offering. We had 400 users when we graduated from the Techstars program, and while we knew we had developed something special, trying to raise venture capital at that time was a real uphill battle. We heard many “no’s” from investors along the way, but believed in our long-term vision.

After returning to a small office in New York City, we launched the first SSD virtual machine service with unprecedented price-to-performance on January 15th, 2013. We instantly went from signing up a couple of new users per day to more than 100. I vividly remember sitting at our kitchen table with the co-founding team, having to manually install SSDs into our servers to keep up with the demand. It’s been a humbling journey to say the least, and I could not have imagined the growth, success, and scale we would achieve only five years later. DigitalOcean has accomplished so many incredible things over the years and I know that our product, people, and operations have never been stronger.

Aug 9, 2012 - Mitch, Alec, Moisey, me and Jeff walking on stage for Techstars demo day

We have raised $123M from some of the world’s leading VCs that share our belief that the developer will lead the continuing technology revolution. Today, we have a team of 400-plus employees around the world with growing offices in New York, Cambridge, Mass., and Bangalore. Our user base has grown with us and last year we crossed one million users from almost every country in the world. Over the last few years, our product went from a single offering, Droplet, to a complete cloud platform. We are extremely proud to be one of the largest and fastest-growing cloud providers in the world.

I’ve always said that putting the business first and doing what is right for DigitalOcean is my highest priority. I’m making this decision knowing that DigitalOcean’s best days are still to come. We have never been in a better position to begin this transition. We have a great leadership team in place, the business has very strong momentum, and we are a clear leader in our industry. I’m confident that our new CEO will be able to rapidly build on this strong foundation.

No matter who our next leader is, one thing that definitely won’t change is our unwavering commitment to delivering the industry’s simplest cloud computing platform, while building one of the world’s largest developer communities. All of the core elements that have contributed to our success — the powerful simplicity of the product, the dedication and talent of the team, and the passionate community of developers that we serve — will remain the same.

I am tremendously excited about DigitalOcean’s future and the milestones ahead. I want to thank everyone who has helped turn our dream and passion into reality. The skills I have learned and friendships I have made while helping to build this company will last me a lifetime, for which I will be forever grateful and I couldn’t be more excited for the journey ahead.

Onward and upward together,
Ben Uretsky

Why the science fiction of my youth is disturbing me again

Published 22 Feb 2018 by in New Humanist Articles and Posts.

As our politics retreats from rationality, we must avoid complacency.


Published 20 Feb 2018 by Sam Wilson in Sam's notebook.

There’s a new extension recently been added to, called DocBookExport. It provides a system of defining a book’s structure (a set of pages and some title and other metadata) and then pipes the pages’ HTML through Pandoc and out into DocBook format, from where it can be turned into PDF or just downloaded as-is.

There are a few issues with getting the extension to run (e.g. it wants to write to its own directory, rather than a normal place for temporary files), and I haven’t actually managed to get it fully functioning. But the idea is interesting. Certainly, there are some limitations with Pandoc, but mostly it’s remarkably good at converting things.

It seems that DocBookExport, and any other MediaWiki export or format conversion system, works best when the wiki pages (and their templates etc.) are written with the output formats in mind. Then, one can avoid things such as web-only formatting conventions that make PDF (or epub, or man page) generation trickier.


Published 20 Feb 2018 by Sam Wilson in Sam's notebook.

There’s not really much that can be said on Twitter that can’t instead be said much more verbosely and with far fewer people seeing it on one’s own blog. There’s no limit to the meaningless silly things you can write on the internet, so they might as well be written in one’s own place.

I’m just a bit sick of the non-chronological nature of Twitter, where some mysterious inner force in the machine is telling me what’s “important”, and leaving me with the feeling that it’s not showing me things that I might actually want to see.

So I think I’ll come back here, to blog in this quiet secluded corner of the web. This, in combination with a bare-bones chronological RSS reader, has worked pretty well for 15 years; might as well carry on with it.

Episode 2: Niklas Laxström

Published 20 Feb 2018 by Yaron Koren in Between the Brackets: a MediaWiki Podcast.

Niklas Laxström is the creator and co-maintainer of, the site where MediaWiki and most of its extensions (along with other software, like OpenStreetMap) gets translated into hundreds of languages. Niklas also works for the Wikimedia Foundation as part of the Language team, where he helps to develop code related to translation and internationalization, most notably the Translate extension.

Links for some of the topics discussed:

MediaWiki 1.21.3 mass deletion

Published 20 Feb 2018 by Sam in Newest questions tagged mediawiki - Stack Overflow.

I want to delete almost 1700 uploads (files and documents) from my MediaWiki wiki.

MediaWiki version: 1.21.3

I cannot use the DeleteBatch extension, it's incompatible with this version.

So, how to delete these many uploads at once with reason to delete?

Meet the DigitalOcean Brand Design Team

Published 20 Feb 2018 by Stephanie Morillo in The DigitalOcean Blog.

Meet the DigitalOcean Brand Design Team

As a company, we’ve always cared about contributing to developer culture in an authentic way, and one of the ways we do that is by adding moments of visual delight to everything we do, whether it's a Community tutorial, an interaction in the control panel, or a T-shirt at a conference. That is why, from the very beginning, DigitalOcean put an emphasis on building out a Brand Design team comprised of not just proficient graphic designers, but brilliant illustrators as well.

The Brand Designers at DigitalOcean are challenged every single day to transform extremely technical and esoteric content into approachable and friendly touch points. Lead Visual Designer Masami Kubo says, “We believe these technologies should be accessible to everyone, and a part of that is acknowledging and celebrating the diverse and quirky personality behind the humans that build these amazing things. Visuals and branding throughout the cloud computing industry are often disregarded or unconsidered, so it’s a unique opportunity for us as designers to bring that culture to life.”

We interviewed DO’s Brand (Visual) Designers Kasia Bojanowska, Masami Kubo, Pat Raubo, and Alex Mostov to learn more about their design process, how they illustrate technical concepts, and where they turn to for inspiration.

How do you approach technical topics as illustrators?

Masami: We’ve been illustrating technical topics for years, so the challenge now is how to keep it fresh and relevant. However, if we push the imagery too conceptual or meta, we run the risk of none of it making any sense to our audience. My approach now is to identify the primary action or message behind complex concepts, and focus on making that one thing really clear. I like to start minimal, then add elements sparingly to not distract from the primary message.

Alex: I came to the DigitalOcean team without much technical knowledge. In some ways I think this has actually been an advantage in creating conceptual illustrations. I create images that help me understand the concepts. I think and hope that inherently makes them more intuitive to others, too.

Where do you draw inspiration from for your designs?

Kasia: When starting a new project I definitely try to spend a good chunk of time looking for inspirations. Google image search, Pinterest, Dribbble, Behance are all wonderful resources for that. We have a few shared pinterest boards with stuff we like. I also get really inspired when I see great work being made by others on our team.

Pat: One of the benefits of working with a team of such enormously talented designers is that I draw inspiration from them and their work all the time. Masami and Kasia both do amazing work, and I’ve learned a great deal from both of them, as well as from Alex. I try to seek out inspiration from a number of things. Some have a pretty clear association with the kind of work we do at DO, like design and illustration done specifically for tech, but I also draw from editorial illustration, film, comics, and book covers, among other sources.

Meet the DigitalOcean Brand Design TeamIllustrations by Kasia Bojanowska, Patricia Raubo, & Alex Mostov

How do you come up with new ideas for similar technical topics?

Masami: I think it actually helps for imagery with similar technical topics to have a common thread of imagery, so as to build a visual association. We have strict style guides for most of our platforms and campaigns, but some of these style guides allow for permutation in aesthetics to avoid looking too repetitive over time.

Pat: I like to first do some research to understand the basic concept of what I’m going to illustrate, and then add to my notes with simple schematics and/or sketches to see if there’s anything I can pull from those for the final visuals.

Alex: I will often try to think about representing a topic in a different kind of space or world. For examples if I create an image for a topic in a 2D space, the next time I will try to figure out how I could represent that same concept in a 3D space or from a different perspective.

What is one of your favorite projects you’ve worked on at DO thus far?

Pat: I worked on a series of illustrations for our Employee Handbook, which meant drawing a team of cute sea creatures in an office setting. I really enjoyed working on that project, and it was great to see people respond to the illustrations in such a positive way.

Masami: My favorite projects are often also the most challenging ones. And usually the more ambitious they are, the more compromises on vision I’ve had to make. But some of the most exciting stuff I’ve worked on here is the art direction and design of our office spaces, in collaboration with architects, fabricators, and our People team. I was expected to transform the space into a branded and navigable experience. It’s still a work in progress, but I love the challenge of designing for physical spaces.

Meet the DigitalOcean Brand Design TeamMurals by Alex Mostov & Masami Kubo

What was one of the most challenging projects you’ve worked on at DO?

Kasia: Redesigning the DO logo was definitely the biggest challenge for me. The process was pretty high pressure but I was allowed enough time to really let myself explore and dig in deep. In this case having a supportive team to brainstorm and keep motivation high through all of the iterations was essential.

Masami: We did a design refresh of the marketing site a year ago, and it went through a lot of changes and push backs. The task was simple—refresh the designs and clean up the performance—but it involved approval from every department and stakeholder in the company. I was doing everything from art direction, web design layouts, and spot illustration. I learned a ton about project management and designing within web accessibility standards, thanks to Una Kravets. I felt creatively drained after the project was finished, and didn’t think it would be possible to revisit it with new ideas. Surprisingly, I am now leading a complete design overhaul for the marketing site, and I feel more equipped than ever to tackle all the challenges and make something more beautiful and smart than last year.

Meet the DigitalOcean Brand Design Team

Sometimes you create visual assets that are targeted at a very specific audience, and you have to balance things like humor with cultural sensitivities. How does localization factor into your designs?

Masami: Part of our job is being aware and sensitive to any imagery that might have harmful or negative impacts to our community. We are fortunate to have a diverse employee base that cares about these things, so the more opinions we can gather, the better. We try to treat branding the same in any other countries as we do here. However, we do want to highlight our growing global coverage, so one way we approach this is to celebrate the unique design culture local to these countries. For example, the Frankfurt datacenter launch campaign featured designs inspired by Bauhaus Constructivist design. For the Bangalore datacenter launch, we created stylized renditions of local architecture. Being a developer from another country doesn’t necessarily mean you have vastly different tastes or interests, so it’s important for companies and designers to address these things authentically.

How do you create different kinds of content while maintaining brand consistency?

Kasia: For illustrations, we keep a consistent color palette. We have a list of prompts to help us throughout the process, but we do not have a very strict style guide when it comes to editorial illustration. We tend to have more fun and variation with all of our community and conference designs. However, we are definitely more strict about stylistic consistency when it comes to our website design.

Like much of DO, the Brand Design team is distributed across the world. What systems or processes do you have in place that allow for open communication and collaboration?
Pat: One of our team members, Kasia, is based in Poland, so we have a time difference of six hours between us. We started to make a habit of doing our daily stand ups and critiques early in the day to make sure we were all able to benefit from them. We have a private Slack channel which we use to stay in contact, to brainstorm, and to share ideas on projects.

Where do you see the DO brand going?

Masami: When I first joined DigitalOcean in 2014, the company was breaking into the cloud computing world by differentiating itself as friendly and accessible. At the time that meant being extra illustrative and bubbly with our designs. We wanted to let the developer community know that their content and culture deserves this kind of attention. That attitude and core value is still what drives every decision, but our aesthetics have matured and evolved just as our products and features have grown. The brand now has a diverse voice ranging from playful and young to mature and sophisticated, all under the same goal of enabling the developer community. I think this range directly reflects the diversity of users we want to speak to.

Alex: I really like DO’s brand evolution because I feel like the changes are made based on need and effectiveness rather than just trying to make a splash. I think the brand will continue to change in this deliberate way as the community and product develop. I also hope it will always maintain the sense of playfulness that I think makes DO special.

What is your best advice for designers just starting out?

Pat: I would encourage aspiring creative folks of any stripe to always stay curious (as cliched as it may sound, it’s advice I’ve followed that I feel has served me well) and seek out inspiration from a range of sources (museums, books, online communities, whatever floats your boat!), because you never know what’s going to be the seed that becomes the root of a fantastic idea. Feeding your mind will give you perspective and enrich your work.

That said, don’t wait around for inspiration to strike, either! It’s best not to be too precious about your work. Just sit down, make the thing, and make it to suit your standards. Then, when you think it’s done, work on it just a little bit more. Keep learning, and push yourself a bit more with each new project.

Do you enjoy our designers' creations? Download desktop wallpapers from some of their favorite illustrations.

Flickr2Piwigo 1.3.0

Published 19 Feb 2018 by Sam Wilson in Sam's notebook.

I thought I’d help out and try to update the Flickr2Piwigo plugin to support OAuth, but having done so I now seem to have become a maintainer of the thing. So that’s good. I’ve just released version 1.3.0.

I’ll try to see to all the outstanding bug reports (well, there’s only one at the moment). And then perhaps add some extra features (support for approximate dates? automatic downloading? download of other people’s photos?).

A neverending sentence

Published 19 Feb 2018 by in New Humanist Articles and Posts.

Britain’s approach to incarceration is coming under scrutiny. How did we get into this mess?

How to use toggleToc() in a MediaWiki installation

Published 18 Feb 2018 by lucamauri in Newest questions tagged mediawiki - Webmasters Stack Exchange.

I admin a wiki site running MediaWiki 1.29 and I have a problem collapsing the TOC on pages.

I would be interesting in keeping the Contents box, but loading the page with it collapsed by default.

It appears there is a simple solution here, but I fail to implement it and I have no idea where the error is, hopefully someone can help.

I integrated the code as explained and checked that MediaWiki:Common.js is used by the site.

During page rendering, I checked the Java code is loaded and executed, but it appears to fail because

ReferenceError: toggleToc is not defined

I also checked this page , but in the table there is a empty cell where it should be explained how to migrate toggleToc();. I am not even entirely sure it should be migrated.

Any help on this topic will be appreciated.



Updating Flickr2Piwigo

Published 17 Feb 2018 by Sam Wilson in Sam's notebook.

I’ve decided to try to bring the Flickr2Piwigo plugin up to date in order to support OAuth (Flickr’s old system of authentication was turned off in the middle of last year). I’ve been tinkering with getting the PhpFlickr library working properly lately (which is what Flickr2Piwigo uses to talk to Flickr), and although there’s lots more to do to it I’ve at least got the OAuth parts working (thanks to the lusitanian/oauth package). So now I’m going to add this to the Flickr2Piwigo.

There’s no support for Composer in Piwigo, so I’m not really sure how this is going to work. Probably some custom distribution-generation process; I’ll worry about that later. Hopefully we’ll not resort to committing vendor/.

Once this is working, I’ll go back to PhpFlickr and write some better documentation (probably Read The Docs) and fix up the caching system (it’s a bespoke oddity at the moment, that I think should be replaced with simple PSR-6 support).

How to use in Module:Asbox

Published 17 Feb 2018 by Rob Kam in Newest questions tagged mediawiki - Webmasters Stack Exchange.

Exporting Template:Stub from Wikipedia for use on non-WMF wiki, it transcludes Scribunto Module:Asbox which has on line 233:

' is a [[Wikipedia:stub|stub]]. You can help Wikipedia by [',

Substituting Wikipedia with magic word {{SITENAME}} doesn't work here. How to replace Wikipedia for the comparable Lua function, so that pages transcluding the stub template shows the local wiki name instead?

Feel the love for digital archives!

Published 15 Feb 2018 by Jenny Mitcham in Digital Archiving at the University of York.

Yesterday was Valentine's Day.

I spent most of the day at work thinking about advocacy for digital preservation. I've been pretty quiet this month, beavering away at a document that I hope might help persuade senior management that digital preservation matters. That digital archives are important. That despite their many flaws and problems, we should look after them as best we can.

Yesterday I also read an inspiring blog post by William Kilbride: A foot in the door is worth two on the desk. So many helpful messages around digital preservation advocacy in here but what really stuck with me was this:

"Digital preservation is not about data loss, it’s about coming good on the digital promise. It’s not about the digital dark age, it’s about a better digital future."

Perhaps we should stop focusing on how flawed and fragile and vulnerable digital archives are, but instead celebrate all that is good about them! Let's feel the love for digital archives!

So whilst cycling home (in the rain) I started thinking about Valentine's cards that celebrate digital archives. Then with a glass of bubbly in one hand and a pen in the other I sketched out some ideas.

Let's celebrate that obsolete media that is still in good working
order (against all odds)

Even file migration can be romantic..

A card to celebrate all that is great about Broadcast
WAV format

Everybody loves a well-formed XML file

I couldn't resist creating one for all you PREMIS fans out there

I was also inspired by a Library of Congress blog post by Abbie Grotke that I keep going back to: Dear Husband: I’m So Sorry for Your Data Loss. I've used these fabulous 'data loss' cards several times over the years to help illustrate the point that we need to look after our digital stuff.

I'm happy for you to use these images if you think they might help with your own digital preservation advocacy. An acknowledgement is always appreciated!

I don't think I'll give up my day job just yet though...

Best get back to the more serious advocacy work I have to do today.

Climate and nature: the spring 2018 New Humanist

Published 15 Feb 2018 by in New Humanist Articles and Posts.

Out now - our writers on climate change.

Email is your electronic memory

Published 14 Feb 2018 by Bron Gondwana in FastMail Blog.

From the CEO’s desk.

Sometimes you write planned blog posts, sometimes events in the news are a prompt to re-examine your values. This is one of those second times.

Gmail and AMP

Yesterday, Google announced that Gmail will use AMP to make emails dynamic, up-to-date and actionable. At first that sounds like a great idea. Last week’s news is stale. Last week’s special offer from your favourite shop might not be on sale any more. The email is worthless to you now. Imagine if it could stay up-to-date.

TechCrunch wrote about AMP in Gmail and then one of their columnists wrote a followup response about why it might not be a good idea – which led to a lot of discussion on Hacker News.

Devin used the word static. In the past I have used the word immutable. I think “immutable” is more precise, though maybe less plain and simple language than “static” – because I don’t really care about how dynamic and interactive email becomes – usability is great, I’m all in favour.

But unchanging-ness... that’s really important. In fact, it’s the key thing about email. It is the biggest thing that email has over social networking or any of the hosted chat systems.

An email which is just a wrapper for content pulled from a website is no longer an unchangeable copy of anything.

To be totally honest, email already has a problem with mutability – an email which is just a wrapper around remotely hosted images can already be created, though FastMail offers you the option of turning them off or restricting them to senders in your address book. Most sites and email clients offer an option to block remote images by default, both for privacy and because they can change after being delivered (even more specifically, an email with remote images can totally change after being content scanned).

Your own memory

The email in your mailbox is your copy of what was said, and nobody else can change it or make it go away. The fact that the content of an email can’t be edited is one of the best things about POP3 and IMAP email standards. I admit it annoyed me when I first ran into it – why can’t you just fix up a message in place – but the immutability is the real strength of email. You can safely forget the detail of something that you read in an email, knowing that when you go back to look at it, the information will be exactly the same.

Over time your mailbox becomes an extension of your memory – a trusted repository of history, in the way that an online news site will never be. Regardless of the underlying reasons, it is a fact that websites can be “corrected” after you read them, tweets can be deleted and posts taken down.

To be clear, often things are taken down or edited for good reasons. The problem is, you can read something online, forward somebody a link to it or just go back later to re-read it, and discover that the content has changed since you were last there. If you don’t have perfect memory (I sure don’t!) then you may not even be sure exactly what changed – just be left with a feeling that it’s not quite how you remember it.

Right now, email is not like that. Email is static, immutable, unchanging. That’s really important to me, and really important to FastMail. Our values are very clear – your data belongs to you, and we promise to be good stewards of your data.

I'm not going to promise that FastMail will “never implement AMP” because compatibility is also important to our users, but we will proceed cautiously and skeptically on any changes that allow emails to mutate after you’ve seen them.

An online datastore

Of course, we’re a hosted “cloud” service. If we turned bad, we could start silently changing your email. The best defence against any cloud service doing that is keeping your own copies, or at least digests of them.

Apart from trusting us, and our multiple replicas and backups of every email, we make it very easy to keep your own copies of messages:

  1. Full standards-compliant access to email. You can use IMAP or POP3 to download messages. IMAP provides the triple of “foldername / uidvalidity / uid” as a unique key for every message. Likewise we provide CalDAV and CardDAV access to the raw copies of all your calendars and contacts.

  2. Export in useful formats. Multiple formats for contacts. Standard ICS files for calendars and it’s rather hidden, but at the bottom of the Folders screen, there’s a link called “Mass delete or remove duplicates” and there’s a facility on that screen to download entire folders as a zip file as well.

  3. Working towards new standards for email. Our team is working hard on JMAP and will be participating in a hackathon at IETF in London in March to test interoperability with other implementations.

  4. We also provide a DIGEST.SHA1 non-standard fetch item via IMAP that allows you to fetch the SHA1 of any individual email. It’s not a standard though. We plan to offer something similar via JMAP, but for any attachment or sub-part of emails as well.

Your data, your choice

We strongly believe that our customers stay with us because we’re the best, not because it’s hard to leave. If for any reason you want to leave FastMail, we make it as easy as possible to migrate your email away. Because it’s all about trust – trust that we will keep your email confidential, trust that we will make your email easy to access, and trust that every email will be exactly the same, every time you come back to read it.

Thank you to our customers for choosing us, and staying with us. If you’re not our customer yet, please do grab yourself a free trial account and check out our product. Let us know via support or twitter, whether you decide to stay, and particuarly if you decide not to! The only thing we don’t want to hear is “it should be free” – we’re not interested in that discussion, we provide a good service and we proudly charge for it so that you are our customer, not our product.

And if you’re not ready to move all your email, you can get a lot of the same features for a whole group of people using Topicbox – a shared memory without having to change anything except the “To:” line in the emails you send!



Make a Lasting Impact with "Write for DOnations"

Published 14 Feb 2018 by Mark Drake in The DigitalOcean Blog.

Make a Lasting Impact with

“Our community is bigger than just us” — As DigitalOcean (DO) employees, we aim to keep this value at the front of our minds in all our work. Since the company was founded in 2012, we’ve worked hard to build a vibrant, engaging Community where everybody from beginners to professionals can learn from one another about working in the cloud.

It’s important to us that the Community emulates the best that tech has to offer by serving as a welcoming place where members can share their ideas and experiences. This is what led us to introduce the Write for DigitalOcean program. Write for DO gives Community members an opportunity to build their brand, develop their writing skills, and get paid for contributing to DigitalOcean’s collection of tutorials on open-source software deployment, configuration, and development.

We’re always looking for new ways to give back to the Community. To that end, we’re excited to announce some updates to the Write for DigitalOcean program and reintroduce it as “Write for DOnations” (currently in beta — the full program launch is coming later this year).

There are two main changes that we are excited to share:

The Write for DOnations beta program will follow the same editorial structure as Write for DO:

At the end of this review process, the author’s tutorial will be published on the Community website and they will receive their payout. The author will then get to choose the nonprofit(s) that will receive their matching donation. Donations will be processed through Bright Funds, and authors’ donations can either go to a single tech-focused nonprofit or be evenly split between a group of nonprofits that share similar missions. Please note that the charitable contributions made by DigitalOcean through this program are not tax-deductible to the authors.

Since its launch, the Write for DigitalOcean program has allowed authors to share their diverse technical knowledge with the world while also improving their writing skills and growing their personal brand. Our team is always on the lookout for fresh content our community will love. To get a sense of which tutorial topics we’re particularly interested in, take a look at our suggested topics page.

Although Write for DOnations is still in development, we’re excited to help our Community authors make a real impact by donating to fantastic organizations that are working to shape the world of tech for the better.

We are actively seeking feedback to inform the full release of the the new Write for DOnations program. Check out the program’s FAQ page for more details, and please share any questions or comments about the Write for DOnations beta launch in the comments below or reach out to us directly at

Import Template:Note on a new MediaWiki wiki

Published 13 Feb 2018 by Andwari in Newest questions tagged mediawiki - Stack Overflow.

I have set up a MediaWiki wiki and now I want to include a Template called "note" (Templatelink). I have used the Special:Export page and imported it. But there is clearly something wrong:enter image description here The tags seem to work, but the pictures not. Also the translation extension, but I guess thats not the point. How do I get the files for this template? Why aren't the exported too. Am I missing important dependencies? And if yes: How do I know which ones a template needs?

The Deep End Podcast Ep #13: From Prototype to Internet of Things with Muzzley

Published 13 Feb 2018 by Hollie Haggans in The DigitalOcean Blog.

The Deep End Podcast Ep #13: From Prototype to Internet of Things with Muzzley

A vision, a small prototype, and a PowerPoint presentation: that’s how Muzzley, a platform for interacting between Internet of Things (IoT) devices, was born three years ago. Today the Muzzley team works to solve a pain point for smart home consumers: managing their IoT devices from one interface, with minimum hassle. But they also place importance on transparency, privacy, and protecting their customers’ data.

In this episode, Muzzley co-founders, Domingo Bruges and Sasha Dewitt, discuss how Muzzley’s tech stack evolved to support a product that integrates with different vendors. They share insight into how they manage the data generated by consumer IoT devices, and how they approach consumer privacy and data production.

Subscribe to the The Deep End Podcast on iTunes, and listen to the latest episode on SoundCloud below:

Hollie Haggans heads up Global Partnerships for DigitalOcean’s Hatch program. She is passionate about startups and cold brew coffee. Get in touch with questions at

Why do I get error message on wikipedia api?

Published 10 Feb 2018 by Jana in Newest questions tagged mediawiki - Stack Overflow.

Can someone tell me what is wrong this code?

  "dataType": "jsonp",
    "action": "opensearch",
    "format": "json",
    "search": "new york",
    "namespace": "0",
    "limit": "3",
    "formatversion": "1",

  success: function(response){

Why do I get the following error message?

Refused to execute script from '' because its MIME type ('text/html') is not executable, and strict MIME type checking is enabled.

Thank you.

MySQL socket disappears

Published 9 Feb 2018 by A Child of God in Newest questions tagged mediawiki - Server Fault.

I am running Ubuntu 16.04 LTS, with MySQL server for MediaWiki 1.30.0 along with Apache2 and PHP7.0. The installation was successful for everything, I managed to get it all running. Then I start installing extensions for MediaWiki. Everything is fine until I install the Virtual Editor extension. It requires that I have both Parsoid and RESTBase installed. So I install those along side Virtual Editor. Then I go to check my wiki and see this message (database name for the wiki is "bible"):

Sorry! This site is experiencing technical difficulties.

Try waiting a few minutes and reloading.

(Cannot access the database: Unknown database 'bible' (localhost))


#0 /var/www/html/w/includes/libs/rdbms/loadbalancer/LoadBalancer.php(1028): Wikimedia\Rdbms\Database->reportConnectionError('Unknown databas...')

#1 /var/www/html/w/includes/libs/rdbms/loadbalancer/LoadBalancer.php(670): Wikimedia\Rdbms\LoadBalancer->reportConnectionError()

#2 /var/www/html/w/includes/GlobalFunctions.php(2858): Wikimedia\Rdbms\LoadBalancer->getConnection(0, Array, false)

#3 /var/www/html/w/includes/user/User.php(493): wfGetDB(-1)

#4 /var/www/html/w/includes/libs/objectcache/WANObjectCache.php(892): User->{closure}(false, 3600, Array, NULL)

#5 /var/www/html/w/includes/libs/objectcache/WANObjectCache.php(1012): WANObjectCache->{closure}(false, 3600, Array, NULL)

#6 /var/www/html/w/includes/libs/objectcache/WANObjectCache.php(897): WANObjectCache->doGetWithSetCallback('global:user:id:...', 3600, Object(Closure), Array, NULL)

#7 /var/www/html/w/includes/user/User.php(520): WANObjectCache->getWithSetCallback('global:user:id:...', 3600, Object(Closure), Array)

#8 /var/www/html/w/includes/user/User.php(441): User->loadFromCache()

#9 /var/www/html/w/includes/user/User.php(405): User->loadFromId(0)

#10 /var/www/html/w/includes/session/UserInfo.php(88): User->load()

#11 /var/www/html/w/includes/session/CookieSessionProvider.php(119): MediaWiki\Session\UserInfo::newFromId('1')

#12 /var/www/html/w/includes/session/SessionManager.php(487): MediaWiki\Session\CookieSessionProvider->provideSessionInfo(Object(WebRequest))

#13 /var/www/html/w/includes/session/SessionManager.php(190): MediaWiki\Session\SessionManager->getSessionInfoForRequest(Object(WebRequest))

#14 /var/www/html/w/includes/WebRequest.php(735): MediaWiki\Session\SessionManager->getSessionForRequest(Object(WebRequest))

#15 /var/www/html/w/includes/session/SessionManager.php(129): WebRequest->getSession()

#16 /var/www/html/w/includes/Setup.php(762): MediaWiki\Session\SessionManager::getGlobalSession()

#17 /var/www/html/w/includes/WebStart.php(114): require_once('/var/www/html/w...')

#18 /var/www/html/w/index.php(40): require('/var/www/html/w...')

#19 {main}

I checked the error logs in MySQL, and the error message said that the database was trying to be accessed without a password. I restarted my computer and restarted Apache, Parsoid, RESTBase, and MySQL. But I could not successfully restart MySQL. The error log by typing the command journalctl -xe and saw that it failed to start because it couldn't write to /var/lib/mysql/. I went to StackExchange to see if I could a solution, and one answer said to use the command mysql -u root -p. I did and typed in the password and it gave this error:

ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)

I also check the status of it by typing sudo mysqladmin status which said:

mysqladmin: connect to server at 'localhost' failed error: 'Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)' Check that mysqld is running and that the socket: '/var/run/mysqld/mysqld.sock' exists!

I wanted to verify that it existed, but upon browsing to the location of the socket, I found it was not there. I saw an answer about a missing MySQL socket which said to use the touch command to create the socket and another file. I did it and still had the same issues. I went back to the directory and found the two files to be missing. So I created them again with the touch command and watched the folder to see what happens. After about half a minute, the folder seems to be deleted and recreated. I get kicked out of the folder into it's parent directory and when I go back in the files are gone.

Does anybody know why this is happening, or at least how I can fix this and get MySQL back up and running?

Error when trying to access index.php in a mediawiki server

Published 7 Feb 2018 by jlvale in Newest questions tagged mediawiki - Stack Overflow.

I'm running an apache server and a mysql db with docker composer. I do have connection with the server but i keep getting this error when i try to access index.php (so i can configure my MediaWiki page):

Fatal error: Uncaught Error: Call to a member function getCode() on null in /var/www/html/includes/user/User.php on line 1578

I've checked User.php but everything seems fine.

error i get in the browser

    $defOpt = $wgDefaultUserOptions;
    // Default language setting
    $defOptLang = $wgContLang->getCode();
    $defOpt['language'] = $defOptLang;
    foreach ( LanguageConverter::$languagesWithVariants as $langCode ) {
        $defOpt[$langCode == $wgContLang->getCode() ? 'variant' : "variant-$langCode"] = $langCode;

Someone can give me a hand here? Thanks

Page-specific skins in MediaWiki?

Published 7 Feb 2018 by Alexander Gorelyshev in Newest questions tagged mediawiki - Webmasters Stack Exchange.

Is there a way to force a particular skin to be applied while displaying specific MediaWiki articles?

In my wiki many articles will have a "flip" version with alternative content (think "good" and "evil" perspectives of the same topic). I was thinking about using namespaces to separate these versions, but I need a definitive way to visually contrast them.

Episode 1: Cindy Cicalese

Published 6 Feb 2018 by Yaron Koren in Between the Brackets: a MediaWiki Podcast.

Cindy Cicalese was a Principal Software Systems Engineer at MITRE for many years, where, among other things, she oversaw the creation and maintenance of over 70 MediaWiki installations, as well as development of many MediaWiki extensions. Last year she joined the Wikimedia Foundation as Product Manager for the MediaWiki Platform team.

Links for some of the topics discussed:

A Practical Droplet Performance Comparison

Published 6 Feb 2018 by Reynold Harbin in The DigitalOcean Blog.

A Practical Droplet Performance Comparison

Benchmarks are a common way to measure and compare the performance of cloud compute servers. While standardized benchmarks are useful for establishing a consistent, broad set of comparison metrics, it can be useful and more practical to compare the performance of the actual tasks you run most often on your servers as well.

For example, how much time could you save when running your app's automated test scripts if you used a more powerful cloud server?

We compared the performance of Standard and Optimized Droplets when doing just this. Specifically, we used the basic React Boilerplate app, which includes a comprehensive set of testing scripts covering 99% of the project. Because the tests are CPU-intensive, we chose test execution time as our comparison metric for the two different Droplet configurations.

Server Setup and Testing Methodology

For the default environment, we used a Standard $40 Droplet, which is configured with 4 vCPUs (Intel Xeon CPU E5-2650L v3 @ 1.80GHz), 8GB of RAM, and 160GB of SSD storage.

For the comparison environment, we used an Optimized $40 Droplet, which is configured with 2 dedicated vCPUs (Intel Xeon CPU E5-2697A v4 @ 2.60GHz), 4GB of RAM, and 25GB of SSD storage.

Both Droplets were running Ubuntu 16.04, and we set both up using the following procedure.

After initial setup to create a non-root user and basic firewall, we verified the CPU architecture using lscpu. We installed Node.js using the PPA to get a recent version of Node.js that includes npm, the Node.js package manager, which we needed to execute the test scripts. Finally, we installed React Boilerplate by cloning the react-boilerplate repository and running npm run setup to install its dependencies.

At this point, we had everything we needed to run the tests. To measure the time it takes to execute them, we used the utility program time, which summarizes the time and system resource usage for a given program command.

As a baseline, we first compared Droplet performance when running React Boilerplate's test suite with its default settings using time npm test.

Because npm uses a test framework that can use all available processors, we also ran a single CPU comparison to better understand the impact of CPU on performance. For the single CPU comparison, we ran time npm test -- --runInBand to force all of the automated tests to run sequentially. This test is relevant for applications that are not designed to use multiple CPUs, where a more powerful processor can improve performance.

Additionally, we found that setting the number of worker nodes to match the number of vCPUs on the server yielded the fastest overall test execution time, so we compared the best case setup on both servers as well. For the vCPU-specific comparison, we ran time npm test -- --maxWorkers=4 for the Standard Droplet (which has 4 vCPUs) and time npm test -- --maxWorkers=2 for the Optimized Droplet (which has 2 vCPUs).

We ran each of these tests five times on each server to look at the average execution time over a larger sample size.

So, how did the Standard and Optimized Droplets perform?


Here's an example (truncated for length) of the output from time npm test on the Optimized Droplet:

> react-boilerplate@3.5.0 pretest /home/perfaccount/react-boilerplate
> npm run test:clean && npm run lint

 PASS  app/containers/App/tests/index.test.js
 PASS  app/containers/LocaleToggle/tests/index.test.js
 PASS  app/containers/HomePage/tests/actions.test.js

Test Suites: 76 passed, 76 total  
Tests:       289 passed, 289 total  
Snapshots:   4 passed, 4 total  
Time:        14.725s, estimated 33s  
Ran all test suites.  
File                             |  % Stmts | % Branch |  % Funcs |  % Lines |Uncovered Lines |  
All files                        |      100 |      100 |      100 |      100 |                |  
 app                             |      100 |      100 |      100 |      100 |                |
  configureStore.js              |      100 |      100 |      100 |      100 |                |
  sagaInjectors.js               |      100 |      100 |      100 |      100 |                |

real    0m22.380s  
user    0m23.512s  
sys    0m0.884s  

The output we’re interested in is real time, which is the actual elapsed wall-clock time it took to execute the tests. In this example, the test script completed in 22.380 seconds.

These are our results showing the average execution time across multiple runs:

A Practical Droplet Performance Comparison

The Optimized Droplet outperformed the Standard Droplet in all tests, but as we explain in the next section, this isn't the only factor to consider when choosing the right configuration for your use case.


When comparing cloud servers with the goal of optimizing price-to-performance and resources, it's important to test the applications that you plan to run on the server in addition to comparing standard benchmarks.

In measuring the execution times of the react-boilerplate project's automated tests, our results showed a small improvement of 4.9% when using a $40 Optimized Droplet compared to a $40 Standard Droplet. For applications that perform similarly and do not take full advantage of all CPUs, choosing the $40 Standard Droplet may be a better choice because of its additional memory (8GB vs 4GB) and larger SSD (160GB vs 25GB).

However, the Optimized Droplet executed 37.3% faster when running the tests sequentially. For compute-intensive applications that use a single vCPU, this difference may be significant enough to choose the Optimized Droplet for the same price as the Standard Droplet.

If your application can run in a clustered mode with a specific number of CPU resources, you may be able to optimize price to resources by using a Standard Plan with more CPU, RAM and SSD versus a lower number of higher powered CPUs. We saw the best performance on both Droplets when we set the number of application instances to match the number of available vCPUs, where Optimized Droplets still outperformed Standard Droplets by a significant 21.7%, though the additional RAM and SSD in Standard Droplets may be preferable.

The tests performed in this article are not designed to be comprehensive, but are tailored to the types of applications that typically consume time and CPU resources. To maximize price-to-performance and resources for your applications, you can test various Droplet configurations and measure execution times of the typical jobs you place on your servers.

Test Droplets for Your Apps

How do i edit Login Required Page [closed]

Published 5 Feb 2018 by jehovahsays in Newest questions tagged mediawiki - Webmasters Stack Exchange.

On my private MediaWiki view & read is set to false.
My website visitors would see
Please Login to view other pages.
What needed to do was edit the login link located in this error message.

Mediawiki - read text from a string

Published 5 Feb 2018 by SuperCiocia in Newest questions tagged mediawiki - Stack Overflow.

I have a managed wiki based on Mediawiki.

The name of a typical page is Lab book 2018/01, and I want to extract the year (2018) and the month (02).

I can use #titleparts, but the issue is that it uses / as delimiters, so it returns either Lab book 2018, or 02.

How would I extract the year? i.e. just text from a string?

Mediawiki - what is the differnece between If and #if?

Published 4 Feb 2018 by SuperCiocia in Newest questions tagged mediawiki - Stack Overflow.

What is the difference between a

{{If| ... }}


{{#if| ... |}}

in MediaWiki?

Which one should I use?

Mediawiki templates - return month number with a zero in front?

Published 4 Feb 2018 by SuperCiocia in Newest questions tagged mediawiki - Stack Overflow.

I need a link to a page 2018/02 where 02 is the month number.

I have


but this only returns 2.

I am also using it with a random month (that is not the current one):


where I changed the MONTHNUMBER template so that it returns months with a 0 in front, but then the addition/subtraction of the 1 makes it go back to a single digit month number...

Any pointers?

Edit Page Section based on Role

Published 3 Feb 2018 by GoldBishop in Newest questions tagged mediawiki - Stack Overflow.

Is there a way to assign a section of a page to a Group for editing?

This is a hypothetical concept for implementation

For example

I also have a scenario where I want to have a Data driven section to show the "facts" without interpretation and would need to isolate that information from any non-Bot from editing.

Link to previous day in MediaWiki?

Published 2 Feb 2018 by SuperCiocia in Newest questions tagged mediawiki - Stack Overflow.

I have a university-managed wiki with the basic templated from MediaWiki, but I have been adding more copying them from the Wikipedia templates.
We use it as a lab book so there is a page per day, e.g. 2018/02/02.

I want to write a link to the previous day, i.e. yesterday, 2018/02/01.

I tried:


but this returns 2018/02/1.

Querying MediaWiki API for links on a page does not return all links

Published 2 Feb 2018 by d_stewart in Newest questions tagged mediawiki - Stack Overflow.

I am looking to get the links on this WikiQuote page. I would like to see what subcategories exist under the 'Fundamental' category. They appear as links on the page, so it seemed natural to ask the API for the links. I only get back the "Category schemes" and "Main page" links, which exist in the introduction. What am I doing wrong/what have I misunderstood here?


function httpGetAsync(theUrl, callback){
    xmlHttp = new XMLHttpRequest();
    xmlHttp.onreadystatechange = function() { 
        if (xmlHttp.readyState == 4 && xmlHttp.status == 200){
    }"GET", theUrl, true); // true for asynchronous

function callback(json_response){
    stuff = json_response;
    console.log(JSON.stringify(JSON.parse(json_response), null, 2));

httpGetAsync('*&format=json', callback);


  "batchcomplete": "",
  "query": {
    "pages": {
      "4480": {
        "pageid": 4480,
        "ns": 14,
        "title": "Category:Fundamental",
        "links": [
            "ns": 4,
            "title": "Wikiquote:Category schemes"
            "ns": 14,
            "title": "Category:Main page"

Intro links returned, subcategory links are not

How To Code in Python: A DigitalOcean eBook

Published 1 Feb 2018 by Lisa Tagliaferri in The DigitalOcean Blog.

How To Code in Python: A DigitalOcean eBook

We have always been community-focused at DigitalOcean. On our Community site, we offer a few ways that developers can connect with each other, through sharing projects, learning about meetups, or answering questions. Additionally, we have over 1,800 technical tutorials, written by both external community members and internal technical writers, that have been designed to support the learning pathways of software engineers and system administrators as they develop their skills and scale their projects.

Since joining the DigitalOcean Community team, I have focused on curriculum development and technical writing related to Python software development. Today, I am happy to share that we are repackaging the “How To Code in Python 3” tutorial series as an eBook that can serve as both a teaching tool for beginners and a point of reference for more seasoned developers.

Our goal in making this tutorial series available in an eBook format is to facilitate access to this educational content. This is especially significant for people with limited internet access, long commutes without wifi, or who primarily access written material from mobile devices. Our hope is that the people who will benefit from this eBook will become more knowledgeable about how coding works, and thereby increase the number of technology stakeholders, decision makers, and knowledge producers who can work to build better software for everyone. By offering a new format of this content, we would like to drive engagement with and interest in software development across broader and more diverse communities.

Creating an eBook

This eBook project came about during a DigitalOcean company-wide Hackathon. Hackathons offer a great environment to test out projects that teams have been thinking about taking on, but have not been able to devote the time and resources to during a regular work week. Our team, which we nicknamed Bookworms, consisted of Brian Boucheron (Technical Writer), Kasia Bojanowska (Senior Visual Designer), and myself.

Brian was our eBook developer. He used pandoc, GNU Make, and Perl scripting to automate the eBook creation process from the original tutorial markdown. For some final stylistic choices, he has done some hand crafting along the way, but has worked to ensure that the eBook can be read as its user desires across devices. We intend to release relevant source code in a repository for others to extend and modify.

Kasia has done a lot of the design work that sets DigitalOcean’s tutorials and brand apart, and has conceived of a new vibrant cover for this eBook. Designs and imagery that invite readers in are an instrumental element of book conception, and Kasia’s dynamic image inspires curiosity and playfulness.

Since the Hackathon, I have worked to ensure that this eBook is made publicly available from major eBook distributors, is catalogued in libraries, and made available as an open educational resource in schools and universities.

What Is an Open Educational Resource?

Open educational resources (OERs) are texts or digital assets that can be used for teaching, learning, and research. What is significant about them is that they are openly accessible and openly licensed. At DigitalOcean, we use a Creative Commons License on all of our tutorials so that others can freely translate our technical content to other languages to encourage learning.

Each version of the eBook that is made publicly available will have a separate ISBN in order to facilitate access to the book. I have been working with the librarians at the City University of New York’s Brooklyn College and Graduate Center in order to catalogue the eBook and make it available for students as an open educational resource. If you would like to see this eBook in your library, share this WorldCat link with your local librarian.

By having this eBook available in libraries and within OER repositories, more students will be able to access computer programming learning material without having to pay textbook prices for that privilege.

We hope that readers who learn from or reference this eBook will be empowered to make their own contributions to open-source code via software and documentation pull requests or repository maintenance. Our community is bigger than just us, and building software together can make sure that everyone has an opportunity to participate in the technology we use every day.

You can now download the free eBook in one of the following formats:

Lisa Tagliaferri is the manager of Community Content at DigitalOcean. In addition to writing about Python, Lisa helps people find solutions through technology and infrastructure. Holding a PhD from CUNY, Lisa has a continued interest in interdisciplinary research, and is committed to community building through education. Find her on Twitter @lisaironcutter.

Cleaning up your inbox

Published 31 Jan 2018 by David Gurvich in FastMail Blog.

With email forming such a big part of our life it’s possible you had a New Year’s resolution to clean up your inbox.

Perhaps you spent last year, or even previous years, at the mercy of your unruly inbox? Or maybe you’ve come back to your email account after some time off and been overwhelmed with cleaning out all those emails.

Putting aside any regular email blasts from friends or family (read on for how to manage that), it’s likely that a lot of your inbox spam or clutter is from marketing lists you have signed up to.

What once seemed like an invitation too good to ignore might now be taking over your email life, so that every time you visit your inbox you’re confronted with more and more emails.

Types of unwanted email

Unwanted email may come in several forms and can include:

  1. Marketing lists - from retailers and organisations.
  2. Social media notifications – linked to an account you’ve already set up.
  3. Spam – communication from people you have no prior relationship with.

So let’s take a look at each of those kinds of unwanted mail in more detail and the best way to keep their effect on your inbox to a minimum.

1. Marketing lists

Imagine you signed up to a marketing list some years ago for a particular retailer. Maybe at a certain period in time you were really interested in throw pillows. But in the intervening years you’ve forgotten about ever signing up to this list and wondering why your inbox keeps filling up with offers on something you don’t want, featured within emails you don’t want to receive. Now you simply find these emails annoying – and consider them to be spam.

Unsubscribe from a list

So how do you stop receiving all of those throw pillow emails? Well, rather than using the 'Report Spam' button the best thing to do is to manually unsubscribe from the list you once signed up to.

Most lists by law should have an unsubscribe link included somewhere within the body of the email; often this is located on the footer. If you can't see an unsubscribe link you may need to contact the sender directly to request removal.

Find lists

There are a few ways you can audit your inbox for lists. The first is to use the 'Mailing lists' tab button. (Note that this is not visible if your screen layout is configured to use the reading pane.)

Image of
filter buttons in the UI with the mailing list button selected

You can click on this to quickly filter your inbox by senders. Then you can go through and decide what you want to keep and what you want to unsubscribe from.

The other way to find a known list is to use our search toolbar and look for it by name.

2. Social media notifications

These days there seems to be a never-ending list of social media platforms to use. Most of us would be aware of, or likely use, some or all of the biggest platforms such as Twitter, Facebook and LinkedIn.

And while social media can be great for staying in touch and promoting your business, notifications are often linked to the email address you set up your account with.

At times this can be convenient, however as these platforms continue to evolve you might find you have endless social media notifications taking over your inbox too.

Switching off notifications at the source

The good thing is that these notifications can be turned off, or managed, directly from the user settings for each individual social media platform you are using.

Visiting the ‘Settings’ or ‘Help’ menu of any social media platform you use should give you step-by-step instructions on how to control what gets sent to your inbox.

3. Spam

At FastMail we define spam as unsolicited mail sent to a large number of users in an attempt to get you to click on a link to buy a product, market an otherwise dubious service, scam you out of some money or install a virus or malware on your computer.

We’re often asked why would you keep receiving certain emails if they had previously been marked as spam?

For example, you may have previously received email you consider to be spam and decide to report the sender as spam using the 'Report Spam' button. However, some days later you find another email from the same sender in your inbox, rather than automatically being moved to your Spam folder upon delivery.

There are a few reasons for this. The first is that at some stage you likely consented to receiving these emails (in some form) so that tells our systems you do want to receive these emails (and we’re all about making sure you receive your email).

The second reason is to do with how our spam filters work. You can choose a range of settings to ensure spam filtering works the best for your needs. We’ve talked about this previously, but essentially you train the spam filter.

Everybody's spam is different. When you report spam that's slipped through our filters, or non-spam that we've mistakenly classified, we feed this information into a database that's tuned just for you. We also automatically train this with spam you've deleted permanently from your spam folder, and non-spam you've moved to your Archive folder or replied to.

And while we never sell email addresses, nor disclose email addresses at our site to anyone else, there are other instances where unscrupulous marketers may have placed you on mailing lists you didn’t consent to – let’s just call them spammers – using a range of methods to spam you.

Taking action

FastMail gives you the power to control your inbox, using a range of features to manage which mail comes to you.

Block the sender

If you can't unsubscribe or switch off notifications, you can block a particular sender by setting up a rule to permanently discard a message upon delivery. We do recommend sending mail into a folder when first setting up the rule, because mail discarded in this way is gone forever: we can't even get it back from backups.

If you have lots of senders you want to block, add them to a group in your addressbook, then create a rule to discard or file mail from that group. You can also create a contact for a wildcard on the whole domain in this group: this will also block mail being sent from any address at that domain.

Mail you want to keep

If you want to never block certain senders, add them to your address book. This also means mail from these trusted senders bypass any spam checking. This might be a good option for online retailers you regularly use, making sure you receive any correspondence straight to your inbox.

Using rules to keep mail organised

Sometimes you still might want to receive email from particular senders but not have these messages taking over our inbox.

We recently wrote about organising your mail with rules and this is ideal for any correspondence that you still want but maybe not at the expense of your day-to-day inbox experience.

When you’re viewing a message you can use the 'More' button, then 'Add rule from Message…' option to directly create a new Rule for that particular mail. For example, you might send all mail from online retailers to a folder called ‘Purchases’.

image showing the Add Rule function when viewing a message in the FastMail web interface

Welcome to your streamlined inbox

So now, rather than waiting for your inbox to fill up and then manually batch-deleting every few weeks or months you can take back control today!

And whether you want to completely unsubscribe from lists or set up rules, the choice is up to you.

Either way, this year you may finally get to utter those words, “I finally unsubscribed from those throw pillow emails” making 2018 the year you bring more peace and control to your inbox.

My back to school free software toolkit

Published 30 Jan 2018 by legoktm in The Lego Mirror.

The 2018 spring semester started last Wednesday. I think I've set up a pretty good free software toolkit for a successful year:

Total software cost: $0

How to Turn Great Employees into Great Interviewers

Published 30 Jan 2018 by Olivia Melman in The DigitalOcean Blog.

How to Turn Great Employees into Great Interviewers

As a follow up to our last post on candidate experience, this post will explore how we’re impacting employees and candidates with our approach to optimizing the interview experience.

Good Interviewers Aren’t Born; They’re Made

In early September, we launched the DigitalOcean Sailor Certification Program, which consisted of a two-hour interactive training session on DO’s hiring processes and best practices for interviews. With our rapid growth comes rapid hiring, and we’ve recognized that the best way to scale, bring in amazing talent, keep the bar high, and continue to optimize for culture add is to have a consistent approach to how we hire.

Here are some of the things that were top of mind for us as we built out the interviewer training program:

Establishing a Process

How could we ensure that interviewers, managers, and even recruiters were following a consistent process that mirrored the efficiency of DO’s daily workflow and ensured timely, repeatable, and scalable hiring decisions? As a starting point, we worked closely with our executives to agree upon shared expectations for each stage within the recruitment process. This informed our creation of the DigitalOcean Recruiting Coordinates (because everything here must have a nautical pun), which is a playbook for all things hiring. This document quickly became the required pre-read for the Sailor Program itself.

Minimizing Unconscious Bias

We needed to coach and educate interviewers on ways to minimize unconscious bias, fairly evaluate candidates, and foster meaningful interview discussions to ultimately make excellent hiring decisions. Fortunately, our Talent Development team hosts Unconscious Bias training at team offsites and implemented a standalone training for all new hires in April of last year. In 2017 alone, the team hosted 22 sessions with roughly 200 attendees. During the Sailor Program, we discuss ways to minimize unconscious bias in the interview process and put these learnings into practice with mock interview activities. Often, the best way to minimize bias in the interview process is to ensure a consistent set of thoughtful interview questions and an equally consistent framework for measuring candidate aptitude, making sure we are fairly evaluating candidates on relevant and meaningful attributes. One of the many follow up resources to the Sailor Program is a living and breathing question bank housed on Google Drive with questions vetted and approved by the People team.

Providing a Great Candidate Experience

If you read my first post, you’ll know how passionate we are about providing candidates with a positive and meaningful experience as they explore joining DO. We think we have one of the best company cultures and employee experiences around, and we want that to be reflected in the interview process as well so candidates know what to expect (and get excited!). By improving the techniques and consistency of our approach to interviewing, we hope to create a more predictable and comfortable environment for our candidates.

Program Logistics

We’ve also tailored the program content for different populations to ensure we’re offering relevant best practices for each employee group (Remote, In-Office, Managers, Individual Contributors, and a blended "Refresh Program" for experienced interviewers). We’re requiring that all DO interviewers get “Sailor Certified” in order to conduct interviews moving forward. By completing the Sailor Program, our employees will be able to more quickly and effectively assess talent, make great hires, and maintain the positive candidate experience we’ve always strived for. Upon completion of the program, participants gain access to a dedicated Slack channel, in which certified sailors share success stories, interview wins, and relevant articles. Our Brand Design team even got involved in helping us make a dedicated Sailor Sammy, which, backed by popular demand, we had printed on sailor hats to garnish the heads and desks of our certified interviewers.

How to Turn Great Employees into Great InterviewersNew "Sailor Certified" DO Employees

In less than 4 months, we’ve put exactly 200 interviewers through the Sailor Program, and the internal reaction to the program has been overwhelmingly positive. We distribute a survey following each session to all participants. Here’s a snapshot of our results:

How to Turn Great Employees into Great Interviewers

Folks are proud to demonstrate new interview techniques and share success stories with their fellow interviewers in Slack:

“The STAR concept was one of my top takeaways from the training. Simple concept, but having it put explicitly was enlightening for me.” Cole Tuininga, Senior Engineer

“I'll be looking to redo our interview questions for support interviews, based on the advice given in the training.” Jarland Donnell, Team Lead, Customer Support

“I have been conducting interviews at DO since we were a single-digit number of engineers, and I learned quite a bit throughout the Sailor Program. Lots of takeaways on things to improve on going forward. ” Vaibhav Bhembre, Senior Software Engineer & Tech Lead, Spaces

We’ve evolved the program for 2018 based on feedback and with scalability in mind to best accommodate our continuous growth, and ensure new interviewers can get up to speed quickly. The program now consists of two modules: the first is a self-paced e-learning course, and the second, an instructor-led session with an even larger emphasis on practice in the classroom.

I’m thrilled to sit with a team that works tirelessly behind the scenes to ensure that anyone who interacts with DO—as a candidate, an employee or in the community at large—walks away with a positive experience. If you’re interested in becoming part of the DigitalOcean family, I encourage you to check out our open positions. Questions, thoughts, and feedback are always welcomed, so feel free to leave a comment below.

View Open Positions at DigitalOcean

Olivia joined DigitalOcean in March 2017 as the People team’s first Program Manager. She is heavily focused on automation and collaboration within the full-cycle recruitment process, strengthening external partnerships to promote DO’s employment brand, and leveraging data to drive Recruiting strategy.

KEXI 3.1.0 Beta & Frameworks

Published 29 Jan 2018 by jaroslaw staniek in blogs.

KDE Project:

Today is the release day for KEXI 3.1.0 Beta & its frameworks:

Since version 3 it becomes KEXI not Kexi to suggest becoming a standalone app. It's standalone status includes being first-class app also outside of KDE Plasma. To make this real things such as useful yet simple file widget are developed or single click mode is really single click mode "even" on XFCE. Actually implementing optimal experience for Windows is quite similar to supporting XFCE.

KEXI Frameworks are now prepared for backward compatibility rules within the series >=3.1. So I would encourage to try KProperty if you need powerful property editing features in your app in place of tedious Qt list or tree views. There's KPropertyExample in the same repository. Then there's KDb if you actually expect more (something low or high-level) than QtSql, that is also need to create database or SQLite-based documents, what seems to be very popular container in our times. Then try KReport if you want escape from generating (ODF/HTML/whatever) documents "by hand", or QPainting them by hand, just to be able to print your application's data in a structured way with nice title, header, footer. Try KReportExample to see KReport in action with "a few lines of code" app.

Finally, try KEXI to create designs of reports mentioned above, design data-driven apps (lots of features are missing before you're able to desing, say, a JIRA like app but it IS coming), integrate data, perform some analysis (again, lots of these features are not shipped as stability was, again, the goal).

I trust stability makes KEXI and its frameworks pretty competitive already. The codebase was tested with Coverity, works with gcc, clang, msvc. Critical parts are autotested much more than in the 2.9 or 3.0 times. Much of the 220 improvements since 3.0.2 is stability, usability or API fixes.

We expect stable release in one month. And here's one request: if you're packager or know one, please send link to available 3.1 packages so we can eventually have a Download page. If you are able to create AppImage or Flatpak packages or work on Craft support for Windows, or test existing source or binary packages once they are published, we are looking for help.

Wordpress header in Mediawiki

Published 26 Jan 2018 by gjergj.jorgji in Newest questions tagged mediawiki - Stack Overflow.

Is there any way I can integrate my Wordpress site header menu to my MediaWiki installed in a subdomain of the main site (the Wordpress site).

I have tried including Wordpress files in the skin template of MediaWiki like this:

$wp_dir = dirname(dirname(dirname(dirname(__FILE__))));

require($wp_dir . '/wp-load.php');

And then I gave tried getting the header via get_header() function (it seems in older versions of Wordpress and MediaWiki it works), but i get some errors:

    [c89fdfb57db1deb841232c5d] /test/wiki/index.php/Main_Page Error from line 313 of C:\wamp64\www\test\wp-includes\rest-api.php: Call to a member function using_index_permalinks() on null


#0 C:\wamp64\www\test\wp-includes\rest-api.php(375): get_rest_url(NULL, string, string)
#1 C:\wamp64\www\test\wp-content\plugins\contact-form-7\includes\controller.php(58): rest_url(string)
#2 C:\wamp64\www\test\wp-content\plugins\contact-form-7\includes\controller.php(37): wpcf7_enqueue_scripts()
#3 C:\wamp64\www\test\wp-includes\class-wp-hook.php(286): wpcf7_do_enqueue_scripts(string)
#4 C:\wamp64\www\test\wp-includes\class-wp-hook.php(310): WP_Hook->apply_filters(NULL, array)
#5 C:\wamp64\www\test\wp-includes\plugin.php(453): WP_Hook->do_action(array)
#6 C:\wamp64\www\test\wp-includes\script-loader.php(1435): do_action(string)
#7 C:\wamp64\www\test\wp-includes\class-wp-hook.php(286): wp_enqueue_scripts(string)
#8 C:\wamp64\www\test\wp-includes\class-wp-hook.php(310): WP_Hook->apply_filters(NULL, array)
#9 C:\wamp64\www\test\wp-includes\plugin.php(453): WP_Hook->do_action(array)
#10 C:\wamp64\www\test\wp-includes\general-template.php(2614): do_action(string)
#11 C:\wamp64\www\test\wp-includes\theme-compat\header.php(46): wp_head()
#12 C:\wamp64\www\test\wp-includes\template.php(688): require_once(string)
#13 C:\wamp64\www\test\wp-includes\template.php(647): load_template(string, boolean)
#14 C:\wamp64\www\test\wp-includes\general-template.php(41): locate_template(array, boolean)
#15 C:\wamp64\www\test\wiki\skins\vector\VectorTemplate.php(121): get_header()
#16 C:\wamp64\www\test\wiki\includes\skins\SkinTemplate.php(251): VectorTemplate->execute()
#17 C:\wamp64\www\test\wiki\includes\OutputPage.php(2442): SkinTemplate->outputPage()
#18 C:\wamp64\www\test\wiki\includes\exception\MWExceptionRenderer.php(135): OutputPage->output()
#19 C:\wamp64\www\test\wiki\includes\exception\MWExceptionRenderer.php(54): MWExceptionRenderer::reportHTML(InvalidArgumentException)
#20 C:\wamp64\www\test\wiki\includes\exception\MWExceptionHandler.php(75): MWExceptionRenderer::output(InvalidArgumentException, integer)
#21 C:\wamp64\www\test\wiki\includes\exception\MWExceptionHandler.php(130): MWExceptionHandler::report(InvalidArgumentException)
#22 C:\wamp64\www\test\wiki\includes\MediaWiki.php(550): MWExceptionHandler::handleException(InvalidArgumentException)
#23 C:\wamp64\www\test\wiki\index.php(43): MediaWiki->run()
#24 {main}

Is there any solution to this?

Is there a MediaWiki login page CSS file?

Published 25 Jan 2018 by Daniel in Newest questions tagged mediawiki - Stack Overflow.

I've changed my MediaWiki Common.css and Print.css and all modifications loaded just fine.

The problem is that all the customizations made in the Common.css are not applying in the Login page at all.

Is there a specific css file for the login page? Can anyone help me?

Spaces Now Available in Singapore (SGP1)

Published 25 Jan 2018 by John Gannon in The DigitalOcean Blog.

Spaces Now Available in Singapore (SGP1)

We’re excited to announce that Spaces is now available in our Singapore datacenter, giving developers and businesses global reach to affordable and scalable object storage. Since our initial launch in September, Spaces has grown in popularity with both existing and new customers—over one billion objects have been stored in Spaces—and it's no wonder; object storage is critical to delivering web assets, backing up data and even storing mission critical event logs in the cloud. Hosting storage close to your applications and customers will improve their overall experience.

Here's what some customers have said about Spaces:

Spaces Now Available in Singapore (SGP1)

What's New and Upcoming with Spaces

Over the past four months, we’ve expanded into Europe by making Spaces available in AMS3, added CORS support, and an upgraded upload experience to the Control Panel.

We’re currently working on other features and capabilities which will come out soon, including:

Spaces will launch in SFO2 by early Q2 2018, with Frankfurt and London to follow later in the year.

Create a Space in SGP1 today!

How can I allow sysops and specific users to hide spam articles in MediaWiki?

Published 24 Jan 2018 by jehovahsays in Newest questions tagged mediawiki - Webmasters Stack Exchange.

My website is powered by MediaWiki Version 1.29.1.

Sometimes the Recent Changes results page becomes cluttered with spam articles that I wish to hide from the results page. How can I allow specific users to hide them?

Keep in mind, I don't need spam protection and I only need to know how to hide spam articles from the results page.

Off and On Again: The story of KDE Plasma's desktop icons; 5.12 improvements

Published 24 Jan 2018 by eike hein in blogs.

KDE Project:

Desktop icons in Plasma 5.12 LTS Beta
Desktop icons in Plasma 5.12 LTS Beta (Click to enlarge)

Recent news in the Linux desktop community recall an interesting time in Plasma's history: Release 4.1 in 2008, Plasma's second release ever, that time we (in)famously abandoned desktop icons (sneak preview: they came back).

Of course we never really abandoned them. Instead, in 4.1 we initially debuted the Folder View technology, which powers most of the ways to browse file locations on a Plasma desktop. Folder View gives you folder widgets on the desktop, folder popups on your panels - and yes, desktop icons, which always remained a supported option. An option we, crucially, did decide to tick default-off at the time. Instead we chose to place a folder widget on the default desktop, in part to reinforce the then-new widget-oriented ways of doing things in Plasma, things older KDE desktops just couldn't do.

The Awkward Years

A telling sign in hindsight, many distributions reneged on our decision and turned icons on for their users anyway. And yet we had decided to throw the switch upstream; what next?

A period of research and experimentation followed. With all that newly freed-up screen real estate and a new modular architecture, we looked into alternatives for what a device homescreen could be. The PC during this time was in a mood to diversify as well, with new form factors popping up in stores. Some of our experiments took off - the Search and Launch interface we debuted alongside the Plasma Netbook spin in 4.5 directly inspired the popular Application Dashboard fullscreen overlay we introduced in Plasma 5.4. Others, like the Newspaper view, failed to find much of an audience.

Application Dashboard in Plasma 5.12 LTS Beta
The Application Dashboard overlay has its origins in the icons-off adventure (Click to enlarge)

Ultimately, though this period was productive in many ways, we didn't hit upon a clearly-better new homescreen. Elsewhere meanwhile, on a parallel track, the icon homescreen UI metaphor unexpectedly bounced back and grew stronger. Touchscreen handsets introduced a whole new generation of computer users to - essentially - desktop icons. In the following years we saw user numbers and familiarity with homescreen icons increase, not decrease.

The Lights are Back on and the Doors are Open

During the Plasma 5.10 dev cycle, we did a lot of polish work on the desktop icons experience. We then decided that it was time to stop hiding desktop icons support behind a config option: All things considered, the previous default was just not serving the majority of our users well. It had to change.

We still don't place any icons on the desktop by default. (Many distributions do - but they always did for all that time.) Those who enjoy the calm and tranquility of an empty desktop or don't want icons to get in the way of widgets were not impacted by this move. But drop a file or add an app to the desktop, and you now get an icon again, with full support for all of the powerful features KDE's desktops have always offered when dealing in files and links. For the many users who rely on desktop icons, this is a welcome reprieve from having to fiddle around post-install.

Icons of the Future

In the upcoming Plasma 5.12 LTS release, desktop icons are getting even better. We've done a truckload of work on improving the experience with multiple monitors, across which icons can be moved freely again, along with gracefully handling monitor hot plug/unplug. Performance and latency improvements, the key theme to 5.12 in general, have continued where 5.10+ left off, with the desktop reflecting file operations now faster than before.

We've worked though many of the most-reported feature requests and pain points for desktop icons throughout 2017, but we're not done yet. Folder View development continues in 2018 with more outstanding user requests on the horizon, so feel free to get in touch.

Check out the beta now and let us know what else you want out of desktop icons after 5.12!

MediaWiki API - what are the differences between 'opensearch' & 'query', and 'generator' & 'list' in the API call url

Published 24 Jan 2018 by firefiber in Newest questions tagged mediawiki - Stack Overflow.

I'm trying to work with the mediawiki API for a project on FreeCodeCamp. I've read through these pages on the API docs:

  1. OpenSearch
  2. Generators
  3. Lists
  4. Query

And it's still not clear what the real differences are, and when and why I'd need to use each one. Here's three API calls I've made that each produce slightly different results:


In this, I get an array with 4 items, the first being the search term, the second being a list of result page titles, the third being a small snippet of each page, and the fourth being the URL to each page.


In this one, there's a generator=search, which I don't understand. On the API page for generators, it just says: Get the list of pages to work on by executing the specified query module., which isn't really very helpful. What does this mean?


This is the same as the previous one except I'm using list=search.

So my questions are:

As you can see, both are nearly identical, except one uses the generator, and the other the list, but both mention inprop=url, and snippets (srprop & gsrprop).

The Full BBS Documentary Interviews are Going Online

Published 23 Jan 2018 by Jason Scott in ASCII by Jason Scott.

This year, the full 250 hours of interviews I conducted for the BBS Documentary are going online at the Internet Archive.

There’s already a collection of them up, from when I first set out to do this. Called “The BBS Documentary Archive“, it’s currently 32 items from various interviews, including a few clip farms and full interviews of a bunch of people who sat with me back in the years of 2002-2004 to talk about all matter of technology and bulletin board history.

That collection, as it currently stands, is a bit of an incomplete mess. Over the course of this project, it will become a lot less so. I’ll be adding every minute of tape I can recover from my storage, as well as fixing up metadata where possible. Naturally you will be asked to help as well.

A bit of background for people coming into this cold: I shot a movie called “BBS: The Documentary” which ended up being an eight episode mini-series. It tried to be the first and ultimately the last large documentary about bulletin board systems, those machines hooked up to phone lines that lived far and wide from roughly 1978-2000s. They were brilliant and weird and they’re one of the major examples of life going online. They laid the foundation for a population that used the Internet and the Web, and I think they’re terribly interesting.

I was worried that we were going to never get The Documentary On BBSes and so I ended up making it. It’s already 10 years and change since the movie came out, and there’s not been another BBS Documentary, so I guess this is it. My movie was very North American-centric and didn’t go into blistering detail about Your Local BBS Scene, and some people resented that, but I stand by both decisions; just getting the whole thing done required a level of effort and energy I’m sure I’m not capable of any more.

Anyway, I’m very proud of that movie.

I’m also proud of the breadth of interviews – people who pioneered BBSes in the 1970s, folks who played around in scenes both infamous and obscure, and experts in areas of this story that would never, ever have been interviewed by any other production. This movie has everything: Vinton Cerf (co-creator of the Internet) along with legends of Fidonet like Tom Jennings and Ken Kaplan and even John Madill, who drew the FidoNet dog logo. We’ve got ANSI kids and Apple II crackers and writers of a mass of the most popular BBS software packages. The creator of .QWK packets and multiple members of the Cult of the Dead Cow. There’s so much covered here that I just think would never, ever be immortalized otherwise.

And the movie came out, and it sold really well, and I open licensed it, and people discover it every day and play it on YouTube or pull out the package and play the original DVDs. It’s a part of culture, and I’m just so darn proud of it.

Part of the reason the movie is watchable is because I took the 250 hours of footage and made it 7.5 hours in total. Otherwise… well….

…unless, of course, you’re a maniac, and you want to watch me talking with people about subjects decades in the past and either having it go really well or fall completely apart. The shortest interview is 8 minutes. The longest is five hours. There’s legions of knowledge touched on in these conversations, stuff that can be a starting port for a bunch of research that would otherwise be out of options to even find what the words are.

Now, a little word about self-doubt.

When I first starting uploading hours of footage of BBS Documentary interviews to the Internet Archive, I was doing it from my old job, and I had a lot going on. I’d not done much direct work with Internet Archive and didn’t know anything going on behind the scenes or how things worked or frankly much about the organization in any meaningful amount. I just did it, and sent along something like 20 hours of footage. Things were looking good.

Then, reviews.

Some people started writing a few scathing responses to the uploads, pointing out how rough they were, my speech patterns, the interview style, and so on. Somehow, I let that get into my head, and so, with so much else to do, I basically walked away from it.

12 years later (12 years!) I’m back, and circumstances have changed.

I work for the Archive, I’ve uploaded hundreds of terabytes of stuff, and the BBS documentary rests easily on its laurels of being a worthwhile production. Comments by randos about how they wish I’d done some prettify-ing of the documentary “raw” footage don’t even register. I’ve had to swim upstream through a cascade of poor responses to things I’ve done in public since then – they don’t get at me. It took some time to get to this place of comfort, which is why I bring it up. For people who think of me as some bulletproof soul, let it be known that “even I” had to work up to that level, even when sitting on something like BBS Documentary and years of accomplishment. And those randos? Never heard from them again.

The interview style I used in the documentary raw footage should be noted because it’s deliberate: they’re conversations. I sometimes talk as much as the subjects. It quickly became obvious that people in this situation of describing BBS history would have aspects that were crystal clear, but would also have a thousand little aspects lost in fuzzy clouds of memory. As I’d been studying BBSes intensely for years at this point, it would often take me telling them some story (and often the same stories) to trigger a long-dormant tale that they would fly with. In many cases, you can see me shut up the second people talk, because that was why I was talking in the first place. I should have known people might not get that, and I shouldn’t have listened to them so long ago.

And from these conversations come stories and insights that are priceless. Folks who lived this life in their distant youth have all sorts of perspectives on this odd computer world and it’s just amazing that I have this place and collection to give them back to you.

But it will still need your help.

Here’s the request.

I lived this stupid thing; I really, really want to focus on putting a whole bunch of commitments to bed. Running the MiniDV recorder is not too hard for me, and neither is the basic uploading process, which I’ve refined over the years. But having to listen to myself for hundreds of hours using whatever time I have on earth left… it doesn’t appeal to me at all.

And what I really don’t want to do, beyond listening to myself, is enter the endless amount of potential metadata, especially about content. I might be inspired to here and there, especially with old friends or interviews I find joyful every time I see them again. But I can’t see myself doing this for everything and I think metadata on a “subjects covered” and “when was this all held” is vital for the collection having use. So I need volunteers to help me. I run a Discord server that communicates with people collaborating with me and I have a bunch of other ways to be reached. I’m asking for help here – turning this all into something useful beyond just existing is a vital step that I think everyone can contribute to.

If you think you can help with that, please step forward.

Otherwise… step back – a lot of BBS history is about to go online.


Open Source at DigitalOcean: Extending go-libvirt with Code Generation

Published 23 Jan 2018 by Geoff Hickey in The DigitalOcean Blog.

Open Source at DigitalOcean: Extending go-libvirt with Code Generation

Back in November 2016, DigitalOcean released go-libvirt, an open source project containing a pure Go interface to libvirt. Using go-libvirt, developers could manage virtual machines leveraging all the power of libvirt’s extensive API without leaving the comfortable environment of Go. But there was a catch.

While the libvirt library has close to 400 API calls, initial versions of go-libvirt implemented only a handful of those calls. But go-libvirt is open source, so you can just add your own implementations for the routines you need, right?

Well, yes you could. But go-libvirt talks to libvirt by exchanging XDR-encoded buffers using an RPC mechanism based on the venerable ONC RPC (or Sun RPC), so you would first have to familiarize yourself with those RPCs. Then, you would have to locate the argument and return value structures in the libvirt protocol definition file, and write code to marshal and unmarshal them on send and receive. By that time you might be asking yourself, “Why don’t I just give up and use CGO?” But hang on. Tedious, repetitive work; that sounds like what we invented computers for. Maybe they can help?

This is the tale of how we used code generation to extend go-libvirt to cover every one of the libvirt API calls, and how we made it more resilient to future changes in the libvirt API.

Sun RPC and the Missing Toolchain

If you’re working with Sun RPC in C, you write a protocol file describing the messages you want to exchange and feed it to a utility called “rpcgen”. The output of rpcgen includes header files and stubs for both client and server. The stubs contain generated code to marshal and unmarshal the message bodies for each of the messages. This is exactly how libvirt works— the protocol files are right there in the libvirt source repo (look for source files ending in .x), and during the build they get processed by rpcgen into .c and .h files.

If rpcgen could output Go code we’d be all set, but it doesn’t. Sun RPC isn’t a popular option for native Go programs, and although there are libraries for handling its on-the-wire data representation—XDR—there aren’t any libraries around for parsing its protocol files into Go.

Time to roll up our sleeves!

Learning the Language

Sun RPC protocol files look a lot like a collection of C declarations. We could throw a parser together with regexes and custom code, but when the source files start getting complex that path often ends in tears. The protocol files we need to parse definitely meet the complexity threshold: like C, data types can be nested inside other data types, and this is exactly the kind of thing that regexes are ill-equipped to handle. To be reliable we’ll want a real stateful parser. We could write one, but there’s a better way.

Parser generators have been around since the 1970s, and Go includes a port of one of the oldest, yacc, in Using goyacc to generate our parser means we don’t have to write the state machine that makes up the bulk of the parser by hand (and yes, it also means our code generator is itself generated). With a generated parser we’re left with three pieces of code to write: the language grammar, which is consumed by the parser generator to build the parser state machine, the actions, which run when the parser identifies a bit of grammar, and the lexer.

The grammar definition lives in its own file, sunrpc.y, and goyacc uses the same syntax for the contents of this file as yacc did before it. Luckily, some of the documentation for Sun RPC includes grammar definitions in exactly the format goyacc expects, and we used that as a starting point for writing the grammar.
The actions are simply Go code mixed in with the grammar. When the parser identifies an element of the grammar, it will execute any actions defined at that point in the grammar file. In our case, the actions build an internal representation of the protocol file that we’ll use later to output our generated Go code.

The actions are simply Go code mixed in with the grammar. When the parser identifies an element of the grammar, it will execute any actions defined at that point in the grammar file. In our case, the actions build an internal representation of the protocol file that we’ll use later to output our generated Go code.

Alexa, Where’s My Lexer?

That leaves the lexer, also called a tokenizer. The lexer is called by the parser, and each time it’s called it returns the next token in the input stream, where a token is a unit of the grammar. For our grammar, if the input stream looks like this:

Open Source at DigitalOcean: Extending go-libvirt with Code Generation

The lexer will return the token CONST, then IDENTIFIER, =, CONSTANT, and ;. That matches one of the valid forms of const_definition from our grammar file (const_ident is elsewhere defined as IDENTIFIER):

Open Source at DigitalOcean: Extending go-libvirt with Code Generation

The Go code inside the braces after the grammar is the action the parser will execute when it sees this sequence of tokens. So the parser will call AddConst(), passing in the value of the second and fourth tokens, in this case the const_ident and the CONSTANT. The resulting call will be AddConst(“REMOTE_STRING_MAX”, “4194304”), because in our grammar the value of any token is the original string.

If you’re familiar with yacc, at this point you might be wondering, “Where’s the Go port of lex? Is there a golex?” The answer is no; lex isn’t part of the standard library. (To get an idea of why this might be so, and an excellent introduction to lexers in general, you might want to see this talk by Rob Pike, from back in the early days of Go, in 2011.)

So instead we have a handwritten lexer, lvlexer.go. It’s pretty straightforward, about 330 lines long, and uses no regular expressions. To work with the parser, the lexer has to satisfy an interface consisting of two functions: Lex() and Error().

Generating the Output

The actions, as well as the code that drives the parser, are found in generate.go, which gets compiled together with the lexer and the parser into a standalone binary. The generator calls the parser, and when the parser has finished its work the generator has an internal representation of the protocol file, and we need to tie everything together and output some Go code.

Up until now we’ve been talking about libvirt and Sun RPC, because libvirt is using many of the pieces that make up Sun RPC. But if you look at remote_protocol.x in the libvirt sources, you’ll notice something surprising: the procedure definitions, which would describe the argument and return types for each RPC procedure, are missing. There is an enum containing procedure numbers, but nothing that resembles a function prototype.

This is where libvirt departs from Sun RPC. Rather than use rpcgen to build the procedure stubs for client and server, they have implemented their own method for calling remote routines (have a look at libvirt’s callFull() in remote_driver.c if you’re curious).
So instead of a procedure definition in the protocol file, the procedure, its arguments, and it’s return values are associated by name. All arguments and return values in libvirt are structures. We can start from the remote_procedure enum in the protocol file. For the procedure REMOTE_NODE_ALLOC_PAGES, we have a procedure number of 347. To find the arguments structure we convert this to lowercase and add _args; for return we add _ret. We can apply this pattern to every procedure in the protocol file. If a procedure doesn’t take arguments or return values, the corresponding struct will be missing.

This gives us enough information to generate the Go client functions for each procedure. We’ll drop the remote_ prefix, since it’s common to every procedure, and we’ll convert the names to camel case so they look natural in Go. For REMOTE_NODE_ALLOC_PAGES, that means our generated Go routine would look like this:

Open Source at DigitalOcean: Extending go-libvirt with Code Generation

That’s not a bad start, but it forces the caller to construct the arguments structure and decode the return structure. Putting all the arguments into a structure makes the function awkward to use, and doesn’t match the libvirt documentation for virNodeAllowPages. We can do better.

API, Deconstructed

Because we used a real parser to process the protocol definition, our generator has full type information available. In fact for every struct definition in the protocol, we have our own Go struct containing all the type information for the struct’s elements. So the generator can replace the struct in the arguments for our generated function with a list of its contents, simply by doing this:

Open Source at DigitalOcean: Extending go-libvirt with Code Generation

Gen.Procs is an array of generated procedures, and Args is an array of arguments. With that statement we set the arguments for a generated function to the array of members in the corresponding arguments structure. We do the same thing for the return values, and then our generated NodeAllocPages looks like this:

Open Source at DigitalOcean: Extending go-libvirt with Code Generation

This is very close to the libvirt documentation; the Go version just omits two size arguments since slices carry size information with them.


The last step in our code generator is to iterate through all the type information we’ve collected and build the actual Go files. We use Go’s text/template library with a couple of fairly simple template files to do this. The generated procedures are output to libvirt.gen.go, and some constants to a separate file, internal/constants/constants.gen.go. We even generate comments so that godoc has something to work with.

Go, Generate

Code generation is exactly what the go generate command was created for. If you’ve never explored this part of the Go toolchain before, there’s an excellent introduction in the Go blog. The generate command is intended to be run separately from go build, and much less often - typically the generated code for a project only needs to be re-created when an external dependency changes. To rebuild the generated files for a particular version of libvirt, you simply set an environment variable LIBVIRT_SOURCE with the path to the libvirt sources, and run go generate ./… from the go-libvirt directory. That command will descend through the go-libvirt sources executing generate instructions embedded in the source files.

That’s It!

We hope you enjoyed this brief tour of the code generator for go-libvirt. There’s still work to be done on the project, but with these changes it should be much more useful to a wider audience.

The go-libvirt project is hosted on GitHub, and pull requests are welcome!

Geoff Hickey is a Senior Engineer at DigitalOcean, where he works on hypervisors and the code that surrounds them. In his spare time he teaches machines to make things out of wood and then shamelessly takes credit for their work. As a result, he views the rise of machine learning with some trepidation.

About Babe

Published 20 Jan 2018 by camiloh in blogs.

KDE Project:

I've been working on a small music player named Babe for a while now, it started as an idea of a bare simple music player, and as the time has passed it has become a project I hope to expand beyond a simple music player.


Last year I presented the project to the KDE community at Akademy 2017 that took place in Almeria-Spain, I wanted to highlight back then the idea of a multimedia desktop application that was aware of its content by making use of the wide knowledge that could be found on the internet, by making use of a music information retrieval (MIR) system called Pulpo together with AI techniques.

The main idea has been to be able to discover and rediscover your own music collection by managing it contextually. Since then I've been working on such idea, and last month I made an initial release for Babe 1.2, which is still missing a proper tar release, but here's an experimental AppImage you can try (on distributions with up to date glib packages)
AppImage preview

and if you feel like compiling here's the 1.2 release branch:
Babe 1.2 release branch

Anyway, back to the subject.
This initial release makes use of qwidgets, which is better suited for desktop applications than it is for mobile or touch devices.
At a Akademy BoF session I had the opportunity to talk to some KDE contributors and devs about the future of Babe, and the idea of porting Babe to Kirigami, the QML framework for convergency, sounded very attractive to me, so I decided that I would be, eventually, porting Babe to QML/Kirigami in order to take part on the future of Plasma.

And then, after the release of Babe 1.2, I started to work on the porting and it is going pretty smooth; so far I got a working version that runs on Android, Plasma mobile and GNU Linux desktops and makes use of the same Pulpo MIR system. So you will have the same convergent music player with tons of features to connect between your devices.

Babe QML/Kirigami preview

The main idea for Babe is still the same, but slowly evolving:
A simple place to keep your fav music at hand, yes... and a contextual manager to let you rediscover your music and discover new one. So Babe, with help of Pulpo , will crawl the internet for you and found data on music information repositories to stablish relationships between your music file and generate suggestions.

You can even try the semantic search by making use of tags like: "similar: sam smith", "like: canadian", "tag: spanish", "lyrics: make me cry" and even more, "artist: _______", "album: _______", "genre: _______" , "playlist: _______" ...

Babe will also integrate in the future with online music streaming services, but for now there's a working extension for Chrome/mium and Firefox that connects to Babe, that makes use of youtubedl tool to collect your favourite music from sites like YouTube, Vimeo, etc...
Once collected, Babe will take care of the rest.
Babe add-on for Firefox

And for last I want to write a little about something I've been working on, a Babe online platform to share music information and music interests with friends or other users, that idea for the platform is to have a free an open place to be able to connect to your friends throughout music, to share playlists, music lyrics and playback annotations, comments, etc... And one idea that might could happen or not, due to copyrights issues, is streaming one-on-one the song your currently listening to, meaning you could stream to some friend or user what you are listening at the moment, similar to a live radio station.

However, it will be a very interesting project for me to develop within the KDE community and I hope to share it with you all pretty soon to make it better.

And that's it for now, I shall be making soon another blog posts writing a little more in depth about Pulpo, and the technical aspects of the projects around Babe and about the QML/Kirigami porting progress.

Let me know what you think, the comments are open.
And hopefully I will meet some of you at Akademy 2018.

The Undiscovered

Published 19 Jan 2018 by Jason Scott in ASCII by Jason Scott.

There’s a bit of a nuance here; this entry is less about the specific situation I’m talking about, than about the kind of situation it is.

I got pulled into this whole thing randomly, when someone wrote me to let me know it was going along. Naturally, I fired into it all with all cylinders, but after a while, I figured out very good people were already on it, by days, and so I don’t actually have to do much of anything. That works for me.

It went down like this.

MOS Technology designed the 6502 chip which was in a mass of home computers in the 1970s and 1980s. (And is still being sold today.) The company, founded in 1969, was purchased in 1976 by Commodore (they of the 64 and Amiga) and became their chip production arm. A lot of the nitty gritty details are in the Wikipedia page for MOS. This company, now a subsidiary, lived a little life in Pennsylvania throughout the 1980s as part of the Commodore family. I assume people went to work, designed things, parked in the parking lot, checked out prototypes, responded to crazy Commodore administration requests… the usual.

In 1994, Commodore went out of business and its pieces bought by various groups. In the case of the MOS Technology building, it was purchased by various management and probably a little outside investment, and became a new company, called GMT Microelectronics. GMT did whatever companies like that do, until 2001, when they were shut down by the Environmental Protection Agency because it turns out they kind of contaminated the groundwater and didn’t clean it up very well.

Then the building sat, a memory to people who cared about the 6502 (like me), to former employees, and probably nobody else.

Now, welcome to 2017!

The building has gotten a new owner who wants to turn the property into something useful. To do this, they basically have to empty it, raze the building the ground, clean the ground, and then build a new building. Bravo, developer. Remember, this building has sat for 16 years, unwanted and unused.

The sign from the GMT days still sits outside, unchanged and just aged from when the building was once that business. Life has certainly gone on. By the way, these photos are all from Doug Crawford of the Vintage Computing Federation, who took this tour in late 2017.

Inside, as expected, it is a graffiti and firepit shitshow, the result of years of kids and others camping out in the building’s skeletal remains and probably whiling away the weekends hanging out.

And along with these pleasant scenes of decay and loss are some others involving what Doug thought were “Calcium Deposits” and which I personally interpret as maybe I never need to set foot in this building at any point in my future life and probably will have to burn any clothing I wear should I do so.

But damn if Doug didn’t make the journey into this environmentally problematic deathtrap to document it, and he even brought a guest of some reknown related to Commodore history: Bil Herd, one of the designers of the Commodore 128.

So, here’s what I want to get to: In this long-abandoned building, decades past prime and the province of trespassers and neglect, there turns out to have been quite a bit of Commodore history lying about.

There’s unquestionably some unusually neat items here – old printed documentation, chip wafers, and those magnetic tapes of who knows what; maybe design or something else that needed storage.

So here’s the thing; the person who was cleaning up this building for demolishing was put into some really weird situations – he wanted people to know this was here, and maybe offer it up to collectors, but as the blowback happened from folks when he revealed he’d been throwing stuff out, he was thrown into a defensive position and ultimately ended up sticking with looking into selling it, like salvage.

I think there’s two lessons here:

  1. There’s no question there’s caches of materials out there, be they in old corporate offices, warehouses, storerooms, or what have you, that are likely precious windows into bygone technology. There’s an important lesson in not assuming “everything” is gone and maybe digging a bit deeper. That means contacting places, inquiring with simple non-badgering questions, and being known as someone interested in some aspect of history so people might contact you about opportunities going forward.
  2. Being a shouty toolbox about these opportunities will not improve the situation.

I am lucky enough to be offered a lot of neat materials in a given month; people contact me about boxes, rooms and piles that they’re not sure what the right steps are. They don’t want to be lectured or shouted at; they want ideas and support as they work out their relationship to the material. These are often commercial products now long-gone and there’s a narrative that old automatically means “payday at auction” and that may or may not be true; but it’s a very compelling narrative, especially when times are hard.

So much has been saved and yes, a lot has been lost. But if the creators of the 6502 can have wafers and materials sitting around for 20 years after the company closed, I think there’s some brightness on the horizon for a lot of other “lost” materials as well.

Fun (?) with symbol visibility...

Published 19 Jan 2018 by alexander neundorf in blogs.

KDE Project:

In the last days I had to deal with loading plugins via dlopen.
I learned so far already a lot about symbols on Linux.

I expected, that if I have an executable, and load a plugin into it, that the stuff inside the executable (i.e. object files and static libraries) won't be used to resolve missing symbols in the plugin.
But that's wrong, by default, all symbols are visible, and so all the symbols in your executable are visible to a plugin.
I didn't find the relevant blogs or mailing list entries from the KDE devs, back in the KDE2 days when all the plugin-infrastructure was added to kdelibs. But also others had to find out:

So, even if your executable links a library as static, its symbols are visible to plugins.
This can be fixed by setting the visibility to hidden, which can be done either using -fvisibility=hidden or the visibility-attribute.

One more thing I didn't expect, is that even if no shared library is involved, the symbols, i.e. code in your executable is still visible to a plugin. Assume your plugin defines a class with the same name and the same method name, i.e. the same symbol.
You create an instance of that class in your plugin and call some function from it.
I didn't expect that at runtime the plugin might call code from the executable instead of the class just built into the plugin (i.e. not being pulled in from a shared lib).
Again, making symbols hidden helps, in general.
Here's something related:

Today I once ended up in a state where all the correct functions from the plugin were called, e.g. the correct ctor, but later on, when a virtual function of that object was called, it was the virtual function from the executable and not from the plugin. Weird. How could that happen ?

I added a little project in github for playing around with such stuff:

My conclusion so far is that in general you probably always want to build executables and static libraries with visibility=hidden. Not sure why this is not the default...

Update: Different behaviour with different compilers

I played around more and added an example to reproduce really strange behaviour on github in the vtabletest subdir.

In that example, I have an executable which implements a class Base and a class Derived, which is derived from it. Base has virtual and non-virtual functions, Derived reimplements both virtual functions.
This executable dlopens a plugin/shared library, which happens to also implement the classes Bar and Foo, both having exact the same functions as in the executable.
Then, the executable calls a function in the plugin, and that function allocates an instance of Derived and calls all its functions, virtual and non-virtual.
I tested this with g++, clang++ and the Intel icpc compiler (you can get a license from Intel if you qualify as non-commercial Open Source developer).

What do you think happens ?
The Derived ctor from the plugin will be called, which will call the ctor of Base from the plugin, and calling the virtual functions will call the implementation in the plugin ?

If symbols are not hidden, this did not happen with any of the three compilers. Instead, all three compilers created different results.

With g++, basically nothing from the plugin was called, Base ctor, Derived ctor and virtual and non-virtual function were all executing the code (symbols) from the executable. This was the most consistent and least messed up result.

With Intel icpc, it was more interesting.
When creating the classes in the plugin, the ctors from the plugin are called, also the non-virtual function calls use the version from the plugin. IMO that's good. Now the weird thing: when calling any of the virtual functions, those from the executable were called. So in the plugin I had basically an object, where everything came from the plugin, except when calling its virtual functions, those were the wrong ones. IMO this was the result closest to what I would like to have , but due to the issue with the virtual functions completely broken.

clang offered yet another version.
Here, only the ctor for Derived was called from the plugin, all other functions, the ctor for Base, the virtual and non-virtual functions, were all using the versions from the executable.

But, there is an easy fix: hiding the visibility of the classes Base and Derived in the executable, or in the plugin makes everything work as expected, for all three compilers.

I plan to have a closer look at the created executables and libraries, using nm and looking at the assembly code...

User Dictionaries – a Fundamental Design Flaw

Published 18 Jan 2018 by Andy Mabbett in Andy Mabbett, aka pigsonthewing.

I have just had to add several words to the user dictionary for the spell-checker in Notepad++, that I have already added to my user dictionary in LibreOffice, and to my user dictionary in (all under Windows 10 – does this happen with user dictionaries under Unix & Mac operating systems?).

Notepad++ spell-checker, not recognising the word 'Mabbett'

Under , a user should not have to accept a word’s spelling more than once.

User dictionaries should not be in a “walled garden” within an application. They should exist at operating-system level, or more specifically, at user-account level.

Or, until Microsoft (and other operating system vendors) implement this, applications — at least, open source applications like those listed above — should make their user dictionaries accessible to each other.

Some issues to consider: users with dictionaries in more than one language; security.

Prior art: I raised a Notepad++ ticket about this. It was (not unreasonably) closed, with a pointer to this DSpellCheck ticket on the same subject.

The post User Dictionaries – a Fundamental Design Flaw appeared first on Andy Mabbett, aka pigsonthewing.

User Dictionaries – a Fundamental Design Flaw

Published 18 Jan 2018 by Andy Mabbett in Andy Mabbett, aka pigsonthewing.

I have just had to add several words to the user dictionary for the spell-checker in Notepad++, that I have already added to my user dictionary in LibreOffice, and to my user dictionary in (all under Windows 10 – does this happen with user dictionaries under Unix & Mac operating systems?).

Notepad++ spell-checker, not recognising the word 'Mabbett'

Under , a user should not have to accept a word’s spelling more than once.

User dictionaries should not be in a “walled garden” within an application. They should exist at operating-system level, or more specifically, at user-account level.

Or, until Microsoft (and other operating system vendors) implement this, applications — at least, open source applications like those listed above — should make their user dictionaries accessible to each other.

Some issues to consider: users with dictionaries in more than one language; security.

Prior art: I raised a Notepad++ ticket about this. It was (not unreasonably) closed, with a pointer to this DSpellCheck ticket on the same subject.

The post User Dictionaries – a Fundamental Design Flaw appeared first on Andy Mabbett, aka pigsonthewing.

Edit existing wiki page with libreoffice writer

Published 18 Jan 2018 by Rafiek in Newest questions tagged mediawiki - Ask Ubuntu.

I've read about sending a page to a mediawiki using libre writer.
But is it possible to call up an existing wiki page, edit it and send it back to the wiki?
If so, how is this done?

Kicking Off the New Year with New Droplet Plans

Published 16 Jan 2018 by Ben Schaechter in The DigitalOcean Blog.

Kicking Off the New Year with New Droplet Plans

Last week, we shared our 2018 roadmap preview and today, we’re excited to announce the first of many new updates for developers: new Droplet plans. We know that price-to-performance is an important consideration when choosing where to host your application, whether it be a small side project or a large business. That is why we’re upgrading resources across our plans and introducing new plans to give you even more flexibility to choose the right Droplet for your application.

We have updates to share across all three classes of Droplet plans: Standard, High CPU, and High Memory Droplets. These updates are available immediately through the Cloud Control Panel and API in the majority of our datacenters. Here are the full details of the updated plans:

Standard Droplets

Standard Droplet plans have always offered a healthy balance of CPU, memory, and SSD storage to get blogs, web applications, and databases off the ground. With today’s changes, we have 14 brand new Standard Plans to ensure that these applications can grow and scale as your projects grow.

These updated plans are listed below, with some before and after looks at how resources have changed at various monthly price points:

Kicking Off the New Year with New Droplet Plans

These updated Standard Droplets offer more resources for either the same or a lower price than our previous generation.

We’ve also introduced three flexible plans, all priced at $15 with varying sets of CPU and memory combinations to give you resource flexibility without worrying about price.

Finally, all original “first generation” Standard Droplet Plans are still available via the API. This will ensure any applications you host that are hard-coded for those plans aren’t negatively impacted. We intend to fully deprecate those plans on July 1, 2018 and will send more updates throughout the year.

Optimized Droplets

High CPU Plans were released just six months ago and we’re excited to make our first upgrades to these plans which are great for CI/CD, batch processing, and other compute-intensive workloads. We’re also renaming High CPU Droplets “Optimized Droplets”. These Droplets are still powered by dedicated hyper-threads from best-in-class Broadwell and Skylake CPUs, but now come with additional memory and local SSD disk for the same price. In the future, we’ll be looking to boost performance not only for CPU but also for memory and disk performance. The updates are shown below:

Kicking Off the New Year with New Droplet Plans

Competitively, these plans line up well with similar offerings from other providers in the market. Below you can see that DigitalOcean’s Optimized Droplets are priced competitively from a price-to-performance perspective:

Kicking Off the New Year with New Droplet Plans

High Memory Plans are being deprecated as a result of the upgrades made to Standard Plans, which come with ample amounts of RAM and SSD storage at competitive price points. The API will support High Memory Droplets created until July 1, 2018, but we recommend transitioning over to the new Standard Droplet Plans before then. (If you have an active High Memory Droplet, it will simply continue to be charged at the same rate for the duration that it remains active.)

Coming Soon: Per-Second Billing

We’re working hard at making continuous improvements to our billing system in order to align with changes in customer Droplet usage behavior. We’re happy to share that starting later this year, Droplets will be billed by the second instead of by the hour. This means that you’ll only be charged for exactly the amount of time you use your instance to the second. We understand it is important for customers scaling instances up and down regularly to have the best rate available and we’re happy to get this update shipped for you. Keep an eye out for a future announcement specifically on billing improvements.

Looking Ahead

We understand that price-to-performance ratios are of utmost consideration when you’re choosing a hosting provider and we’re committed to being a price-to-performance leader in the market. As we continue to find ways to optimize our infrastructure we plan on passing those benefits on to you, our customers.

Additional Information & Helpful Links

Ben Schaechter
Senior Product Manager, Droplet

Use remote Tomcat/Solr for BlueSpice ExtendedSearch

Published 15 Jan 2018 by Dominic P in Newest questions tagged mediawiki - Webmasters Stack Exchange.

Is it possible to configure the BlueSpice ExtendedSearch extension to connect to a remote Apache Tomcat/Solr instance instead of installing all of that on the same machine that runs BlueSpice?

I looked through the install guide for ExtendedSearch, but I couldn't find any mention of this as an option.

Any ideas?


Published 15 Jan 2018 by timbaker in Tim Baker.

I have found myself recently in the unfamiliar and uncomfortable position of defending natural therapies. This is not a role I ever foresaw for myself. I understand the rigorous scientific process of developing and testing theories, assessing evidence and requiring proof. I studied science into...

Update 1.3.4 released

Published 13 Jan 2018 by Roundcube Webmail Dev Team in Roundcube Webmail Project News.

We proudly announce the next service release to update the stable version 1.3. It contains fixes to several bugs reported by our dear community members and makes Roundcube fully compatible with PHP 7.2.

See the full changelog in the release notes on the Github download page.

This release is considered stable and we recommend to update all productive installations of Roundcube with this version. Download it from

Please do backup your data before updating!

New year, new tool - TeraCopy

Published 12 Jan 2018 by Jenny Mitcham in Digital Archiving at the University of York.

For various reasons I'm not going to start 2018 with an ambitious to do list as I did in 2017 ...I've still got to do much of what I said I was going to do in 2017 and my desk needs another tidy!

In 2017 I struggled to make as much progress as I would have liked - that old problem of having too much to do and simply not enough hours in the day.

So it seems like a good idea to blog about a new tool I have just adopted this week to help me use the limited amount of time I've got more effectively!

The latest batch of material I've been given to ingest into the digital archive consists of 34 CD-ROMs and I've realised that my current ingest procedures were not as efficient as they could be. Virus checking, copying files over from 1 CD and then verifying the checksums is not very time consuming, but when you have to do this 34 times, you do start to wonder whether your processes could be improved!

In my previous ingest processes, copying files and then verifying checksums had been a two stage process. I would copy files over using Windows Explorer and then use FolderMatch to confirm (using checksums) that my copy was identical to the original.

But why use a two stage process when you can do it in one go?

The dialog that pops up when you copy
I'd seen TeraCopy last year whilst visiting The British Library (thanks Simon!) so decided to give it a go. It is a free file transfer utility with a focus on data integrity.

So, I've installed it on my PC. Now, whenever I try and copy anything in Windows it pops up and asks me whether I want to use TeraCopy to make my copy.

One of the nice things about this is that this will also pop up when you accidentally click and drop a directory into another directory in Windows Explorer (who hasn't done that at least once?) and gives you the opportunity to cancel the operation.

When you copy with TeraCopy it doesn't just copy the files for you, but also creates checksums as it goes along and then at the end of the process verifies that the checksums are the same as they were originally. Nice! You need to tweak the settings a little to get this to work.

TeraCopy busy copying some files for me and creating checksums as it goes

When copying and verifying is complete it tells you how many files it has
verified and shows matching checksums for both copies - job done!

So, this has made the task of copying data from 34 CDs into the digital archive a little bit less painful and has made my digital ingest process a little bit more efficient.

...and that from my perspective is a pretty good start to 2018!

The Deep End Podcast Ep. 12: Empowering People & AI with Unbabel

Published 11 Jan 2018 by Hollie Haggans in The DigitalOcean Blog.

The Deep End Podcast Ep. 12: Empowering People & AI with Unbabel

What interesting challenges does a platform that leverages both human translations and artificial intelligence face? For Unbabel, an AI-powered human translation platform, it’s paying equal amounts of attention to the humans driving the business and to the technology underlying the platform.

In this episode, Marcelo Lebre, Unbabel’s VP of Engineering, discusses how the startup is growing and adapting on both the technical side and the people side, and shares insights into how their stack supports their scaling service.

Subscribe to the The Deep End Podcast on iTunes, and listen to the latest episode on SoundCloud below:

Hollie Haggans heads up Global Partnerships for DigitalOcean’s Hatch program. She is passionate about startups and cold brew coffee. Get in touch with questions at

Block Storage Now Available in AMS3

Published 9 Jan 2018 by Priya Chakravarthi in The DigitalOcean Blog.

Block Storage Now Available in AMS3

Today, we're excited to share that Block Storage is available to Droplets in the AMS3 datacenter. With Block Storage, you can scale your storage independently of your compute and have more control over how you grow your infrastructure, enabling you to build and scale larger applications more easily. Block Storage has been a key part of our overall focus on strengthening the foundation of our platform to increase performance and enable our customers to scale.

We've seen incredible engagement since our launch in July 2016. Users have created Block Storage volumes in SFO2, NYC1, NYC3, FRA1, SGP1, TOR1, BLR1, and LON1 to scale databases, take backups, store media, and much more. AMS3 is our ninth datacenter with Block Storage.

Thanks to everyone who has given us feedback and used Block Storage so far. Please keep it coming. You can create your first Block Storage volume in AMS3 today!

— DigitalOcean Storage Team

Looking forwards, looking backwards (2017/2018)

Published 2 Jan 2018 by inthemailbox in In the mailbox.

Yesterday, 1 January, I was reminded by the British Museum that the month is named after Janus, the Roman god. The International Council on Archives (ICA) uses a stylised form of Janus for their logo, because archivists do the same thing. Archives are identified and appraised based on their ongoing value to the community and to the organisations and people that created them. Over time, they develop historicity, which leads to the common, but mistaken, belief that archives are about “old stuff”.

January 1 is also the time for looking back over the previous year, and making resolutions about the forthcoming year. Personally, I think the latter is foolish, because I ascribe to the aphorism that no plan survives contact with reality, and 2017 demonstrates that perfectly.

I started with grand plans for a new blog on the Pie Lunch Lot, my mother’s and her cronies answer to what we now call the FIFO lifestyle, without benefit of modern social media. This would mean that I would take charge of my personal archives, and work within an historian’s framework. Yeah, right.

Blogs on this site were also few and far between. I left Curtin, and the luxury of reading and reviewing articles as part of my work there. Back at SRO, I’ve been involved with archival description and with developing our archival management system. This has had congruences with my private work, including  a poster at the Association of Canadian Archivists conference in Ottawa (Disrupting description – Canada3) and developing a workshop on archival description for the ASA conference in Melbourne (of which more another time).

I also became the program chair for the ASA conference in Perth in 2018 – “Archives in a Blade Runner age”, which has led to the creation of another blogsite, this one on science fiction and archives. (Don’t forget to submit an abstract before February 28, and, yes, there will be no extensions.) And, I became a National Councillor for the ASA, which has its own steep learning curve.

Add in the usual chaos that is life, and there you have it. 2017 not as planned, 2018 already out of control 🙂

Looking forward to 2018

Published 24 Dec 2017 by Bron Gondwana in FastMail Blog.

This is the final post in the 2017 FastMail Advent Calendar. In the previous post we met Rik.

We’ve done it! 24 blog posts, one per day.

To begin this aspirational blog post, a goal. We plan to post more frequently overall next year. At least one post every month.

This should be fairly easy since we have a Pobox blog, a Topicbox blog and of course this FastMail blog.

One company

In 2018 we will continue the process of becoming a single company where everybody “owns” all our products, rather than two separate companies flying in close formation, each with their own products. Rik is driving the process of integrating development while Rob N ★ leads merging of operations.

We no longer wake somebody with automated notifications if there’s a person awake in a different timezone who can perform triage, leading to better sleep for everybody. We’re also distributing first-line support between the offices, and training support staff in all our products, for a closer working relationship between the people seeing emerging issues, and the people responding to them.

Our 4 products have their own branding, but internally we’re becoming a single team who love all our children equally (ssssh … I think we each still have our favourite)

Settling in to our new digs

FastMail Melbourne moved to a new office in the middle of the year, and the process was not entirely painless.

Special mention to Jamie who somehow didn’t go mad through all of this. What a welcome to the company – he’s just coming up to the completion of his first year with us, and when I asked him to take point on the office fit-out, I had no idea what I was asking him to do. I’m sure he had no idea either, or he wouldn’t have said yes!

Our office is a lovely space, just 50 metres from our old office, so we can still go to our favourite coffee places in the morning! We have a favourite place we normally go, but we can be fickle – if their coffee isn’t up to our snobby standards, Melbourne has plenty of nearby hipster options just waiting for our custom. This year we’ve mostly dragged people who used disposable paper cups into bringing reusable keep-cups instead. Reusable keep-cups are totally Melbourne.

The morning coffee run is our main regular social gathering, and a highlight of the day. Even non-coffee-drinkers join us for the walk and chat.

Improving our products

The world keeps changing, and no product can keep still and stay successful. But we’re not just reacting, we have plans for new features too!

Next year we will keep polishing Topicbox based on feedback from our early adopters. We also have some neat ideas for new features which will make it even more compelling for teams working together.

FastMail hasn’t seen many user-visible changes in the past year, largely because we’ve been focusing on getting JMAP finished and the interface ported over to use it. 3 Years since our first blog post about JMAP, we’re really close to a finished mail spec. 2018 will be the year of JMAP on the desktop, and then we can start adding new features that build upon the new protocol.

More and more people are accessing our products primarily on mobile devices. We have neglected our apps in 2017, and we will remedy that in 2018. Mobile experience is an explicit focus for us in the coming year, and we’ve engaged outside help to assist with our app development.

Continuing to support Open Source and the community

We fund large amounts of the development work going into the Cyrus IMAP server, as well as the many other open source projects we work on.

We have sponsored various conferences in the past year, and provide free email service to some groups that we feel are well aligned with our mission, like the Thunderbird project, one of the most well known open source email clients.

And of course we send people, and give our time to standards work and collaboration at IETF, M3AAGW and CalConnect.


This is always the most interesting thing to me when I follow discussions about issues that affect us and our customers. Privacy and security are key features for everybody, as are usability and speed. Ideally, as a customer, these things are invisible. You only notice speed when things get slow. You only notice usability when you’re struggling to achieve your goals. You only notice privacy and security when it turns out you didn’t have them.

Neil wrote a great post earlier in this advent series about our mindset around security. Security and usability are frequently in opposition – the most secure computer is one which is turned off and unplugged from the network. The problem is, it’s easy to believe that something is more secure just because it’s harder to use – that is rarely true.

For example if you pop up dialogs all the time to ask users to make security decisions, they will just click “Yes” without reading and actually be less secure than if asked rarely. Our preferred interaction is to perform the requested action immediately, but make undo simple, so the common case is painless. We also provide a restore from backup feature which allows recovery from most mistakes.

As we review our systems for GDPR compliance next year, we will have pragmatic and effective security in mind.

To 2018

The advent calendar is over, Christmas is almost here, and the New Year just a week away. 2018 will be an exciting year for us.

Thank you again for reading, and for your loyal support throughout 2017. We depend upon the existence of people who are willing to pay to be the customer rather than the product. We’re not the cheapest option for your email, but we firmly believe we are the best. We love what we do, and we love the direct relationship with our customers, payment for service provided.

FastMail the company is growing strongly. We have great people, great products, great customers, and funky new t-shirts.

Cheers! See you next year :)

Team Profile: Rik

Published 23 Dec 2017 by Helen Horstmann-Allen in FastMail Blog.

This is the twenty-third post in the 2017 FastMail Advent series. The previous post was about our repsonse to the GDPR. We finish the series with a post looking forward to next year.

2017 has been a year of big changes for FastMail, team-wise. As Bron Gondwana stepped up to CEO, the role of CTO has been taken on by one of our new American colleagues, Ricardo Signes. We picked him up in 2015 when we acquired Pobox and Listbox, and he’s spent the bulk of his time since then building our newest team email product, Topicbox. Get to know Rik!

Photo of Ricardo Signes

What I work on

Historically, I have been the primary programmer on Pobox and Listbox, and I did a lot of work in the last few years building the framework of Topicbox. But nowadays, I spend most of my time coordinating the work of the development and operations teams, figuring out who’s doing what and whose work might be blocking whom, so that people aren’t sitting frustrated at their desks.

As CTO, I balance the technology requirements across different groups. Generally we don’t have people who want contradictory things, but sorting out work often requires determining invisible pre-requisites and doing that work first. It requires figuring out the way to get from here to there… And preferably after I’ve already figured out what people are likely to want next.

Figuring out what people will want next is often a natural by-product of talking to people about what they want. As we take things from the abstract to the concrete, I try to stay focused on the goals (and really understanding them!) rather than the suggested technical implementation they’ve requested. Time is often a consideration; a lot of times, just keeping in mind the next logical iteration of the solution you can get today is all the plan for the future you need.

How did you get involved with FastMail?

They bought me? I got involved with Pobox in 2005 when Dieter Pearcey heard me saying I was looking for somewhere else to hang my hat. He and I had debugged some problems earlier that year on IRC, so when he told me to apply, I did. About 8 years later, I met Bron at OSCON. We were having a beer when super-connector Paul Fenwick realized we worked at Pobox and FastMail, respectively, and asked if we were going to brawl. We did not; we ended up discussing the common problems and solutions of our load balancers and user migrator implementations. About a year after that, we started the long process of acquisition. A year after that, it happened. 16 months after that, I was the CTO.

I took a photo at the time, recording our meeting for posterity.

Bron and RJBS

What’s your favourite thing you’ve worked on this year?

In technical terms, it’s Topicbox. When building Topicbox, we treated it like a greenfield project. We didn’t reuse many of our standard components, but the technical decisions I made were based on years of using our tools and thinking about how I would do it if I did it from scratch. As many of those plans were borne out in successful technical solutions, it was really rewarding — a pleasure to build and see used.

But, more than that, I have loved organizing the technical team. It’s a really talented group of people, with many unique areas of expertise. Orchestrating all of them requires having a handle on what everyone is doing. Doing it successfully also requires I have at least a basic understanding of what everyone is working on. It is either an excuse or demand for me to be learning a little more all the time, which is great! It forces me to get off the same old path.

What’s your preferred mobile platform?

I use iOS. I don’t have really strong feelings about it. It has a bunch of things I prefer, but … I’ll use anything that has Spotify.

What other companies, people or projects inspire you?

The Rust language project is really inspirational. Like most technical projects, they’ve always striven to have a technically excellent product, but they also decided early on that they were unwilling to compromise on their community to get it. So, the community is not the toxic, or even merely odious, community that you can get in other projects with a lesser commitment to community.

Julia Evans, who is an endless prolific source of interesting and instructive material, who is always positive in attitude, is the kind of technical role model I aspire to be. She says my favorite thing, which is that computers are not magical; you can start from first principles and figure out what is going on, always.

Companies are impressive for lots of reasons, but I’m pleased when companies doing interesting work make it clear what their values are, especially when you can see it’s true. They make it clear they have a direction beyond just making money. They promote that the direction they’ve chosen has value to them. They make it easy to guess what it would be like to work there, and what kind of work and behavior would be rewarded. Netflix and Stripe are two examples that come to mind; I hope I do my part to expose a similar ethos here at FastMail.

What’s your favourite FastMail feature?

I like FastMail’s push support, because it makes clear that FastMail really is fast. It looks like a super simple feature, but the technical details are way more complicated than they should be. It’s technically interesting and you can always get Rob N to tell a good story about it!

My favorite Pobox feature is the RSS feed of spam reports, which lets you review the small amount of mail Pobox isn’t quite sure about. I like it both because RSS is something that I wish had gotten wider adoption, and because I like having it in a separate place than my email or the web (which are the two other places you can review it.)

My favorite Topicbox feature is organization-wide search! Topicbox makes it easy for team members to create new groups, which is awesome for making sure all the right people are included in a discussion. But as soon as you start having enough information that you can’t see in one screen, you want to search for it. The Topicbox search technology is based on FastMail’s, so it’s fast, thorough, and easy to refine. You find the full thread… and the conclusion. Organization-wide search is, to me, the best reason to move your organization’s email discussions to Topicbox. (And, yes, we can help you import from an archive or a even a personal mailbox!)

What’s your favourite or most used piece of technology?

My bicycle! It embodies everything I think of as technology. It lets you solve a problem that you could probably solve without it, but much more efficiently. It also rewards curiosity. You don’t need to know how it works to use it. But it’s really easy to learn how to take it apart, fix it, and make it better. Also, like most of the technology I like, I don't use it as often as I'd like.

This isn't my bike. It's a photo I took while on a trip to the Australian office. It's a sculpture of three bicycles occupying the same space!

Three bicycles

What are you listening to / watching these days?

I’m finally catching up on Song-by-Song podcasts, which discusses every Tom Waits song, one per episode. But that means I’m listening to a lot of Tom Waits again too. It’s good to listen to full albums!

We talk a lot about music at FastMail, and we’ve gotten almost everyone on Spotify. We have a bot who tracks people’s Discover Weekly playlists, looking for duplicates, and determining who has compatible (and diverse!) musical tastes. I’ve found a bunch of good music that I wouldn’t have heard before because staffers have been listening. I also know who has consistent enough taste that I know I can always hit up their weekly playlist for, say, synth pop and 80s hits (Rob Mueller!).

What do you like to do outside of work?

I do coding projects outside of work, too, though less this year than in years past. I used to manage the perl5 project for many years, but now I'm just an active member of the peanut gallery.

I watch a lot of movies. I talk a lot about the horror movies I watch because they are the funniest to discuss, but I actually watch plenty of movies across all genres.

I run a D&D game, and I’ve been playing a lot of Mario Kart on my Nintendo Switch.

What's your favourite animal?

I used to have pet guinea pigs, so they’re up there! They’re my favorite animal that I would actually consider palling around with. But I’m also a fan of any animal that is really bizarre in some way. It reminds you that evolution goes in a lot of crazy ways.

Any FM staffers you want to brag on?

Everybody’s great! If I was going to call somebody out in particular, though, it would be Bron. We had reached an inflection point in terms of scale, where we needed to rethink the way we organized our work. Bron stepped up to make that happen, and we’re all better off for it.

What are you proudest of in your work?

In my technical work, over many years, I’m proudest we’ve been able to use a slowly evolving set of patterns without finding out they were fundamentally bankrupt. With Topicbox, we were able to test that theory in the biggest way — we started from scratch using those patterns as first principles, and it worked. So that was really rewarding.

On a larger scale than that, it’s always a blast to meet people in a professional setting who have heard of FastMail or Pobox. They will be excited to talk about what they know of the company, and often tell me they think it would be an exciting and great place to work. In large part, that's because of people and culture we have, and I’m proud to have been part of making that the case!

GDPR: European Data Protection

Published 22 Dec 2017 by Bron Gondwana in FastMail Blog.

This is the twenty-second post in the 2017 FastMail Advent Calendar. The previous post was about our new monitoring infrastructure. In the next post we meet Rik, our intrepid CTO.

Some of you may already be aware of the upcoming GPDR legislation. We’ve certainly been getting support tickets and the occasional tweet asking about our plans.

General Data Protection Regulation

The GDPR is a European regulation which affects the processing and collection of data about all European residents, no matter where they are in the world, as well as data about any human physically present in Europe, regardless of residency.

In short – the GDPR affects almost everybody in the world, since Europeans are prone to traveling. It definitely affects FastMail, who sell our services worldwide, and have many customers in the EU.

The big scary part of the GDPR is the fines for non-compliance – 4% of global revenue or $20,000,000 per offense, whichever is greater. They’re not playing around.

FastMail’s products have features that make us both a data controller and a data processor under the definitions of the GDPR.

The GDPR takes force on 25 May 2018, and FastMail intends to be ready.

Australian advice

Australia already has very strong privacy laws, which we take seriously. The Office of the Australian Information Commissioner gave guidance about GDPR for Australian businesses earlier this year, which details similarities and differences between the two laws.

The good news is that we can be GDPR-compliant without a conflict of law. Sadly this isn’t always the case in international law – there exist cases where a person can have no option that does not result in them committing a crime according to somewhere in the world.

In this case, it looks like Australia will be following Europe’s lead, with new laws like the Notifiable Data Breaches scheme coming into effect next year.

Interesting questions

While most parts of the GDPR are good and we implement them already, the European right to be forgotten raises interesting questions about who owns information about a person. Fairly clearly for our FastMail product, the private mailbox of a person is their own personal electronic memory and an email you sent somebody doesn’t count as personal data that we, FastMail the company, hold about you. You shouldn’t be able to take that email back, certainly not just by asking us to do it.

On the other hand, Topicbox groups can be public. Clearly public groups archives could be abused to host spam, phishing, or other nasties. The exact same issue already exists for files published as user websites.

Published information might need to be taken down - due to terms of service violation, DMCA request, GDPR-covered privacy request, or any other legal method. The tension between maintaining an accurate immutable record and allowing permanent removal of material that should never have been published is very real.

Finally, backups contain data for a time after it’s been deleted. Shredding every copy of something is actually really tricky, and guaranteeing that every possible copy has been removed is a tradeoff as well. I have personally dealt with an account for somebody who had obtained power of attorney for his father who was no longer able to remember very well. The father’s email account at FastMail had been unused and unpaid for long enough that it had expired and the backups had been removed. It was very hard to tell this person that they had lost important family data – for somebody who had been a loyal FastMail customer for 10 years none less.

Shredding backups is not always the right choice. We now keep backups longer for lapsed accounts – those where the user had not explicitly asked us to delete anything, than for accounts where the user chooses to close the account. And yet … I've still had ex-users ask if I can dig up an old backup because they forgot to copy some important message before closing their account!

Supporting FastMail’s values

We blogged last year about our values. The GDPR’s requirements about privacy and consent to store and use data are very compatible with our values: “Your data belongs to you” and “We are good stewards of your data”.

We’re working on our support processes to make consent more explicit if we access your account to help you with an issue. As we audit our processes for GDPR next year, we will continue to focus on practical and usable methods to maintain our customers’ privacy.

Monitoring FastMail with Prometheus

Published 21 Dec 2017 by Rob N ★ in FastMail Blog.

This is the twenty-first post in the 2017 FastMail Advent Calendar. The previous post was a design spotlight on the Topicbox logo. The next post is about our response to the GDPR.

Monitoring is a critical part of any complex system. If you’re not monitoring your system properly, you can’t be sure that it’s working the way it’s supposed to, and it’s hard to figure out what’s going on when there is a problem.

For most of FastMail's life, it's been monitored by a set of scripts that run regularly on each machine, look for anything that seems wrong and if a problem is found, report it. Most of the time that's by sending an email for engineers to look into later but in extreme cases, they would send an SMS and demand immediate attention.

These tests are fairly simple. The example I usually use is about disk space. Every time the monitor scripts run, they would check the amount of space free on each mail disk. If it's more than 92% full, it would emit a warning. More than 95%, it would wake someone.

The problem with this simple test is that it doesn't really say much about why the disk is filling or what the correct remedy is.

All of these cases are important, but not all demand immediate attention, and all have different methods to resolve the problem. But we've just woken someone and told them "the disk is nearly full", and it's up to them to figure out what's going on.

As one of the lucky people that would get woken up by this, it wasn't long before I started to get a sense of what information we didn't have that could really help here.

These are hard questions to answer with a simple per-host "is anything wrong right now?" check. Importers and deliver happen on different hosts to where the mail is stored. Tracking rates of change means you need to know previous values, which means some storage for the check scripts. To improve things, we needed something new.

A central metrics store

During 2015 we started looking seriously at a product or service that could do the job for us. There were lots of possibilities, each with pros and cons. Things we particularly wanted:

We did a few experiments, and eventually settled on Prometheus, which ticks pretty much all of these boxes.

Prometheus is what's known as a time-series database, which basically means that it stores the value of different things at a given point in time. Each thing it stores is called a "metric", and has a numeric value.

Using our disk usage example again, we might store two values: the total size of a disk, and the amount of free space. So our metrics (in the Prometheus data format) might look like:

disk_size   1968480002048
disk_free   1236837498880

Now we have these raw values stored, we can use Prometheus' query language to understand more about these values. For example, we could use a simple query to get a "percent full" value:

100 - 100 * disk_free / disk_size     37.16789108382110

Because Prometheus is constantly checking and storing these values as they change, we can also do more advanced things based on the history. For example, we can use the deriv() function to find out how much the space on this disk changes each second (based on the last 15 minutes worth of values):

deriv(disk_free[15m])   -3205234.3553299494

We can also use a separate product, Grafana, to graph these values. This is a very boring one of this disk value:

Grafana disk usage graph

There's loads more things we can ask Prometheus about our metrics, and Grafana can graph pretty much everything we can query. Prometheus also comes with an alerting component, which can send emails, SMS messages and other stuff based on the results of queries. It's a fantastic piece of software and we're very happy with it.

Gathering metrics

It's all very well having a place to store all these metrics, but we still have to get them out of our systems. Prometheus' model is very simple: every few seconds, it connects to very simple web servers all over your network and requests a list of all the metrics they have to give.

The idea is that all of your internal services will have a HTTP port and be able to provide metrics about what they're doing, but that's not always possible. Some software is too simple to carry its own metrics, or does do its own metrics but presents them in some different way (often logfiles).

So, the Prometheus project and wider community have produced a huge number of "exporters", which interrogate applications or the OS to collect information about them, and then present those to Prometheus. We use a number of off-the-shelf exporters for off-the-shelf software we use (eg node_exporter for the OS, mysql_exporter for MySQL, etc), and we've written our own for a few things where off-the-shelf exporter didn't exist or didn't do what we wanted (PowerDNS, Postfix, tinydns, etc).

The most common style of exporter we have at the moment is one that monitors an application logfile, extracts information about events that occurred and presents them to Prometheus as counts. Many of our existing monitoring scripts already did that, and most of our own internal applications log lots about what they're doing, so it's been a very good transition step.

One of the most interesting applications for us, of course, is the Cyrus mail server, the place where your mail is stored, searched and accessed. When we started with Prometheus there was no good way to see inside and find out what it was doing. It does log lots about what it's doing, and we do watch those logs for problems, but there's so many things it does internally that we don't have a lot of visibility on. One of the things about gathering metrics is that you want information available when you have new questions to ask, new things you want to know. So we wanted to bring out as much information from Cyrus as we could, much more than what was currently available in the logs.

Adding Prometheus metrics to Cyrus

So I sat down with Ellie, my local Cyrus developer, and had a chat about it.

It turns out that Cyrus once had SNMP support. SNMP is an older protocol for remotely monitoring and managing things. It's still in wide use with network equipment. Cyrus still had support for it, but large parts of it didn't appear to work. Ellie had been keen to understand it for a while and either fix it or remove it, and this seemed like a great opportunity.

We spent an afternoon pooling our knowledge, me with my basic understanding of how Prometheus worked and what Cyrus is like to operate, her with her knowledge of Cyrus internals and what kind of monitoring other users were asking for, and worked out plan.

Cyrus already has a HTTP server (for CalDAV, CardDAV, etc) so getting it to serve metrics was not complicated. The toughest part is actually around Cyrus' architecture. The Cyrus model is a single process that accepts incoming connections, which it then hands off to another process for service. There's no central coordinating component for these processes; they just do their thing and use filesystem locks to make sure they don't get in each others' way when accessing mail data. That's part of how it gets its performance, so that’s great. The downside is that all these worker processes need to record metric events, so something somewhere needs to collect all the stats coming back from each process and combine it.

To do that, Ellie modified each Cyrus process to write its own stats out to a file, and then created a new Cyrus component called promstatsd which collects all of these, combines them and prepares a metrics data file ready for Prometheus to come and fetch.

As for the metrics themselves, we chose to start with a few that already had internal counters inside Cyrus: active connection counts, message delivery counts and IMAP command counts. These aren't things we currently pay a lot of attention to but they might be useful in the future (as noted above, we want to collect everything we can, just in case) but Cyrus already knows them so it makes it easier to focus on adding the metrics support. They're also easy to test, since they are incremented when simple actions are taken: login, send a message, etc.

Cyrus is written in C, which is not widely used for server development anymore, so there were no good Prometheus metrics libraries available to use. Ellie wrote a very nice little metrics implementation for Cyrus that is designed to perform well and make it easy to add new metrics without adding overhead.

Put it all together, and we can now use Prometheus and Grafana to see what's happening inside Cyrus. For example, here's a graph of the number of active IMAP connections to a single Cyrus shard:

Grafana IMAP connection graph

Now that we have the basic framework in place, we have a lot more metrics we want to add, particularly for the httpd server that will be used for JMAP, and improvements to the way the metrics are expressed to Prometheus to give us more querying options (eg, only count CalDAV requests, excluding CardDAV requests).

Adding support to Cyrus has been pretty exciting, partly because of the general excitement of being able to better understand what our servers are doing, but also because it's really interesting to keep updating this ancient project to support new ways of working.

What's next?

We won't stop! We still have a long way to go to get all of our monitoring and alerting moved into Prometheus, but every week we move one or two little things over, or add one or two little new things, often finding minor problems or quirks that we didn't know about but can now see, and thus fix. Every time we do this we smile, because we learn a little more about how our systems really work and feel a little more confident that they're doing what they're supposed to. And with everything we know about our systems all available in one central place that everyone can see and work with, we’ll be able to sleep easy at night knowing that everything is running smoothly and any problems will be quickly noticed.

Design spotlight: how the Topicbox logo came to be

Published 20 Dec 2017 by Jamie Toyama in FastMail Blog.

This is the twentieth post in the 2017 FastMail Advent series. The previous post showed how you can easily set up your iPhone with your camera. The next post is about our new monitoring infrastructure.

Many of you would have seen our new Topicbox logo, but did you ever wonder how it came to be?

Before I permanently joined the FastMail Team, I had already been working with David and Neil to help out with some design work. This started with a FastMail website refresh and led to some preliminary work for an exciting new group email product, which eventually became Topicbox.

As we were creating a product and a brand identity from scratch it gave me the freedom to really help shape the design elements and in this post I’m going to concentrate on the Topicbox logo.

Logo design – or really, any general design process – will follow a number of collaborative steps to ensure the brief can be successfully achieved:

1. Kick off meeting

A logo should be distinctive, memorable and an expression of the company it represents and every good design project starts with a kick-off meeting.

I wanted to learn as much as possible from the project team (initially Helen, Neil and David) – things like company values and culture, potential customers and competitors. By gathering up as much information as I could it would really help me to distil all of these elements into the brand identity.

recognisable brand logos from large companies

2. Discovery

During this process I took the time to further research the potential customers and competitors in the market place.

Having a better understanding of the customer – or our ideal customer – helped me to understand the types of people this logo had to engage with.

Looking at the competition was important, it allowed me to get a good understanding of what the other brands were doing, so I could then make sure my design was going to be different.

I then looked at the logo application – how and where will the logo be used? Is it going to appear in both print and digital?

Understanding any restrictions that might be applied to the design of a logo means you have already problem-solved any possible solutions to a design before it's used across its various touchpoints.

For example, a mobile app icon is kept to a fairly small area, compared to that of a business card, which will have less restrictions on things like size and operating systems.

breaking down the elements of the logo

3. Brain dump

This is where some of the initial design starts to formulate. I took all of the knowledge I had gathered up to this stage and started putting my thoughts and ideas together. From there I took these ideas – which could be a sketch or even a list of words – and started to develop these in Illustrator (design software).

I find that I can quickly get my initial thoughts down on paper, but then I start to break the core idea down using Illustrator. By taking the concepts of ‘group communication’, ‘email’, and ‘archive’ as key inspiration I got my draft designs together and presented these to Helen, David and Neil.

It’s worth noting that as this point I specifically presented these in black and white only. This is so we can focus on the logo design and not be emotionally swayed by colour.

At this stage all feedback is welcome. The feedback is important in allowing me to get a good sense of what’s working and where I need to spend more time developing a concept.

first round of topicbox logo samples

4. Refining concepts

This was probably the most time-consuming and longest part of the process. It involved me going back and forth with the team and refining the logos down to four key concepts.

During this stage I started to introduce colour and how this influenced the design of the logo. I also looked at refining the details within the logo – the curves, thickness of lines and the typography.

The process of refinement can be a challenging one! I remember thinking after a meeting with FastMail, how much further can I push this concept? However, only now when I look back on this do I understand that it was this constant need to keep pushing an idea that allowed me to get to the final execution of my design. It pays to never settle for ‘just good’.

second round of topicbox logo ideas in colour

5. Finally … a logo appears!

The Topicbox logo has been designed to express a friendly, inclusive and approachable design through the unique graphic style, typography and colours. There are a couple of key features worth noting in the final design of the Topicbox logo:

logo components: reading news + community groups + email

The final Topicbox logo

A black tshirt with Topicbox logo

Topicbox logo on a phone

Faster, easier setup on iPhones, iPads and Macs

Published 19 Dec 2017 by Neil Jenkins in FastMail Blog.

This is the nineteenth post in the 2017 FastMail Advent series. The previous post was about our usage of the modern web platform. The next post is a design spotlight on the Topicbox logo.

On Friday we launched a new feature to make it easier for our many users of Apple devices to set up their accounts with the built-in Mail, Contacts and Calendar apps.

While some people prefer the power and speed of our own app and webmail, others prefer the native integrations and offline support of the built-in apps on their iPhones, iPads and Macs. With our full support for push email, calendars and contacts, and custom tweaks for fast search, we aim to make FastMail the best service to use regardless of how you choose to read your mail.

Unfortunately, setting up your account can be tricky and error-prone. Proper support for autodiscovery of server settings is sadly missing still in most apps including Apple’s (despite the fact they literally wrote the spec…) and you have to set up mail, contacts and calendar syncing separately.

Howerver, now there is a better way thanks to a feature Apple provides called configuration profiles. A profile is a file which bundles all the settings needed for your account, and can easily be installed or removed from your phone or Mac.

Setting up your iPhone is now as easy as:

  1. Open the Settings → Password & Security screen on a PC or Mac and unlock the screen with your password (no awkward password entry on a mobile keyboard required!).
  2. Create a new app password. If you only want to set up mail and not calendars/contacts, you can choose that here.
  3. Open the Camera app on your iPhone and aim it at the QR code (2D barcode) on screen, then open the link when prompted.
  4. Tap Allow then Install to install the profile.

That’s it! Your mail, calendars and contacts are now all configured and ready to go. You’ll be up and running in no time, with nothing to type on your phone and no chance of messing up the settings. On a Mac the steps are very similar, but after creating the app password you’ll get a link to a file you just download and open to finish the setup. There’s further information in our documentation for iPhone/iPad and Mac.

We’re always looking for ways to make your email as hassle-free as possible, and we hope this saves many customers time and frustration in the future.

Using the modern web platform

Published 18 Dec 2017 by Chris Morgan in FastMail Blog.

This is the eighteenth post in the 2017 FastMail Advent Calendar. The previous post was a staff profile on Kourtnee, support agent. Next, we look at how we've made it easier to set up your iPhone, iPad or Mac.

The web platform is a steadily-evolving beast, and anyone who works on the web must pay attention to the state of things, or else be left behind. In the past couple of years, the web platform has been surging forwards with lots of nice new toys—some that make it easier for us to do our work, and some that make new features possible (such as offline support).

This post takes a look at some of the things we’ve been doing with it all this year, and some of what is to come.

The FastMail web interface is a complex app with a legacy going back quite a few years; Topicbox, on the other hand, is a smaller and newer code base, so we’ve been using it as a staging ground for various improvements throughout this past year—and it’s been going well. In the coming year we’ll be continuing to work on these improvements, and starting to apply the best parts of them to FastMail, as well as working on a few new projects.

Using new JavaScript features

After being dormant for quite a while, JavaScript language development has become active these last few years, with some syntactic/language features like ES6 classes and modules, and some library features like promises.

We’ve been progressively updating Overture, our library which is the foundation for the FastMail and Topicbox web interfaces, to use a lot of these new techniques, making our lives a bit easier. Various experimental work is still ongoing. The end result of all this is code that is easier to work with, so that we can implement new features more quickly, and avoid making various common mistakes. The ability to maintain higher code quality with various tools is also good.

We’re also steadily updating the Topicbox code base to use some more modern conventions, and FastMail will follow later next year.

Service workers and offline support

The FastMail web UI was always well engineered to be able to cope with an unreliable internet connection, but it can’t yet start without an internet connection at all—offline support has only started making it into browsers in the past couple of years. I’ve really been looking forward to adding offline support to FastMail.

So, we plan to use service workers to implement offline support in Topicbox and then FastMail; already most browsers support service workers—Firefox and Chrome have it, Edge has an experimental implementation and Safari is working on it.

One less obvious benefit of offline support is much faster app startup time: rather than needing to contact the server a few times to get the code, account details, list of mailboxes, list of emails and so forth, your browser will already have all of these things stored locally (presuming you already logged into your email).

Browser compatibility

So far I’ve been talking about new browsers and new features; what about old browsers?

You’re safe: we actively monitor which browsers and tools people are using to access our services and take care to avoid breaking things for our customers. (As an example, in our recent calendar changes we diligently tested almost all the calendar clients our customers use.)

Over the coming year we will cease to support some very old browsers; we still support IE 8+ now, but in practice we have almost no one using IE 10 or older, and it holds back development in a few places (we want to be able to use SVG, for example), so we will be dropping support for those browsers. IE 11 will still be supported for the indefinite future (roughly 5% of our desktop customers use it).

For other old browsers that aren’t Internet Explorer: we use various techniques to make sure that things keep working without special effort, despite the new features we use; in Topicbox, for example, we use Bublé to translate our new JavaScript syntax into the lowest common denominator (that takes care of language features), and core-js to polyfill all the library features that we depend on.

(Fun fact: earlier this year we replaced our small collection of manually-selected polyfills and replaced it with core-js; this actually restored support for IE 10, which Topicbox hadn’t formerly supported! But due to some flexbox, image and font rendering issues we decided not to actually support IE 10. But see the principle: polyfills do the heavy lifting of basic old-browser compatibility.)

We only load core-js if it’s necessary, so that up-to-date browsers aren’t penalised. Tech people may be interested in how we determine whether it’s needed or not, because I think we’ve come up with the most efficient check around:

if ( !Object.values ) {
    // Fetch and load core-js before continuing

Object.values is pretty much the most recently supported feature in core-js and browsers, so if the browser supports Object.values, it supports everything we need. (We checked this quite carefully.) This makes the overhead practically nil for modern browsers, while retaining support for older browsers at a very low cost. The browsers that need core-js under this test are: IE, Safari ≤ 10.0, iOS ≤ 10.2, Chrome ≤ 53, Firefox ≤ 46 and Edge ≤ 13. No current browser releases (excluding IE 11) need core-js.

Using new CSS features

As part of this goal of migrating things to the standard web platform, we’re also replacing Less with PostCSS, assisted by plugins like cssnext, so that what we’re writing is as close to normal CSS as possible, and we can slowly over time remove more and more magic build process, leaving just the code.

The most important new CSS feature that we’re using is custom properties, which allow you to write code like this:

:root {
    --color-brand-teal: #009688;
    --button-background: var(--color-brand-teal);

.v-Button {
    background-color: var(--button-background);

… and then, putting the variable declarations in a separate theme file, have the button style be equivalent to:

.v-Button {
    background-color: #009688;

FastMail’s existing theme system is heavy: it compiles an entirely new version of all the styles for each module (base, mail, calendar, &c.) with the theme’s variable declarations, and then reloads all its stylesheets when you change theme.

For now, we’re still compiling the variables out with cssnext in Topicbox, because IE doesn’t support them; but we think it’s feasible to leave the variables in, using native support in modern browsers, and performing the transformation in the browser via postcss-cssnext for old browsers. That way we achieve our goal of maintaining compatibility for old browsers, while still using fancy new features.

A few of our customers like to customise their themes, and this will help them, too; if you don’t like a particular shade in our dark theme, for example, you can override that specific colour in one easy and clearly-labelled place, and it’ll be updated in all the places that use that colour.

Looking to the future of the web

The future is bright! The web as a platform continues to improve, and FastMail and Topicbox continue to improve along with it. In this coming year we plan to deliver some major feature improvements, and we hope you’re looking forward to it as much as we are!

Review of Stepping Off from Limina.

Published 17 Dec 2017 by Tom Wilson in thomas m wilson.

This book review is a good summary of Stepping Off  – thanks to Amy Budrikis.  Read the review at Limina.  

Review of Stepping Off from Limina.

Published 17 Dec 2017 by Tom Wilson in thomas m wilson.

This book review is a good summary of Stepping Off  – thanks to Amy Budrikis.  Read the review at Limina.  

How would you change Archivematica's Format Policy Registry?

Published 15 Dec 2017 by Jenny Mitcham in Digital Archiving at the University of York.

A train trip through snowy Shropshire to get to Aberystwyth
This week the UK Archivematica user group fought through the snow and braved the winds and driving rain to meet at the National Library of Wales in Aberystwyth.

This was the first time the group had visited Wales and we celebrated with a night out at a lovely restaurant on the evening before our meeting. Our visit also coincided with the National Library cafe’s Christmas menu so we were treated to a generous Christmas lunch (and crackers) at lunch time. Thanks NLW!

As usual the meeting covered an interesting range of projects and perspectives from Archivematica users in the UK and beyond. As usual there was too much to talk about and not nearly enough time. Fortunately this took my mind off the fact I had damp feet for most of the day.

This post focuses on a discussion we had about Archivematica's Format Policy Registry or FPR. The FPR in Archivematica is a fairly complex beast, but a crucial tool for the 'Preservation Planning' step in digital archiving. It is essentially a database which allows users to define policies for handling different file formats (including the actions, tools and settings to apply to specific file type for the purposes of preservation or access). The FPR comes ready populated with a set of rules based on agreed best practice in the sector, but institutions are free to change these and add new tools and rules to meet their own requirements.

Jake Henry from the National Library of Wales kicked off the discussion by telling us about some work they had done to make the thumbnail generation for pdf files more useful. Instead of supplying a generic thumbnail image for all pdfs they wanted the thumbnail to actually represent the file in question. They made changes to the FPR to change the pdf thumbnail generation to use GhostScript.

NLW liked the fact that Archivematica converted pdf files to pdf/a but also wanted that same normalisation pathway to apply to existing pdf/a files. Just because a pdf/a file is already in a preservation file format it doesn’t mean it is a valid file. By also putting pdf/a files through a normalisation step they had more confidence that they were creating and preserving pdf/a files with some consistency.

Sea view from our meeting room!
Some institutions had not had any time to look in any detail at the default FPR rules. It was mentioned that there was trust in how the rules had been set up by Artefactual and that people didn’t feel expert enough to override these rules. The interface to the FPR within Archivematica itself is also not totally intuative and requires quite a bit of time to understand. It was mentioned that adding a tool and a new rule for a specific file format in Archivematica is quite an complex task and not for the faint hearted...!

Discussion also touched on the subject of those files that are not identified. A file needs to be identified before a FPR rule can be set up for it. Ensuring files are identified in the first instance was seen to be a crucial step. Even once a format makes its way into PRONOM (TNA’s database of file formats) Artefactual Systems have to carry out extra work to get Archivematica to pick up that new PUID.

Unfortunately normalisation tools do not exist for all files and in many cases you may just have to accept that a file will stay in the format in which it was received. For example a Microsoft Word document (.doc) may not be an ideal preservation format but in the absence of open source command line migration tools we may just have to accept the level of risk associated with this format.

Moving on from this, we also discussed manual normalisations. This approach may be too resource intensive for many (particularly those of us who are implementing automated workflows) but others would see this as an essential part of the digital preservation process. I gave the example of the WordStar files I have been working with this year. These files are already obsolete and though there are other ways of viewing them, I plan to migrate them to a format more suitable for preservation and access. This would need to be carried out outside of Archivematica using the manual normalisation workflow. I haven’t tried this yet but would very much like to test it out in the future.

I shared some other examples that I'd gathered outside the meeting. Kirsty Chatwin-Lee from the University of Edinburgh had a proactive approach to handling the FPR on a collection by collection and PUID by PUID basis. She checks all of the FPR rules for the PUIDs she is working with as she transfers a collection of digital objects into Archivematica and ensures she is happy before proceding with the normalisation step.

Back in October I'd tweeted to the wider Archivematica community to find out what people do with the FPR and had a few additional examples to share. For example, using Unoconv to convert office documents and creating PDF access versions of Microsoft Word documents. We also looked at some more detailed preservation planning documentation that Robert Gillesse from the International Institute of Social History had shared with the group.

We had a discussion about the benefits (or not) of normalising a compressed file (such as a JPEG) to an uncompressed format (such as TIFF). I had already mentioned in my presentation earlier that this default migration rule was turning 5GB of JPEG images into 80GB of TIFFs - and this is without improving the quality or the amount of information contained within the image. The same situation would apply to compressed audio and video which would increase even more in size when converted to an uncompressed format.

If storage space is at a premium (or if you are running this as a service and charging for storage space used) this could be seen as a big problem. We discussed the reasons for and against leaving this rule in the FPR. It is true that we may have more confidence in the longevity of TIFFs and see them as more robust in the face of corruption, but if we are doing digital preservation properly (checking checksums, keeping multiple copies etc) shouldn't corruption be easily spotted and fixed?

Another reason we may migrate or normalise files is to restrict the file formats we are preserving to a limited set of known formats in the hope that this will lead to less headaches in the future. This would be a reason to keep on converting all those JPEGs to TIFFs.

The FPR is there to be changed and being that not all organisations have exactly the same requirements it is not surprising that we are starting to tweak it here and there – if we don’t understand it, don’t look at it and don’t consider changing it perhaps we aren’t really doing our jobs properly.

However there was also a strong feeling in the room that we shouldn’t all be re-inventing the wheel. It is incredibly useful to hear what others have done with the FPR and the rationale behind their decisions.

Hopefully it is helpful to capture this discussion in a blog post, but this isn’t a sustainable way to communicate FPR changes for the longer term. There was a strong feeling in the room that we need a better way of communicating with each other around our preservation planning - the decisions we have made and the reasons for those decisions. This feeling was echoed by Kari Smith (MIT Libraries) and Nick Krabbenhoeft (New York Public Library) who joined us remotely to talk about the OSSArcFlow project - so this is clearly an international problem! This is something that Jisc are considering as part of their Research Data Shared Service project so it will be interesting to see how this might develop in the future.

Thanks to the UK Archivematica group meeting attendees for contributing to the discussion and informing this blog post.

Cakes, quizzes, blogs and advocacy

Published 4 Dec 2017 by Jenny Mitcham in Digital Archiving at the University of York.

Last Thursday was International Digital Preservation Day and I think I needed the weekend to recover.

It was pretty intense...

...but also pretty amazing!

Amazing to see what a fabulous international community there is out there working on the same sorts of problems as me!

Amazing to see quite what a lot of noise we can make if we all talk at once!

Amazing to see such a huge amount of advocacy and awareness raising going on in such a small space of time!

International Digital Preservation Day was crazy but now I have had a bit more time to reflect, catch up...and of course read a selection of the many blog posts and tweets that were posted.

So here are some of my selected highlights:


Of course the highlights have to include the cakes and biscuits including those produced by Rachel MacGregor and Sharon McMeekin. Turning the problems that we face into something edible helps does seem to make our challenges easier to digest!

Quizzes and puzzles

A few quizzes and puzzles were posed on the day via social media - a great way to engage the wider world and have a bit of fun in the process.

There was a great quiz from the Parliamentary Archives (the answers are now available here) and a digital preservation pop quiz from Ed Pinsent of CoSector which started here. Also for those hexadecimal geeks out there, a puzzle from the DP0C Fellows at Oxford and Cambridge which came just at the point that I was firing up a hexadecimal viewer as it happens!

In a blog post called Name that item in...? Kirsty Chatwin-Lee at Edinburgh University encourages the digital preservation community to help her to identify a mysterious large metal disk found in their early computing collections. Follow the link to the blog to see a picture - I'm sure someone out there can help!

Announcements and releases

There were lots of big announcements on the day too. IDPD just kept on giving!

Of course the 'Bit List' (a list of digitally endangered species) was announced and I was able to watch this live. Kevin Ashley from the Digital Curation Coalition discusses this in a blog post. It was interesting to finally see what was on the list (and then think further about how we can use this for further advocacy and awareness raising).

I celebrated this fact with some Fake News but to be fair, William Kilbride had already been on the BBC World Service the previous evening talking about just this so it wasn't too far from the truth!

New versions of JHOVE and VeraPDF were released as well as a new PRONOM release.  A digital preservation policy for Wales was announced and a new course on file migration was launched by CoSector at the University of London. Two new members also joined the Digital Preservation Coalition - and what a great day to join!


Some institutions did a roadshow or a pop up museum in order to spread the message about digital preservation more widely. This included the revival of the 'fish screensaver' at Trinity College Dublin and a pop up computer museum at the British Geological Survey.

Digital Preservation at Oxford and Cambridge blogged about their portable digital preservation roadshow kit. I for one found this a particularly helpful resource - perhaps I will manage to do something similar myself next IDPD!

A day in the life

Several institutions chose to mark the occasion by blogging or tweeting about the details of their day. This gives an insight into what we DP folks actually do all day and can be really useful being that the processes behind digital preservation work are often less tangible and understandable than those used for physical archives!

I particularly enjoyed the nostalgia of following ex colleagues at the Archaeology Data Service for the day (including references to those much loved checklists!) and hearing from  Artefactual Systems about the testing, breaking and fixing of Archivematica that was going on behind the scenes.

The Danish National Archives blogged about 'a day in the life' and I was particularly interested to hear about the life-cycle perspective they have as new software is introduced, assessed and approved.

Exploring specific problems and challenges

Plans are my reality from Yvonne Tunnat of the ZBW Leibniz Information Centre for Economics was of particular interest to me as it demonstrates just how hard the preservation tasks can be. I like it when people are upfront and honest about the limitations of the tools or the imperfections of the processes they are using. We all need to share more of this!

In Sustaining the software that preserves access to web archives, Andy Jackson from the British Library tells the story of an attempt to maintain a community of practice around open source software over time and shares some of the lessons learned - essential reading for any of us that care about collaborating to sustain open source.

Kirsty Chatwin-Lee from Edinburgh University invites us to head back to 1985 with her as she describes their Kryoflux-athon challenge for the day. What a fabulous way to spend the day!

Disaster stories

Digital Preservation Day wouldn't be Digital Preservation Day without a few disaster stories too! Despite our desire to move away beyond the 'digital dark age' narrative, it is often helpful to refer to worse case scenarios when advocating for digital preservation.

Cees Hof from DANS in the Netherlands talks about the loss of digital data related to rare or threatened species in The threat of double extinction, Sarah Mason from Oxford University uses the recent example of the shutdown of DCist to discuss institutional risk, José Borbinha from Lisbon University, Portugal talks about his own experiences of digital preservation disaster and Neil Beagrie from Charles Beagrie Ltd highlights the costs of inaction.

The bigger picture

Other blogs looked at the bigger picture

Preservation as a present by Barbara Sierman from the National Library of the Netherlands is a forward thinking piece about how we could communicate and plan better in order to move forward.

Shira Peltzman from the University of California, Los Angeles tries to understand some of the results of the 2017 NDSA Staffing Survey in It's difficult to solve a problem if you don't know what's wrong.

David Minor from the University of San Diego Library, provides his thoughts on What we’ve done well, and some things we still need to figure out.

I enjoyed reading a post from Euan Cochrane from Yale University Library on The Emergence of “Digital Patinas”. A really interesting piece... and who doesn't like to be reminded of the friendly and helpful Word 97 paperclip?

In Towards a philosophy of digital preservation, Stacey Erdman from Beloit College, Wisconsin USA asks whether archivists are born or made and discusses her own 'archivist "gene"'.

So much going on and there were so many other excellent contributions that I missed.

I'll end with a tweet from Euan Cochrane which I thought nicely summed up what International Digital Preservation Day is all about and of course the day was also concluded by William Kilbride of the DPC with a suitably inspirational blog post.

Congratulations to the Digital Preservation Coalition for organising the day and to the whole digital preservation community for making such a lot of noise!

Wikidata Map November 2017

Published 3 Dec 2017 by addshore in Addshore.

It has only been 4 months since my last Wikidata map update post, but the difference on the map in these 4 months is much greater than the diff shown in my last post covering 9 months. The whole map is covered with pink (additions to the map). The main areas include Norway, Germany, Malaysia, South Korea, Vietnam and New Zealand to name just a few.

As with previous posts varying sizes of the images generated can be found on Wikimedia Commons along with the diff image:

July to November in numbers

In the last 4 months (roughly speaking):

All of these numbers were roughly pulled out of graphs by eye. The graphs can be seen below:

Wikibase docker images

Published 3 Dec 2017 by addshore in Addshore.

This is a belated post about the Wikibase docker images that I recently created for the Wikidata 5th birthday. You can find the various images on docker hub and matching Dockerfiles on github. These images combined allow you to quickly create docker containers for Wikibase backed by MySQL and with a SPARQL query service running alongside updating live from the Wikibase install.

A setup was demoed at the first Wikidatacon event in Berlin on the 29th of October 2017 and can be seen at roughly 41:10 in the demo of presents video which can be seen below.

The images

The ‘wikibase‘ image is based on the new official mediawiki image hosted on the docker store. The only current version, which is also the version demoed is for MediaWiki 1.29. This image contains MediaWiki running on PHP 7.1 served by apache. Right now the image does some sneaky auto installation of the MediaWiki database tables which might be disappearing in the future to make the image more generic.

The ‘wdqs‘ image is based on the official openjdk image hosted on the docker store. This image also only has one version, the current latest version of the Wikidata Query Service which is downloaded from maven. This image can be used to run the blazegraph service as well as run an updater that reads from the recent changes feed of a wikibase install and adds the new data to blazegraph.

The ‘wdqs-frontend‘ image hosts the pretty UI for the query service served by nginx. This includes auto completion and pretty visualizations. There is currently an issue which means the image will always serve examples for Wikidata which will likely not work on your custom install.

The ‘wdqs-proxy‘ image hosts an nginx proxy that restricts external access to the wdqs service meaning it is READONLY and also has a time limit for queries (not currently configurable). This is very important as if the wdqs image is exposed directly to the world then people can also write to your blazegraph store.

You’ll also need to have some mysql server setup for wikibase to use, you can use the default mysql or mariadb images for this, this is also covered in the example below.

All of the wdqs images should probably be renamed as they are not specific to Wikidata (which is where the wd comes from), but right now the underlying repos and packages have the wd prefix and not a wb prefix (for Wikibase) so we will stick to them.

Compose example

The below example configures volumes for all locations with data that should / could persist. Wikibase is exposed on port 8181 with the query service UI on 8282 and the queryservice itself (behind the proxy) on 8989.

Each service has a network alias defined (that probably isn’t needed in most setups), but while running on WMCS it was required to get around some bad name resolving.

version: '3'

    image: wikibase/wikibase
    restart: always
      - mysql
     - "8181:80"
      - mediawiki-images-data:/var/www/html/images
    - mysql
         - wikibase.svc
    image: mariadb
    restart: always
      - mediawiki-mysql-data:/var/lib/mysql
      MYSQL_DATABASE: 'my_wiki'
      MYSQL_USER: 'wikiuser'
      MYSQL_PASSWORD: 'sqlpass'
         - mysql.svc
    image: wikibase/wdqs-frontend
    restart: always
     - "8282:80"
    - wdqs-proxy
         - wdqs-frontend.svc
    image: wikibase/wdqs
    restart: always
      context: ./wdqs/0.2.5
      dockerfile: Dockerfile
      - query-service-data:/wdqs/data
    command: /
         - wdqs.svc
    image: wikibase/wdqs-proxy
    restart: always
      - PROXY_PASS_HOST=wdqs.svc:9999
     - "8989:80"
    - wdqs
         - wdqs-proxy.svc
    image: wikibase/wdqs
    restart: always
    command: /
    - wdqs
    - wikibase
         - wdqs-updater.svc



I’ll vaugly keep this section up to date with Qs & As, but if you don’t find you answer here, leave a comment, send an email or file a phabricator ticket.

Can I use these images in production?

I wouldn’t really recommend running any of these in ‘production’ yet as they are new and not well tested. Various things such as upgrade for the query service and upgrades for mediawiki / wikibase are also not yet documented very well.

Can I import data into these images from an existing wikibase / wikidata? (T180216)

In theory, although this is not documented. You’ll have to import everything using an XML dump of the existing mediawiki install, the configuration will also have to match on both installs. When importing using an XML dump the query service will not be updated automatically, and you will likely have to read the manual.

Where was the script that you ran in the demo video?

There is a copy in the github repo called, but I can’t guarantee it works in all situations! It was specifically made for a WMCS debian jessie VM.


What shall I do for International Digital Preservation Day?

Published 30 Nov 2017 by Jenny Mitcham in Digital Archiving at the University of York.

I have been thinking about this question for a few months now and have only recently come up with a solution.

I wanted to do something big on International Digital Preservation Day. Unfortunately other priorities have limited the amount of time available and I am doing something a bit more low key. To take a positive from a negative I would like to suggest that as with digital preservation more generally, it is better to just do something rather than wait for the perfect solution to come along!

I am sometimes aware that I spend a lot of time in my own echo chamber - for example talking on Twitter and through this blog to other folks who also work in digital preservation. Though this is undoubtedly a useful two-way conversation, for International Digital Preservation Day I wanted to target some new audiences.

So instead of blogging here (yes I know I am blogging here too) I have blogged on the Borthwick Institute for Archives blog.

The audience for the Borthwick blog is a bit different to my usual readership. It is more likely to be read by users of our services at the Borthwick Institute and those who donate or deposit with us, perhaps also by staff working in other archives in the UK and beyond. Perfect for what I had planned.

In response to the tagline of International Digital Preservation Day ‘Bits Decay: Do Something Today’ I wanted to encourage as many people as possible to ‘Do Something’. This shouldn’t be just limited to us digital preservation folks, but to anyone anywhere who uses a computer to create or manage data.

This is why I decided to focus on Personal Digital Archiving. The blog post is called “Save your digital stuff!” (credit to the DPC Technology Watch Report on Personal Digital Archiving for this inspiring title - it was noted that at a briefing day hosted by the Digital Preservation Coalition (DPC) in April 2015, one of the speakers suggested that the term ‘personal digital archiving’ be replaced by the more urgent exhortation, ‘Save your digital stuff!’).

The blog post aimed to highlight the fragility of digital resources and then give a few tips on how to protect them. Nothing too complicated or technical, but hopefully just enough to raise awareness and perhaps encourage engagement. Not wishing to replicate all the great work that has already been done on Personal Digital Archiving, by the Library of Congress, the Paradigm project and others I decided to focus on just a few simple pieces of advice and then link out to other resources.

At the end of the post I encourage people to share information about any actions they have taken to protect their own digital legacies (of course using the #IDPD17 hashtag). If I inspire just one person to take action I'll consider it a win!

I'm also doing a 'Digital Preservation Takeover' of the Borthwick twitter account @UoYBorthwick. I lined up a series of 'fascinating facts' about the digital archives we hold here at the Borthwick and tweeted them over the course of the day.

OK - admittedly they won't be fascinating to everyone, but if nothing else it helps us to move further away from the notion that an archive is where you go to look at very old documents!

...and of course I now have a whole year to plan for International Digital Preservation Day 2018 so perhaps I'll be able to do something bigger and better?! I'm certainly feeling inspired by the range of activities going on across the globe today.

Preserving Google Drive: What about Google Sheets?

Published 29 Nov 2017 by Jenny Mitcham in Digital Archiving at the University of York.

There was lots of interest in a blog post earlier this year about preserving Google Docs.

Often the issues we grapple with in the field of digital preservation are not what you'd call 'solved problems' and that is what makes them so interesting. I always like to hear how others are approaching these same challenges so it is great to see so many comments on the blog itself and via Twitter.

This time I'm turning my focus to the related issue of Google Sheets. This is the native spreadsheet application for Google Drive.


Again, this is an application that is widely used at the University of York in a variety of different contexts, including for academic research data. We need to think about how we might preserve data created in Google Sheets for the longer term.

How hard can it be?

Quite hard actually - see my earlier post!

Exporting from Google Drive

For Google Sheets I followed a similar methodology to Google Docs. Taking a couple of sample spreadsheets and downloading them in the formats that Google provides, then examining these exported versions to assess how well specific features of the spreadsheet were retained.

I used the File...Download as... menu in Google Sheets to test out the available export formats

The two spreadsheets I worked with were as follows:

Here is a summary of my findings:

Microsoft Excel - xlsx

I had high hopes for the xlsx export option - however, on opening the exported xlsx version of my flexisheet I was immediately faced with an error message telling me that the file contained unreadable content and asking whether I wanted to recover the contents.

This doesn't look encouraging...

Clicking 'Yes' on this dialogue box then allows the sheet to open and another message appears telling you what has been repaired. In this case it tells me that a formula has been removed.

Excel can open the file if it removes the formula

This is not ideal if the formula is considered to be worthy of preservation.

So clearly we already know that this isn't going to be a perfect copy of the Google sheet.

This version of my flexisheet looks pretty messed up. The dates and values look OK, but none of the calculated values are there - they are all replaced with "#VALUE".

The colours on the original flexisheet are important as they flag up problems and issues with the data entered. These however are not fully retained - for example, weekends are largely (but not consistently) marked as red and in the original file they are green (because it is assumed that I am not actually meant to be working weekends).

The XLSX export does however give a better representation of the more simple menu choices Google sheet. The data is accurate, and comments are present in a partial way. Unfortunately though, replies to comments are not displayed and the comments are not associated with a date or time.

Open Document Format - ods

I tried opening the ODS version of the flexisheet in LibreOffice on a Macbook. There were no error messages (which was nice) but the sheet was a bit of a mess. There were similar issues to those that I encountered in the Excel export though it wasn't identical. The colours were certainly applied differently, neither entirely accurate to the original.

If I actually tried to use the sheet to enter more data in, the formula do not work - they do not calculate anything, though it does appear that the formula itself appears to be retained. Any values that are calculated on the original sheet are not present.

Comments are retained (and replies to comments) but no date or time appears to be associated with them (note that the data may be there but just not displaying in LibreOffice).

I also tried opening the ODS file in Microsoft Office. On opening it the same error message was displayed to the one originally encountered in the XLSX version described above and this was followed by notification that “Excel completed file level validation and repair. Some parts of this workbook may have been repaired or discarded.” Unlike the XLSX file there didn't appear to be any additional information available about exactly what had been repaired or discarded - this didn't exactly fill me with confidence!

PDF document - pdf

When downloading a spreadsheet as a PDF you are presented with a few choices - for example:
  • Should the export include all sheets, just the current sheet or current selection (note that current sheet is the default response)
  • Should the export include the document title?
  • Should the export include sheet names?
To make the export as thorough as possible I chose to export all sheets and include document title and sheet names.

As you might expect this was a good representation of the values on the spreadsheet - a digital print if you like - but all functionality and interactivity was lost. In order to re-use the data, it would need to be copied and pasted or re-typed back into a spreadsheet application.

Note that comments within the sheet were not retained and also there was no option to export sheets that were hidden.

Web page - html

This gave an accurate representation of the values on the spreadsheet, but, similar to the PDF version, not in a way that really encourages reuse. Formula were not retained and the resulting copy is just a static snapshot.

Interestingly, the comments in the menu choices example weren't retained. This surprised me because when using the html export option for Google documents one of the noted benefits was that comments were retained. Seems to be a lack of consistency here.

Another thing that surprised me about this version of the flexisheet was that it included hidden sheets (I hadn't until this point realised that there were hidden sheets!). I later discovered that the XLSX and ODS also retained the hidden sheets ...but they were (of course) hidden so I didn't immediately notice them! 

Tab delimited and comma separated values - tsv and csv

It is made clear on export that only the current sheet is exported so if using this as an export strategy you would need to ensure you exported each individual sheet one by one.

The tab delimited export of the flexisheet surprised me. In order to look at the data properly I tried importing it into MS Excel. It came up with a circular reference warning which surprised me - were some of the dynamic properties of the sheets being somehow retained (all be it in a way that was broken)?

A circular reference warning when opening the tab delimited file in Microsoft Excel

Both of these formats did a reasonable job of capturing the simple menu choices data (though note that the comments were not retained) but neither did an acceptable job of representing the complex data within the flexisheet (given that the more complex elements such as formulas and colours were not retained).

What about the metadata?

I won't go into detail again about the other features of a Google Sheet that won't be saved with these export options - for example information about who created it and when and the complete revision history that is available through Google Drive - this is covered in a previous post. Given my findings when I interviewed a researcher here at the University of York about their use of Google Sheets, the inability of the export options to capture the version history will be seen as problematic for some use cases.

What is the best export format for Google Sheets?

The short answer is 'it depends'.

The export options available all have pros and cons and as ever, the most suitable one will very much depend on the nature of the original file and the properties that you consider to be most worthy of preservation.

  • If for example the inclusion of comments is an essential requirement, XLSX or ODS will be the only formats that retain them (with varying degrees of success). 
  • If you just want a static snapshot of the data in its final form, PDF will do a good job (you must specify that all sheets are saved), but note that if you want to include hidden sheets, HTML may be a better option. 
  • If the data is required in a usable form (including a record of the formula used) you will need to try XLSX or ODS but note that calculated values present in the original sheet may be missing. Similar but not identical results were noted with XLSX and ODS so it would be worth trying them both and seeing if either is suitable for the data in question.

It should be possible to export an acceptable version of the data for a simple Google Sheet but for a complex dataset it will be difficult to find an export option that adequately retains all features.

Exporting Google Sheets seems even more problematic and variable than Google Documents and for a sheet as complex as my flexisheet it appears that there is no suitable option that retains the functionality of the sheet as well as the content.

So, here's hoping that native Google Drive files appear on the list of World's Endangered Digital Species...due to be released on International Digital Preservation Day! We will have to wait until tomorrow to find out...

A disclaimer: I carried out the best part of this work about 6 months ago but have only just got around to publishing it. Since I originally carried out the exports and noted my findings, things may have changed!

Server failures in october and november 2017

Published 28 Nov 2017 by Pierrick Le Gall in The Blog.

The huge downtime at OVH that occurred on November 9th 2017 was quite like an earthquake for the European web. Of course was impacted. But before that, we lived the server failure of October 7th and another one on October 14th. Let’s describe and explain what happened.

Photo by Johannes Plenio on Unsplash

Photo by Johannes Plenio on Unsplash

A) October 7th, the first server failure

On October 7th 2017, during saturday evening, our “reverse-proxy” server, the one through which all web traffic goes, crashed. OVH, our technical host, has identified a problem on the motherboard and replaced it. Web traffic was routed to the spare server during the short downtime. A server failure without real gravity, without loss of data, but which announced the start of a painful series of technical problems.

B) October 14th, a more serious server failure

A week later, on October 14th, the very same “reverse-proxy” server saw his load go into such high levels it was unable to deliver web pages… Web traffic is again switched to the spare server, in read-only mode for accounts hosted on this server. About 10 hours of investigation later, we were still not able to understand the origin of the problem. We have to decide to switch the spare server to write mode. This decision was difficult to take because it meant losing data produced between the last backup (1am) and the switch to spare server (about 8am). In other words, for the accounts hosted on this server, the photos added during the night simply “disappeared” from their Piwigo.

This is the first time in the history of that we switch a spare server to write mode. Unfortunately, another problem has happened, related to the first one. To explain this problem, it is necessary to understand how servers infrastructure works.

On the infrastructure, servers work in pairs: a main server and its spare server. There are currently 4 pairs in production. The main server takes care of the “live operations”, while the spare server is synchronized with its main server every night and receives the web traffic in read-only during downtimes.

In the usual way, spare servers only allow read operations, ie you can visit the albums or view the photos, but not enter the administration or add photos.

One of the server pairs is what we call the “reverse-proxy”: all the web traffic of * goes through this server and according to the piwigo concerned, the traffic goes to one or the other pair. Normally the reverse-proxy is configured to point to the main servers, not spare servers.

When a problem occurs on one of the main servers, we switch the traffic to its spare server. If the reverse-proxy server is concerned, we switch the IP address Fail-Over (IPFO): a mechanism that we manage on our OVH administration pannel. For other servers, we change the reverse-proxy configuration.

That’s enough for infrastructure details… let’s go back to October 14th: so we switched the IPFO to use the spare reverse-proxy server. Unfortunately, we met 2 problems in cascade:

  1. the spare reverse-proxy server, for one of the server pairs, pointed to the spare server
  2. this very spare server was configured in write mode instead of read-only

Why such an unexpected configuration?

Because we sometimes use the spare infrastructure to do real-life tests. In this case, these were IPV6 tests.

What impact for users?

During the many hours when the web traffic went through the spare reverse-proxy server, accounts hosted on the faulty server returned to the state of the previous night where photos added during night & morning had apparently disappeared but they were able to keep adding photos. This state did not trigger any specific alert : the situation seemed “normal” for the users concerned and for our monitor system. When the problem was detected, we changed the reverse proxy configuration to point back to the main server. Consequence: all the photos added during the downtime apparently disappeared.

What actions have been taken after October 14th?

1) Checks on reverse-proxy configuration

A new script was pushed on production. It checks very often that reverse-proxy is configured to send web traffic on main servers only.

2) Checks on write Vs read-only mode

Another script was pushed to production. This one checks main servers are configured in write mode and spare severs are in read-only mode.

3) Isolate third-party web applications

The “non-vital” web applications, on which we have less expertise, were switched to a third-party server dedicated to this use: 2 WordPress blogs, wiki, forum and piwik (analytics for visits). Indeed, one of the possibilities for the server failure, is that an application entered the 4th dimension or was under attack. Moving these applications into an “isolated” server helps to limit the impact of any future issue.

4) New backup system

The decision to switch a spare server to write mode, ie turn it into a main server, is a hard to take. Indeed it means giving up any hope to return to the main server. This decision is difficult because it involves accepting a loss of data.

To make this decision simpler, two measures have been taken: first to define a time threshold after which we apply the switch. In our case, if the failure lasts more than 2 hours, we will switch. Then backups must be more frequent than once a day: if the backups were only 1 or 2 hours old, the decision would have been much easier!

In addition to the daily backup, we have added a new “rolling backups” system: every 15 minutes, the script analyzes each Piwigo on specific criteria (new/modified/deleted photos/users/albums/groups…). If anything has changed since the last backup, the script backs up the Piwigo (files + database) with a synchronization on the spare server.

C) What about the giant downtime on OVH network, on October 9th and 10th ?

Being hosted at OVH, especially in the datacenter of Strasbourg (France, Europe), the downtime has greatly impacted our own infrastructure. First because our main reverse-proxy server is in Strasbourg. The datacenter failure put completely out of order during the morning of November 9th (Central Europe time). Then because we could not switch the IP Fail Over. Or rather, OVH allowed us to do it, but instead of requiring ~60 seconds, it took ~10 hours! Hours when the accounts hosted on the reverse-proxy server were in read-only.

Unlike the October 14th situation, we could not make the decision to switch the spare server in write mode because an IPFO switch request was in progress, and we had no idea how long it would take OVH to apply the action.

The infrastructure has returned to its normal state on November 10th at 14:46, Paris time (France).

OVH has just provided compensation for these failures. We were waiting for it to publish this blog post. The compensation is not much, compared to the actual damage, but we will fully transfer this compensation to our customers. After very high level calculations, 3 days of time credits were added to each account. It’s a small commercial gesture but we think we have to reverse it to you as a symbol!

We are sorry for these inconveniences. As you read in this blog post, we’ve improved our methods to mitigate risk in the future and reduce the impact of an irreversible server failure.

My best books of 2017

Published 25 Nov 2017 by Tom Wilson in thomas m wilson.

My best books of 2017… Deeply insightful works from Yale University Press on geopolitics today, a history of consumerism in the West, a wave making read on the Anthropocene as a new era, a powerful explanation of the nature/nurture question for human identity by a very funny Californian, and a charming meander through the English […]

My best books of 2017

Published 25 Nov 2017 by Tom Wilson in thomas m wilson.

My best books of 2017… Deeply insightful works from Yale University Press on geopolitics today, a history of consumerism in the West, a wave making read on the Anthropocene as a new era, a powerful explanation of the nature/nurture question for human identity by a very funny Californian, and a charming meander through the English […]

How do I promote a user automatically in Mediawiki and create a log of those promotions?

Published 24 Nov 2017 by sau226 in Newest questions tagged mediawiki - Webmasters Stack Exchange.

I control a Mediawiki site. Here you can see users being automatically updated and added into the extended confirmed user group.

If I have a group called "util" where I just want to add relevant code to enable autopromotion with a log entry like that, make an edit and get promoted automatically into the group before removing the bit of code would it be possible? Also what code would I have to use to gain a level of access like that?

Is it possible to find broken link anchors in MediaWiki?

Published 24 Nov 2017 by Lyubomyr Shaydariv in Newest questions tagged mediawiki - Webmasters Stack Exchange.

Probably a simple question answered million times, but I can't find an answer. MediaWiki can track missing pages and report those with Special:WantedPages. I'm not sure if it's possible, but can MediaWiki report broken anchors? Say, I have the Foo page that refers the Bar page like this: [[Bar#Name]]. Let's assume the Bar page lacks this section therefore the Name section does not exist there, but Special:WantedPages won't report this link as broken because the Bar page exists. Is there any way to find all broken anchors? Thanks in advance!


Published 16 Nov 2017 by timbaker in Tim Baker.

I recently made my public slam poetry debut at the Men of Letters event in Brisbane in the salubrious surrounds of one of Australia’s most iconic live music venues, the Zoo, in Fortitude Valley. Men of Letters is a spin off of the hugely successful...

Your Fun and Informative Guide to Consuming “Oil, Love & Oxygen”

Published 16 Nov 2017 by Dave Robertson in Dave Robertson.

The Paradox of Choice says that too many options can demotivate people, so here’s a short guide to the options for getting your ears on “Oil, Love & Oxygen”.

For the personal touch you can always get CDs at our shows. They come with a lush booklet of lyrics and credits, and the enchanting artwork of Frans Bisschops. Discounted digital download codes are also available for Bandcamp…

Bandcamp is a one-stop online shop for your album consumption needs. You can get a digital download in your choice of format, including high-resolution formats for “audiophiles and nerds”. If you go for one of the “lossless” formats such as ALAC, then you are getting the highest sound quality possible (higher than CD). Downloads also come with a digital version of the aforementioned booklet.

Bandcamp is also where you can place a mail-order for the CD if you want to get physical. Another feature of Bandcamp is fans can pay more than the minimum price if they want to support the artist.

The iTunes store is a great simple option for those in the Apple ecosystem, because it goes straight into the library on your device(s). You also get the same digital booklet as Bandcamp, and the audio for this release has been specially “Mastered for iTunes”. This means the sound quality is a bit better than most digital downloads (though not as good as the lossless formats available on Bandcamp).

This album was mastered by Ian Shepherd who has been a vigorous campaigner against the “loudness wars”. Did you ever notice that much, maybe most, music after the early 90s started to sound flat and bland? Well one reason was the use of “brick wall limiters” to increase average loudness, but this came at the expense of dynamics. I’m glad my release is not a causality of this pointless war, but I digress.

Other Digital Download Services
The album is on many other services, so just search for “Oil, Love & Oxygen” on your preferred platform. These services don’t provide you the booklet though and are not quite as high sound quality as the above two.

Streaming (Spotify etc.)
The album is also available across all the major streaming platforms. While streaming is certainly convenient, it is typically low sound quality and pays tiny royalties to artists.

Vinyl and Tape
Interestingly these formats are seeing a bit of a resurgence around the world. I would argue this is not because they are inherently better than digital, but because digital is so often abused (e.g. the aforementioned loudness wars and the use of “lossy” formats like mp3). If you seriously want vinyl or tape though, let me know and I will consider getting old school!

Share the Love
If you like the album, then please consider telling friends, rating or reviewing the album on iTunes etc., liking our page on the book of face…

Short enough?


Your Fun and Informative Guide to Consuming “Oil, Love & Oxygen”

Published 16 Nov 2017 by Dave Robertson in Dave Robertson.

The Paradox of Choice says that too many options can demotivate people, so here’s a short guide to the options for getting your ears on “Oil, Love & Oxygen”.

For the personal touch you can always get CDs at our shows. They come with a lush booklet of lyrics and credits, and the enchanting artwork of Frans Bisschops. Discounted digital download codes are also available for Bandcamp…

Bandcamp is a one-stop online shop for your album consumption needs. You can get a digital download in your choice of format, including high-resolution formats for “audiophiles and nerds”. If you go for one of the “lossless” formats such as ALAC, then you are getting the highest sound quality possible (higher than CD). Downloads also come with a digital version of the aforementioned booklet.

Bandcamp is also where you can place a mail-order for the CD if you want to get physical. Another feature of Bandcamp is fans can pay more than the minimum price if they want to support the artist.

The iTunes store is a great simple option for those in the Apple ecosystem, because it goes straight into the library on your device(s). You also get the same digital booklet as Bandcamp, and the audio for this release has been specially “Mastered for iTunes”. This means the sound quality is a bit better than most digital downloads (though not as good as the lossless formats available on Bandcamp).

This album was mastered by Ian Shepherd who has been a vigorous campaigner against the “loudness wars”. Did you ever notice that much, maybe most, music after the early 90s started to sound flat and bland? Well one reason was the use of “brick wall limiters” to increase average loudness, but this came at the expense of dynamics. I’m glad my release is not a causality of this pointless war, but I digress.

Other Digital Download Services
The album is on many other services, so just search for “Oil, Love & Oxygen” on your preferred platform. These services don’t provide you the booklet though and are not quite as high sound quality as the above two.

Streaming (Spotify etc.)
The album is also available across all the major streaming platforms. While streaming is certainly convenient, it is typically low sound quality and pays tiny royalties to artists.

Vinyl and Tape
Interestingly these formats are seeing a bit of a resurgence around the world. I would argue this is not because they are inherently better than digital, but because digital is so often abused (e.g. the aforementioned loudness wars and the use of “lossy” formats like mp3). If you seriously want vinyl or tape though, let me know and I will consider getting old school!

Share the Love
If you like the album, then please consider telling friends, rating or reviewing the album on iTunes etc., liking our page on the book of face…

Short enough?



Published 11 Nov 2017 by mblaney in Tags from simplepie.

1.5.1 (#559)

* Revert sanitisation type change for author and category.

* Check if the Sanitize class has been changed and update the registry.
Also preference links in the headers over links in the body to
comply with WebSub specification.

* Improvements to mf2 feed parsing.

* Switch from regex to xpath for microformats discovery.

* 1.5.1 release.

* Remove PHP 5.3 from testing.

Security updates 1.3.3, 1.2.7 and 1.1.10 released

Published 7 Nov 2017 by Roundcube Webmail Dev Team in Roundcube Webmail Project News.

We just published updates to all stable versions from 1.1.x onwards delivering fixes for a recently discovered file disclosure vulnerability in Roundcube Webmail.

Apparently this zero-day exploit is already being used by hackers to read Roundcube’s configuration files. It requires a valid username/password as the exploit only works with a valid session. More details will be published soon under CVE-2017-16651.

The Roundcube series 1.0.x is not affected by this vulnerability but we nevertheless back-ported the fix in order to protect from yet unknown exploits.

See the full changelog for the according version in the release notes on the Github download pages: v1.3.3, v1.2.7, v1.1.10 v1.0.12

We strongly recommend to update all productive installations of Roundcube with either one of these versions.


In order to check whether your Roundcube installation has been compromised check the access logs for requests like ?_task=settings&_action=upload-display&_from=timezone. As mentioned above, the file disclosure only works for authenticated users and by finding such requests in the logs you should also be able to identify the account used for this unauthorized access. For mitigation we recommend to change the all credentials to external services like database or LDAP address books and preferably also the des_key option in your config.

Update 1.3.2 released

Published 30 Oct 2017 by Roundcube Webmail Dev Team in Roundcube Webmail Project News.

We proudly announce the second service release to update the stable version 1.3. It contains fixes to several bugs reported by our dear community members as well as translation updates synchronized from Transifex.

We also changed the wording for the setting that controls the time after which an opened message is marked as read. This was previously only affecting messages being viewed in the preview panel but now applies to all means of opening a message. That change came with 1.3.0 an apparently confused many users. Some translation work is still needed here.

See the full changelog in the release notes on the Github download page.

This release is considered stable and we recommend to update all productive installations of Roundcube with this version. Download it from

Please do backup your data before updating!

Community goal: Modern and Global Text Input For Every User

Published 23 Oct 2017 by eike hein in blogs.

KDE Project:

A few months ago, I had the opportunity to give a talk on Input Methods in Plasma 5 at Akademy 2017 in lovely Almería in Spain. If you were interest in my talk but were unable to attend, there's now video (complementary slides) available courtesy of the Akademy conference team. Yay!

A big part of my talk was a long laundry list of issues we need to tackle in Plasma, other KDE software and the wider free desktop ecosystem. It's now time to take the next step and get started.

I've submitted the project under Modern and Global Text Input For Every User as part of the KDE community's new community goals initiative, a new thing we're trying exactly for challenges like this - goals that need community-wide, cross-project collaboration over a longer time period to achieve.

If you're interested in this work, make sure to read the proposal and add yourself at the bottom!

Understanding WordStar - check out the manuals!

Published 20 Oct 2017 by Jenny Mitcham in Digital Archiving at the University of York.

Last month I was pleased to be able to give a presentation at 'After the Digital Revolution' about some of the work I have been doing on the WordStar 4.0 files in the Marks and Gran digital archive that we hold here at the Borthwick Institute for Archives. This event specifically focused on literary archives.

It was some time ago now that I first wrote about these files that were recovered from 5.25 inch floppy (really floppy) disks deposited with us in 2009.

My original post described the process of re-discovery, data capture and file format identification - basically the steps that were carried out to get some level of control over the material and put it somewhere safe.

I recorded some of my initial observations about the files but offered no conclusions about the reasons for the idiosyncrasies.

I’ve since been able to spend a bit more time looking at the files and investigating the creating application (WordStar) so in my presentation at this event I was able to talk at length (too long as usual) about WordStar and early word processing. A topic guaranteed to bring out my inner geek!

WordStar is not an application I had any experience with in the past. I didn’t start word processing until the early 90’s when my archaeology essays and undergraduate dissertation were typed up into a DOS version of Word Perfect. Prior to that I used a typewriter (now I feel old!).

WordStar by all accounts was ahead of its time. It was the first Word Processing application to include mail merge functionality. It was hugely influential, introducing a number of keyboard shortcuts that are still used today in modern word processing applications (for example control-B to make text bold). Users interacted with WordStar using their keyboard, selecting the necessary keystrokes from a set of different menus. The computer mouse (if it was present at all) was entirely redundant.

WordStar was widely used as home computing and word processing increased in popularity through the 1980’s and into the early 90’s. However, with the introduction of Windows 3.0 and Word for Windows in 1989, WordStar gradually fell out of favour (info from Wikipedia).

Despite this it seems that WordStar had a loyal band of followers, particularly among writers. Of course the word processor was the key tool of their trade so if they found an application they were comfortable with it is understandable that they might want to stick with it.

I was therefore not surprised to hear that others presenting at 'After the Digital Revolution' also had WordStar files in their literary archives. Clear opportunities for collaboration here! If we are all thinking about how to provide access to and preserve these files for the future then wouldn't it be useful to talk about it together?

I've already learnt a lot through conversations with the National Library of New Zealand who have been carrying out work in this area (read all about it here: Gattuso J, McKinney P (2014) Converting WordStar to HTML4. iPres.)

However, this blog post is not about defining a preservation strategy for the files it is about better understanding them. My efforts have been greatly helped by finding a copy of both a WordStar 3 manual and a WordStar 4 manual online.

As noted in my previous post on this subject there were a few things that stand out when first looking at the recovered WordStar files and I've used the manuals and other research avenues to try and understand these better.

Created and last modified dates

The Marks and Gran digital archive consists of 174 files, most of which are WordStar files (and I believe them to be WordStar version 4).

Looking at the details that appear on the title pages of some of the scripts, the material appears to be from the period 1984 to 1987 (though not everything is dated).

However the system dates associated with the files themselves tell a different story. 

The majority of files in the archive have a creation date of 1st January 1980.

This was odd. Not only would that have been a very busy New Year's Day for the screen writing duo, but the timestamps on the files suggest that they were also working in the very early hours of the morning - perhaps unexpected when many people are out celebrating having just seen in the New Year!

This is the point at which I properly lost my faith in technical metadata!

In this period computers weren't quite as clever as they are today. When you switched them on they would ask you what date it was. If you didn't tell them the date, the PC would fall back to a system default ....which just so happens to be 1st January 1980.

I was interested to see Abby Adams from the Harry Ransom Center, University of Texas at Austin (also presenting at 'After the Digital Revolution') flag up some similarly suspicious dates on files in a digital archive held at her institution. Her dates differed just slightly to mine, falling on the evening of the 31st December 1979. Again, these dates looked unreliable as they were clearly out of line with the rest of the collection.

This is the same issue as mine, but the differences relate to the timezone. There is further explanation here highlighted by David Clipsham when I threw the question out to Twitter. Thanks!


Another thing I had noticed about the files was the way that they were broken up into fragments. The script for a single episode was not saved as a single file but typically as 3 or 4 separate files. These files were named in such a way that it was clear that they were related and that the order that the files should be viewed or accessed was apparent - for example GINGER1, GINGER2 or PILOT and PILOTB.

This seemed curious to me - why not just save the document as a single file? The WordStar 4 manual didn't offer any clues but I found this piece of information in the WordStar 3 manual which describes how files should be split up to help manage the storage space on your diskettes:

From the WordStar 3 manual

Perhaps some of the files in the digital archive are from WordStar 3, or perhaps Marks and Gran had been previously using WordStar 3 and had just got into the habit of splitting a document into several files in order to ensure they didn't run out of space on their floppy disks.

I can not imagine working this way today! Technology really has come on a long way. Imagine trying to format, review or spell check a document that exists as several discrete files potentially sitting on different media!


One thing that stands out when browsing the disks is that all the filenames are in capital letters. DOES ANYONE KNOW WHY THIS WAS THE CASE?

File names in this digital archive were also quite cryptic.This is the 1980’s so filenames conform to the 8.3 limit. Only 8 characters are allowed in a filename and it *may* also include a 3 character file extension.

Note that the file extension really is optional and WordStar version 4 doesn’t enforce the use of a standard file extension. Users were encouraged to use those last 3 characters of the file name to give additional context to the file content rather than to describe the file format itself.

Guidance on file naming from the WordStar 4 manual
Some of the tools and processes we have in place to analyse and process the files in our digital archives use the file extension information to help understand the format. The file naming methodology described here therefore makes me quite uncomfortable!

Marks and Gran tended not to use the file extension in this way (though there are a few examples of this in the archive). The majority of WordStar files have no extension at all. The real consistent use of file extensions related to their back up files.

Backup files

Scattered amongst the recovered data were a set of files that had the extension BAK. This clearly is a file extension that WordStar creates and uses consistently. These files clearly contained very similar content to other documents within the archive but typically with just a few differences in content. These files were clearly back up files of some sort but I wondered whether they had been created automatically or by the writers themselves.

Again the manual was helpful in moving forward my understanding on this:

Backup files from the WordStar 4 manual

This backup procedure is also summarised with the help of a diagram in the WordStar 3 manual:

The backup procedure from WordStar 3 manual

This does help explain why there were so many back up files in the archive. I guess the next question is 'should we keep them?'. It does seem that they are an artefact of the application rather than representing a conscious process by the writers to back their files up at a particular point in time and that may impact on their value. However, as discussed in a previous post on preserving Google documents there could be some benefit in preserving revision history (even if only partial).

...and finally

My understanding of these WordStar files has come on in leaps and bounds by doing a bit of research and in particular through finding copies of the manuals.

The manuals even explain why alongside the scripts within the digital archive we also have a disk that contains a copy of the WordStar application itself. 

The very first step in the manual asks users to make a copy of the software:

I do remember having to do this sort of thing in the past! From WordStar 4 manual

Of course the manuals themselves are also incredibly useful in teaching me how to actually use the software. Keystroke based navigation is hardly intuitive to those of us who are now used to using a mouse, but I think that might be the subject of another blog post!

Crime and Punishment

Published 19 Oct 2017 by leonieh in State Library of Western Australia Blog.

Many Western Australians have a convict or pensioner guard in their ancestral family. The State Library has digitised some items from our heritage collections relating to convicts, the police and the early criminal justice system.

Convicts slwa_b2462917_1

Convicts Tom the dealer, Davey Evans and Paddy Paternoster b2462917

Police Gazette of Western Australia, 1876-1900
The Police Gazettes include information under various headings including apprehensions (name of person arrested, arresting constable, charge and sentence), police appointments, tickets of leave, certificates of freedom, and conditional pardons issued to convicts. You may find physical descriptions of prisoners. Deserters from military service and escaped prisoners are sought. Mention is also made of expirees leaving the colony; inquests (where held, date, name and date of death of person, verdict); licences (publican, gallon, eating, boarding and lodging houses, railway refreshment rooms, wine and beer and spirit merchants, etc. giving name of licensee, name of hotel and town or district). There are listings for missing friends; prisoners discharged; people tried at Quarter Sessions (name, offence, district, verdict); and warrants issued. There are many reasons for a name to appear in the gazettes.

We thank the Friends of Battye Library and the Sholl Bequest, for supporting the digitising of the Police Gazettes.

Click to view slideshow.


A great resource for researching the broader experience of WA convicts is The convict system in Western Australia, 1850-1870 by Cherry Gertzel. This thesis explains the workings of the convict system, and explores the conditions under which the convicts lived and worked, their effect on the colony and, to some extent, the attitudes of colonists to the prisoners.

Click to view slideshow.

Another valuable publication is Further correspondence on the subject of convict discipline and transportation. This comprises official documents relating to the transportation of convicts to Australia, covering the period 1810-1865, and is bound in 8 volumes.
This set from our rare book collection gives an excellent background to the subject for anyone researching convicts or convict guards, with individuals (very) occasionally being named.
The easiest way to access this wonderful resource is to type convict system under Title in our catalogue and select State Library Online from the drop-down box. Once you’ve selected a volume, you can browse through the pages by placing your cursor on the edge of a page and clicking. If you have the volume turned on, this makes a very satisfying page-turning noise! If you want to search for names, scroll down and select the Download button. You can then save a searchable PDF version to your PC. The files are fairly large so you may need to be patient.

Return of the number of wives and families of ticket-of-leave holders to be sent out to Western Australia 1859

Return of the number of wives and families of ticket-of-leave holders to be sent out to Western Australia 1859 From: Further correspondence on the subject of convict discipline and transportation, 1859-1865 p.65. [vol.8]

 There are several online diaries relating to convict voyages. The diary, including copies of letters home, of convict John Acton Wroth was kept during his transportation to Western Australia on the Mermaid in 1851 and for a while after his arrival. Wroth was only 17 years old at the time of his conviction. Apparently he was enamoured of a young woman and resorted to fraud in order to find the means to impress her. The diary spans 1851-1853 and it reveals one young man’s difficulty in finding himself far from the love and support of his family while accepting of the circumstance he has brought upon himself. Wroth subsequently settled in Toodyay and became a respected resident, raising a large family and running several businesses as well as acting for some time as local school master. Click to view slideshow.

Another interesting read is the transcript of the diary of John Gregg, carpenter on the convict ship York. This 1862 diary gives details of work each day, which was often difficult when the weather was foul and the carpenter sea-sick, and uncommon events such as attempts by convicts to escape –

“…the affair altogether must be admitted to reflect little credit on the military portion of the convict guard, for although the officer of the watch called loud and long for the guard, none were forthcoming until the prisoners were actually in custody.”

Click to view slideshow.

Diary of John Gregg, carpenter on the convict ship ‘York’, with definitions of nautical terms, compiled by Juliet Ludbrook.





A letter from a convict in Australia to a brother in England, originally published in the Cornhill Magazine, April 1866 contains insights into the experience of a more educated felon and some sharp observations on convict life as lived by him upon his arrival in Western Australia-

“…you can walk about and talk with your friends as you please. So long as there is no disturbance, there is no interference”


“…the bond class stand in the proportion of fully five-sevenths of the entire population, and are fully conscious of their power…”

Other miscellaneous convict -related items include:

Two posters listing convict runaways with details of their convictions and descriptions:
Return of convicts who have escaped from the colony, and whose absconding has been notified to this office between the 1st June, 1850, and the 31st of March, 1859
List of convicts who are supposed to have escaped the Colony (a broadsheet giving the name, number and description of 83 escaped convicts).

Parade state of the Enrolled Guard, 30 March 1887, on the occasion of the inspection of the guard by Sir Frederick Napier Broome, prior to disbandment.


Parade state of the Enrolled Guard… b1936163


British Army pensioners came out to Western Australia as convict guards. This document gives the following details for those still serving in 1887:- rank, name, regiment, age, rate of pension, length of Army service, rank when pensioned, date of joining the Enrolled Guard, medals and clasps.







Scale of remission for English convicts sentenced to penal servitude subsequent to 1 July 1857  is a table showing how much time in good behaviour convicts needed to accrue in order to qualify for privileges.

Certificate of freedom, 1869 [Certificates of freedom of convict William Dore]

This is just a small sample of convict-related material in the State Library collections that you can explore online. You can also visit the Battye Library of West Australian History to research individual convicts, policemen, pensioner guards or others involved in the criminal justice system.


“Why archivists need a shredder…”

Published 13 Oct 2017 by inthemailbox in In the mailbox.

Struggling to explain what it is that you do and why you do it? President of the Australian Society of Archivists, Julia Mant, gives it a red hot go in an interview for the University of Technology Sydney


Why do I write environmental history?

Published 8 Oct 2017 by Tom Wilson in thomas m wilson.

Why bother to tell the history of the plants and animals that make up my home in Western Australia?  Partly its about reminding us of what was here on the land before, and in some ways, could be here again. In answering this question I’d like to quote the full text of Henry David Thoreau’s […]

Why do I write environmental history?

Published 8 Oct 2017 by Tom Wilson in thomas m wilson.

Why bother to tell the history of the plants and animals that make up my home in Western Australia?  Partly its about reminding us of what was here on the land before, and in some ways, could be here again. In answering this question I’d like to quote the full text of Henry David Thoreau’s […]

Come dine with the KDE e.V. board in Berlin in October!

Published 29 Sep 2017 by eike hein in blogs.

KDE Project:

As has become tradition in recent years, the KDE e.V. board will have an open dinner alongside its in-person meeting in Berlin, Germany on October 14th, at 7 PM.

We know there will be a lot of cool people in town next month, thanks to a KDE Edu development sprint, Qt World Summit, the GNOME Foundation hackfest and probably other events, and you're all invited to drop by and have a chat with us and amongst yourselves - and enjoy good food.

We're still picking out a location currently, so if you're interested in attending, please drop me a mail to pre-register and I will (space permitting) confirm and send details soon.

Oil, Love & Oxygen – Album Launch

Published 29 Sep 2017 by Dave Robertson in Dave Robertson.

“Oil, Love & Oxygen” is a collection of songs about kissing, climate change, cult 70s novels and more kissing. Recorded across ten houses and almost as many years, the album is diverse mix of bittersweet indie folk, pop, rock and blues. The Kiss List bring a playful element to Dave Robertson’s songwriting, unique voice and percussive acoustic guitar work. This special launch night also features local music legends Los Porcheros, Dave Johnson, Sian Brown, Rachel Armstrong and Merle Fyshwick.

Tickets $15 through , or on the door if still available


Oil, Love & Oxygen – Album Launch

Published 29 Sep 2017 by Dave Robertson in Dave Robertson.

“Oil, Love & Oxygen” is a collection of songs about kissing, climate change, cult 70s novels and more kissing. Recorded across ten houses and almost as many years, the album is diverse mix of bittersweet indie folk, pop, rock and blues. The Kiss List bring a playful element to Dave Robertson’s songwriting, unique voice and percussive acoustic guitar work. This special launch night also features local music legends Los Porcheros, Dave Johnson, Sian Brown, Rachel Armstrong and Merle Fyshwick.

Tickets $15 through , or on the door if still available



Published 27 Sep 2017 by fabpot in Tags from Twig.


Published 27 Sep 2017 by fabpot in Tags from Twig.

The first UK AtoM user group meeting

Published 27 Sep 2017 by Jenny Mitcham in Digital Archiving at the University of York.

Yesterday the newly formed UK AtoM user group met for the first time at St John's College Cambridge and I was really pleased that myself and a colleague were able to attend.
Bridge of Sighs in Autumn (photo by Sally-Anne Shearn)

This group has been established to provide the growing UK AtoM community with a much needed forum for exchanging ideas and sharing experiences of using AtoM.

The meeting was attended by about 15 people though we were informed that there are nearly 50 people on the email distribution list. Interest in AtoM is certainly increasing in the UK.

As this was our first meeting, those who had made progress with AtoM were encouraged to give a brief presentation covering the following points:
  1. Where are you with AtoM (investigating, testing, using)?
  2. What do you use it for? (cataloguing, accessions, physical storage locations)
  3. What do you like about it/ what works?
  4. What don’t you like about it/ what doesn’t work?
  5. How do you see AtoM fitting into your wider technical infrastructure? (do you have separate location or accession databases etc?)
  6. What unanswered questions do you have?
It was really interesting to find out how others are using AtoM in the UK. A couple of attendees had already upgraded to the new 2.4 release so that was encouraging to see.

I'm not going to summarise the whole meeting but I made a note of people's likes and dislikes (questions 3 and 4 above). There were some common themes that came up.

Note that most users are still using AtoM 2.2 or 2.3, those who have moved to 2.4 haven't had much chance to explore it yet. It may be that some of these comments are already out of date and fixed in the new release.

What works?

AtoM seems to have lots going for it!

The words 'intuitive', 'user friendly', 'simple', 'clear' and 'flexible' were mentioned several times. One attendee described some user testing she carried out during which she found her users just getting on and using it without any introduction or explanation! Clearly a good sign!

The fact that it was standards compliant was mentioned as well as the fact that consistency was enforced. When moving from unstructured finding aids to AtoM it really does help ensure that the right bits of information are included. The fact that AtoM highlights which mandatory fields are missing at the top of a page is really helpful when checking through your own or others records.

The ability to display digital images was highlighted by others as a key selling point, particularly the browse by digital objects feature.

The way that different bits of the AtoM database interlink was a plus point that was mentioned more than once - this allows you to build up complex interconnecting records using archival descriptions and authority records and these can also be linked to accession records and a physical location.

The locations section of AtoM was thought to be 'a good thing' - for recording information about where in the building each archive is stored. This works well once you get your head around how best to use it.

Integration with Archivematica was mentioned by one user as being a key selling point for them - several people in the room were either using, or thinking of using Archivematica for digital preservation.

The user community itself and the quick and helpful responses to queries posted on the user forum were mentioned by more than one attendee. Also praised was the fact that AtoM is in continuous active development and very much moving in the right direction.

What doesn't work?

Several attendees mentioned the digital object functionality in AtoM. As well as being a clear selling point, it was also highlighted as an area that could be improved. The one-to-one relationship between an archival description and a digital object wasn't thought to be ideal and there was some discussion about linking through to external repositories - it would be nice if items linked in this way could be displayed in the AtoM image carousel even where the url doesn't end in a filename.

The typeahead search suggestions when you enter search terms were not thought to be helpful all of the time. Sometimes the closest matches do not appear in the list of suggested results.

One user mentioned that they would like a publication status that is somewhere in between draft and published. This would be useful for those records that are complete and can be viewed internally by a selected group of users who are logged in but are not available to the wider public.

More than one person mentioned that they would like to see a conservation module in AtoM.

There was some discussion about the lack of an audit trail for descriptions within AtoM. It isn't possible to see who created a record, when it was created and information about updates. This would be really useful for data quality checking, particularly when training new members of staff and volunteers.

Some concerns about scalability were mentioned - particularly for one user with a very large number of records within AtoM - the process of re-indexing AtoM can take three days.

When creating creator or access points, the drop down menu doesn’t display all the options so this causes difficulties when trying to link to the right point or establishing whether the desired record is in the system or not. This can be particularly problematic for common surnames as several different records may exist.

There are some issues with the way authority records are created currently, with no automated way of creating a unique identifier and no ability to keep authority records in draft.

A comment about the lack of auto-save and the issue of the web form timing out and losing all of your work seemed to be a shared concern for many attendees.

Other things that were mentioned included an integration with Active Directory and local workarounds that had to be put in place to make finding aids bi-lingual.

Moving forward

The group agreed that it would be useful to keep a running list of these potential areas of development for AtoM and that perhaps in the future members may be able to collaborate to jointly sponsor work to improve AtoM. This would be a really positive outcome for this new network.

I was also able to present on a recent collaboration to enable OAI-PMH harvesting of EAD from AtoM and use it as an opportunity to try to drum up support for further development of this new feature. I had to try and remember what OAI-PMH stood for and think I got 83% of it right!

Thanks to St John's College Cambridge for hosting. I look forward to our next meeting which we hope to hold here in York in the Spring.

Moving a proof of concept into production? it's harder than you might think...

Published 20 Sep 2017 by Jenny Mitcham in Digital Archiving at the University of York.

Myself and colleagues blogged a lot during the Filling the Digital Preservation Gap Project but I’m aware that I’ve gone a bit quiet on this topic since…

I was going to wait until we had a big success to announce, but follow on work has taken longer than expected. So in the meantime here is an update on where we are and what we are up to.


Just to re-cap, by the end of phase 3 of Filling the Digital Preservation Gap we had created a working proof of concept at the University of York that demonstrated that it is possible create an automated preservation workflow for research data using PURE, Archivematica, Fedora and Samvera (then called Hydra!).

This is described in our phase 3 project report (and a detailed description of the workflow we were trying to implement was included as an appendix in the phase 2 report).

After the project was over, it was agreed that we should go ahead and move this into production.

Progress has been slower than expected. I hadn’t quite appreciated just how different a proof of concept is to a production-ready environment!

Here are some of the obstacles we have encountered (and in some cases overcome):

Error reporting

One of the key things that we have had to build in to the existing code in order to get it ready for production is error handling.

This was not a priority for the proof of concept. A proof of concept is really designed to demonstrate that something is possible, not to be used in earnest.

If errors happen and things stop working (which they sometimes do) you can just kill it and rebuild.

In a production environment we want to be alerted when something goes wrong so we can work out how to fix it. Alerts and errors are crucial to a system like this.

We are sorting this out by enabling Archivematica's own error handling and error catching within Automation Tools.

What happens when something goes wrong?

...and of course once things have gone wrong in Archivematica and you've fixed the underlying technical issue, you then need to deal with any remaining problems with your information packages in Archivematica.

For example, if the problems have resulted in failed transfers in Archivematica then you need to work out what you are going to do with those failed transfers. Although it is (very) tempting to just clear out Archivematica and start again, colleagues have advised me that it is far more useful to actually try and solve the problems and establish how we might handle a multitude of problematic scenarios if we were in a production environment!

So we now have scenarios in which an automated transfer has failed so in order to get things moving again we need to carry out a manual transfer of the dataset into Archivematica. Will the other parts of our workflow still work if we intervene in this way?

One issue we have encountered along the way is that though our automated transfer uses a specific 'datasets' processing configuration that we have set up within Archivematica, when we push things through manually it uses the 'default' processing configuration which is not what we want.

We are now looking at how we can encourage Archivematica to use the specified processing configuration. As described in the Archivematica documentation, you can do this by including an XML file describing your processing configuration within your transfer.

It is useful to learn lessons like this outside of a production environment!

File size/upload

Although our project recognised that there would be limit to the size of dataset that we could accept and process with our application, we didn't really bottom out what size dataset we intended to support.

It has now been agreed that we should reasonably expect the data deposit form to accept datasets of up to 20 GB in size. Anything larger than this would need to be handed in a different way.

Testing the proof of concept in earnest showed that it was not able to handle datasets of over 1 GB in size. Its primary purpose was to demonstrate the necessary integrations and workflow not to handle larger files.

Additional (and ongoing) work was required to enable the web deposit form to work with larger datasets.


In testing the application we of course ended up trying to push some quite substantial datasets through it.

This was fine until everything abrubtly seemed to stop working!

The problem was actually a fairly simple one but because of our own inexperience with Archivematica it took a while to troubleshoot and get things moving in the right direction again.

It turned out that we hadn’t allocated enough space in one of the bits of filestore that Archivematica uses for failed transfers (/var/archivematica/sharedDirectory/failed). This had filled up and was stopping Archivematica from doing anything else.

Once we knew the cause of the problem the available space was increased but then everything ground to a halt again because we had quickly used that up again ….increasing the space had got things moving but of course while we were trying to demonstrate the fact that it wasn't working, we had deposited several further datasets which were waiting in the transfer directory and quickly blocked things up again.

On a related issue, one of the test datasets I had been using to see how well Research Data York could handle larger datasets consisted of c.5 GB consisting of about 2000 JPEG images. Of course one of the default normalisation tasks in Archivematica is to convert all of these JPEGs to TIFF.

Once this collection of JPEGs were converted to TIFF the size of the dataset increased to around 80 GB. Until I witnessed this it hadn't really occurred to me that this could cause problems.

The solution - allocate Archivematica much more space than you think it will need!

We also now have the filestore set up so that it will inform us when the space in these directories gets to 75% full. Hopefully this will allow us to stop the filestore filling up in the future.


The proof of concept did not undergo rigorous testing - it was designed for demonstration purposes only.

During the project we thought long and hard about the deposit, request and preservation workflows that we wanted to support, but we were always aware that once we had it in an environment that we could all play with and test, additional requirements would emerge.

As it happens, we have discovered that the workflow implemented is very true to that described in the appendix of our phase 2 report and does meet our needs. However, there are lots of bits of fine tuning required to enhance the functionality and make the interface more user friendly.

The challenge here is to try to carry out the minimum of work required to turn it into an adequate solution to take into production. There are so many enhancements we could make – I have a wish list as long as my arm – but until we better understand whether a local solution or a shared solution (provided by the Jisc Research Data Shared Service) will be adopted in the future it is not worth trying to make this application perfect.

Making it fit for production is the priority. Bells and whistles can be added later as necessary!

My thanks to all those who have worked on creating, developing, troubleshooting and testing this application and workflow. It couldn't have happened without you!

How do you deal with mass spam on MediaWiki?

Published 19 Sep 2017 by sau226 in Newest questions tagged mediawiki - Webmasters Stack Exchange.

What would be the best way to find a users IP address on MediaWiki if all the connections were proxied through squid proxy server and you have access to all user rights?

I am a steward on a centralauth based wiki and we have lots of spam accounts registering and making 1 spam page each.

Can someone please tell me what the best way to mass block them is as I keep on having to block each user individually and lock their accounts?


Published 18 Sep 2017 by timbaker in Tim Baker.

The author (centre) with Ruth and Ian Gawler Recently a great Australian, a man who has helped thousands of others in their most vulnerable and challenging moments, a Member of the Order of Australia, quietly retired from a long and remarkable career of public service....

Harvesting EAD from AtoM: we need your help!

Published 18 Sep 2017 by Jenny Mitcham in Digital Archiving at the University of York.

Back in February I published a blog post about a project to develop AtoM to allow EAD (Encoded Archival Description) to be harvested via OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting): “Harvesting EAD from AtoM: a collaborative approach

Now that AtoM version 2.4 is released (hooray!), containing the functionality we have sponsored, I thought it was high time I updated you on what has been achieved by this project, where more work is needed and how the wider AtoM community can help.

What was our aim?

Our development work had a few key aims:

  • To enable finding aids from AtoM to be exposed as EAD 2002 XML for others to harvest. The partners who sponsored this project were particularly keen to enable the Archives Hub to harvest their EAD.
  • To change the way that EAD was generated by AtoM in order to make it more scalable. Moving EAD generation from the web browser to the job scheduler was considered to be the best approach here.
  • To make changes to the existing DC (Dublin Core) metadata generation feature so that it also works through the job scheduler - making this existing feature more scalable and able to handle larger quantities of data

A screen shot of the job scheduler in AtoM - showing the EAD and
DC creation jobs that have been completed

What have we achieved?

The good

We believe that the EAD harvesting feature as released in AtoM version 2.4 will enable a harvester such as the Archives Hub to harvest our catalogue metadata from AtoM as EAD. As we add new top level archival descriptions to our catalogue, subsequent harvests should pick up and display these additional records. 

This is a considerable achievement and something that has been on our wishlist for some time. This will allow our finding aids to be more widely signposted. Having our data aggregated and exposed by others is key to ensuring that potential users of our archives can find the information that they need.

Changes have also been made to the way metadata (both EAD and Dublin Core) are generated in AtoM. This means that the solution going forward is more scalable for those AtoM instances that have very large numbers of records or large descriptive hierarchies.

The new functionality in AtoM around OAI-PMH harvesting of EAD and settings for moving XML creation to the job scheduler is described in the AtoM documentation.

The not-so-good

Unfortunately the EAD harvesting functionality within AtoM 2.4 will not do everything we would like it to do. 

It does not at this point include the ability for the harvester to know when metadata records have been updated or deleted. It also does not pick up new child records that are added into an existing descriptive hierarchy. 

We want to be able to edit our records once within AtoM and have any changes reflected in the harvested versions of the data. 

We don’t want our data to become out of sync. 

So clearly this isn't ideal.

The task of enabling full harvesting functionality for EAD was found to be considerably more complex than first anticipated. This has no doubt been confounded by the hierarchical nature of the EAD which differs from the simplicity of the traditional Dublin Core approach.

The problems encountered are certainly not insurmountable, but lack of additional resources and timelines for the release of AtoM 2.4 stopped us from being able to finish off this work in full.

A note on scalability

Although the development work deliberately set out to consider issues of scalability, it turns out that scalability is actually on a sliding scale!

The National Library of Wales had the forethought to include one of their largest archival descriptions as sample data for inclusion in the version of AtoM 2.4 that Artefactual deployed for testing. Their finding aid for St David’s Diocesan Records is a very large descriptive hierarchy consisting of 33,961 individual entries. This pushed the capabilities of EAD creation (even when done via the job scheduler) and also led to discussions with The Archives Hub about exactly how they would process and display such a large description at their end even if EAD generation within AtoM were successful.

Some more thought and more manual workarounds will need to be put in place to manage the harvesting and subsequent display of large descriptions such as these.

So what next?

We are keen to get AtoM 2.4 installed at the Borthwick Institute for Archives over the next couple of months. We are currently on version 2.2 and would like to start benefiting from all the new features that have been introduced available... and of course to test in earnest the EAD harvesting feature that we have jointly sponsored.

We already know that this feature will not fully meet our needs in its current form, but would like to set up an initial harvest with the Archives Hub and further test some of our assumptions about how this will work.

We may need to put some workarounds in place to ensure that we have a way of reflecting updates and deletions in the harvested data – either with manual deletes or updates or a full delete and re-harvest periodically.

Harvesting in AtoM 2.4 - some things that need to change

So we have a list of priority things that need to be improved in order to get EAD harvesting working more smoothly in the future:

In line with the OAI-PMH specification

  • AtoM needs to expose updates to the metadata to the harvester
  • AtoM needs to expose new records (at any level of description) to the harvester
  • AtoM needs to expose information about deletions to the harvester
  • AtoM also needs to expose information about deletions to DC metadata to the harvester (it has come to my attention during the course of this project that this isn’t happening at the moment) 

Some other areas of potential work

I also wanted to bring together and highlight some other areas of potential work for the future. These are all things that were discussed during the course of the project but were not within the scope of our original development goals.

  • Harvesting of EAC (Encoded Archival Context) - this is the metadata standard for authority records. Is this something people would like to see enabled in the future? Of course this is only useful if you have someone who actually wants to harvest this information!
  • On the subject of authority records, it would be useful to change the current AtoM EAD template to use @authfilenumber and @source - so that an EAD record can link back to the relevant authority record in the local AtoM site. The ability to create rich authority records is such a key strength of AtoM, allowing an institution to weave rich interconnecting stories about their holdings. If harvesting doesn’t preserve this inter-connectivity then I think we are missing a trick!
  • EAD3 - this development work has deliberately not touched on the new EAD standard. Firstly, this would have been a much bigger job and secondly, we are looking to have our EAD harvested by The Archives Hub and they are not currently working with EAD3. This may be a priority area of work for the future.
  • Subject source - the subject source (for example "Library of Congress Subject Headings") doesn't appear in AtoM generated EAD at the moment even though it can be entered into AtoM - this would be a really useful addition to the EAD.
  • Visible elements - AtoM allows you to decide which elements you wish to display/hide in your local AtoM interface. With the exception of information relating to physical storage, the XML generation tasks currently do not take account of visible elements and will carry out an export of all fields. Further investigation of this should be carried out in the future. If an institution is using the visible elements feature to hide certain bits of information that should not be more widely distributed, they would be concerned if this information was being harvested and displayed elsewhere. As certain elements will be required in order to create valid EAD, this may get complicated!
  • ‘Manual’ EAD generation - the project team discussed the possibility of adding a button to the AtoM user interface so that staff users can manually kick-off EAD regeneration for a single descriptive hierarchy. Artefactual suggested this as a method of managing the process of EAD generation for large descriptive hierarchies. You would not want the EAD to regenerate with each minor tweak if a large archival description was undergoing several updates, however, you need to be able to trigger this task when you are ready to do so. It should be possible to switch off the automatic EAD re-generation (which normally triggers when a record is edited and saved) but have a button on the interface that staff can click when they want to initiate the process - for example when all edits are complete. 
  • As part of their work on this project, Artefactual created a simple script to help with the process of generating EAD for large descriptive hierarchies - it basically provides a way of finding out which XML files relate to a specific archival description so that EAD can be manually enhanced and updated if it is too large for AtoM to generate via the job scheduler. It would be useful to turn this script into a command-line task that is maintained as part of the AtoM codebase.

We need your help!

Although we believe we have something we can work with here and now, we are not under any illusions that this feature does all that it needs to in order to meet our requirements in the longer term. 

I would love to find out what other AtoM users (and harvesters) think of the feature. Is it useful to you? Are there other things we should put on the wishlist? 

There is a lot of additional work described in this post which the original group of project partners are unlikely to be able to fund on their own. If EAD harvesting is a priority to you and your organisation and you think you can contribute to further work in this area either on your own or as part of a collaborative project please do get in touch.


I’d like to finish with a huge thanks to those organisations who have helped make this project happen, either through sponsorship, development or testing and feedback.

Jason Scott Talks His Way Out of It: A Podcast

Published 14 Sep 2017 by Jason Scott in ASCII by Jason Scott.

Next week I start a podcast.

There’s a Patreon for the podcast with more information here.

Let me unpack a little of the thinking.

Through the last seven years, since I moved back to NY, I’ve had pretty variant experiences of debt or huge costs weighing me down. Previously, I was making some serious income from a unix admin job, and my spending was direct but pretty limited. Since then, even with full-time employment (and I mean, seriously, a dream job), I’ve made some grandiose mistakes with taxes, bills and tracking down old obligations that means I have some notable costs floating in the background.

Compound that with a new home I’ve moved to with real landlords that aren’t family and a general desire to clean up my life, and I realized I needed some way to make extra money that will just drop directly into the bill pit, never to really pass into my hands.

How, then, to do this?

I work very long hours for the Internet Archive, and I am making a huge difference in the world working for them. It wouldn’t be right or useful for me to take on any other job. I also don’t want to be doing something like making “stuff” that I sell or otherwise speculate into some market. Leave aside I have these documentaries to finish, and time has to be short.

Then take into account that I can no longer afford to drop money going to anything other than a small handful of conferences that aren’t local to me (the NY-CT-NJ Tri-State area), and that people really like the presentations I give.

So, I thought, how about me giving basically a presentation once a week? What if I recorded me giving a sort of fireside chat or conversational presentation about subjects I would normally give on the road, but make them into a downloadable podcast? Then, I hope, everyone would be happy: fans get a presentation. I get away from begging for money to pay off debts. I get to refine my speaking skills. And maybe the world gets something fun out of the whole deal.

Enter a podcast, funded by a Patreon.

The title: Jason Talks His Way Out of It, my attempt to write down my debts and share the stories and thoughts I have.

I announced the Patreon on my 47th birthday. Within 24 hours, about 100 people had signed up, paying some small amount (or not small, in some cases) for each published episode. I had a goal of $250/episode to make it worthwhile, and we passed that handily. So it’s happening.

I recorded a prototype episode, and that’s up there, and the first episode of the series drops Monday. These are story-based presentations roughly 30 minutes long apiece, and I will continue to do them as long as it makes sense to.

Public speaking is something I’ve done for many, many years, and I enjoy it, and I get comments that people enjoy them very much. My presentation on That Awesome Time I Was Sued for Two Billion Dollars has passed 800,000 views on the various copies online.

I spent $40 improving my sound setup, which should work for the time being. (I already had a nice microphone and a SSD-based laptop which won’t add sound to the room.) I’m going to have a growing list of topics I’ll work from, and I’ll stay in communication with the patrons.

Let’s see what this brings.

One other thing: Moving to the new home means that a lot of quality of life issues have been fixed, and my goal is to really shoot forward finishing those two documentaries I owe people. I want them done as much as everyone else! And with less looming bills and debts in my life, it’ll be all I want to do.

So, back the new podcast if you’d like. It’ll help a lot.

Does Mediawiki encrypt logins by default as the browser sends them to the server?

Published 11 Sep 2017 by user1258361 in Newest questions tagged mediawiki - Server Fault.

Several searches only turned up questions about encrypting login info on the server side. Does Mediawiki encrypt logins after you type them in the browser and send them? (to prevent a man-in-the-middle from reading them in transit and taking over an account)

The Bounty of the Ted Nelson Junk Mail

Published 9 Sep 2017 by Jason Scott in ASCII by Jason Scott.

At the end of May, I mentioned the Ted Nelson Junk Mail project, where a group of people were scanning in boxes of mailings and pamphlets collected by Ted Nelson and putting them on the Internet Archive. Besides the uniqueness of the content, it was also unique in that we were trying to set it up to be self-sustaining from volunteer monetary contributions, and the compensate the scanners doing the work.

This entire endeavor has been wildly successful.

We are well past 18,000 pages scanned. We have taken in thousands in donations. And we now have three people scanning and one person entering metadata.

Here is the spreadsheet with transparency and donation information.

I highly encourage donating.

But let’s talk about how this collection continues to be amazing.

Always, there are the pure visuals. As we’re scanning away, we’re starting to see trends in what we have, and everything seems to go from the early 1960s to the early 1990s, a 30-year scope that encompasses a lot of companies and a lot of industries. These companies are trying to thrive in a whirlpool of competing attention, especially in certain technical fields, and they try everything from humor to class to rudimentary fear-and-uncertainty plays in the art.

These are exquisitely designed brochures, in many cases – obviously done by a firm or with an in-house group specifically tasked with making the best possible paper invitations and with little expense spared. After all, this might be the only customer-facing communication a company could have about its products, and might be the best convincing literature after the salesman has left or the envelope is opened.

Scanning at 600dpi has been a smart move – you can really zoom in and see detail, find lots to play with or study or copy. Everything is at this level, like this detail about a magnetic eraser that lets you see the lettering on the side.

Going after these companies for gender roles or other out-of-fashion jokes almost feels like punching down, but yeah, there’s a lot of it. Women draped over machines, assumptions that women will be doing the typing, and clunky humor about fulfilling your responsibilities as a (male) boss abounds. Cultural norms regarding what fears reigned in business or how companies were expected to keep on top of the latest trends are baked in there too.

The biggest obstacle going forward, besides bringing attention to this work, is going to be one of findability. The collection is not based on some specific subject matter other than what attracted Ted’s attention over the decades. He tripped lightly among aerospace, lab science, computers, electronics, publishing… nothing escaped his grasp, especially in technical fields.

If people are looking for pure aesthetic beauty, that is, “here’s a drawing of something done in a very old way” or “here are old fonts”, then this bounty is already, at 1,700 items, a treasure trove that could absorb weeks of your time. Just clicking around to items that on first blush seem to have boring title pages will often expand into breathtaking works of art and design.

I’m not worried about that part, frankly – these kind of sell themselves.

But there’s so much more to find among these pages, and as we’re now up to so many examples, it’s going to be a challenge to get researching folks to find them.

We have the keywording active, so you can search for terms like monitor, circuit, or hypercard and get more specific matches without concentrating on what the title says or what graphics appear on the front. The Archive has a full-text search, and so people looking for phrases will no doubt stumble into this collection.

But how easily will people even think to know about a wristwatch for the Macintosh from 1990, a closed circuit camera called the Handy Looky..  or this little graphic, nestled away inside a bland software catalog:

…I don’t know. I’ll mention that this is actually twitter-fodder among archivists, who are unhappy when someone is described as “discovering” something in the archives, when it was obvious a person cataloged it and put it there.

But that’s not the case here. Even Kyle, who’s doing the metadata, is doing so in a descriptive fashion, and on a rough day of typing in descriptions, he might not particularly highlight unique gems in the pile (he often does, though). So, if you discover them in there, you really did discover them.

So, the project is deep, delightful, and successful. The main consideration of this is funding; we are paying the scanners $10/hr to scan and the metadata is $15/hr. They work fast and efficiently. We track them on the spreadsheet. But that means a single day of this work can cause a notable bill. We’re asking people on twitter to raise funds, but it never hurts to ask here as well. Consider donating to this project, because we may not know for years how much wonderful history is saved here.

Please share the jewels you find.

Update 1.2.6 released

Published 9 Sep 2017 by Roundcube Webmail Dev Team in Roundcube Webmail Project News.

This is a service and security update to the stable version 1.2. It contains some important bug fixes and improvements which we picked from the upstream branch.

See the full changelog in the release notes on the Github download page.

This release is considered stable and we recommend to update all productive 1.2.x installations of Roundcube with this version. Download it from

Please do backup your data before updating!

4 Months!

Published 9 Sep 2017 by Jason Scott in ASCII by Jason Scott.

It’s been 4 months since my last post! That’s one busy little Jason summer, to be sure.

Obviously, I’m still around, so no heart attack lingering or problems. My doctor told me that my heart is basically healed, and he wants more exercise out of me. My diet’s continued to be lots of whole foods, leafy greens and occasional shameful treats that don’t turn into a staple.

I spent a good month working with good friends to clear out the famous Information Cube, sorting out and mailing/driving away all the contents to other institutions, including the Internet Archive, the Strong Museum of Play, the Vintage Computer Federation, and parts worldwide.

I’ve moved homes, no longer living with my brother after seven up-and-down years of siblings sharing a house. It was time! We’re probably not permanently scarred! I love him very much. I now live in an apartment with very specific landlords with rules and an important need to pay them on time each and every month.

To that end, I’ve cut back on my expenses and will continue to, so it’s the end of me “just showing up” to pretty much any conferences that I’m not being compensated for, which will of course cut things down in terms of Jason appearances you can find me at.

I’ll still be making appearances as people ask me to go, of course – I love travel. I’m speaking in Amsterdam in October, as well as being an Emcee at the Internet Archive in October as well. So we’ll see how that goes.

What that means is more media ingestion work, and more work on the remaining two documentaries. I’m going to continue my goal of clearing my commitments before long, so I can choose what I do next.

What follows will be (I hope) lots of entries going deep into some subjects and about what I’m working on, and I thank you for your patience as I was not writing weblog entries while upending my entire life.

To the future!

Godless for God’s Sake: Now available for Kindle for just $5.99

Published 6 Sep 2017 by James Riemermann in


Godless for God’s Sake: Nontheism in Contemporary Quakerism

In this book edited by British Friend and author David Boulton, 27 Quakers from 4 countries and 13 yearly meetings tell how they combine active and committed membership in the Religious Society of Friends with rejection of traditional belief in the existence of a transcendent, personal and supernatural God.

For some, God is no more (but no less) than a symbol of the wholly human values of “mercy, pity, peace and love”. For others, the very idea of God has become an archaism.

Readers who seek a faith free of supernaturalism, whether they are Friends, members of other religious traditions or drop-outs from old-time religion, will find good company among those whose search for an authentic 21st century understanding of religion and spirituality has led them to declare themselves “Godless – for God’s Sake”.


Preface: In the Beginning…

1. For God’s Sake? An Introduction


David Boulton

2. What’s a Nice Nontheist Like You Doing Here?


Robin Alpern

3. Something to Declare


Philip Gross

4. It’s All in the Numbers

Joan D Lucas

5. Chanticleer’s Call: Religion as a Naturalist Views It

Os Cresson

6. Mystery: It’s What we Don’t Know

James T Dooley Riemermann

7. Living the Questions

Sandy Parker

8. Listening to the Kingdom

Bowen Alpern

9. The Making of a Quaker Nontheist Tradition

David Boulton and Os Cresson

10. Facts and Figures

David Rush

11. This is my Story, This is my Song…


Ordering Info

Links to forms for ordering online will be provided here as soon as they are available. In the meantime, contact the organizations listed below, using the book details at the bottom of this page.

QuakerBooks of Friends General Conference

(formerly FGC Bookstore)

1216 Arch St., Ste 2B

Philadelphia, PA 19107

215-561-1700 fax 215-561-0759

(this is the “Universalism” section of Quakerbooks, where the book is currently located)

(this is the “Universalism” section of Quakerbooks, where the book is currently located)



Quaker Bookshop

173 Euston Rd London NW1 2BJ

020 7663 1030, fax 020 7663 1008


Those outside the United Kingdom and United States should be able to order through a local bookshop, quoting the publishing details below – particularly the ISBN number. In case of difficulty, the book can be ordered direct from the publisher’s address below.

Title: “Godless for God’s Sake: Nontheism in Contemporary Quakerism” (ed. David Boulton)

Publisher: Dales Historical Monographs, Hobsons Farm, Dent, Cumbria LA10 5RF, UK. Tel 015396 25321. Email

Retail price: ?9.50 ($18.50). Prices elsewhere to be calculated on UK price plus postage.

Format: Paperback, full colour cover, 152 pages, A5

ISBN number: 0-9511578-6-8 (to be quoted when ordering from any bookshop in the world)

Konversation 2.x in 2018: New user interface, Matrix support, mobile version

Published 5 Sep 2017 by eike hein in blogs.

KDE Project:

It's time to talk about exciting new things in store for the Konversation project!

Konversation is KDE's chat application for communities. No matter whether someone is a newcomer seeking community, a seasoned participant in one, or a community administrator: our mission is to bring groups of people together, allow them to delight in each other's company, and support their pursuit of shared interests and goals.

One of the communities we monitor for changes to your needs is our own: KDE. Few things make a Konversation hacker happier than journeying to an event like Akademy in Almería, Spain and seeing our app run on many screens all around.

The KDE community has recently made progress defining what it wants out of a chat solution in the near future. To us, those initial results align very strongly with Konversation's mission and display a lot of overlap with the things it does well. However, they also highlight trends where the current generation of Konversation falls short, e.g. support for persistence across network jumps, mobile device support and better media/file handling.

This evolution in KDE's needs matches what we're seeing in other communities we cater to. Recently we've started a new development effort to try and answer those needs.

Enter Konversation 2.x

Konversation 2.x R&D mockup screenshot
Obligatory tantilizing sneak preview (click to enlarge)

Konversation 2.x will be deserving of the version bump, revamping the user interface and bringing the application to new platforms. Here's a rundown of our goals:

  • A more modern, cleaner user interface, built using Qt Quick and KDE's Kirigami technology
    • Adopting a responsive window layout, supporting more varied desktop use cases and putting us on a path towards becoming a desktop/mobile convergent application
    • Scaling to more groups with an improved tab switcher featuring better-integrated notifications and mentions
    • Redesigned and thoroughly cleaned-up settings, including often-requested per-tab settings
    • Richer theming, including a night mode and a small selection of popular chat text layouts for different needs
  • Improved media/file handling, including image sharing, a per-tab media gallery, and link previews
  • A reduced resource footprint, using less memory and battery power
  • Support for the Matrix protocol
  • Supporting a KDE-wide Global and Modern Text Input initiative, in particular for emoji input
  • Versions for Plasma Mobile and Android
  • Updating Konversation's web presence

Let's briefly expand on a few of those:


KDE's Kirigami user interface technology helps developers make applications that run well on both desktop and mobile form factors. While still a young project, too, it's already being put to good use in projects such as Peruse, Calligra Gemini, Gwenview, and others. When we tried it out Kirigami quickly proved useful to us as well. We've been enjoying a great working relationship with the Kirigami team, with code flowing both ways. Check it out!

Design process

To craft the new user interface, we're collaborating with KDE's Visual Design Group. Within the KDE community, the VDG itself is a driver of new requirements for chat applications (as their collaboration workflows differ substantially from coding contributors). We've been combining our experience listening to many years of user feedback with their design chops, and this has lead to an array of design mockups we've been working from so far. This is just the beginning, with many, many details left to hammer out together - we're really grateful for the help! :)


Currently we're focused on bringing more of the new UI online, proving it on top of our robust IRC backend. However, Matrix support will come next. While we have no plans to drop support for IRC, we feel the Matrix protocol has emerged as a credible alternative that retains many of IRC's best qualities while better supporting modern needs (and bridging to IRC). We're excited about what it will let us do and want to become your Matrix client of choice next year!

Work done so far

The screenshot shown above is sort of a functional R&D mockup of where we're headed with the new interface. It runs, it chats - more on how to try it out in a moment - but it's quite incomplete, wonky, and in a state of flux. Here's a few more demonstrations and explorations of what it can do:

Repsonsive window layout
Responsive window layout: Front-and-center vs. small-and-in-a-corner (click for smoother HD/YouTube)

Toggling settings mode
Friction-free switching to and from settings mode (click for smoother HD/YouTube

Overlay context sidebar
Overlay context sidebar: Tab settings and media gallery will go here (click to enlarge)

See a gallery with an additional screenshot of the settings mode.

Trying it out

The work is being carried out on the wip/qtquick branch of konversation.git. It needs Qt 5.9 and the master branch of kirigami.git to build and run, respectively. We also have a Flatpak nightly package soon on the way, pending sorting out some dependency issues.

Be sure to check out this wiki page with build and testing instructions. You'll learn how to retrieve either the sources or the Flatpak, as well as a number of command line arguments that are key when test-driving.

Sneak preview of great neat-ness: It's possible to toggle between the old and new Konversation UIs at any time using the F10 key. This makes dogfooding at this early stage much more palatable!

Joining the fun

We're just starting out to use this workboard on KDE's Phabricator instance to track and coordinate tasks. Subscribe and participate! Phabricator is also the platform of choice to submit code contributions.

As noted above, Konversation relies on Kirigami and the VDG. Both projects welcome new contributors. Helping them out helps Konversation!

To chat with us, you can stop by the #konversation and #kde-vdg channels on freenode (using IRC or the Matrix bridge). Hop on and introduce yourself!

Side note: The Kirigami team plans to show up in force at the KDE Randa meeting this fall to hack on things the Konversation team is very much interested in, including expanding support for keyboard navigation in Kirigami UI. Check out the Randa fundraising campaign which e.g. enables KDE to bring more devs along, it's really appreciated!

Update 1.3.1 released

Published 3 Sep 2017 by Roundcube Webmail Dev Team in Roundcube Webmail Project News.

We just published the first service release to update the stable version 1.3 which is the result of some touching-up on the new features introduced with the 1.3.0 release. For example it brings back the double-click behavior to open messages which was reduced to the list-only view. Or because the switch to change the mail view layout was a bit hidden, we also added it to the preferences section.

The update also includes fixes to reported bugs and one potential XSS vulnerability as well as optimizations to smoothly run on the latest version of PHP.

See the full changelog in the release notes on the Github download page.

This release is considered stable and we recommend to update all productive installations of Roundcube with this version. Download it from

Please do backup your data before updating!

MassMessage hits 1,000 commits

Published 28 Aug 2017 by legoktm in The Lego Mirror.

The MassMessage MediaWiki extension hit 1,000 commits today, following an update of the localization messages for the Russian language. MassMessage replaced a Toolserver bot that allowed sending a message to all Wikimedia wikis, by integrating it into MediaWiki and using the job queue. We also added some nice features like input validation and previewing. Through it, I became familiar with different internals of MediaWiki, including submitting a few core patches.

I made my first commit on July 20, 2013. It would get a full rollout to all Wikimedia wikis on November 19, 2013, after a lot of help from MZMcBride, Reedy, Siebrand, Ori, and other MediaWiki developers.

I also mentored User:wctaiwan, who worked on a Google Summer of Code project that added a ContentHandler backend to the extension, to make it easier for people to create and maintain page lists. You can see it used by The Wikipedia Signpost's subscription list.

It's still a bit crazy to think that I've been hacking on MediaWiki for over four years now, and how much it has changed my life in that much time. So here's to the next four years and next 1,000 commits to MassMessage!

Requiring HTTPS for my Toolforge tools

Published 26 Aug 2017 by legoktm in The Lego Mirror.

My Toolforge (formerly "Tool Labs") tools will now start requiring HTTPS, and redirecting any HTTP traffic. It's a little bit of common code for each tool, so I put it in a shared "toolforge" library.

from flask import Flask
import toolforge

app = Flask(__name__)

And that's it! Your tool will automatically be HTTPS-only now.

$ curl -I ""
HTTP/1.1 302 FOUND
Server: nginx/1.11.13
Date: Sat, 26 Aug 2017 07:58:39 GMT
Content-Type: text/html; charset=utf-8
Content-Length: 281
Connection: keep-alive
X-Clacks-Overhead: GNU Terry Pratchett

My DebConf 17 presentation - Bringing MediaWiki back into Debian

Published 25 Aug 2017 by legoktm in The Lego Mirror.

Full quality video available on Wikimedia Commons, as well as the slides.

I had a blast attending DebConf '17 in Montreal, and presented about my efforts to bring back MediaWiki into Debian. The talks I went to were all fantastic, and got to meet some amazing people. But the best parts about the conference was the laid-back atmosphere and the food. I've never been to another conference that had food that comes even close to DebConf.

Feeling very motivated, I have three new packages in the pipeline: LuaSandbox, uprightdiff, and libkiwix.

I hope to be at DebConf again next year!

Benchmarking with the NDSA Levels of Preservation

Published 18 Aug 2017 by Jenny Mitcham in Digital Archiving at the University of York.

Anyone who has heard me talk about digital preservation will know that I am a big fan of the NDSA Levels of Preservation.

This is also pretty obvious if you visit me in my office – a print out of the NDSA Levels is pinned to the notice board above my PC monitor!

When talking to students and peers about how to get started in digital preservation in a logical, pragmatic and iterative way, I always recommend using the NDSA Levels to get started. Start at level 1 and move forward to the more advanced levels as and when you are able. This is a much more accessible and simple way to start addressing digital preservation than digesting some of the bigger and more complex certification standards and benchmarking tools.

Over the last few months I have been doing a lot of documentation work. Both ensuring that our digital archiving procedures are written down somewhere and documenting where we are going in the future.

As part of this documentation it seemed like a good idea to use the NDSA Levels:

Previously I have used the NDSA Levels in quite a superficial way – as a guide and a talking point, it has been quite a different exercise actually mapping where we stand.

It was not always straightforward to establish where we are and to unpick and interpret exactly what each level meant in practice. I guess this is one of the problems of using a relatively simple set of metrics to describe what is really quite a complex set of processes.

Without publishing the whole document that I've written on this, here is a summary of where I think we are currently. I'm also including some questions I've been grappling with as part of the process.

Storage and geographic location

Currently at LEVEL 2: 'know your data' with some elements of LEVEL 3 and 4 in place

See the full NDSA levels here

Four years ago we carried out a ‘rescue mission’ to get all digital data in the archives off portable media and on to the digital archive filestore. This now happens as a matter of course when born digital media is received by the archives.

The data isn’t in what I would call a proper digital archive but it is on a fairly well locked down area of University of York filestore.

There are three copies of the data available at any one time (not including the copy that is on original media within the strongrooms). The University stores two copies of the data on spinning disk. One at a data centre on one campus and the other at a data centre on another campus with another copy backed up to tape which is kept for 90 days.

I think I can argue that storage of the data on two different campuses is two different geographic locations but these locations are both in York and only about 1 mile apart. I'm not sure whether they could be described as having different disaster threats so I'm going to hold back from putting us at Level 3 though IT do seem to have systems in place to ensure that filestore is migrated on a regular schedule.


File fixity and data integrity

Currently at LEVEL 4: 'repair your data'

See the full NDSA levels here

Having been in this job for five years now I can say with confidence that I have never once received file fixity information alongside data that has been submitted to us. Obviously if I did receive it I would check it on ingest, but I can not envisage this scenario occurring in the near future! I do however create fixity information for all content as part of the ingest process.

I use a tool called Foldermatch to ensure that the digital data I have copied into the archive is identical to the original. Foldermatch allows you to compare the contents of two folders and one of the comparison methods (the one I use at ingest) uses checksums to do this.

Last year I purchased a write blocker for use when working with digital content delivered to us on portable hard drives and memory sticks. A check for viruses is carried out on all content that is ingested into the digital archive so this fulfills the requirements of level 2 and some of level 3.

Despite putting us at Level 4, I am still very keen to improve our processes and procedures around fixity. Fixity checks are carried out at intervals (several times a month) and these checks are logged but at the moment this is all initiated manually. As the digital archive gets bigger, we will need to re-think our approaches to this important area and find solutions that are scalable.


Information Security

Currently at LEVEL 2: 'know your data' with some elements of LEVEL 3 in place

See the full NDSA levels here

Access to the digital archive filestore is limited to the digital archivist and IT staff who administer the filestore. If staff or others need to see copies of data within the digital archive filestore, copies are made elsewhere after appropriate checks are made regarding access permissions. The master copy is always kept on the digital archive filestore to ensure that the authentic original version of the data is maintained. Access restrictions are documented.

We are also moving towards the higher levels here. A recent issue reported on a mysterious change of last modified dates for .eml files has led to discussions with colleagues in IT, and I have been informed that an operating system upgrade for the server should include the ability to provide logs of who has done what to files in the archive.

It is worth pointing out that as I don't currently have systems in place for recording PREMIS (preservation) metadata. I am currently taking a hands off approach to preservation planning within the digital archive. Preservation actions such as file migration are few and far between and are recorded in a temporary way until a more robust system is established.


Currently at LEVEL 3: 'monitor your data'

See the full NDSA levels here

We do OK with metadata currently, (considering a full preservation system is not yet in place). Using DROID at ingest is helpful at fulfilling some of the requirements of levels 1 to 3 (essentially, having a record of what was received and where it is).

Our implementation of AtoM as our archival management system has helped fulfil some of the other metadata requirements. It gives us a place to store administrative metadata (who gave us it and when) as well as providing a platform to surface descriptive metadata about the digital archives that we hold.

Whether we actually have descriptive metadata or not for digital archives will remain an issue. Much metadata for the digital archive can be generated automatically but descriptive metadata isn't quite as straightforward. In some cases a basic listing is created for files within the digital archive (using Dublin Core as a framework) but this will not happen in all cases. Descriptive metadata typically will not be created until an archive is catalogued which may come at a later date.

Our plans to implement Archivematica next year will help us get to Level 4 as this will create full preservation metadata for us as PREMIS.


File formats

Currently at LEVEL 2: 'know your data' with some elements of LEVEL 3 in place

See the full NDSA levels here

It took me a while to convince myself that we fulfilled Level 1 here! This is a pretty hard one to crack, especially if you have lots of different archives coming in from different sources, and sometimes with little notice. I think it is useful that the requirement at this level is prefaced with "When you can..."!

Thinking about it, we do do some work in this area - for example:

To get us to Level 2, as part of the ingest process we run DROID to get a list of file formats included within a digital archive. Summary stats are kept within a spreadsheet that covers all content within the digital archive so we can quickly see the range of formats that we hold and find out which archives they are in.

This should allow us to move towards Level 3 but we are not there yet. Some pretty informal and fairly ad hoc thinking goes into  file format obsolescence but I won't go as far as saying that we 'monitor' it. I have an awareness of some specific areas of concern in terms of obsolete files (for example I've still got those WordStar 4.0 files and I really do want to do something with them!) but there are no doubt other formats that need attention that haven't hit my radar yet.

As mentioned earlier, we are not really doing migration right now - not until I have a better system for creating the PREMIS metadata, so Level 4 is still out of reach.



This has been a useful exercise and it is good to see where we need to progress. Going from using the Levels in the abstract and actually trying to apply them as a tool has been a bit challenging in some areas. I think additional information and examples would be useful to help clear up some of the questions that I have raised.

I've also found that even where we meet a level there is often other ways we could do things better. File fixity and data integrity looks like a strong area for us but I am all too aware that I would like to find a more sustainable and scalable way to do this. This is something we'll be working on as we get Archivematica in place. Reaching Level 4 shouldn't lead to complacency!

An interesting blog post last year by Shira Peltzman from the UCLA Library talked about Expanding the NDSA Levels of Preservation to include an additional row focused on Access. This seems sensible given that the ability to provide access is the reason why we preserve archives. I would be keen to see this developed further so long as the bar wasn't set too high. At the Borthwick my initial consideration has been preservation - getting the stuff and keeping it safe - but access is something that will be addressed over the next couple of years as we move forward with our plans for Archivematica and AtoM.

Has anyone else assessed themselves against the NDSA Levels?  I would be keen to see how others have interpreted the requirements.

Botanical Wonderland events

Published 18 Aug 2017 by carinamm in State Library of Western Australia Blog.

From pressed seaweed, to wildflower painting, embroidery, to photography – botanical wonders have inspired and defined Western Australia. Hear from art historian, author, artist and curator Dr Dorothy Erickson in two events at the State Library of Western Australia.

WA wildflowers 17.jpg

Lecture: Professional women Artists in the Wildflower State by Dr Dorothy Erickson
Wednesday 23 August 2017 – 5:00-6:00 pm
Great Southern Room – State Library of Western Australia
Free. No bookings required

The first profession acceptable to be practiced by Middle Class women was as an Artist. They were the ‘Angels in the Studio’ at the time when gold was first being found in Western Australia. While a few Western Australian born were trained artists many others came in the wake of the gold rushes when Western Australia was the world’s El Dorado. A number were entranced by the unique wildflowers and made this the mainstay of their careers. This talk will focus on the professional women artists in Western Australia from 1890 to WWI with particular attention to the those who painted our unique botanical wonderland.

L W Greaves_CROP

Lilian Wooster Greaves was a prolific Western Australian wildflower artist , “no one else seems to be able to equal her skill in pressing and mounting wildflower specimens, in the form of panels, cards and booklets – The West Australian 21 May 1927. Portrait of Lilian Wooster Greaves Out of Doors in WA, 1927, State Library of Western Australia 821A(W)GRE.

Floor Talk on Botanical Wonderland exhibition with Dr Dorothy Erickson
Friday 1 September 2017  – 1:00-1:30 pm
The Nook – State Library of Western Australia
Free. No bookings required.

Be inspired by the botanical wonders of Western Australia as Australian artist Dr Dorothy Erickson discusses some of the marvels on display in the exhibition.

Nature's Showground 1940_001

Nature’s Showground, 1940. The Western Mail, State Library of Western Australia, 630.5WES.

Botanical Wonderland is a partnership between the Royal Western Australian Historical Society, the Western Australian Museum and the State Library of Western Australia. The exhibition is on display at the State Library until 24 September 2017.

Image: Acc 9131A/4: Lilian Wooster Greaves, pressed wildflower artwork, ‘Westralia’s Wonderful Wildflowers’, c1929

Running applications and unittests without "make install"

Published 15 Aug 2017 by dfaure in blogs.

KDE Project:

In our Akademy presentation, Kévin and I showed the importance for a better developer story to be able to work on a KDE module without having to install it. Running unittests and running applications without installing the module at all is possible, it turns out, it just needs a bit of effort to set things up correctly.

Once you require ECM version 5.38 (using find_package(ECM 5.38)), your libraries, plugins and executables will all go to the builddir's "bin" directory, instead of being built in the builddir where they are defined.
Remember to wipe out your builddir first, to avoid running outdated unit tests!
This change helps locating helper binaries, and plugins (depending on how they are loaded).

After doing that, see if this works:

  • make uninstall
  • ctest . (or run the application)

Oops, usually it doesn't work. Here's what you might have to do to fix things.

  • XMLGUI files: since KDE Frameworks 5.4, they can be embedded into a qrc file so that they can be found without being installed.
    The qrc should put the xmlgui file under ":/kxmlgui5/". You can use the script kde-dev-scripts/kf5/ to automate most of this change.
  • Uninstalled plugins can be found at runtime if they are installed into the same subdir of the "bin" dir as they will be in their final destination. For instance, the cmake line install(TARGETS kio_file DESTINATION ${KDE_INSTALL_PLUGINDIR}/kf5/kio) indicates that you want the uninstalled plugin to be in builddir/bin/kf5/kio, which can be done with the following line:
    set_target_properties(kio_file PROPERTIES LIBRARY_OUTPUT_DIRECTORY "${CMAKE_BINARY_DIR}/bin/kf5/kio")
    Qt uses the executable's current directory as one of the search paths for plugins, so this then works out of the box.
  • If ctest complains that it can't find the unittest executable, the fix is very simple: instead of the old syntax add_test(testname myexec) you want to use the newer syntax add_test(NAME testname COMMAND myexec)
  • Helper binaries for libraries: look for them locally first. Example from KIO:
    QString kioexec = QCoreApplication::applicationDirPath() + "/kioexec";
    if (!QFileInfo::exists(kioexec))
        kioexec = CMAKE_INSTALL_FULL_LIBEXECDIR_KF5 "/kioexec"; // this was the original line of code
  • Helper binaries for unittests: an easy solution is to just change the current directory to the bin dir, so that ./myhelper continues to work. This can be done with QDir::setCurrent(QCoreApplication::applicationDirPath());

There are two issues I didn't solve yet: trader queries that should find uninstalled desktop files, and QML components, like in kirigami. It seems that the only solution for the latter is to reorganize the source dir to have the expected layout "org/kde/kirigami.2/*"?

Update: this howto is now a wiki page.

Archival software survey

Published 8 Aug 2017 by inthemailbox in In the mailbox.

A few months ago, I asked my colleagues in the Archives Live Archives and recordkeeping software group to undertake a short survey for me, looking at archival description and management systems in use in Australia. I used the free SurveyMonkey site (ten simple questions) and promoted the survey on the Archives Live site and via my personal twitter account. I got 39 responses from a possible pool of 230 members, in a four week period.

The majority of respondents worked in a combination archive, taking both transfers from inhouse records creators as well as accepting donations or purchasing material for their collections (58.97%).  Small archives, with 2-4 staff (qualifications not specified), were slightly ahead of lone arrangers (48.7% and 30.7%). 11 were school archives and 7 from universities. There was a smattering of religious institutions, local council collections and government institutions, plus a couple of companies who held archives of their business.

Most archivists said they could use excel and word (92%), so it is not surprising that 25.6% of them created finding aids and archival aids using word documents and spreadsheets. However, the majority of finding aids are created using online systems and archive management software.

Software identified in responses to the survey included:

Both Tabularium and Archive Manager were created here in Australia and have good compliance with the Australian series system.   Tabularium was created by David Roberts and distributed by State Records NSW; however, it is no longer maintained. Archive Manager was created for use with Windows PCs, and has recently been sold to the UK.

In looking at new software requirements, respondents expressed a remarkable degree of frustration with old, clunky software which was not properly maintained or could not be easily updated either by themselves or by a provider. Ease of use, the ability to make collection content available online, integrate digital components and work with an EDRMS or other records management system were all identified as something for the modern archival management system. Concerns were raised about making  donor and other personal and confidential information available, so some degree of authority control and viewing permissions was also required.

Whether one system can meet all these requirements is yet to be seen. It may be better to focus on a range of systems that have some degree of interoperability and on standards for transferring data from one to the other. Either way, archivists in Australia are eager and ready to embrace new ways of working and for a new generation of archival software.



The mysterious case of the changed last modified dates

Published 31 Jul 2017 by Jenny Mitcham in Digital Archiving at the University of York.

Today's blog post is effectively a mystery story.

Like any good story it has a beginning (the problem is discovered, the digital archive is temporarily thrown into chaos), a middle (attempts are made to solve the mystery and make things better, several different avenues are explored) and an end (the digital preservation community come to my aid).

This story has a happy ending (hooray) but also includes some food for thought (all the best stories do) and as always I'd be very pleased to hear what you think.

The beginning

I have probably mentioned before that I don't have a full digital archive in place just yet. While I work towards a bigger and better solution, I have a set of temporary procedures in place to ingest digital archives on to what is effectively a piece of locked down university filestore. The procedures and workflows are both 'better than nothing' and 'good enough' as a temporary measure and actually appear to take us pretty much up to Level 2 of the NDSA Levels of Preservation (and beyond in some places).

One of the ways I ensure that all is well in the little bit of filestore that I call 'The Digital Archive' is to run frequent integrity checks over the data, using a free checksum utility. Checksums (effectively unique digital fingerprints) for each file in the digital archive are created when content is ingested and these are checked periodically to ensure that nothing has changed. IT keep back-ups of the filestore for a period of three months, so as long as this integrity checking happens within this three month period (in reality I actually do this 3 or 4 times a month) then problems can be rectified and digital preservation nirvana can be seamlessly restored.

Checksum checking is normally quite dull. Thankfully it is an automated process that runs in the background and I can just get on with my work and cheer when I get a notification that tells me all is well. Generally all is well, it is very rare that any errors are highlighted - when that happens I blog about it!

I have perhaps naively believed for some time that I'm doing everything I need to do to keep those files safe and unchanged because if the checksum is the same then all is well, however this month I encountered a problem...

I've been doing some tidying of the digital archive structure and alongside this have been gathering a bit of data about the archives, specifically looking at things like file formats, number of unidentified files and last modified dates.

Whilst doing this I noticed that one of the archives that I had received in 2013 contained 26 files with a last modified date of 18th January 2017 at 09:53. How could this be so if I have been looking after these files carefully and the checksums are the same as they were when the files were deposited?

The 26 files were all EML files - email messages exported from Microsoft Outlook. These were the only EML files within the whole digital archive. The files weren't all in the same directory and other files sitting in those directories retained their original last modified dates.

The middle

So this was all a bit strange...and worrying too. Am I doing my job properly? Is this something I should be bringing to the supportive environment of the DPC's Fail Club?

The last modified dates of files are important to us as digital archivists. This is part of the metadata that comes with a file. It tells us something about the file. If we lose this date are we losing a little piece of the authentic digital object that we are trying to preserve?

Instead of beating myself up about it I wanted to do three things:

  1. Solve the mystery (find out what happened and why)
  2. See if I could fix it
  3. Stop it happening again
So how could it have happened? Has someone tampered with these 26 files? Perhaps unlikely considering they all have the exact same date/time stamp which to me suggests a more automated process. Also, the digital archive isn't widely accessible. Quite deliberately it is only really me (and the filestore administrators) who have access.

I asked IT whether they could explain it. Had some process been carried out across all filestores that involved EML files specifically? They couldn't think of a reason why this may have occurred. They also confirmed my suspicions that we have no backups of the files with the original last modified dates.

I spoke to a digital forensics expert from the Computer Science department and he said he could analyse the files for me and see if he could work out what had acted on them and also suggest a methodology of restoring the dates.

I have a record of the last modified dates of these 26 files when they arrived - the checksum tool that I use writes the last modified date to the hash file it creates. I wondered whether manually changing the last modified dates back to what they were originally was the right thing to do or whether I should just accept and record the change.

...but I decided to sit on it until I understood the problem better.

The end

I threw the question out to the digital preservation community on Twitter and as usual I was not disappointed!

In fact, along with a whole load of discussion and debate, Andy Jackson was able to track down what appears to be the cause of the problem.

He very helpfully pointed me to a thread on StackExchange which described the issue I was seeing.

It was a great comfort to discover that the cause of this problem was apparently a bug and not something more sinister. It appears I am not alone!

...but what now?

So I now I think I know what caused the problem but questions remain around how to catch issues like this more quickly (not six months after it has happened) and what to do with the files themselves.

IT have mentioned to me that an OS upgrade may provide us with better auditing support on the filestore. Being able to view reports on changes made to digital objects within the digital archive would be potentially very useful (though perhaps even that wouldn't have picked up this Windows bug?). I'm also exploring whether I can make particular directories read only and whether that would stop issues such as this occurring in the future.

If anyone knows of any other tools that can help, please let me know.

The other decision to make is what to do with the files themselves. Should I try and fix them? More interesting debate on Twitter on this topic and even on the value of these dates in the first place. If we can fudge them then so can others - they may have already been fudged before they got to the digital archive - in which case, how much value do they really have?

So should we try and fix last modified dates or should we focus our attention on capturing and storing them within the metadata. The later may be a more sustainable solution in the longer term, given their slightly slippery nature!

I know there are lots of people interested in this topic - just see this recent blog post by Sarah Mason and in particular the comments - When was that?: Maintaining or changing ‘created’ and ‘last modified’ dates. It is great that we are talking about real nuts and bolts of digital preservation and that there are so many people willing to share their thoughts with the community.

...and perhaps if you have EML files in your digital archive you should check them too!

Roundup: Welcome, on news, bad tools and great tools

Published 28 Jul 2017 by Carlos Fenollosa in Carlos Fenollosa — Blog.

I'm starting a series of posts with a summary of the most interesting links I found. The concept of "social bookmarks" has always been interesting, but no implementation is perfect. was probably the closest to a good enough service, but in the end, we all just post them to Twitter and Facebook for shares and likes.

Unfortunately, Twitter search sucks, and browser bookmarks rot quickly. That's why I'm trying this new model of social + local, not only for my readers but also for myself. Furthermore, writing a tapas-sized post is much faster than a well-thought one.

Hopefully, forcing myself to post periodically —no promises, though— will encourage me to write regular articles sometimes.

Anyway, these posts will try to organize links I post on my Twitter account and provide a bit more of context.

While other friends publish newsletters, I still believe RSS can work well, so subscribe to the RSS if you want to get these updates. Another option is to use some of the services which deliver feeds by email, like Feenbox which, by the way may never leave alpha, so drop me an email if you want an invitation.


RTVE, the Spanish public TV, has uploaded a few Bit a bit episodes. It was a rad early-90s show that presented video games and the early Internet.

On news

I quit reading news 3 years ago. A recent article from Tobias Rose-Stockwell digs deep into how your fear and outrage are being sold for profit by the Media.

@xurxof recommended a 2012 article from Rolf Dobelli, Avoid News. Towards a Healthy News Diet

LTE > Fiber

I was having router issues and realized how my cellphone internet is sometimes more reliable than my home fiber.

It seems to be more common than you'd think, read the Twitter replies! XKCD also recently posted a comic on this


There was a discussion on on tools to journal your workday, which was one of the reasons that led me to try out these roundup posts.

New keyboard

I bought a Matias Clicky mechanical keyboard which sounds like a minigun. For all those interested in mechanical keyboards, you must watch Thomas's Youtube channel

The new board doesn't have a nav cluster, so I configured Ctrl-HJKL to be the arrow keys. It gets a few days to get used to, but since then, I've been using that combination even when I'm using a keyboard with arrow keys.

Slack eats CPU cycles

Slack was eating a fair amount of my CPU while my laptop was trying to build a Docker image and sync 3000 files on Dropbox. Matthew O'Riordan also wrote Where’s all my CPU and memory gone? The answer: Slack

Focus, focus, focus!

I'm a subscriber and use it regularly, especially when I'm working on the train or in a busy cafe.

musicForProgramming() is a free resource with a variety of music and also provides a podcast feed for updates.

Tags: roundup

Comments? Tweet  

My letter to the Boy Scouts of America

Published 25 Jul 2017 by legoktm in The Lego Mirror.

The following is a letter I just mailed to the Boy Scouts of America, following President Donald Trump's speech at the National Jamboree. I implore my fellow scouts to also contact the BSA to express their feelings.

25 July 2017

Boy Scouts of America
PO Box 152079
Irving, TX

Dear Boy Scouts of America,

Like many others I was extremely disappointed and disgusted to hear about the contents of President Donald Trump’s speech to the National Jamboree. Politics aside, I have no qualms with inviting the president, or having him speak to scouts. I was glad that some of the Eagle Scouts currently serving at high levels of our government were recognized for their accomplishments.

However above all, the Boy Scouts of America must adhere to the values of the Scout Law, and it was plainly obvious that the president’s speech did not. Insulting opponents is not “kindness”. Threatening to fire a colleague is not “loyal”. Encouraging boos of a former President is not “courteous”. Talking about fake news and media is not “trustworthy”. At the end of the day, the values of the Scout Law are the most important lesson we must instill in our youth – and President Trump showed the opposite.

The Boy Scouts of America must send a strong message to the public, and most importantly the young scouts that were present, that the president’s speech was not acceptable and does not embody the principles of the Boy Scouts of America.

I will continue to speak well of scouting and the program to all, but incidents like this will only harm future boys who will be dissuaded from joining the organization in the first place.

Kunal Mehta
Eagle Scout, 2012
Troop 294
San Jose, CA

How do I get my MediaWiki site to use templates? [closed]

Published 21 Jul 2017 by Cyberherbalist in Newest questions tagged mediawiki - Webmasters Stack Exchange.

My MediaWiki site is currently using v1.24.4.

I don't seem to have many templates installed, and some very important ones seem to be missing. For example, I can't use the Reference List template. If I do put references in an article, with {{reflist}} at the bottom, the template comes across as a redlink:


Are templates something that have to be installed separately? And if so, how do I go about it.

My site is hosted by DreamHost.

Building the Lego Saturn V rocket 48 years after the moon landing

Published 20 Jul 2017 by legoktm in The Lego Mirror.

Full quality video available on Wikimedia Commons.

On this day 48 years ago, three astronauts landed on the moon after flying there in a Saturn V rocket.

Today I spent four hours building the Lego Saturn V rocket - the largest Lego model I've ever built. Throughout the process I was constantly impressed with the design of the rocket, and how it all came together. The attention paid to the little details is outstanding, and made it such a rewarding experience. If you can find a place that has them in stock, get one. It's entirely worth it.

The rocket is designed to be separated into the individual stages, and the lander actually fits inside the rocket. Vertically, it's 3ft, and comes with three stands so you can show it off horizontally.

As a side project, I also created a timelapse of the entire build, using some pretty cool tools. After searching online how to have my DSLR take photos on a set interval and being frustrated with all the examples that used a TI-84 calculator, I stumbled upon gphoto2, which lets you control digital cameras. I ended up using a command as simple as gphoto2 --capture-image-and-download -I 30 to have it take and save photos every 30 seconds. The only negative part is that it absolutely killed the camera's battery, and within an hour I needed to switch the battery.

To stitch the photos together (after renaming them a bit), ffmpeg came to the rescue: ffmpeg -r 20 -i "%04d.jpg" -s hd1080 -vcodec libx264 time-lapse.mp4. Pretty simple in the end!

Song Club Showcase

Published 14 Jul 2017 by Dave Robertson in Dave Robertson.

While the finishing touches are being put on the album, I’m going solo with other Freo songwriter’s at the Fib.


Song Club Showcase

Published 14 Jul 2017 by Dave Robertson in Dave Robertson.

While the finishing touches are being put on the album, I’m going solo with other Freo songwriter’s at the Fib.


Wikidata Map July 2017

Published 11 Jul 2017 by addshore in Addshore.

It’s been 9 months since my last Wikidata map update and once again we have many new noticable areas appearing, including Norway, South Africa, Peru and New Zealand to name but a few.  As with the last map generation post I once again created a diff image so that the areas of change are easily identifiable comparing the data from July 2017 with that from my last post on October 2016.

The various sizes of the generated maps can be found on Wikimedia Commons:

Reasons for increases

If you want to have a shot at figuring out the cause of the increases in specific areas then take a look at my method described in the last post using the Wikidata Query Service.

Peoples discoveries so far:

I haven’t included the names of those that discovered reasons for areas of increase above, but if you find your discovery here and want credit just ask!

Preserving Google docs - decisions and a way forward

Published 7 Jul 2017 by Jenny Mitcham in Digital Archiving at the University of York.

Back in April I blogged about some work I had been doing around finding a suitable export (and ultimately preservation) format for Google documents.

This post has generated a lot of interest and I've had some great comments both on the post itself and via Twitter.

I was also able to take advantage of a slot I had been given at last week's Jisc Research Data Network event to introduce the issue to the audience (who had really come to hear me talk about something else but I don't think they minded).

There were lots of questions and discussion at the end of this session, mostly focused on the Google Drive issue rather than the rest of the talk. I was really pleased to see that the topic had made people think. In a lightening talk later that day, William Kilbride, Executive Director of The Digital Preservation Coalition mused on the subject of "What is data?". Google Drive was one of the examples he used, asking where does the data end and the software application start?

I just wanted to write a quick update on a couple of things - decisions that have been made as a result of this work and attempts to move the issue forward.

Decisions decisions

I took a summary of the Google docs data export work to my colleagues in a Research Data Management meeting last month in order to discuss a practical way forward for the institutional research data we are planning on capturing and preserving.

One element of the Proof of Concept that we had established at the end of phase 3 of Filling the Digital Preservation Gap was a deposit form to allow researchers to deposit data to the Research Data York service.

As well as the ability to enable researchers to browse and select a file or a folder on their computer or network, this deposit form also included a button to allow deposit to be carried out via Google Drive.

As I mentioned in a previous post, Google Drive is widely used at our institution. It is clear that many researchers are using Google Drive to collect, create and analyse their research data so it made sense to provide an easy way for them to deposit direct from Google Drive. I just needed to check out the export options and decide which one we should support as part of this automated export.

However, given the inconclusive findings of my research into export options it didn't seem that there was one clear option that adequately preserved the data.

As a group we decided the best way out of this imperfect situation was to ask researchers to export their own data from Google Drive in whatever format they consider best captures the significant properties of the item. By exporting themselves in a manual fashion prior to upload, this does give them the opportunity to review and check their files and make their own decision on issues such as whether comments are included in the version of their data that they upload to Research Data York.

So for the time being we are disabling the Google Drive upload button from our data deposit interface....which is a shame because a certain amount of effort went into getting that working in the first place.

This is the right decision for the time being though. Two things need to happen before we can make this available again:

  1. Understanding the use case - We need to gain a greater understanding of how researchers use Google Drive and what they consider to be 'significant' about their native Google Drive files.
  2. Improving the technology - We need to make some requests to Google to make the export options better.

Understanding the use case

We've known for a while that some researchers use Google Drive to store their research data. The graphic below was taken from a survey we carried out with researchers in 2013 to find out about current practice across the institution. 

Of the 188 researchers who answered the question "Where is your digital research data stored (excluding back up copies)?" 22 mentioned Google Drive. This is only around 12% of respondents but I would speculate that over the last four years, use of Google Drive will have increased considerably as Google applications have become more embedded within the working practices of staff and students at the University.

Where is your digital research data stored (excluding back up copies)?

To understand the Google Drive use case today I really need to talk to researchers.

We've run a couple of Research Data Management teaching sessions over the last term. These sessions are typically attended by PhD students but occasionally a member of research staff also comes along. When we talk about data storage I've been asking the researchers to give a show of hands as to who is using Google Drive to store at least some of their research data.

About half of the researchers in the room raise their hand.

So this is a real issue. 

Of course what I'd like to do is find out exactly how they are using it. Whether they are creating native Google Drive files or just using Google Drive as a storage location or filing system for data that they create in another application.

I did manage to get a bit more detail from one researcher who said that they used Google Drive as a way of collaborating on their research with colleagues working at another institution but that once a document has been completed they will export the data out of Google Drive for storage elsewhere. 

This fits well with the solution described above.

I also arranged a meeting with a Researcher in our BioArCh department. Professor Matthew Collins is known to be an enthusiastic user of Google Drive.

Talking to Matthew gave me a really interesting perspective on Google Drive. For him it has become an essential research tool. He and his colleagues use many of the features of the Google Suite of tools for their day to day work and as a means to collaborate and share ideas and resources, both internally and with researchers in other institutions. He showed me PaperPile, an extension to Google Drive that I had not been aware of. He uses this to manage his references and share them with colleagues. This clearly adds huge value to the Google Drive suite for researchers.

He talked me through a few scenarios of how they use Google - some, (such as the comments facility) I was very much aware of. Others, I've not used myself such as the use of the Google APIs to visualise for example activity on preparing a report in Google Drive - showing a time line and when different individuals edited the document. Now that looks like fun!

He also talked about the importance of the 'previous versions' information that is stored within a native Google Drive file. When working collaboratively it can be useful to be able to track back and see who edited what and when. 

He described a real scenario in which he had had to go back to a previous version of a Google Sheet to show exactly when a particular piece of data had been entered. I hadn't considered that the previous versions feature could be used to demonstrate that you made a particular discovery first. Potentially quite important in the competitive world of academic research.

For this reason Matthew considered the native Google Drive file itself to be "the ultimate archive" and "a virtual collaborative lab notebook". A flat, static export of the data would not be an adequate replacement.

He did however acknowledge that the data can only exist for as long as Google provides us with the facility and that there are situations where it is a good idea to take a static back up copy.

He mentioned that the precursor to Google Docs was a product called Writely (which he was also an early adopter of). Google bought Writely in 2006 after seeing the huge potential in this online word processing tool. Matthew commented that backwards compatibility became a problem when Google started making some fundamental changes to the way the application worked. This is perhaps the issue that is being described in this blog post: Google Docs and Backwards Compatibility.

So, I'm still convinced that even if we can't preserve a native Google Drive file perfectly in a static form, this shouldn't stop us having a go!

Improving the technology

Along side trying to understand how researchers use Google Drive and what they consider to be significant and worthy of preservation, I have also been making some requests and suggestions to Google around their export options. There are a few ideas I've noted that would make it easier for us to archive the data.

I contacted the Google Drive forum and was told that as a Google customer I was able to log in and add my suggestions to Google Cloud Connect so this I did...and what I asked for was as follows:

  • Please can we have a PDF/A export option?
  • Please could we choose whether or not to export comments or not ...and if we are exporting comments can we choose whether historic/resolved comments are also exported
  • Please can metadata be retained - specifically the created and last modified dates. (Author is a bit trickier - in Google Drive a document has an owner rather than an author. The owner probably is the author (or one of them) but not necessarily if ownership has been transferred).
  • I also mentioned a little bug relating to comment dates that I found when exporting a Google document containing comments out into docx format and then importing it back again.
Since I submitted these feature requests and comments in early May it has all gone very very quiet...

I have a feeling that ideas only get anywhere if they are popular ...and none of my ideas are popular ...because they do not lead to new and shiny functionality.

Only one of my suggestions (re comments) has received a vote by another member of the community.

So, what to do?

Luckily, since having spoken about my problem at the Jisc Research Data Network, two people have mentioned they have Google contacts who might be interested in hearing my ideas.

I'd like to follow up on this, but in the meantime it would be great if people could feedback to me. 

  • Are my suggestions sensible? 
  • Are there are any other features that would help the digital preservation community preserve Google Drive? I can't imagine I've captured everything...

The UK Archivematica group goes to Scotland

Published 6 Jul 2017 by Jenny Mitcham in Digital Archiving at the University of York.

Yesterday the UK Archivematica group met in Scotland for the first time. The meeting was hosted by the University of Edinburgh and as always it was great to be able to chat informally to other Archivematica users in the UK and find out what everyone is up to.

The first thing to note was that since this group of Archivematica ‘explorers’ first met in 2015 real and tangible progress seems to have been made. This was encouraging to see. This is particularly the case at the University of Edinburgh. Kirsty Lee talked us through their Archivematica implementation (now in production) and the steps they are taking to ingest digital content.

One of the most interesting bits of her presentation was a discussion about appraisal of digital material and how to manage this at scale using the available tools. When using Archivematica (or other digital preservation systems) it is necessary to carry out appraisal at an early stage before an Archival Information Package (AIP) is created and stored. It is very difficult (perhaps impossible) to unpick specific files from an AIP at a later date.

Kirsty described how one of her test collections has been reduced from 5.9GB to 753MB using a combination of traditional and technical appraisal techniques. 

Appraisal is something that is mentioned frequently in digital preservation discussions. There was a group talking about just this a couple of weeks ago at the recent DPC unconference ‘Connecting the Bits’. 

As ever it was really valuable to hear how someone is moving forward with this in a practical way. 

It will be interesting to find out how these techniques can be applied at scale of some of the larger collections Kirsty intends to work with.

Kirsty recommended an article by Victoria Sloyan, Born-digital archives at the Wellcome Library: appraisal and sensitivity review of two hard drives which was helpful to her and her colleagues when formulating their approach to this thorny problem.

She also referenced the work that the Bentley Historical Library at University of Michigan have carried out with Archivematica and we watched a video showing how they have integrated Archivematica with DSpace. This approach has influenced Edinburgh’s internal discussions about workflow.

Kirsty concluded with something that rings very true for me (in fact I think I said it myself the two presentations I gave last week!). Striving for perfection isn’t helpful, the main thing is just to get started and learn as you go along.

Rachel McGregor from the University of Lancaster gave an entertaining presentation about the UK Archivematica Camp that was held in York in April, covering topics as wide ranging as the weather, the food and finally feeling the love for PREMIS!

I gave a talk on work at York to move Archivematica and our Research Data York application towards production. I had given similar talks last week at the Jisc Research Data Network event and a DPC briefing day but I took a slightly different focus this time. I wanted to drill in a bit more detail into our workflow, the processing configuration within Archivematica and some problems I was grappling with. 

It was really helpful to get some feedback and solutions from the group on an error message I’d encountered whilst preparing my slides the previous day and to have a broader discussion on the limitations of web forms for data upload. This is what is so good about presenting within a small group setting like this as it allows for informality and genuinely productive discussion. As a result of this I over ran and made people wait for their lunch (very bad form I know!)

After lunch John Kaye updated the group on the Jisc Research Data Shared Service. This is becoming a regular feature of our meetings! There are many members of the UK Archivematica group who are not involved in the Jisc Shared Service so it is really useful to be able to keep them in the loop. 

It is clear that there will be a substantial amount of development work within Archivematica as a result of its inclusion in the Shared Service and features will be made available to all users (not just those who engage directly with Jisc). One example of this is containerisation which will allow Archivematica to be more quickly and easily installed. This is going to make life easier for everyone!

Sean Rippington from the University of St Andrews gave an interesting perspective on some of the comparison work he has been doing of Preservica and Archivematica. 

Both of these digital preservation systems are on offer through the Jisc Shared Service and as a pilot institution St Andrews has decided to test them side by side. Although he hasn’t yet got his hands on both, he was still able to offer some really useful insights on the solutions based on observations he has made so far. 

First he listed a number of similarities - for example alignment with the OAIS Reference Model, the migration-based approach, the use of microservices and many of the tools and standards that they are built on.

He also listed a lot of differences - some are obvious, for example one system is commercial and the other open source. This leads to slightly different models for support and development. He mentioned some of the additional functionality that Preservica has, for example the ability to handle emails and web archives and the inclusion of an access front end. 

He also touched on reporting. Preservica does this out of the box whereas with Archivematica you will need to use a third party reporting system. He talked a bit about the communities that have adopted each solution and concluded that Preservica seems to have a broader user base (in terms of the types of institution that use it). The engaged, active and honest user community for Archivematica was highlighted as a specific selling point and the work of the Filling the Digital Preservation Gap project (thanks!).

Sean intends to do some more detailed comparison work once he has access to both systems and we hope he will report back to a future meeting.

Next up we had a collaborative session called ‘Room 101’ (even though our meeting had been moved to room 109). Considering we were encouraged to grumble about our pet hates this session came out with some useful nuggets:

After coffee break we were joined (remotely) by several representatives from the OSSArcFlow project from Educopia and the University of North Carolina. This project is very new but it was great that they were able to share with us some information about what they intend to achieve over the course of the two year project. 

They are looking specifically at preservation workflows using open source tools (specifically Archivematica, BitCurator and ArchivesSpace) and they are working with 12 partner institutions who will all be using at least two of these tools. The project will not only provide training and technical support, but will fully document the workflows put in place at each institution. This information will be shared with the wider community. 

This is going to be really helpful for those of us who are adopting open source preservation tools, helping to answer some of those niggling questions such as how to fill the gaps and what happens when there are overlaps in the functionality of two tools.

We registered our interest in continuing to be kept in the loop about this project and we hope to hear more at a future meeting.

The day finished with a brief update from Sara Allain from Artifactual Systems. She talked about some of the new things that are coming in version 1.6.1 and 1.7 of Archivematica.

Before leaving Edinburgh it was a pleasure to be able to join the University at an event celebrating their progress in digital preservation. Celebrations such as this are pretty few and far between - perhaps because digital preservation is a task that doesn’t have an obvious end point. It was really refreshing to see an institution publicly celebrating the considerable achievements made so far. Congratulations to the University of Edinburgh!

Hot off the press…

Published 4 Jul 2017 by Tom Wilson in thomas m wilson.


Hot off the press…

Published 4 Jul 2017 by Tom Wilson in thomas m wilson.


Can't connect to MediaWiki on Nginx server [duplicate]

Published 4 Jul 2017 by Marshall S. Lee in Newest questions tagged mediawiki - Server Fault.

This question is an exact duplicate of:

I downloaded and configured MediaWiki on the Ubuntu server. I'm running it on Nginx, so I opened the nginx.conf file and modified the server part as follows.

 38     server {
 39         listen 80;
 40         server_name;
 42         access_log /var/log/nginx/access-wiki.log;
 43         error_log /var/log/nginx/error-wiki.log;
 45         charset utf-8;
 46         passenger_enabled on;
 47         client_max_body_size 50m;
 49         location / {
 50             root /var/www/html/mediawiki;
 51             index index.php;
 52         }
 54         # pass the PHP scripts to FastCGI server
 55         location ~ \.php$ {
 56             root           html;
 57             fastcgi_pass   unix:/var/run/php/php7.0-fpm.sock;
 58             fastcgi_index  index.php;
 59             fastcgi_param  SCRIPT_FILENAME  /scripts$fastcgi_script_name;
 60             include        fastcgi_params;
 61         }
 63         # deny access to .htaccess files, if Apache's document root
 64         # concurs with nginx's one
 66         location ~ /\.ht {
 67             deny  all;
 68         }
 69     }

After editing, I restarted the Nginx and now I started facing another problem. Every time I try to access the webpage by with the domain above, I keep failing to face the main page of MediaWiki but I receive a file instead, which says the following.

 * This is the main web entry point for MediaWiki.
 * If you are reading this in your web browser, your server is probably
 * not configured correctly to run PHP applications!
 * See the README, INSTALL, and UPGRADE files for basic setup instructions
 * and pointers to the online documentation.
 * ----------
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation; either version 2 of the License, or
 * (at your option) any later version.
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * GNU General Public License for more details.
 * You should have received a copy of the GNU General Public License along
 * with this program; if not, write to the Free Software Foundation, Inc.,
 * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
 * @file

// Bail on old versions of PHP, or if composer has not been run yet to install
// dependencies. Using dirname( __FILE__ ) here because __DIR__ is PHP5.3+.
// @codingStandardsIgnoreStart MediaWiki.Usage.DirUsage.FunctionFound
require_once dirname( __FILE__ ) . '/includes/PHPVersionCheck.php';
// @codingStandardsIgnoreEnd
wfEntryPointCheck( 'index.php' );

require __DIR__ . '/includes/WebStart.php';

$mediaWiki = new MediaWiki();

Now, in the middle of the setup, I'm almost lost and have no idea how to work it out. I created a file hello.html in the root directory and accessed the page via This is working. I do believe that the PHP configuration part is causing the errors, but I don't know how to fix it.


Published 4 Jul 2017 by fabpot in Tags from Twig.

Roundcube Webmail 1.3.0 released

Published 25 Jun 2017 by Roundcube Webmail Dev Team in Roundcube Webmail Project News.

We proudly announce the stable version 1.3.0 of Roundcube Webmail which is now available for download. With this milestone we introduce new features since the 1.2 version:

Plus security and deployment improvements:

And finally some code-cleanup:

IMPORTANT: The code-cleanup part brings major changes and possibly incompatibilities to your existing Roundcube installations. So please read the Changelog carefully and thoroughly test your upgrade scenario.

Please note that Roundcube 1.3

  1. no longer runs on PHP 5.3
  2. no longer supports IE < 10 and old versions of Firefox, Chrome and Safari
  3. requires an SMTP server connection to send mails
  4. uses jQuery 3.2 and will not work with current jQuery mobile plugin

With the release of Roundcube 1.3.0, the previous stable release branches 1.2.x and 1.1.x will switch in to LTS low maintenance mode which means they will only receive important security updates but no longer any regular improvement updates.

Wikimedia Hackathon at home project

Published 24 Jun 2017 by legoktm in The Lego Mirror.

This is the second year I haven't been able to attend the Wikimedia Hackathon due to conflicts with my school schedule (I finish at the end of June). So instead I decided I would try and accomplish a large-ish project that same weekend, but at home. I'm probably more likely to get stuff done while at home because I'm not chatting up everyone in person!

Last year I converted OOjs-UI to use PHP 5.5's traits instead of a custom mixin system. That was a fun project for me since I got to learn about traits and do some non-MediaWiki coding, while still reducing our technical debt.

This year we had some momentum on MediaWiki-Codesniffer changes, so I picked up one of our largest tasks which had been waiting - to upgrade to the 3.0 upstream PHP_CodeSniffer release. Being a new major release there were breaking changes, including a huge change to the naming and namespacing of classes. My current diffstat on the open patch is +301, -229, so it is roughly the same size as last year. The conversion of our custom sniffs wasn't too hard, the biggest issue was actually updating our test suite.

We run PHPCS against test PHP files and verify the output matches the sniffs that we expect. Then we run PHPCBF, the auto-fixer, and check that the resulting "fixed" file is what we expect. The first wasn't too bad, it just calls the relevant internal functions to run PHPCS, but the latter would have PHPBCF output in a virtual filesystem, shells out to create a diff, and then tries to put it back together. Now, we just get the output from the relevant PHPCS class, and compare it to the expected test output.

This change was included in the 0.9.0 release of MediaWiki-Codesniffer and is in use by many MediaWiki extensions.

Emulation for preservation - is it for me?

Published 23 Jun 2017 by Jenny Mitcham in Digital Archiving at the University of York.

I’ve previously been of the opinion that emulation isn’t really for me.

I’ve seen presentations about emulation at conferences such as iPRES and it is fair to say that much of it normally goes over my head.

This hasn’t been helped by the fact that I’ve not really had a concrete use case for it in my own work - I find it so much easier to relate and engage to a topic or technology if I can see how it might be directly useful to me.

However, for a while now I’ve been aware that emulation is what all the ‘cool kids’ in the digital preservation world seem to be talking about. From the very migration heavy thinking of the 2000’s it appears that things are now moving in a different direction.

This fact first hit my radar at the 2014 Digital Preservation Awards where the University of Freiburg won the The OPF Award for Research and Innovation award for their work on Emulation as a Service with bwFLA Functional Long Term Archiving and Access.

So I was keen to attend the DPC event Halcyon, On and On: Emulating to Preserve to keep up to speed... not only because it was hosted on the doorstep in the centre of my home town of York!

It was an interesting and enlightening day. As usual the Digital Preservation Coalition did a great job of getting all the right experts in the room (sometimes virtually) at the same time, and a range of topics and perspectives were covered.

After an introduction from Paul Wheatley we heard from the British Library about their experiences of doing emulation as part of their Flashback project. No day on emulation would be complete without a contribution from the University of Freiburg. We had a thought provoking talk via WebEx from Euan Cochrane of Yale University Library and an excellent short film created by Jason Scott from the Internet Archive. One of the highlights for me was Jim Boulton talking about Digital Archaeology - and that wasn’t just because it had ‘Archaeology’ in the title (honest!). His talk didn’t really cover emulation, it related more to that other preservation strategy that we don’t talk about much anymore - hardware preservation. However, many of the points he raised were entirely relevant to emulation - for example, how to maintain an authentic experience, how you define what the significant properties of an item actually are and what decisions you have to make as a curator of the digital past. It was great to see how engaged the public were with his exhibitions and how people interacted with it.

Some of the themes of the day and take away thoughts for me:

Thinking about how this all relates to me and my work, I am immediately struck by two use cases.

Firstly research data - we are taking great steps forward in enabling this data to be preserved and maintained for the long term but will it be re-usable? For many types of research data there is no clear migration strategy. Emulation as a strategy for accessing this data ten or twenty years from now needs to be seriously considered. In the meantime we need to ensure we can identify the files themselves and collect adequate documentation - it is these things that will help us to enable reuse through emulators in the future.

Secondly, there are some digital archives that we hold at the Borthwick Institute from the 1980's. For example I have been working on a batch of WordStar files in my spare moments over the last few years. I'd love to get a contemporary emulator fired up and see if I could install WordStar and work with these files in their native setting. I've already gone a little way down the technology preservation route, getting WordStar installed on an old Windows 98 PC and viewing the files, but this isn't exactly contemporary. These approaches will help to establish the significant properties of the files and assess how successful subsequent migration strategies are....but this is a future blog post.

It was a fun event and it was clear that everybody loves a bit of nostalgia. Jim Boulton ended his presentation saying "There is something quite romantic about letting people play with old hardware".

We have come a long way and this is most apparent when seeing artefacts (hardware, software, operating systems, data) from early computing. Only this week whilst taking the kids to school we got into a conversation about floppy disks (yes, I know...). I asked the kids if they knew what they looked like and they answered "Yes, it is the save icon on the computer"(see Why is the save icon still a floppy disk?)...but of course they've never seen a real one. Clearly some obsolete elements of our computer history will remain in our collective consciousness for many years and perhaps it is our job to continue to keep them alive in some form.

Quick Method to wget my local wiki... need advice (without dumping mysql)

Published 23 Jun 2017 by WubiUbuntu980 Unity7 Refugee in Newest questions tagged mediawiki - Ask Ubuntu.

I need advice.

I have a webserver vm (LAN, not on the internet), it has 2 wikis:



I want to wget only the homework wiki pages, without crawling into the GameWiki?

My goal is to just get the .htmls (ignore all other files images etc), with wget. (I dont want to do a mysqldump or mediawiki export, but rather wget for my (non-IT) boss who just wants to double click the html).

How can I run wget to only crawl the HomeWorkWiki, and not the GameWiki on this VM.


Using MediaWiki and external data, how can I show an image in a page, returned as a blob from a database?

Published 20 Jun 2017 by Masutatsu in Newest questions tagged mediawiki - Webmasters Stack Exchange.

I'm creating a wiki (using MediaWiki) which pulls data from a mySQL instance, and uses this alongside a template to generate the page dynamically.

My mySQL instance contains images, stored in a field of type BLOB.

Is it possible for MediaWiki to interpret this BLOB data into the actual image desired to be shown on the page?

A typical week as a digital archivist?

Published 16 Jun 2017 by Jenny Mitcham in Digital Archiving at the University of York.

Sometimes (admittedly not very often) I'm asked what I actually do all day. So at the end of a busy week being a digital archivist I've decided to blog about what I've been up to.


Today I had a couple of meetings. One specifically to talk about digital preservation of electronic theses submissions. I've also had a work experience placement in this week so have set up a metadata creation task which he has been busy working on.

When I had a spare moment I did a little more testing work on the EAD harvesting feature the University of York is jointly sponsoring Artefactual Systems to develop in AtoM. Testing this feature from my perspective involves logging into the test site that Artefactual has created for us and tweaking some of the archival descriptions. Once those descriptions are saved, I can take a peek at the job scheduler and make sure that new EAD files are being created behind the scenes for the Archives Hub to attempt to harvest at a later date.

This piece of development work has been going on for a few months now and communications have been technically quite complex so I'm also trying to ensure all the organisations involved are happy with what has been achieved and will be arranging a virtual meeting so we can all get together and talk through any remaining issues.

I was slightly surprised today to have a couple of requests to talk to the media. This has sprung from the news that the Queen's Speech will be delayed. One of the reasons for the delay relates to the fact that the speech has to be written on goat's skin parchment, which takes a few days to dry. I had previously been interviewed for a article entitled Why is the UK still printing its laws on vellum? and am now mistaken for someone who knows about vellum. I explained to potential interviewers that this is not my specialist subject!


In the morning I went to visit a researcher at the University of York. I wanted to talk to him about how he uses Google Drive in relation to his research. This is a really interesting topic to me right now as I consider how best we might be able to preserve current research datasets. Seeing how exactly Google Drive is used and what features the researcher considers to be significant (and necessary for reuse) is really helpful when thinking about a suitable approach to this problem. I sometimes think I work a little bit too much in my own echo chamber, so getting out and hearing different perspectives is incredibly valuable.

Later that afternoon I had an unexpected meeting with one of our depositors (well, there were two of them actually). I've not met them before but have been working with their data for a little while. In our brief meeting it was really interesting to chat and see the data from a fresh perspective. I was able to reunite them with some digital files that they had created in the mid 1980's, had saved on to floppy disk and had not been able to access for a long time.

Digital preservation can be quite a behind the scenes sort of job - we always give a nod to the reason why we do what we do (ie: we preserve for future reuse), but actually seeing the results of that work unfold in front of your eyes is genuinely rewarding. I had rescued something from the jaws of digital obsolescence so it could now be reused and revitalised!

At the end of the day I presented a joint webinar for the Open Preservation Foundation called 'PRONOM in practice'. Alongside David Clipsham (The National Archives) and Justin Simpson (Artefactual Systems), I talked about my own experiences with PRONOM, particularly relating to file signature creation, and ending with a call to arms "Do try this at home!". It would be great if more of the community could get involved!

I was really pleased that the webinar platform worked OK for me this time round (always a bit stressful when it doesn't) and that I got to use the yellow highlighter pen on my slides.

In my spare moments (which were few and far between), I put together a powerpoint presentation for the following day...


I spent the day at the British Library in Boston Spa. I'd been invited to speak at a training event they regularly hold for members of staff who want to find out a bit more about digital preservation and the work of the team.

I was asked specifically to talk through some of the challenges and issues that I face in my work. I found this pretty easy - there are lots of challenges - and I eventually realised I had too many slides so had to cut it short! I suppose that is better than not having enough to say!

Visiting Boston Spa meant that I could also chat to the team over lunch and visit their lab. They had a very impressive range of old computers and were able to give me a demonstration of Kryoflux (which I've never seen in action before) and talk a little about emulation. This was a good warm up for the DPC event about emulation I'm attending next week: Halcyon On and On: Emulating to Preserve.

Still left on my to do list from my trip is to download Teracopy. I currently use Foldermatch for checking that files I have copied have remained unchanged. From the quick demo I saw at the British Library I think that Teracopy would be a more simple one step solution. I need to have a play with this and then think about incorporating it into the digital ingest workflow.

Sharing information and collaborating with others working in the digital preservation field really is directly beneficial to the day to day work that we do!


Back in the office today and a much quieter day.

I extracted some reports from our AtoM catalogue for a colleague and did a bit of work with our test version of Research Data York. I also met with another colleague to talk about storing and providing access to digitised images.

In the afternoon I wrote another powerpoint presentation, this time for a forthcoming DPC event: From Planning to Deployment: Digital Preservation and Organizational Change.

I'm going to be talking about our experiences of moving our Research Data York application from proof of concept to production. We are not yet in production and some of the reasons why will be explored in the presentation! Again I was asked to talk about barriers and challenges and again, this brief is fairly easy to fit! The event itself is over a week away so this is unprecedentedly well organised. Long may it continue!


On Fridays I try to catch up on the week just gone and plan for the week ahead as well as reading the relevant blogs that have appeared over the week. It is also a good chance to catch up with some admin tasks and emails.

Lunch time reading today was provided by William Kilbride's latest blog post. Some of it went over my head but the final messages around value and reuse and the need to "do more with less" rang very true.

Sometimes I even blog myself - as I am today!

Was this a typical week - perhaps not, but in this job there is probably no such thing! Every week brings new ideas, challenges and surprises!

I would say the only real constant is that I've always got lots of things to keep me busy.

Five minutes with Kylie Howarth

Published 7 Jun 2017 by carinamm in State Library of Western Australia Blog.

Kylie Howarth is an award winning Western Australian author, illustrator and graphic designer. Original illustrations and draft materials from her most recent picture book 1, 2, Pirate Stew (Five Mile Press) are currently showing in The Story Place Gallery.

We spent some time hearing from Kylie Howarth about the ideas and inspiration behind her work. Here’s what she had to say…


1, 2, Pirate Stew is all about the power of imagination and the joys of playing in a cardboard box. How do your real life experiences influence your picture book ideas? What role does imagination play?

The kids and I turned the box from our new BBQ into a pirate ship. We painted it together and made anchors, pirate hats and oars. They loved it so much they played in it every day for months… and so the idea for 1, 2, Pirate Stew was born. It eventually fell apart and so did our hot water system, so we used that box to build a rocket. Boxes live long lives around our place. I also cut them up and take them to school visits to do texture rubbings with the students.

Your illustrations for 1, 2, Pirate Stew are unique in that they incorporate painted textures created during backyard art sessions with your children. What encouraged you to do this? How do your children’s artworks inspire you?

I just love children’s paintings. They have an energy I find impossible to replicate. Including them in my book illustrations encourages kids to feel their art is important and that they can make books too. Kids sometimes find highly realistic illustrations intimidating and feel they could never do it themselves. During school and library visits, they love seeing the original finger paintings and potato stamp prints that were used in my books.

Through digital illustration you have blended hand drawings with painted textures. How has your background and training as a graphic designer influenced your illustrative style?

Being a graphic designer has certainly influenced the colour and composition of my illustrations. In 1, 2, Pirate Stew particularly the use of white space. Many illustrators and designers are afraid of white space but it can be such an effective tool, it allows the book to breathe. The main advantage though is that I have been able to design all my own book covers, select fonts and arrange the text layout.

Sometimes ideas for picture books evolve and change a lot when working with the publisher. Sometimes the ideas don’t change much at all. What was your experience when creating 1, 2, Pirate Stew? Was it similar or different to your previous books Fish Jam and Chip?

I worked with a fabulous editor, Karen Tayleur on all three books. We tweaked the text for Fish Jam and Chip a little to make them sing as best we could. With 1, 2, Pirate Stew however, the text was based on the old nursery rhyme 1, 2, Buckle My Shoe. So there was little room to move as I was constrained to a limited number of syllables and each line had to rhyme. I think we only added one word. I did however further develop the illustrations from my original submission. Initially the character’s faces were a little more stylised so I refined them to be more universal. Creating the mini 3D character model helped me get them looking consistent from different angles throughout the book. I also took many photographs of my boys to sketch from.

1, 2, Pirate Stew – an exhibition is on display at the State Library of Western Australia until 22 June 2017. The exhibition is part of a series showcasing the diverse range of illustrative styles in picture books published by Western Australian authors and illustrators. For more information go to


Published 7 Jun 2017 by fabpot in Tags from Twig.

MediaWiki fails to show Ambox

Published 7 Jun 2017 by lucamauri in Newest questions tagged mediawiki - Webmasters Stack Exchange.

I am writing you about the use of Template:Ambox in MediaWiki.

I have a version 1.28 hosted MediWiki installation that works well apparently at everything, but I can't get the boxes explain here to work properly.

As a test I implemented in this page the following code:

| type       = notice
| text       = Text for a big box, for the top of articles.
| smalltext  = Text for the top of article sections.

and I expected a nice box to show up. Instead I simply see the text Template:Ambox shown at the top of the page.
It seems like this template is not defined in MediaWiki, but, as far as I understood, this is built-in and in all examples I saw it seems it should work out-of-the-box.

I guess I miss something basic here, but it really escapes me: any help you might provide will be appreciated.



Ted Nelson’s Junk Mail (and the Archive Corps Pilot)

Published 31 May 2017 by Jason Scott in ASCII by Jason Scott.

I’ve been very lucky over the past few months to dedicate a few days here and there to helping legend Ted Nelson sort through his archives. We’ve known each other for a bunch of years now, but it’s always a privilege to get a chance to hang with Ted and especially to help him with auditing and maintaining his collection of papers, notes, binders, and items. It also helps that it’s in pretty fantastic shape to begin with.

Along with sorting comes some discarding – mostly old magazines and books; they’re being donated wherever it makes sense to. Along with these items were junk mail that Ted got over the decades.

About that junk mail….

After glancing through it, I requested to keep it and take it home. There was a lot of it, and even going through it with a cursory view showed me it was priceless.

There’s two kinds of people in the world – those who look at ephemera and consider it trash, and those who consider it gold.

I’m in the gold camp.

I’d already been doing something like this for years, myself – when I was a teenager, I circled so many reader service cards and pulled in piles and piles of flyers and mailings from companies so fleeting or so weird, and I kept them. These became and later the reader service collection, which encapsulates completely. There’s well over a thousand pages in that collection, which I’ve scanned myself.

Ted, basically, did what I was doing, but with more breadth, more variety, and with a few decades more time.

And because he was always keeping an eye out on many possibilities for future fields of study, he kept his mind (and mailbox) open to a lot of industries. Manufacturing, engineering, film-making, printing, and of course “computers” as expressed in a thousand different ways. The mail dates from the 1960s through to the mid 2000s, and it’s friggin’ beautiful.

Here’s where it gets interesting, and where you come in.

There’s now a collection of scanned mail from this collection up at the Internet Archive. It’s called Ted Nelson’s Junk Mail and you can see the hundreds of scanned pages that will soon become thousands and maybe tens of thousands of scanned pages.

They’re separated by mailing, and over time the metadata and the contents will get better, increase in size, and hopefully provide decades of enjoyment for people.

The project is being coordinated by Kevin Savetz, who has hired a temp worker to scan in the pages across each weekday, going through the boxes and doing the “easy” stuff (8.5×11 sheets) which, trust me, is definitely worth going through first. As they’re scanned, they’re uploaded, and (for now) I am running scripts to add them as items to the Junk Mail collection.

The cost of doing this is roughly $80 a day, during which hundreds of pages can be scanned. We’re refining the process as we go, and expect it to get even more productive over time.

So, here’s where Archive Corps comes in; this is a pilot program for the idea behind the new idea of Archive Corps, which is providing a funnel for all the amazing stuff out there to get scanned. If you want to see more stuff come from the operation that Kevin is running, he has a paypal address up at – the more you donate the more days we are able to have the temp come in to scan.

I’m very excited to watch this collection grow, and see the massive variety of history that it will reveal. A huge thank-you to Ted Nelson for letting me take these items, and a thank-you to Kevin Savetz for coordinating.

Let’s enjoy some history!

Local illustration showcase

Published 30 May 2017 by carinamm in State Library of Western Australia Blog.

From digital illustration to watercolor painting and screen-printing, three very different styles of illustration highlight the diversity and originality of picture books published this year. 

In a series of exhibitions, The Story Place Gallery will showcase original artwork by Western Australian illustrators from the picture books 1,2 , Pirate Stew, (Five Mile Press 2017), One Thousand Trees and Colour Me (Fremantle Press 2017).


7, 8, he took the bait © Kylie Howarth 2017

In 1,2 , Pirate Stew,  Kylie Howarth has used a digital Illustration process to merge her drawings created using water soluble pencils, with background textures painted by her two adventurous children Beau and Jack. Kylie Howarth’s playful illustrations of gentle colours, together with her entertaining rhyming verse, take readers on an imaginative adventure all about the joys of playing in a cardboard box. Illustrations from 1,2, Pirate Stew are on display from 26 May – 22 June.


Among © Kyle Hughes-Odgers 2017

Kyle Hughes-Odgers’ distinctive illustrations blend geometric shapes, patterns and forms. In his watercolour illustrations for One Thousand Trees, he uses translucent colours and a restricted colour palette to explore the relationship between humankind and the environment. Shades of green browns and grey blues emphasise contrasts between urban and natural scenes. Kyle Hughes-Odgers places the words of the story within his illustrations to accentuate meaning. One Thousand Trees is on display from 24 June to 23 July.


If I was red © Moira Court

Moira Court’s bold illustration for the book Colour Me (written by Ezekiel Kwaymullina) were created using a woodcut and screen printing technique. Each final illustration is made from layers of silk screen prints created using hand cut paper stencils and transparent ink. Each screen print was then layered with a patchy, textural woodcut or linoleum print. Colours were  printed one at a time to achieve a transparent effect. The story celebrates the power of each individual colour, as well as the power of their combination. Colour Me is on display from 26 July – 16 August.

Each exhibition in this series is curated especially for children and is accompanied by a story sharing area, self-directed activity, and discussion prompters for families

A Lot of Doing

Published 28 May 2017 by Jason Scott in ASCII by Jason Scott.

If you follow this weblog, you saw there was a pause of a couple months. I’ve been busy! Better to do than to talk about doing.

A flood of posts are coming – they reflect accomplishments and thoughts of the last period of time, so don’t be freaked out as they pop up in your life very quickly.


Report on Wikimedia Hackathon 2017 in Vienna

Published 26 May 2017 by Niklas Laxström in It rains like a saavi.

Long time no post! Let’s fix that with another report from my travels. This one is mostly about work.

Wikimedia hackathon was held in Vienna in May 2017. It is an event many MediaWiki developers come to meet and work together on various kinds of things – the more experiences developers helping the newcomers. For me this was one of the best events of this kind which I have attended because it was very well organized and I had a good balance between working on things and helping others.

The main theme for me for this hackathon was This was great, because recently I have not had as much time to work on improving as I used to. But this does not mean there hasn’t been any, I just haven’t made any noise about it. For example, we have greatly increased automation for importing and exporting translations, new projects are being added, the operating systems have been updates, and so on. But let’s talk about what happened during the hackathon.

I worked with with Nemo_bis and Siebrand (they did most of the work) to go over backlog of support requests from We addressed more than half of 101 open support requests, also by adding support for 7 locales. Relatedly, we also helped a couple of people to start translating at or to contribute to CLDR language database for their language.

Siebrand and I held an open post-mortem (anyone could join) about a 25 hours downtime that happened to just before the event. There we reflected how we handled the situation, and how we could we do better in the future. The main take-aways are better communication (twitter, status page), server upgrade and using it for increased redundancy and periodically doing restoration practices to ensure we can restore quickly if the need arises.

Amir revived his old project that allows translating messages using a chat application (the prototype uses Telegram). Many people (at least Amir, Taras, Mt.Du and I) worked on different aspects on that project. I installed the MediaWiki OAuth extension (without which it would not be possible to do the translations using the correct user name) to, and gave over-the-shoulder help for the coders.

Hackathon attendees working on their computers. Photo CC-BY-SA 3.0 by Nemo_bis

As always, some bugs were found and fixed during the hackathon. I fixed an issue in Translate where the machine translation suggestions using the Apertium service hosted by Wikimedia Foundation were not showing up. I also reported an issue with our discussion extension (LiquidThreads) having two toolbars instead of one. This was quickly fixed by Bartosz and Ed.

Finally, I would advertise a presentation about MediaWiki best practices I gave in the Fantastic MediaWikis track. It summarizes a few of the best practices I have come up during my experience maintaining and many other MediaWiki sites. It has tips about deployment, job queue configuration and short main page URLs and slides are available.

As a small bonus, I finally updated my blog to use https, so that I could write and that you could read this post safely knowing that nobody else but me could have put all the bad puns in the post.

TV Interview on Stepping Off

Published 26 May 2017 by Tom Wilson in thomas m wilson.

TV Interview on Stepping Off

Published 26 May 2017 by Tom Wilson in thomas m wilson.

Simon 0.4.90 beta released

Published 20 May 2017 by fux in blogs.

KDE Project:

The second version (0.4.90) towards Simon 0.5.0 is out in the wilds. Please download the source code, test it and send us feedback.

What we changed since the alpha release:

  • Bugfix: The download of Simon Base Models work again flawlessly (bug: 377968)
  • Fix detection of utterid APIs in Pocketsphinx

You can get it here:

In the work is also an AppImage version of Simon for easy testing. We hope to deliver one for the Beta release coming soon.

Known issues with Simon 0.4.90 are:

  • Some Scenarios available for download don't work anymore (BUG: 375819)
  • Simon can't add Arabic or Hebrew words (BUG: 356452)

We hope to fix these bugs and look forward to your feedback and bug reports and maybe to see you at the next Simon IRC meeting: Tuesday, 23rd of May, at 10pm (UTC+2) in #kde-accessibility on

About Simon
Simon is an open source speech recognition program that can replace your mouse and keyboard. The system is designed to be as flexible as possible and will work with any language or dialect. For more information take a look at the Simon homepage.

All accounts updated to version 2.9

Published 20 May 2017 by Pierrick Le Gall in The Blog.

17 days after Piwigo 2.9.0 was released and 4 days after we started to update, all accounts are now up-to-date.

Piwigo 2.9 and new design on administration pages

Piwigo 2.9 and new design on administration pages

As you will learn from the release notes, your history will now be automatically purged to keep “only” the last 1 million lines. Yes, some of you, 176 to be exact, have more than 1 million lines, with a record set to 27 millions lines!

Wikimedia Commons Android App Pre-Hackathon

Published 19 May 2017 by addshore in Addshore.

Wikimedia Commons Logo

The Wikimedia Commons Android App allows users to upload photos to Commons directly from their phone.

The website for the app details some of the features and the code can be found on GitHub.

A hackathon was organized in Prague to work on the app in the run up to the yearly Wikimedia Hackathon which is in Vienna this year.

A group of 7 developers worked on the app over a few days and as well as meeting each other and learning from each other they also managed to work on various improvements which I have summarised below.

2 factor authentication (nearly)

Work has been done towards allowing 2fa logins to the app.

Lots of the login & authentication code has been refactored and the app now uses the clientlogin API module provided by Mediawiki instead of the older login module.

When building to debug the 2fa input box will appear if you have 2fa login enabled, however the current production build will not show this box and simply display a message saying that 2fa is not currently supported. This is due to a small amount of session handling work that the app still needs.

Better menu & Logout

As development on the app was fairly non existent between mid 2013 and 2016 the UI generally fell behind. This is visible in forms, buttons as well as app layout.

One significant push was made to drop the old style ‘burger’ menu from the top right of the app and replace it with a new slide out menu draw including a feature image and icons for menu items.

Uploaded images display limit

Some users have run into issues with the number of upload contributions that the app loads by default in the contributions activity. The default has always been 500 and this can cause memory exhaustion / OOM and a crash on some memory limited phones.

In an attempt to fix and generally speed up the app a recent upload limit has been added to the settings which will limit the number images and image details that are displayed, however the app will still fetch and store more than this on the device.

Nearby places enhancements

The nearby places enhancements probably account for the largest portion of development time at the pre hackathon. The app has always had a list of nearby places that don’t have images on commons but now the app also has a map!

The map is powered by the mapbox SDK and the current beta uses the mapbox tiles however part of the plan for the Vienna hackathon is to switch this to using the wikimedia hosted map tiles at

The map also contains clickable pins that provide a small pop up pulling information from Wikidata including the label and description of the item as well as providing two buttons to get directions to the place or read the Wikipedia article.

Image info coordinates & image date

Extra information has also been added to the image details view and the image date and coordinates of the image can now be seen in the app.

Summary of hackathon activity

The contributions and authors that worked on the app during the pre hackathon can be found on Github at the following link.

Roughly 66 commits were made between the 11th and 19th of May 2017 by 9 contributors.

Screenshot Gallery

AtoM Camp take aways

Published 12 May 2017 by Jenny Mitcham in Digital Archiving at the University of York.

The view from the window at AtoM Camp ...not that there was
any time to gaze out of the window of course...
I’ve spent the last three days in Cambridge at AtoM Camp. This was the second ever AtoM Camp, and the first in Europe. A big thanks to St John’s College for hosting it and to Artefactual Systems for putting it on.

It really has been an interesting few days, with a packed programme and an engaged group of attendees from across Europe and beyond bringing different levels of experience with AtoM.

As a ‘camp counsellor’ I was able to take to the floor at regular intervals to share some of our experiences of implementing AtoM at the Borthwick, covering topics such as system selection, querying the MySQL database, building the community and overcoming implementation challenges.

However, I was also there to learn!

Here are some bits and pieces that I’ve taken away.

My first real take away is that I now have a working copy of the soon to be released AtoM 2.4 on my Macbook - this is really quite cool. I'll never again be bored on a train - I can just fire up Ubuntu and have a play!

Walk to Camp takes you over Cambridge's Bridge of Sighs
During the camp it was great to be able to hear about some of the new features that will be available in this latest release.

At the Borthwick Institute our catalogue is still running on AtoM 2.2 so we are pretty excited about moving to 2.4 and being able to take advantage of all of this new functionality.

Just some of the new features I learnt about that I can see an immediate use case are:

On day two of camp I enjoyed the implementation tours, seeing how other institutions have implemented AtoM and the tweaks and modifications they have made. For example it was interesting to see the shopping cart feature developed for the Mennonite Archival Image Database and most popular image carousel feature on front page of the Chinese Canadian Artifacts Project. I was also interested in some of the modifications the National Library of Wales have made to meet their own needs.

It was also nice to hear the Borthwick Catalogue described  by Dan as “elegant”!

There was a great session on community and governance at the end of day two which was one of the highlights of the camp for me. It gave attendees the chance to really understand the business model of Artefactual (as well as alternatives to the bounty model in use by other open source projects). We also got a full history of the evolution of AtoM and saw the very first project logo and vision.

The AtoM vision hasn't changed too much but the name and logo have!

Dan Gillean from Artefactual articulated the problem of trying to get funding for essential and ongoing tasks, such as code modernisation. Two examples he used were updating AtoM to work with the latest version of Symfony and Elasticsearch - both of these tasks need to happen in order to keep AtoM moving in the right direction but both require a substantial amount of work and are not likely to be picked up and funded by the community.

I was interested to see Artefactual’s vision for a new AtoM 3.0 which would see some fundamental changes to the way AtoM works and a more up-to-date, modular and scalable architecture designed to meet the future use cases of the growing AtoM community.

Artefactual's proposed modular architecture for AtoM 3.0

There is no time line for AtoM 3.0, and whether it goes ahead or not is entirely dependent on a substantial source of funding being available. It was great to see Artefactual sharing their vision and encouraging feedback from the community at this early stage though.

Another highlight of Camp:
a tour of the archives of St John's College from Tracy Deakin
A session on data migrations on day three included a demo of OpenRefine from Sara Allain from Artefactual. I’d heard of this tool before but wasn’t entirely sure what it did and whether it would be of use to me. Sara demonstrated how it could be used to bash data into shape before import into AtoM. It seemed to be capable of doing all the things that I’ve previously done in Excel (and more) ...but without so much pain. I’ll definitely be looking to try this out when I next have some data to clean up.

Dan Gillean and Pete Vox from IMAGIZ talked through the process of importing data into AtoM. Pete focused on an example from Croydon Museum Service who's data needed to be migrated from CALM. He talked through some of the challenges of the task and how he would approach this differently in future. It is clear that the complexities of data migration may be one of the biggest barriers to institutions moving to AtoM from an alternative system, but it was encouraging to hear that none of these challenges are insurmountable.

My final take away from AtoM Camp is a long list of actions - new things I have learnt that I want to read up on or try out for myself ...I best crack on!

Permission denied for files in www-data

Published 11 May 2017 by petergus in Newest questions tagged mediawiki - Ask Ubuntu.

I have image files being uploaded with mediawiki, and they are setting the owner as www-data. Viewing the files results in 403 forbidden. (all other site files owned by SITE_USER).

The SITE_USER and www-data are both in each others (secondary) groups.

What am I missing here?

EDIT: My Apache directives

DocumentRoot "/home/SITE_USER/public_html/"
# Alias for Wiki so images work
Alias /images "/home/SITE_USER/public_html/mediawiki/sites/images"    
<Directory "/home/SITE_USER/public_html/">
RewriteRule ^(.*)$ %{DOCUMENT_ROOT}//index.php [L]
# Enable the rewrite engine
RewriteEngine On
# Short url for wiki pages
RewriteRule ^/?wiki(/.*)?$ %{DOCUMENT_ROOT}/index.php [L]
# Redirect / to Main Page
RewriteRule ^/*$ %{DOCUMENT_ROOT}/index.php [L]
Options -Indexes +SymLinksIfOwnerMatch
allow from all
AllowOverride All Options=ExecCGI,Includes,IncludesNOEXEC,Indexes,MultiViews,SymLinksIfOwnerMatch
Require all granted

Maintenance report of April 28th 2017

Published 11 May 2017 by Pierrick Le Gall in The Blog. clients have already received this message. Many users told us they were happy to receive such details about our technical operations so but let’s make it more “public” with a blog post!

A. The short version

On April 27th 2017, we replaced one of main servers. The replacement itself was successful. No downtime. The read-only mode has lasted only 7 minutes, from 6:00 to 6:07 UTC.

While sending the notification email to our clients, we encountered difficulties with Gmail users. Solving this Gmail issue made the website unavailable for a few users and maybe an hour. Everything was back to normal in a few hours. Of course, no data has been lost during this operation.

The new server and Piwigo are now good friends. They both look forward to receive version 2.9 in the next days 😉

B. Additional technical details

The notification message had already been sent to the first 390 users when we realized emails sent to Gmail addresses were returned in error. Indeed Gmail now asks for a “reverse DNS IPv6”. Sorry for this very technical detail. We already had it on the old server so we added it on the new server. And then start the problems… Unfortunately the new server does not manage IPv6 the same way. A few users, on IPv6, told us they only see “Apache2 Debian Default Page” instead of their Piwigo. Here is the timeline:

Unfortunately adding or removing an IPv6 is not an immediate action. It relies on the “DNS propagation” which may take a few hours, depending on each user.

We took the rest of the day to figure out how to make Gmail accept our emails and web visitors see your Piwigo. Instead of “”, we now use a sub-domain of “” (Pigolabs is the company running service) with an IPv6 : no impact on web traffic.

We also have a technical solution to handle IPv6 for web traffic. We have decided not to use it because IPv6 lacks an important feature, the FailOver. This feature, only available on IPv4, let us redirect web traffic from one server to another in a few seconds without worrying about DNS propagation. We use it when a server fails and web traffic goes to a spare server.

In the end, the move did not go so well and we sweat quite a this friday, but everything came back to normal and the “Apache2 Debian Default Page” issue eventually affected only a few people!

At the J Shed

Published 7 May 2017 by Dave Robertson in Dave Robertson.

We can’t wait to play here again soon… in June… stay tuned! Photo by Alex Chapman


At the J Shed

Published 7 May 2017 by Dave Robertson in Dave Robertson.

We can’t wait to play here again soon… in June… stay tuned! Photo by Alex Chapman


Semantic MediaWiki require onoi/callback-container, but it can't be installed

Published 5 May 2017 by Сергей Румянцев in Newest questions tagged mediawiki - Server Fault.

I try to install the latest release of SemanticMediaWiki. When I run composer update, it returns the following:

> ComposerHookHandler::onPreUpdate
Loading composer repositories with package information
Updating dependencies (including require-dev)
Your requirements could not be resolved to an installable set of packages.

  Problem 1
    - mediawiki/semantic-media-wiki 2.4.x-dev requires onoi/callback-container ~1.0 -> satisfiable by onoi/callback-container[1.0.0, 1.1.0] but these conflict with your requirements or minimum-stability.
    - mediawiki/semantic-media-wiki 2.4.6 requires onoi/callback-container ~1.0 -> satisfiable by onoi/callback-container[1.0.0, 1.1.0] but these conflict with your requirements or minimum-stability.
    - mediawiki/semantic-media-wiki 2.4.5 requires onoi/callback-container ~1.0 -> satisfiable by onoi/callback-container[1.0.0, 1.1.0] but these conflict with your requirements or minimum-stability.
    - mediawiki/semantic-media-wiki 2.4.4 requires onoi/callback-container ~1.0 -> satisfiable by onoi/callback-container[1.0.0, 1.1.0] but these conflict with your requirements or minimum-stability.
    - mediawiki/semantic-media-wiki 2.4.3 requires onoi/callback-container ~1.0 -> satisfiable by onoi/callback-container[1.0.0, 1.1.0] but these conflict with your requirements or minimum-stability.
    - mediawiki/semantic-media-wiki 2.4.2 requires onoi/callback-container ~1.0 -> satisfiable by onoi/callback-container[1.0.0, 1.1.0] but these conflict with your requirements or minimum-stability.
    - mediawiki/semantic-media-wiki 2.4.1 requires onoi/callback-container ~1.0 -> satisfiable by onoi/callback-container[1.0.0, 1.1.0] but these conflict with your requirements or minimum-stability.
    - Installation request for mediawiki/semantic-media-wiki ~2.4.1 -> satisfiable by mediawiki/semantic-media-wiki[2.4.1, 2.4.2, 2.4.3, 2.4.4, 2.4.5, 2.4.6, 2.4.x-dev].

I have even set minimum-stability to dev and even prefer-stable to false. Nothing resolves.

It is not the first problem with Composer. It returned an error due to no set version in package mediawiki/core, which was required still by this SMW. But not at this time, surprise.

And Composer don't see the package in composer show onoi/callback-container. There is stable version 2.0 at all.

After upgrade to 14.04 I get "You don't have permission to access /wiki/ on this server."

Published 3 May 2017 by Finn Årup Nielsen in Newest questions tagged mediawiki - Ask Ubuntu.

After dist-upgrade to 14.04 I get "You don't have permission to access /wiki/ on this server." for a MediaWiki installation with alias. /w/index.php is also failing.

So far I have seen a difference in configuration between 12.04 and 14.04 and I did

cd /etc/apache2/sites-available
sudo ln -s ../sites-available/000-default.conf .

This fixed other problems, but not the MediaWiki problem.

How can we preserve Google Documents?

Published 28 Apr 2017 by Jenny Mitcham in Digital Archiving at the University of York.

Last month I asked (and tried to answer) the question How can we preserve our wiki pages?

This month I am investigating the slightly more challenging issue of how to preserve native Google Drive files, specifically documents*.


At the University of York we work a lot with Google Drive. We have the G Suite for Education (formally known as Google Apps for Education) and as part of this we have embraced Google Drive and it is now widely used across the University. For many (me included) it has become the tool of choice for creating documents, spreadsheets and presentations. The ability to share documents and directly collaborate are key.

So of course it is inevitable that at some point we will need to think about how to preserve them.

How hard can it be?

Quite hard actually.

The basic problem is that documents created in Google Drive are not really "files" at all.

The majority of the techniques and models that we use in digital preservation are based around the fact that you have a digital object that you can see in your file system, copy from place to place and package up into an Archival Information Package (AIP).

In the digital preservation community we're all pretty comfortable with that way of working.

The key challenge with stuff created in Google Drive is that it doesn't really exist as a file.

Always living in hope that someone has already solved the problem, I asked the question on Twitter and that really helped with my research.

Isn't the digital preservation community great?

Exporting Documents from Google Drive

I started off testing the different download options available within Google docs. For my tests I used 2 native Google documents. One was the working version of our Phase 1 Filling the Digital Preservation Gap report. This report was originally authored as a Google doc, was 56 pages long and consisted of text, tables, images, footnotes, links, formatted text, page numbers, colours etc (ie: lots of significant properties I could assess). I also used another more simple document for testing - this one was just basic text and tables but also included comments by several contributors.

I exported both of these documents into all of the different export formats that Google supports and assessed the results, looking at each characteristic of the document in turn and establishing whether or not I felt it was adequately retained.

Here is a summary of my findings, looking specifically at the Filling the Digital Preservation Gap phase 1 report document:

...but what about the comments?

My second test document was chosen so I could look specifically at the comments feature and how these were retained (or not) in the exported version.

  • docx - Comments are exported. On first inspection they appear to be anonymised, however this seems to be just how they are rendered in Microsoft Word. Having unzipped and dug into the actual docx file and looked at the XML file that holds the comments, it is clear that a more detailed level of information is retained - see images below. The placement of the comments is not always accurate. In one instance the reply to a comment is assigned to text within a subsequent row of the table rather than to the same row as the original comment.
  • odt -  Comments are included, are attributed to individuals and have a date and time. Again, matching up of comments with right section of text is not always accurate - in one instance a comment and it's reply are linked to the table cell underneath the one that they referenced in the original document.
  • rtf - Comments are included but appear to be anonymised when displayed in MS Word...I haven't dug around enough to establish whether or not this is just a rendering issue.
  • txt - Comments are retained but appear at the end of the document with a [a], [b] etc prefix - these letters appear in the main body text to show where the comments appeared. No information about who made the comment is preserved.
  • pdf - Comments not exported
  • epub - Comments not exported
  • html - Comments are present but appear at the end of the document with a code which also acts as a placeholder in the text where the comment appeared. References to the comments in the text are hyperlinks which take you to the right comment at the bottom of the document. There is no indication of who made the comment (not even hidden within the html tags).

A comment in original Google doc

The same comment in docx as rendered by MS Word

...but in the XML buried deep within the docx file structure - we do have attribution and date/time
(though clearly in a different time zone)

What about bulk export options?

Ed Pinsent pointed me to the Google Takeout Service which allows you to:
"Create an archive with your data from Google products"
[Google's words not mine - and perhaps this is a good time to point you to Ed's blog post on the meaning of the term 'Archive']

This is really useful. It allows you to download Google Drive files in bulk and to select which formats you want to export them as.

I tested this a couple of times and was surprised to discover that if you select pdf or docx (and perhaps other formats that I didn't test) as your export format of choice, the takeout service creates the file in the format requested and an html file which includes all comments within the document (even those that have been resolved). The content of the comments/responses including dates and times is all included within the html file, as are names of individuals.

The downside of the Google Takeout Service is that it only allows you to select folders and not individual files. There is another incentive for us to organise our files better! The other issue is that it will only export documents that you are the owner of - and you may not own everything that you want to archive!

What's missing?

Quite a lot actually.

The owner, creation and last modified dates of a document in Google Drive are visible when you click on Document details... within the File menu. Obviously this is really useful information for the archive but is lost as soon as you download it into one of the available export formats.

Creation and last modified dates as visible in Document details

Update: I was pleased to see that if using the Google Takeout Service to bulk export files from Drive, the last modified dates are retained, however on single file export/download these dates are lost and the last modified date of the file becomes the date that you carried out the export. 

Part of the revision history of my Google doc
But of course in a Google document there is more metadata. Similar to the 'Page History' that I mentioned when talking about preserving wiki pages, a Google document has a 'Revision history'

Again, this *could* be useful to the archive. Perhaps not so much so for my document which I worked on by myself in March, but I could see more of a use case for mapping and recording the creative process of writing a novel for example. 

Having this revision history would also allow you to do some pretty cool stuff such as that described in this blog post: How I reverse engineered Google Docs to play back any documents Keystrokes (thanks to Nick Krabbenhoft for the link).

It would seem that the only obvious way to retain this information would be to keep the documents in their original native Google format within Google Drive but how much confidence do we have that it will be safe there for the long term?


If you want to preserve a Google Drive document there are several options but no one-size-fits-all solution.

As always it boils down to what the significant properties of the document are. What is it we are actually trying to preserve?

  • If we want a fairly accurate but non interactive digital 'print' of the document, pdf might be the most accurate representation though even the pdf export can't be relied on to retain the exact pagination. Note that I didn't try and validate the pdf files that I exported and sadly there is no pdf/a export option.
  • If comments are seen to be a key feature of the document then docx or odt will be a good option but again this is not perfect. With the test document I used, comments were not always linked to the correct point within the document.
  • If it is possible to get the owner of the files to export them, the Google Takeout Service could be used. Perhaps creating a pdf version of the static document along with a separate html file to capture the comments.

A key point to note is that all export options are imperfect so it would be important to check the exported document against the original to ensure it accurately retains the important features.

Another option would be simply keeping them in their native format but trying to get some level of control over them - taking ownership and managing sharing and edit permissions so that they can't be changed. I've been speaking to one of our Google Drive experts in IT about the logistics of this. A Google Team Drive belonging to the Archives could be used to temporarily store and lock down Google documents of archival value whilst we wait and see what happens next. 

...I live in hope that export options will improve in the future.

This is a work in progress and I'd love to find out what others think.

* note, I've also been looking at Google Sheets and that may be the subject of another blog post

Security updates 1.2.5, 1.1.9 and 1.0.11 released

Published 27 Apr 2017 by Roundcube Webmail Dev Team in Roundcube Webmail Project News.

We just published updates to all stable versions 1.x delivering important bug fixes and improvements which we picked from the upstream branch.

The updates primarily fix a recently discovered vulnerability in the virtualmin and sasl drivers of the password plugin (CVE-2017-8114). Security-wise the update is therefore only relevant for those installations of Roundcube using the password plugin with either one of these drivers.

See the full changelog for the according version in the release notes on the Github download pages: v1.2.5, v1.1.9, v1.0.11

All versions are considered stable and we recommend to update all productive installations of Roundcube with either of these versions.

As usual, don’t forget to backup your data before updating!

Legal considerations regarding hosting a MediaWiki site

Published 27 Apr 2017 by Oliver K in Newest questions tagged mediawiki - Webmasters Stack Exchange.

What legal considerations are there when creating a wiki using MediaWiki for people to use worldwide?

For example, I noticed there are privacy policies & terms and conditions; are these required to safeguard me from any legal battles?

Things I hope you learn in GLAM School

Published 27 Apr 2017 by inthemailbox in In the mailbox.

I’ve just realised that I haven’t blogged for a very long time, so lest you think me moribund, it’s time to start typing. I have a few things I want to say about collections software and the GLAMPeak project, as well as pulling some thoughts together on the Open Government initiative, so there will be some slightly more professional blogposts after this, I promise.

But today, to get the writing process back underway, I’m going to munge together two #GLAMBlogClub topics – hope, and what I wish they’d taught me in GLAM School. It’s been a few years since I was in GLAM school, but not that long since I left teaching. Reading through the blogs, though, reminded me very much of that long distant self, who wrote a letter to her lecturer, the lovely Peter Orlovich, bemoaning the gap between practice and theory. I also wrote one to the WA Museums Australia co-ordinator, Stephen Anstey, when I could not get a job for love or money.  And they basically said this:

It’s just not possible to learn all the things, all the technical details or peculiar ways that people reinvent the wheel, in just three or four, or one or two years. What you can learn, and what we hope you learn, is how to learn. GLAM school should provide you with a fundamental structure for understanding and implementing theory in practical ways.  The basic theoretical foundations for archival or library description, museum collection management or art history will remain, even as new theoretical concepts are added that build on what we know from the past. The way we implement those concepts will depend on our collections, our resources, our own strengths and weaknesses, but if you can learn, you can change, grow and adapt.

Be bold in your choices. GLAM school, like any good school, will have taught you how to read, research and analyse content. It will teach you how to express yourself in a range of communication styles and platforms. The tests and stresses that you experience at GLAM school will help you temper the way you respond to those stresses in the work place.  We can, and do, try to provide experiences and examples in an environment where you are supported to fail, and to try again.

Do not put artificial limits on yourselves.

And, give yourselves hope. You have the skills, they just need sharpening and developing. Try, and try again.

Finally – “Keep interested in your own career, however humble;
it is a real possession in the changing fortunes of time.”

(Max Ehrmann, The Desiderata)

Release Candidate for version 1.3

Published 25 Apr 2017 by Roundcube Webmail Dev Team in Roundcube Webmail Project News.

We just published the feature-complete version for the next major version 1.3 of Roundcube webmail for final testing. After dropping support for older browsers and PHP versions and adding some new features like the widescreen layout, the release candidate finalizes that work and also fixes two security issues plus adds improvements to the Managesieve and Enigma plugins.

We also slightly polished the Larry theme to make it look a little less 2010. Still, the default theme doesn’t work on mobile devices but a fully responsive skin is currently being worked on.

As a reminder: if you’re installing the dependent package or run Roundcube directly from source, you now need to install the removed 3rd party javascript modules by executing the following install script:

$ bin/

With the upcoming stable release of 1.3.0 the old 1.x series will only receive important security fixes.

See the complete Changelog and download the new packages from

Please note that this is a release candidate and we recommend to test it on a separate environment. And don’t forget to backup your data before installing it.

mosh, the disconnection-resistant ssh

Published 22 Apr 2017 by Carlos Fenollosa in Carlos Fenollosa — Blog.

The second post on this blog was devoted to screen and how to use it to make persistent SSH sessions.

Recently I've started using mosh, the mobile shell. It's targeted to mobile users, for example laptop users who might get short disconnections while working on a train, and it also provides a small keystroke buffer to get rid of network lag.

It really has little drawbacks and if you ever ssh to remote hosts and get annoyed because your vim sessions or tail -F windows get disconnected, give mosh a try. I strongly recommend it.

Tags: software, unix

Comments? Tweet  

In conversation with the J.S. Battye Creative Fellows

Published 19 Apr 2017 by carinamm in State Library of Western Australia Blog.

How can contemporary art lead to new discoveries about collections and ways of engaging with history?  Nicola Kaye and Stephen Terry will discuss this idea drawing from the experience of creating Tableau Vivant and the Unobserved.

In conversation with the J.S. Battye Creative Fellows
Thursday 27 April, 6pm
State Library Theatre.

April 4 Tableau Vivant Image_darkened_2

Tableau Vivant and the Unobserved is the culmination of the State Library’s inaugural J.S. Battye Creative Fellowship.  The Creative Fellowship aims to enhance engagement with the Library’s heritage collections and provide new experiences for the public.

Tableau Vivant and the Unobserved
visually questions how history is made, commemorated and forgotten. Through digital art installation, Nicola Kaye and Stephen Terry expose the unobserved and manipulate our perception of the past.  Their work juxtaposes archival and contemporary imagery to create an interactive experience for the visitor where unobserved lives from the archive collide with the contemporary world. The installation is showing at the State Library until 12 May 2017.

For more information visit:


Published 17 Apr 2017 by mblaney in Tags from simplepie.

Merge pull request #510 from mblaney/master

Version bump to 1.5 due to changes to Category class.

Interview on Stepping Off: Rewilding and Belonging in the South West

Published 14 Apr 2017 by Tom Wilson in thomas m wilson.

You can listen to a recent radio interview I did about my new book with Adrian Glamorgan here.

Interview on Stepping Off: Rewilding and Belonging in the South West

Published 14 Apr 2017 by Tom Wilson in thomas m wilson.

You can listen to a recent radio interview I did about my new book with Adrian Glamorgan here.

Wikimania submisison: apt install mediawiki

Published 9 Apr 2017 by legoktm in The Lego Mirror.

I've submitted a talk to Wikimania titled apt install mediawiki. It's about getting the MediaWiki package back into Debian, and efforts to improve the overall process. If you're interested, sign up on the submissions page :)

Archivematica Camp York: Some thoughts from the lake

Published 7 Apr 2017 by Jenny Mitcham in Digital Archiving at the University of York.

Well, that was a busy week!

Yesterday was the last day of Archivematica Camp York - an event organised by Artefactual Systems and hosted here at the University of York. The camp's intention was to provide a space for anyone interested in or currently using Archivematica to come together, learn about the platform from other users, and share their experiences. I think it succeeded in this, bringing together 30+ 'campers' from across the UK, Europe and as far afield as Brazil for three days of sessions covering different aspects of Archivematica.

Our pod on the lake (definitely a lake - not a pond!)
My main goal at camp was to ensure everyone found their way to the rooms (including the lakeside pod) and that we were suitably fuelled with coffee, popcorn and cake. Alongside these vital tasks I also managed to partake in the sessions, have a play with the new version of Archivematica (1.6) and learn a lot in the process.

I can't possibly capture everything in this brief blog post so if you want to know more, have a look back at all the #AMCampYork tweets.

What I've focused on below are some of the recurring themes that came up over the three days.


Archivematica is just one part of a bigger picture for institutions that are carrying out digital preservation, so it is always very helpful to see how others are implementing it and what systems they will be integrating with. A session on workflows in which participants were invited to talk about their own implementations was really interesting. 

Other sessions  also helped highlight the variety of different configurations and workflows that are possible using Archivematica. I hadn't quite realised there were so many different ways you could carry out a transfer! 

In a session on specialised workflows, Sara Allain talked us through the different options. One workflow I hadn't been aware of before was the ability to include checksums as part of your transfer. This sounds like something I need to take advantage of when I get Archivematica into production for the Borthwick. 

Justin talking about Automation Tools
A session on Automation Tools with Justin Simpson highlighted other possibilities - using Archivematica in a more automated fashion. 

We already have some experience of using Automation Tools at York as part of the work we carried out during phase 3 of Filling the Digital Preservation Gap, however I was struck by how many different ways these can be applied. Hearing examples from other institutions and for a variety of different use cases was really helpful.


The camp included a chance to play with Archivematica version 1.6 (which was only released a couple of weeks ago) as well as an introduction to the new Appraisal and Arrangement tab.

A session in progress at Archivematica Camp York
I'd been following this project with interest so it was great to be able to finally test out the new features (including the rather pleasing pie charts showing what file formats you have in your transfer). It was clear that there were a few improvements that could be made to the tab to make it more intuitive to use and to deal with things such as the ability to edit or delete tags, but it is certainly an interesting feature and one that I would like to explore more using some real data from our digital archive.

Throughout camp there was a fair bit of discussion around digital appraisal and at what point in your workflow this would be carried out. This was of particular interest to me being a topic I had recently raised with colleagues back at base.

The Bentley Historical Library who funded the work to create the new tab within Archivematica are clearly keen to get their digital archives into Archivematica as soon as possible and then carry out the work there after transfer. The addition of this new tab now makes this workflow possible.

Kirsty Lee from the University of Edinburgh described her own pre-ingest methodology and the tools she uses to help her appraise material before transfer to Archivematica. She talked about some tools (such as TreeSize Pro) that I'm really keen to follow up on.

At the moment I'm undecided about exactly where and how this appraisal work will be carried out at York, and in particular how this will work for hybrid collections so as always it is interesting to hear from others about what works for them.

Metadata and reporting

Evelyn admitting she loves PREMIS and METS
Evelyn McLellan from Artefactual led a 'Metadata Deep Dive' on day 2 and despite the title, this was actually a pretty interesting session!

We got into the details of METS and PREMIS and how they are implemented within Archivematica. Although I generally try not to look too closely at METS and PREMIS it was good to have them demystified. On the first day through a series of exercises we had been encouraged to look at a METS file created by Archivematica ourselves and try and pick out some information from it so these sessions in combination were really useful.

Across various sessions of the camp there was also a running discussion around reporting. Given that Archivematica stores such a detailed range of metadata in the METS file, how do we actually make use of this? Being able to report on how many AIPs have been created, how many files and what size is useful. These are statistics that I currently collect (manually) on a quarterly basis and share with colleagues. Once Archivematica is in place at York, digging further into those rich METS files to find out which file formats are in the digital archive would be really helpful for preservation planning (among other things). There was discussion about whether reporting should be a feature of Archivematica or a job that should be done outside Archivematica.

In relation to the later option - I described in one session how some of our phase 2 work of Filling the Digital Preservation Gap was designed to help expose metadata from Archivematica to a third party reporting system. The Jisc Research Data Shared Service was also mentioned in this context as reporting outside of Archivematica will need to be addressed as part of this project.


As with most open source software, community is important. This was touched on throughout the camp and was the focus of the last session on the last day.

There was a discussion about the role of Artefactual Systems and the role of Archivematica users. Obviously we are all encouraged to engage and help sustain the project in whatever way we are able. This could be by sharing successes and failures (I was pleased that my blog got a mention here!), submitting code and bug reports, sponsoring new features (perhaps something listed on the development roadmap) or helping others by responding to queries on the mailing list. It doesn't matter - just get involved!

I was also able to highlight the UK Archivematica group and talk about what we do and what we get out of it. As well as encouraging new members to the group, there was also discussion about the potential for forming other regional groups like this in other countries.

Some of the Archivematica community - class of Archivematica Camp York 2017

...and finally

Another real success for us at York was having the opportunity to get technical staff at York working with Artefactual to resolve some problems we had with getting our first Archivematica implementation into production. Real progress was made and I'm hoping we can finally start using Archivematica for real at the end of next month.

So, that was Archivematica Camp!

A big thanks to all who came to York and to Artefactual for organising the programme. As promised, the sun shined and there were ducks on the lake - what more could you ask for?

Thanks to Paul Shields for the photos

Failover in local accounts

Published 7 Apr 2017 by MUY Belgium in Newest questions tagged mediawiki - Server Fault.

I would like to use mediawiki as documentation with access privileges. I use the LdapAuthentication extension (here : ) in order to get user authenticated against a LDAP.

For various reason, the authentication should continue working even if the LDAP fails.

How can I get a fail-over (for example using the passwords in the local SQL database?) which should enable the wiki to remains accessible even if infrastructure fails?

Shiny New History in China: Jianshui and Tuanshan

Published 6 Apr 2017 by Tom Wilson in thomas m wilson.

  The stones in this bridge are not all in a perfect state of repair.  That’s part of its charm.  I’m just back from a couple of days down at Jianshui, a historic town a few hours south of Kunming with a large city wall and a towering city gate.  The trip has made me reflect on […]

Shiny New History in China: Jianshui and Tuanshan

Published 6 Apr 2017 by Tom Wilson in thomas m wilson.

  The stones in this bridge are not all in a perfect state of repair.  That’s part of its charm.  I’m just back from a couple of days down at Jianshui, a historic town a few hours south of Kunming with a large city wall and a towering city gate.  The trip has made me reflect on […]

Update 1.0.10 released

Published 5 Apr 2017 by Roundcube Webmail Dev Team in Roundcube Webmail Project News.

We just published a security update to the LTS version 1.0. It contains some important bug fixes and security improvements backported from the master version.

It’s considered stable and we recommend to update all productive installations of Roundcube with this version. Download it from

Please do backup before updating!

Tableau Vivant and the Unobserved

Published 30 Mar 2017 by carinamm in State Library of Western Australia Blog.

April 4 Tableau Vivant Image_darkened_2.jpg

Still scene: Tableau Vivant and the Unobserved, 2016, Nicola Kaye, Stephen Terry.

Tableau Vivant and the Unobserved visually questions how history is made, commemorated and forgotten. Through digital art installation, Nicola Kaye and Stephen Terry expose the unobserved and manipulate our perception of the past.  Their work juxtaposes archival and contemporary imagery to create an experience for the visitor where unobserved lives from the archive collide with the contemporary world.

Tableau Vivant and the Unobserved is the culmination of the State Library’s inaugural J.S. Battye Creative Fellowship.  The Creative Fellowship aims to enhance engagement with the Library’s heritage collections and provide new experiences for the public.

Artists floor talk
Thursday 6 April, 6pm
Ground Floor Gallery, State Library of Western Australia.

Nicola Kaye and Stephen Terry walk you through Tableau Vivant and the Unobserved

In conversation with the J.S. Battye Creative Fellows
Thursday 27 April, 6pm
State Library Theatre.

How can contemporary art lead to new discoveries about collections and ways of engaging with history?  Nicola Kaye and Stephen Terry will discuss this idea drawing from the experience of creating Tableau Vivant and the Unobserved.

Tableau Vivant and the Unobserved is showing at the State Library from 4 April – 12 May 2017.
For more information visit:

Remembering Another China in Kunming

Published 29 Mar 2017 by Tom Wilson in thomas m wilson.

Last weekend I headed out for a rock climbing session with some locals and expats.  First I had to cross town, and while doing so I came across an old man doing water calligraphy by Green Lake.  I love the transience of this art: the beginning of the poem is starting to fade by the time he reaches […]

Remembering Another China in Kunming

Published 29 Mar 2017 by Tom Wilson in thomas m wilson.

Last weekend I headed out for a rock climbing session with some locals and expats.  First I had to cross town, and while doing so I came across an old man doing water calligraphy by Green Lake.  I love the transience of this art: the beginning of the poem is starting to fade by the time he reaches […]

Week #11: Raided yet again

Published 28 Mar 2017 by legoktm in The Lego Mirror.

If you missed the news, the Raiders are moving to Las Vegas. The Black Hole is leaving Oakland (again) for a newer, nicer, stadium in the desert. But let's talk about how we got here, and how different this is from the moving of the San Diego Chargers to Los Angeles.

The current Raiders stadium is outdated and old. It needs renovating to keep up with other modern stadiums in the NFL. Owner Mark Davis isn't a multi-billionaire that could finance such a stadium. And the City of Oakland is definitely not paying for it. So the options left are find outside financing for Oakland, for find said financing somewhere else. And unfortunately it was the latter option that won out in the end.

I think it's unsurprising that more and more cities are refusing to put public money into stadiums that they will see no profit from - it makes no sense whatsoever.

Overall I think the Raider Nation will adapt and survive just as it did when they moved to Los Angeles. The Raiders still have an awkward two-to-three years left in Oakland, and with Derek Carr at the helm, it looks like they will be good ones.

Week #10: March Sadness

Published 23 Mar 2017 by legoktm in The Lego Mirror.

In California March Madness is really...March Sadness. The only Californian team that is still in is UCLA. UC Davis made it in but was quickly eliminated. USC and Saint Mary's both fell in the second round. Cal and Stanford didn't even make it in. At best we can root for Gonzaga, but that's barely it.

Some of us root for school's we went to, but for those of us who grew up here and support local teams, we're left hanging. And it's not bias in the selection commitee, those schools just aren't good enough.

On top of that we have a top notch professional team through the Warriors, but our amateur players just aren't up to muster.

So good luck to UCLA, represent California hella well. We somewhat believe in you.

Week #9: The jersey returns

Published 23 Mar 2017 by legoktm in The Lego Mirror.

And so it has been found. Tom Brady's jersey was in Mexico the whole time, stolen by a member of the press. And while it's great news for Brady, sports memorabilia fans, and the FBI, it doesn't look good for journalists. Journalists are given a lot of access to players, allowing them to obtain better content and get better interviews. It would not be surprising if the NFL responds to this incident by locking down the access that journalists are given. And that would be real bummer.

I'm hoping this is seen as an isolated incident and all journalists are not punished for the offenses by one. Enterprise plans, now official!

Published 23 Mar 2017 by Pierrick Le Gall in The Blog.

In the shadow of the standard plan for several years and yet already adopted by more than 50 organizations, it is time to officially introduce the Enterprise plans. They were designed for organizations, private or public, looking for a simple, affordable and yet complete tool to manage their collection of photos.

The main idea behind Enterprise is to democratize photo library management for organizations of all kind and size. We are not targeting fortune 500, although some of them are already clients, but fortune 5,000,000 companies! Enterprise plans can replace, at a reasonable cost, inadequate solutions relying on intranet shared folders, where photos are sometimes duplicated, deleted by mistake, without the appropriate permission system.

Introduction to Enterprise plans

Introduction to Enterprise plans

Why announcing officially these plans today? Because the current trend obviously shows us that our Enterprise plans find its market. Although semi-official, Enterprise plans represented nearly 40% of our revenue in February 2017! It is time to put these plans under the spotlights.

In practice, here is what changes with the Enterprise plans:

  1. they can be used by organizations, as opposed to the standard plan
  2. additional features, such as support for non-photo files (PDF, videos …)
  3. higher level of service (priority support, customization, presentation session)

Discover Entreprise

Please Help Us Track Down Apple II Collections

Published 20 Mar 2017 by Jason Scott in ASCII by Jason Scott.

Please spread this as far as possible – I want to reach folks who are far outside the usual channels.

The Summary: Conditions are very, very good right now for easy, top-quality, final ingestion of original commercial Apple II Software and if you know people sitting on a pile of it or even if you have a small handful of boxes, please get in touch with me to arrange the disks to be imaged. 

The rest of this entry says this in much longer, hopefully compelling fashion.

We are in a golden age for Apple II history capture.

For now, and it won’t last (because nothing lasts), an incredible amount of interest and effort and tools are all focused on acquiring Apple II software, especially educational and engineering software, and ensuring it lasts another generation and beyond.

I’d like to take advantage of that, and I’d like your help.

Here’s the secret about Apple II software: Copy Protection Works.

Copy protection, that method of messing up easy copying from floppy disks, turns out to have been very effective at doing what it is meant to do – slow down the duplication of materials so a few sales can eke by. For anything but the most compelling, most universally interesting software, copy protection did a very good job of ensuring that only the approved disks that went out the door are the remaining extant copies for a vast majority of titles.

As programmers and publishers laid logic bombs and coding traps and took the brilliance of watchmakers and used it to design alternative operating systems, they did so to ensure people wouldn’t take the time to actually make the effort to capture every single bit off the drive and do the intense and exacting work to make it easy to spread in a reproducible fashion.

They were right.

So, obviously it wasn’t 100% effective at stopping people from making copies of programs, or so many people who used the Apple II wouldn’t remember the games they played at school or at user-groups or downloaded from AE Lines and BBSes, with pirate group greetings and modified graphics.

What happened is that pirates and crackers did what was needed to break enough of the protection on high-demand programs (games, productivity) to make them work. They used special hardware modifications to “snapshot” memory and pull out a program. They traced the booting of the program by stepping through its code and then snipped out the clever tripwires that freaked out if something wasn’t right. They tied it up into a bow so that instead of a horrendous 140 kilobyte floppy, you could have a small 15 or 20 kilobyte program instead. They even put multiple cracked programs together on one disk so you could get a bunch of cool programs at once.

I have an entire section of TEXTFILES.COM dedicated to this art and craft.

And one could definitely argue that the programs (at least the popular ones) were “saved”. They persisted, they spread, they still exist in various forms.

And oh, the crack screens!

I love the crack screens, and put up a massive pile of them here. Let’s be clear about that – they’re a wonderful, special thing and the amount of love and effort that went into them (especially on the Commodore 64 platform) drove an art form (demoscene) that I really love and which still thrives to this day.

But these aren’t the original programs and disks, and in some cases, not the originals by a long shot. What people remember booting in the 1980s were often distant cousins to the floppies that were distributed inside the boxes, with the custom labels and the nice manuals.


On the left is the title screen for Sabotage. It’s a little clunky and weird, but it’s also something almost nobody who played Sabotage back in the day ever saw; they only saw the instructions screen on the right. The reason for this is that there were two files on the disk, one for starting the title screen and then the game, and the other was the game. Whoever cracked it long ago only did the game file, leaving the rest as one might leave the shell of a nut.

I don’t think it’s terrible these exist! They’re art and history in their own right.

However… the mistake, which I completely understand making, is to see programs and versions of old Apple II software up on the Archive and say “It’s handled, we’re done here.” You might be someone with a small stack of Apple II software, newly acquired or decades old, and think you don’t have anything to contribute.

That’d be a huge error.

It’s a bad assumption because there’s a chance the original versions of these programs, unseen since they were sold, is sitting in your hands. It’s a version different than the one everyone thinks is “the” version. It’s precious, it’s rare, and it’s facing the darkness.

There is incredibly good news, however.

I’ve mentioned some of these folks before, but there is now a powerful allegiance of very talented developers and enthusiasts who have been pouring an enormous amount of skills into the preservation of Apple II software. You can debate if this is the best use of their (considerable) skills, but here we are.

They have been acquiring original commercial Apple II software from a variety of sources, including auctions, private collectors, and luck. They’ve been duplicating the originals on a bits level, then going in and “silent cracking” the software so that it can be played on an emulator or via the web emulation system I’ve been so hot on, and not have any change in operation, except for not failing due to copy protection.

With a “silent crack”, you don’t take the credit, you don’t make it about yourself – you just make it work, and work entirely like it did, without yanking out pieces of the code and program to make it smaller for transfer or to get rid of a section you don’t understand.

Most prominent of these is 4AM, who I have written about before. But there are others, and they’re all working together at the moment.

These folks, these modern engineering-minded crackers, are really good. Really, really good.

They’ve been developing tools from the ground up that are focused on silent cracks, of optimizing the process, of allowing dozens, sometimes hundreds of floppies to be evaluated automatically and reducing the workload. And they’re fast about it, especially when dealing with a particularly tough problem.

Take, for example, the efforts required to crack Pinball Construction Set, and marvel not just that it was done, but that a generous and open-minded article was written explaining exactly what was being done to achieve this.

This group can be handed a stack of floppies, image them, evaluate them, and find which have not yet been preserved in this fashion.

But there’s only one problem: They are starting to run out of floppies.

I should be clear that there’s plenty left in the current stack – hundreds of floppies are being processed. But I also have seen the effort chug along and we’ve been going through direct piles, then piles of friends, and then piles of friends of friends. We’ve had a few folks from outside the community bring stuff in, but those are way more scarce than they should be.

I’m working with a theory, you see.

My theory is that there are large collections of Apple II software out there. Maybe someone’s dad had a store long ago. Maybe someone took in boxes of programs over the years and they’re in the basement or attic. I think these folks are living outside the realm of the “Apple II Community” that currently exists (and which is a wonderful set of people, be clear). I’m talking about the difference between a fan club for surfboards and someone who has a massive set of surfboards because his dad used to run a shop and they’re all out in the barn.

A lot of what I do is put groups of people together and then step back to let the magic happen. This is a case where this amazingly talented group of people are currently a well-oiled machine – they help each other out, they are innovating along this line, and Apple II software is being captured in a world-class fashion, with no filtering being done because it’s some hot ware that everyone wants to play.

For example, piles and piles of educational software has returned from potential oblivion, because it’s about the preservation, not the title. Wonderfully done works are being brought back to life and are playable on the Internet Archive.

So like I said above, the message is this:

Conditions are very, very good right now for easy, top-quality, final ingestion of original commercial Apple II Software and if you know people sitting on a pile of it or even if you have a small handful of boxes, please get in touch with me to arrange the disks to be imaged.

I’ll go on podcasts or do interviews, or chat with folks on the phone, or trade lots of e-mails discussing details. This is a very special time, and I feel the moment to act is now. Alliances and communities like these do not last forever, and we’re in a peak moment of talent and technical landscape to really make a dent in what are likely acres of unpreserved titles.

It’s 4am and nearly morning for Apple II software.

It’d be nice to get it all before we wake up.


Managing images on an open wiki platform

Published 19 Mar 2017 by Oliver K in Newest questions tagged mediawiki - Webmasters Stack Exchange.

I'm developing a wiki page using MediaWiki and there are a few ways of inplementing images into wiki pages such as uploading them on the website and uploading them on external websites it potentially banning and requesting others to place an image.

Surely images may be difficult to manage as one day someone may upload a vulgar image and many people will then see it. How can I ensure vulgar images do not get through and that administrators aren't scarred for life after monitoring them?

Does the composer software have a command like python -m compileall ./

Published 18 Mar 2017 by jehovahsays in Newest questions tagged mediawiki - Server Fault.

I want to use composer for a mediawiki root folder with multiple directories
that need composer to install their dependencies
with a command like composer -m installall ./
For example , if the root folder was all written in python
i could use the command python -m compileall ./

Hilton Harvest Earth Hour Picnic and Concert

Published 18 Mar 2017 by Dave Robertson in Dave Robertson.


Hilton Harvest Earth Hour Picnic and Concert

Published 18 Mar 2017 by Dave Robertson in Dave Robertson.


Sandpapering Screenshots

Published 15 Mar 2017 by Jason Scott in ASCII by Jason Scott.

The collection I talked about yesterday was subjected to the Screen Shotgun, which does a really good job of playing the items, capturing screenshots, and uploading them into the item to allow people to easily see, visually, what they’re in for if they boot them up.

In general, the screen shotgun does the job well, but not perfectly. It doesn’t understand what it’s looking at, at all, and the method I use to decide the “canonical” screenshot is inherently shallow – I choose the largest filesize, because that tends to be the most “interesting”.

The bug in this is that if you have, say, these three screenshots:

…it’s going to choose the first one, because those middle-of-loading graphics for an animated title screen have tons of little artifacts, and the filesize is bigger. Additionally, the second is fine, but it’s not the “title”, the recognized “welcome to this program” image. So the best choice turns out to be the third.

I don’t know why I’d not done this sooner, but while waiting for 500 disks to screenshot, I finally wrote a program to show me all the screenshots taken for an item, and declare a replacement canonical title screenshot. The results have been way too much fun.

It turns out, doing this for Apple II programs in particular, where it’s removed the duplicates and is just showing you a gallery, is beautiful:

Again, the all-text “loading screen” in the middle, which is caused by blowing program data into screen memory, wins the “largest file” contest, but literally any other of the screens would be more appropriate.

This is happening all over the place: crack screens win over the actual main screen, the mid-loading noise of Apple II programs win over the final clean image, and so on.

Working with tens of thousands of software programs, primarily alone, means that I’m trying to find automation wherever I can. I can’t personally boot up each program and do the work needed to screenshot/describe it – if a machine can do anything, I’ll make the machine do it. People will come to me with fixes or changes if the results are particularly ugly, but it does leave a small amount that no amount of automation is likely to catch.

If you watch a show or documentary on factory setups and assembly lines, you’ll notice they can’t quite get rid of people along the entire line, especially the sign-off. Someone has to keep an eye to make sure it’s not going all wrong, or, even more interestingly, a table will come off the line and you see one person giving it a quick run-over with sandpaper, just to pare down the imperfections or missed spots of the machine. You still did an enormous amount of work with no human effort, but if you think that’s ready for the world with no final sign-off, you’re kidding yourself.

So while it does mean another hour or two looking at a few hundred screenshots, it’s nice to know I haven’t completely automated away the pleasure of seeing some vintage computer art, for my work, and for the joy of it.

Thoughts on a Collection: Apple II Floppies in the Realm of the Now

Published 15 Mar 2017 by Jason Scott in ASCII by Jason Scott.

I was connected with The 3D0G Knight, a long-retired Apple II pirate/collector who had built up a set of hundreds of floppy disks acquired from many different locations and friends decades ago. He generously sent me his entire collection to ingest into a more modern digital format, as well as the Internet Archive’s software archive.

The floppies came in a box without any sort of sleeves for them, with what turned out to be roughly 350 of them removed from “ammo boxes” by 3D0G from his parents’ house. The disks all had labels of some sort, and a printed index came along with it all, mapped to the unique disk ID/Numbers that had been carefully put on all of them years ago. I expect this was months of work at the time.

Each floppy is 140k of data on each side, and in this case, all the floppies had been single-sided and clipped with an additional notch with a hole punch to allow the second side to be used as well.

Even though they’re packed a little strangely, there was no damage anywhere, nothing bent or broken or ripped, and all the items were intact. It looked to be quite the bonanza of potentially new vintage software.

So, this activity at the crux of the work going on with both the older software on the Internet Archive, as well as what I’m doing with web browser emulation and increasing easy access to the works of old. The most important thing, over everything else, is to close the air gap – get the data off these disappearing floppy disks and into something online where people or scripts can benefit from them and research them. Almost everything else – scanning of cover art, ingestion of metadata, pulling together the history of a company or cross-checking what titles had which collaborators… that has nowhere near the expiration date of the magnetized coated plastic disks going under. This needs us and it needs us now.

The way that things currently work with Apple II floppies is to separate them into two classes: Disks that Just Copy, and Disks That Need A Little Love. The Little Love disks, when found, are packed up and sent off to one of my collaborators, 4AM, who has the tools and the skills to get data of particularly tenacious floppies, as well as doing “silent cracks” of commercial floppies to preserve what’s on them as best as possible.

Doing the “Disks that Just Copy” is a mite easier. I currently have an Apple II system on my desk that connects via USB-to-serial connection to my PC. There, I run a program called Apple Disk Transfer that basically turns the Apple into a Floppy Reading Machine, with pretty interface and everything.

Apple Disk Transfer (ADT) has been around a very long time and knows what it’s doing – a floppy disk with no trickery on the encoding side can be ripped out and transferred to a “.DSK” file on the PC in about 20 seconds. If there’s something wrong with the disk in terms of being an easy read, ADT is very loud about it. I can do other things while reading floppies, and I end up with a whole pile of filenames when it’s done. The workflow, in other words, isn’t so bad as long as the floppies aren’t in really bad shape. In this particular set, the floppies were in excellent shape, except when they weren’t, and the vast majority fell into the “excellent” camp.

The floppy drive that sits at the middle of this looks like some sort of nightmare, but it helps to understand that with Apple II floppy drives, you really have to have the cover removed at all time, because you will be constantly checking the read head for dust, smudges, and so on. Unscrewing the whole mess and putting it back together for looks just doesn’t scale. It’s ugly, but it works.

It took me about three days (while doing lots of other stuff) but in the end I had 714 .dsk images pulled from both sides of the floppies, which works out to 357 floppy disks successfully imaged. Another 20 or so are going to get a once over but probably are going to go into 4am’s hands to get final evaluation. (Some of them may in fact be blank, but were labelled in preparation, and so on.) 714 is a lot to get from one person!

As mentioned, an Apple II 5.25″ floppy disk image is pretty much always 140k. The names of the floppy are mine, taken off the label, or added based on glancing inside the disk image after it’s done. For a quick glance, I use either an Apple II emulator called Applewin, or the fantastically useful Apple II disk image investigator Ciderpress, which is a frankly the gold standard for what should be out there for every vintage disk/cartridge/cassette image. As might be expected, labels don’t always match contents. C’est la vie.

As for the contents of the disks themselves; this comes down to what the “standard collection” was for an Apple II user in the 1980s who wasn’t afraid to let their software library grow utilizing less than legitimate circumstances. Instead of an elegant case of shiny, professionally labelled floppy diskettes, we get a scribbled, messy, organic collection of all range of “warez” with no real theme. There’s games, of course, but there’s also productivity, utilities, artwork, and one-off collections of textfiles and documentation. Games that were “cracked” down into single-file payloads find themselves with 4-5 other unexpected housemates and sitting behind a menu. A person spending the equivalent of $50-$70 per title might be expected to have a relatively small and distinct library, but someone who is meeting up with friends or associates and duplicating floppies over a few hours will just grab bushels of strange.

The result of the first run is already up on the Archive: A 37 Megabyte .ZIP file containing all the images I pulled off the floppies. 

In terms of what will be of relevance to later historians, researchers, or collectors, that zip file is probably the best way to go – it’s not munged up with the needs of the Archive’s structure, and is just the disk images and nothing else.

This single .zip archive might be sufficient for a lot of sites (go git ‘er!) but as mentioned infinite times before, there is a very strong ethic across the Internet Archive’s software collection to make things as accessible as possible, and hence there are over nearly 500 items in the “3D0G Knight Collection” besides the “download it all” item.

The rest of this entry talks about why it’s 500 and not 714, and how it is put together, and the rest of my thoughts on this whole endeavor. If you just want to play some games online or pull a 37mb file and run, cackling happily, into the night, so be it.

The relatively small number of people who have exceedingly hard opinions on how things “should be done” in the vintage computing space will also want to join the folks who are pulling the 37mb file. Everything else done by me after the generation of the .zip file is in service of the present and near future. The items that number in the hundreds on the Archive that contain one floppy disk image and interaction with it are meant for people to find now. I want someone to have a vague memory of a game or program once interacted with, and if possible, to find it on the Archive. I also like people browsing around randomly until something catches their eye and to be able to leap into the program immediately.

To those ends, and as an exercise, I’ve acquired or collaborated on scripts to do the lion’s share of analysis on software images to prep them for this living museum. These scripts get it “mostly” right, and the rough edges they bring in from running are easily smoothed over by a microscopic amount of post-processing manual attention, like running a piece of sandpaper over a machine-made joint.

Again, we started out 714 disk images. The first thing done was to run them against a script that has hash checksums for every exposed Apple II disk image on the Archive, which now number over 10,000. Doing this dropped the “uniquely new” disk images from 714 to 667.

Next, I concatenated disk images that are part of the same product into one item: if a paint program has two floppy disk images for each of the sides of its disk, those become a single item. In one or two cases, the program spans multiple floppies, so 4-8 (and in one case, 14!) floppy images become a single item. Doing this dropped the total from 667 to 495 unique items. That’s why the number is significantly smaller than the original total.

Let’s talk for a moment about this.

Using hashes and comparing them is the roughest of rough approaches to de-duplicating software items. I do it with Apple II images because they tend to be self contained (a single .dsk file) and because Apple II software has a lot of people involved in it. I’m not alone by any means in acquiring these materials and I’m certainly not alone in terms of work being done to track down all the unique variations and most obscure and nearly lost packages written for this platform. If I was the only person in the world (or one of a tiny sliver) working on this I might be super careful with each and every item to catalog it – but I’m absolutely not; I count at least a half-dozen operations involving in Apple II floppy image ingestion.

And as a bonus, it’s a really nice platform. When someone puts their heart into an Apple II program, it rewards them and the end user as well – the graphics can be charming, the program flow intuitive, and the whole package just gleams on the screen. It’s rewarding to work with this corpus, so I’m using it as a test bed for all these methods, including using hashes.

But hash checksums are seriously not the be-all for this work. Anything can make a hash different – an added file, a modified bit, or a compilation of already-on-the-archive-in-a-hundred-places files that just happen to be grouped up slightly different than others. That said, it’s not overwhelming – you can read about what’s on a floppy and decide what you want pretty quickly; gigabytes will not be lost and the work to track down every single unique file has potential but isn’t necessary yet.

(For the people who care, the Internet Archive generates three different hashes (md5, crc32, sha1) and lists the size of the file – looking across all of those for comparison is pretty good for ensuring you probably have something new and unique.)

Once the items are up there, the Screen Shotgun whips into action. It plays the programs in the emulator, takes screenshots, leafs off the unique ones, and then assembles it all into a nice package. Again, not perfect but left alone, it does the work with no human intervention and gets things generally right. If you see a screenshot in this collection, a robot did it and I had nothing to do with it.

This leads, of course, to scaring out which programs are a tad not-bootable, and by that I mean that they boot up in the emulator and the emulator sees them and all, but the result is not that satisfying:

On a pure accuracy level, this is doing exactly what it’s supposed to – the disk wasn’t ever a properly packaged, self-contained item, and it needs a boot disk to go in the machine first before you swap the floppy. I intend to work with volunteers to help with this problem, but here is where it stands.

The solution in the meantime is a java program modified by Kevin Savetz, which analyzes the floppy disk image and prints all the disk information it can find, including the contents of BASIC programs and textfiles. Here’s a non-booting disk where this worked out. The result is that this all gets ingested into the search engine of the Archive, and so if you’re looking for a file within the disk images, there’s a chance you’ll be able to find it.

Once the robots have their way with all the items, I can go in and fix a few things, like screenshots that went south, or descriptions and titles that don’t reflect what actually boots up. The amount of work I, a single person, have to do is therefore reduced to something manageable.

I think this all works well enough for the contemporary vintage software researcher and end user. Perhaps that opinion is not universal.

What I can say, however, is that the core action here – of taking data away from a transient and at-risk storage medium and putting it into a slightly less transient, less at-risk storage medium – is 99% of the battle. To have the will to do it, to connect with the people who have these items around and to show them it’ll be painless for them, and to just take the time to shove floppies into a drive and read them, hundreds of times… that’s the huge mountain to climb right now. I no longer have particularly deep concerns about technology failing to work with these digital images, once they’re absorbed into the Internet. It’s this current time, out in the cold, unknown and unloved, that they’re the most at risk.

The rest, I’m going to say, is gravy.

I’ll talk more about exactly how tasty and real that gravy is in the future, but for now, please take a pleasant walk in the 3D0G Knight’s Domain.

The Followup

Published 14 Mar 2017 by Jason Scott in ASCII by Jason Scott.

Writing about my heart attack garnered some attention. I figured it was only right to fill in later details and describe what my current future plans are.

After the previous entry, I went back into the emergency room of the hospital I was treated at, twice.

The first time was because I “felt funny”; I just had no grip on “is this the new normal” and so just to understand that, I went back in and got some tests. They did an EKG, a blood test, and let me know all my stats were fine and I was healing according to schedule. That took a lot of stress away.

Two days later, I went in because I was having a marked shortness of breath, where I could not get enough oxygen in and it felt a little like I was drowning. Another round of tests, and one of the cardiologists mentioned a side effect of one of the drugs I was taking was this sort of shortness/drowning. He said it usually went away and the company claimed 5-7% of people got this side effect, but that they observed more like 10-15%. They said I could wait it out or swap drugs. I chose swap. After that, I’ve had no other episodes.

The hospital thought I should stay in Australia for 2 weeks before flying. Thanks to generosity from both MuseumNext and the ACMI, my hosts, that extra AirBnB time was basically paid for. MuseumNext also worked to help move my international flight ahead the weeks needed; a very kind gesture.

Kind gestures abounded, to be clear. My friend Rochelle extended her stay from New Zealand to stay an extra week; Rachel extended hers to match my new departure date. Folks rounded up funds and sent them along, which helped cover some additional costs. Visitors stopped by the AirBnB when I wasn’t really taking any walks outside, to provide additional social contact.

Here is what the blockage looked like, before and after. As I said, roughly a quarter of my heart wasn’t getting any significant blood and somehow I pushed through it for nearly a week. The insertion of a balloon and then a metal stent opened the artery enough for the blood flow to return. Multiple times, people made it very clear that this could have finished me off handily, and mostly luck involving how my body reacted was what kept me going and got me in under the wire.

From the responses to the first entry, it appears that a lot of people didn’t know heart attacks could be a lingering, growing issue and not just a bolt of lightning that strikes in the middle of a show or while walking down the street. If nothing else, I’m glad that it’s caused a number of people to be aware of how symptoms portray each other, as well as getting people to check up cholesterol, which I didn’t see as a huge danger compared to other factors, and which turned out to be significant indeed.

As for drugs, I’ve got a once a day waterfall of pills for blood pressure, cholesterol, heart healing, anti-clotting, and my long-handled annoyances of gout (which I’ve not had for years thanks to the pills). I’m on some of them for the next few months, some for a year, and some forever. I’ve also been informed I’m officially at risk for another heart attack, but the first heart attack was my hint in that regard.

As I healed, and understood better what was happening to me, I got better remarkably quick. There is a single tiny dot on my wrist from the operation, another tiny dot where the IV was in my arm at other times. Rachel gifted a more complicated Fitbit to replace the one I had, with the new one tracking sleep schedule and heart rate, just to keep an eye on it.

A day after landing back in the US, I saw a cardiologist at Mt. Sinai, one of the top doctors, who gave me some initial reactions to my charts and information: I’m very likely going to be fine, maybe even better than before. I need to take care of myself, and I was. If I was smoking or drinking, I’d have to stop, but since I’ve never had alcohol and I’ve never smoked, I’m already ahead of that game. I enjoy walking, a lot. I stay active. And as of getting out of the hospital, I am vegan for at least a year. Caffeine’s gone. Raw vegetables are in.

One might hesitate putting this all online, because the Internet is spectacularly talented at generating hatred and health advice. People want to help – it comes from a good place. But I’ve got a handle on it and I’m progressing well; someone hitting me up with a nanny-finger-wagging paragraph and 45 links to isn’t going to help much. But go ahead if you must.

I failed to mention it before, but when this was all going down, my crazy family of the Internet Archive jumped in, everyone from Dad Brewster through to all my brothers and sisters scrambling to find me my insurance info and what they had on their cards, as I couldn’t find mine. It was something really late when I first pinged everyone with “something is not good” and everyone has been rather spectacular over there. Then again, they tend to be spectacular, so I sort of let that slip by. Let me rectify that here.

And now, a little bit on health insurance.

I had travel insurance as part of my health insurance with the Archive. That is still being sorted out, but a large deposit had to be put on the Archive’s corporate card as a down-payment during the sorting out, another fantastic generosity, even if it’s technically a loan. I welcome the coming paperwork and nailing down of financial brass tacks for a specific reason:

I am someone who once walked into an emergency room with no insurance (back in 2010), got a blood medication IV, stayed around a few hours, and went home, generating a $20,000 medical bill in the process. It got knocked down to $9k over time, and I ended up being thrown into a low-income program they had that allowed them to write it off (I think). That bill could have destroyed me, financially. Therefore, I’m super sensitive to the costs of medical care.

In Australia, it is looking like the heart operation and the 3 day hospital stay, along with all the tests and staff and medications, are going to round out around $10,000 before the insurance comes in and knocks that down further (I hope). In the US, I can’t imagine that whole thing being less than $100,000.

The biggest culture shock for me was how little any of the medical staff, be they doctors or nurses or administrators, cared about the money. They didn’t have any real info on what things cost, because pretty much everything is free there. I’ve equating it to asking a restaurant where the best toilets to use a few hours after your meal – they might have some random ideas, but nobody’s really thinking that way. It was a huge factor in my returning to the emergency room so willingly; each visit, all-inclusive, was $250 AUD, which is even less in US dollars. $250 is something I’ll gladly pay for peace of mind, and I did, twice. The difference in the experince is remarkable. I realize this is a hot button issue now, but chalk me up as another person for whom a life-changing experience could come within a remarkably close distance of being an influence on where I might live in the future.

Dr. Sonny Palmer, who did insertion of my stent in the operating room.

I had a pile of plans and things to get done (documentaries, software, cutting down on my possessions, and so on), and I’ll be getting back to them. I don’t really have an urge to maintain some sort of health narrative on here, and I certainly am not in the mood to urge any lifestyle changes or preach a way of life to folks. I’ll answer questions if people have them from here on out, but I’d rather be known for something other than powering through a heart attack, and maybe, with some effort, I can do that.

Thanks again to everyone who has been there for me, online and off, in person and far away, over the past few weeks. I’ll try my best to live up to your hopes about what opportunities my second chance at life will give me.


Want to learn about Archivematica whilst watching the ducks?

Published 13 Mar 2017 by Jenny Mitcham in Digital Archiving at the University of York.

We are really excited to be hosting the first European Archivematica Camp here at the University of York next month - on the 4-6th April.

Don't worry - there will be no tents or campfires...but there may be some wildlife on the lake.

The Ron Cooke Hub on a frosty morning - hoping for some warmer weather for Camp!

The event is taking place at the Ron Cooke Hub over on our Heslington East campus. If you want to visit the beautiful City of York (OK, I'm biased!) and meet other European Archivematica users (or Archivematica explorers) this event is for you. Artefactual Systems will be leading the event and the agenda is looking very full and interesting.

I'm most looking forward to learning more about the workflows that other Archivematica users have in place or are planning to implement.

One of these lakeside 'pods' will be our breakout room

There are still places left and you can register for Camp here or contact the organisers at

...and if you are not able to attend in person, do watch this blog in early April as you can guarantee I'll be blogging after the event!

Through the mirror-glass: Capture of artwork framed in glass.

Published 13 Mar 2017 by slwacns in State Library of Western Australia Blog.


State Library’s collection material that is selected for digitisation comes to the Digitisation team in a variety of forms. This blog describes capture of artwork that is framed and encased within glass.

So let’s see how the item is digitized.


Two large framed original artworks from the picture book Teacup written by Rebecca Young and illustrated by Matt Ottley posed some significant digitisation challenges.

When artwork from the Heritage collection is framed in glass, the glass acts like a mirror and without great care during the capture process, the glass can reflect whatever is in front of it, meaning that the photographer’s reflection (and the reflection of capture equipment) can obscure the artwork.

This post shows how we avoided this issue during the digitisation of two large framed paintings, Cover illustration for Teacup and also page 4-5 [PWC/255/01 ] and The way the whales called out to each other [PWC/255/09].

Though it is sometimes possible to remove the artwork from its housing, there are occasions when this is not suitable. In this example, the decision was made to not remove the artworks from behind glass as the Conservation staff assessed that it would be best if the works were not disturbed from their original housing.

PWC/255/01                                                         PWC/255/09

The most critical issue was to be in control of the light. Rearranging equipment in the workroom allowed for the artwork to face a black wall, a method used by photographers to eliminate reflections.


We used black plastic across the entrance of the workroom to eliminate all unwanted light.


The next challenge was to set up the camera. For this shoot we used our Hasselblad H3D11 (a 39 mega pixel with excellent colour fidelity).


Prior to capture, we gave the glass a good clean with an anti-static cloth. In the images below, you can clearly see the reflection caused by the mirror effect of the glass.


Since we don’t have a dedicated photographic studio we needed to be creative when introducing extra light to allow for the capture. Bouncing the light off a large white card prevented direct light from falling on the artwork and reduced a significant number of reflections. We also used a polarizing filter on the camera lens to reduce reflections even further.


Once every reflection was eliminated and the camera set square to the artwork, we could test colour balance and exposure.

In the image below, you can see that we made the camera look like ‘Ned Kelly’ to ensure any shiny metal from the camera body didn’t reflect in the glass. We used the camera’s computer controlled remote shutter function to further minimise any reflections in front of the glass.



The preservation file includes technically accurate colour and greyscale patches to allow for colour fidelity and a ruler for accurate scaling in future reproductions.


The preservation file and a cropped version for access were then ingested into the State Library’s digital repository. The repository allows for current access and future reproductions to be made.

From this post you can see the care and attention that goes into preservation digitisation, ‘Do it right, do it once’ is our motto.

Week #8: Warriors are on the right path

Published 12 Mar 2017 by legoktm in The Lego Mirror.

As you might have guessed due to the lack of previous coverage of the Warriors, I'm not really a basketball fan. But the Warriors are in an interesting place right now. After setting an NBA record for being the fastest team to clinch a playoff spot, Coach Kerr has started resting his starters and the Warriors have a three game losing streak. This puts the Warriors in danger of losing their first seed spot with the San Antonio Spurs only half a game behind them.

But I think the Warriors are doing the right thing. Last year the Warriors set the record for having the best regular season record in NBA history, but also became the first team in NBA history to have a 3-1 advantage in the finals and then lose.

No doubt there was immense pressure on the Warriors last year. It was just expected of them to win the championship, there really wasn't anything else.

So this year they can easily avoid a lot of that pressure by not being the best team in the NBA on paper. They shouldn't worry about being the top seed, just seed in the top four, and play your best in the playoffs. Get some rest, they have a huge advantage over every other team simply by already being in the playoffs with so many games left to play.

How can we preserve our wiki pages

Published 10 Mar 2017 by Jenny Mitcham in Digital Archiving at the University of York.

I was recently prompted by a colleague to investigate options for preserving institutional wiki pages. At the University of York we use the Confluence wiki and this is available for all staff to use for a variety of purposes. In the Archives we have our own wiki space on Confluence which we use primarily for our meeting agendas and minutes. The question asked of me was how can we best capture content on the wiki that needs to be preserved for the long term? 

Good question and just the sort of thing I like to investigate. Here are my findings...

Space export

The most sensible way to approach the transfer of a set of wiki pages to the digital archive would be to export them using the export options available within the Space Tools.

The main problem with this approach is that a user will need to have the necessary permissions on the wiki space in order to be able to use these tools ...I found that I only had the necessary permissions on those wiki spaces that I administer myself.

There are three export options as illustrated below:

Space export options - available if you have the right permissions!


Once you select HTML, there are two options - a standard export (which exports the whole space) or a custom export (which allows you to select the pages you would like included within the export).

I went for a custom export and selected just one section of meeting papers. Each wiki page is saved as an HTML file. DROID identifies these as HTML version 5. All relevant attachments are included in the download in their original format.

There are some really good things about this export option:
  • The inclusion of attachments in the export - these are often going to be as valuable to us as the wiki page content itself. Note that they were all renamed with a number that tied them to the page that they were associated with. It seemed that the original file name was however preserved in the linking wiki page text 
  • The metadata at the top of a wiki page is present in the HTML pages: ie Created by Jenny Mitcham, last modified by Jenny Mitcham on 31, Oct, 2016 - this is really important to us from an archival point of view
  • The links work - including links to the downloaded attachments, other wiki pages and external websites or Google Docs
  • The export includes an index page which can act as a table of contents for the exported files - this also includes some basic metadata about the wiki space


Again, there are two options here - either a standard export (of the whole space) or a custom export, which allows you to select whether or not you want comments to be exported and choose exactly which pages you want to export.

I tried the custom export. It seemed to work and also did export all the relevant attachments. The attachments were all renamed as '1' (with no file extension), and the wiki page content is all bundled up into one huge XML file.

On the plus side, this export option may contain more metadata than the other options (for example the page history) but it is difficult to tell as the XML file is so big and unwieldy and hard to interpret. Really it isn't designed to be usable. The main function of this export option is to move wiki pages into another instance of Confluence.


Again you have the option to export whole space or choose your pages. There are also other configurations you can make to the output but these are mostly cosmetic.

I chose the same batch of meeting papers to export as PDF and this produces a 111 page PDF document. The first page is a contents page which lists all the other pages alphabetically with hyperlinks to the right section of the document. It is hard to use the document as the wiki pages seem to run into each other without adequate spacing and because of the linear nature of a pdf document you feel drawn to read it in the order it is presented (which in this case is not a logical order for the content). Attachments are not included in the download though links to the attachments are maintained in the PDF file and they do continue to resolve to the right place on the wiki. Creation and last modified metadata is also not included in the export.

Single page export

As well as the Space Export options in Confluence there are also single page export options. These are available to anyone who can access the wiki page so may be useful if people do not have necessary permissions for a space export.

I exported a range of test pages using the 'Export to PDF' and 'Export to Word' options.

Export to PDF

The PDF files created in this manner are version 1.4. Sadly no option to export as PDF/A, but at least version 1.4 is closer to the PDF/A standard than some, so perhaps a subsequent migration to PDF/A would be successful.

Export to Word

Surprisingly the 'Word' files produced by Confluence appear not to be Word files at all!

Double click on the files in Windows Explorer and they open in Microsoft Word no problem, but DROID identifies the files as HTML (with no version number) and reports a file extension mismatch (because the files have a .doc extension).

If you view the files in a text application you can clearly see the Content-Type marked as text/html and <html> tags within the document. Quick View Plus, however views them as an Internet Mail Message with the following text displayed at the top of each page:

Subject: Exported From Confluence
1024x640 72 Print 90

All very confusing and certainly not giving me a lot of faith in this particular export format!


Both of these single page export formats do a reasonable job of retaining the basic content of the wiki pages - both versions include many of the key features I was looking for - text, images, tables, bullet points, colours. 

Where advanced formatting has been used to lay out a page using coloured boxes, the PDF version does a better job at replicating this than the 'Word' version. Whilst the PDF attempts to retain the original formatting, the 'Word' version displays the information in a much more linear fashion.

Links were also more usefully replicated in the PDF version. The absolute URL of all links, whether internal, external or to attachments was included within the PDF file so that it is possible to follow them to their original location (if you have the necessary permissions to view the pages). On the 'Word' versions, only external links worked in this way. Internal wiki links and links to attachments were exported as a relative link which become 'broken' once that page is taken out of its original context. 

The naming of the files that were produced is also worthy of comment. The 'Word' versions are given a name which mirrors the name of the page within the wiki space, but the naming of the PDF versions are much more useful, including the name of the wiki space itself, the page name and a date and timestamp showing when the page was exported.

Neither of these single page export formats retained the creation and last modified metadata for each page and this is something that it would be very helpful to retain.


So, if we want to preserve pages from our institutional wiki, what is the best approach?

The Space Export in HTML format is a clear winner. It reproduces the wiki pages in a reusable form that replicates the page content well. As HTML is essentially just ASCII text it is also a good format for long term preservation.

What impressed me about the HTML export was the fact that it retained the content, included basic creation and last modified metadata for each page and downloaded all relevant attachments, updating the links to point to these local copies.

What if someone does not have the necessary permissions to do a space export? My first suggestion would be that they ask for their permissions to be upgraded. If not, perhaps someone who does have necessary permissions could carry out the export?

If all else fails, the export of a single page using the 'Export as PDF' option could be used to provide ad hoc content for the digital archive. PDF is not the best preservation format but may be able to be converted to PDF/A. Note that any attachments would have to be exported separately and manually is this option was selected.

Final thoughts

A wiki space is a dynamic thing which can involve several different types of content - blog posts, labels/tags and comments can all be added to wiki spaces and pages. If these elements are thought to be significant then more work is required to see how they can be captured. It was apparent that comments could be captured using the HTML and XML exports and I believe blog posts can be captured individually as PDF files.

What is also available within the wiki platform itself is a very detailed Page History. Within each wiki page it is possible to view the Page History and see how a page has evolved over time - who has edited it and when those edits occurred. As far as I could see, none of the export formats included this level of information. The only exception may be the XML export but this was so difficult to view that I could not be sure either way.

So, there are limitations to all these approaches and as ever this goes back to the age old discussion about Significant Properties. What is significant about the wiki pages? What is it that we are trying to preserve? None of the export options preserve everything. All are compromises, but perhaps some are compromises we could live with.

China – Arrival in the Middle Kingdom

Published 9 Mar 2017 by Tom Wilson in thomas m wilson.

I’ve arrived in Kunming, the little red dot you can see on the map above.  I’m here to teach research skills to undergraduate students at Yunnan Normal University.  As you can see, I’ve come to a point where the foothills of the Himalayas fold up into a bunch of deep creases.  Yunnan province is the area of […]

China – Arrival in the Middle Kingdom

Published 9 Mar 2017 by Tom Wilson in thomas m wilson.

I’ve arrived in Kunming, the little red dot you can see on the map above.  I’m here to teach research skills to undergraduate students at Yunnan Normal University.  As you can see, I’ve come to a point where the foothills of the Himalayas fold up into a bunch of deep creases.  Yunnan province is the area of […]

Introducing Similarity Search at Flickr

Published 7 Mar 2017 by Clayton Mellina in

At Flickr, we understand that the value in our image corpus is only unlocked when our members can find photos and photographers that inspire them, so we strive to enable the discovery and appreciation of new photos.

To further that effort, today we are introducing similarity search on Flickr. If you hover over a photo on a search result page, you will reveal a “…” button that exposes a menu that gives you the option to search for photos similar to the photo you are currently viewing.

In many ways, photo search is very different from traditional web or text search. First, the goal of web search is usually to satisfy a particular information need, while with photo search the goal is often one of discovery; as such, it should be delightful as well as functional. We have taken this to heart throughout Flickr. For instance, our color search feature, which allows filtering by color scheme, and our style filters, which allow filtering by styles such as “minimalist” or “patterns,” encourage exploration. Second, in traditional web search, the goal is usually to match documents to a set of keywords in the query. That is, the query is in the same modality—text—as the documents being searched. Photo search usually matches across modalities: text to image. Text querying is a necessary feature of a photo search engine, but, as the saying goes, a picture is worth a thousand words. And beyond saving people the effort of so much typing, many visual concepts genuinely defy accurate description. Now, we’re giving our community a way to easily explore those visual concepts with the “…” button, a feature we call the similarity pivot.

The similarity pivot is a significant addition to the Flickr experience because it offers our community an entirely new way to explore and discover the billions of incredible photos and millions of incredible photographers on Flickr. It allows people to look for images of a particular style, it gives people a view into universal behaviors, and even when it “messes up,” it can force people to look at the unexpected commonalities and oddities of our visual world with a fresh perspective.

What is “similarity”?

To understand how an experience like this is powered, we first need to understand what we mean by “similarity.” There are many ways photos can be similar to one another. Consider some examples.

It is apparent that all of these groups of photos illustrate some notion of “similarity,” but each is different. Roughly, they are: similarity of color, similarity of texture, and similarity of semantic category. And there are many others that you might imagine as well.

What notion of similarity is best suited for a site like Flickr? Ideally, we’d like to be able to capture multiple types of similarity, but we decided early on that semantic similarity—similarity based on the semantic content of the photos—was vital to facilitate discovery on Flickr. This requires a deep understanding of image content for which we employ deep neural networks.

We have been using deep neural networks at Flickr for a while for various tasks such as object recognition, NSFW prediction, and even prediction of aesthetic quality. For these tasks, we train a neural network to map the raw pixels of a photo into a set of relevant tags, as illustrated below.

Internally, the neural network accomplishes this mapping incrementally by applying a series of transformations to the image, which can be thought of as a vector of numbers corresponding to the pixel intensities. Each transformation in the series produces another vector, which is in turn the input to the next transformation, until finally we have a vector that we specifically constrain to be a list of probabilities for each class we are trying to recognize in the image. To be able to go from raw pixels to a semantic label like “hot air balloon,” the network discards lots of information about the image, including information about  appearance, such as the color of the balloon, its relative position in the sky, etc. Instead, we can extract an internal vector in the network before the final output.

For common neural network architectures, this vector—which we call a “feature vector”—has many hundreds or thousands of dimensions. We can’t necessarily say with certainty that any one of these dimensions means something in particular as we could at the final network output, whose dimensions correspond to tag probabilities. But these vectors have an important property: when you compute the Euclidean distance between these vectors, images containing similar content will tend to have feature vectors closer together than images containing dissimilar content. You can think of this as a way that the network has learned to organize information present in the image so that it can output the required class prediction. This is exactly what we are looking for: Euclidian distance in this high-dimensional feature space is a measure of semantic similarity. The graphic below illustrates this idea: points in the neighborhood around the query image are semantically similar to the query image, whereas points in neighborhoods further away are not.

This measure of similarity is not perfect and cannot capture all possible notions of similarity—it will be constrained by the particular task the network was trained to perform, i.e., scene recognition. However, it is effective for our purposes, and, importantly, it contains information beyond merely the semantic content of the image, such as appearance, composition, and texture. Most importantly, it gives us a simple algorithm for finding visually similar photos: compute the distance in the feature space of a query image to each index image and return the images with lowest distance. Of course, there is much more work to do to make this idea work for billions of images.

Large-scale approximate nearest neighbor search

With an index as large as Flickr’s, computing distances exhaustively for each query is intractable. Additionally, storing a high-dimensional floating point feature vector for each of billions of images takes a large amount of disk space and poses even more difficulty if these features need to be in memory for fast ranking. To solve these two issues, we adopt a state-of-the-art approximate nearest neighbor algorithm called Locally Optimized Product Quantization (LOPQ).

To understand LOPQ, it is useful to first look at a simple strategy. Rather than ranking all vectors in the index, we can first filter a set of good candidates and only do expensive distance computations on them. For example, we can use an algorithm like k-means to cluster our index vectors, find the cluster to which each vector is assigned, and index the corresponding cluster id for each vector. At query time, we find the cluster that the query vector is assigned to and fetch the items that belong to the same cluster from the index. We can even expand this set if we like by fetching items from the next nearest cluster.

This idea will take us far, but not far enough for a billions-scale index. For example, with 1 billion photos, we need 1 million clusters so that each cluster contains an average of 1000 photos. At query time, we will have to compute the distance from the query to each of these 1 million cluster centroids in order to find the nearest clusters. This is quite a lot. We can do better, however, if we instead split our vectors in half by dimension and cluster each half separately. In this scheme, each vector will be assigned to a pair of cluster ids, one for each half of the vector. If we choose k = 1000 to cluster both halves, we have k2= 1000 * 1000 = 1e6 possible pairs. In other words, by clustering each half separately and assigning each item a pair of cluster ids, we can get the same granularity of partitioning (1 million clusters total) with only 2 * 1000 distance computations with half the number of dimensions for a total computational savings of 1000x. Conversely, for the same computational cost, we gain a factor of k more partitions of the data space, providing a much finer-grained index.

This idea of splitting vectors into subvectors and clustering each split separately is called product quantization. When we use this idea to index a dataset it is called the inverted multi-index, and it forms the basis for fast candidate retrieval in our similarity index. Typically the distribution of points over the clusters in a multi-index will be unbalanced as compared to a standard k-means index, but this unbalance is a fair trade for the much higher resolution partitioning that it buys us. In fact, a multi-index will only be balanced across clusters if the two halves of the vectors are perfectly statistically independent. This is not the case in most real world data, but some heuristic preprocessing—like PCA-ing and permuting the dimensions so that the cumulative per-dimension variance is approximately balanced between the halves—helps in many cases. And just like the simple k-means index, there is a fast algorithm for finding a ranked list of clusters to a query if we need to expand the candidate set.

After we have a set of candidates, we must rank them. We could store the full vector in the index and use it to compute the distance for each candidate item, but this would incur a large memory overhead (for example, 256 dimensional vectors of 4 byte floats would require 1Tb for 1 billion photos) as well as a computational overhead. LOPQ solves these issues by performing another product quantization, this time on the residuals of the data. The residual of a point is the difference vector between the point and its closest cluster centroid. Given a residual vector and the cluster indexes along with the corresponding centroids, we have enough information to reproduce the original vector exactly. Instead of storing the residuals, LOPQ product quantizes the residuals, usually with a higher number of splits, and stores only the cluster indexes in the index. For example, if we split the vector into 8 splits and each split is clustered with 256 centroids, we can store the compressed vector with only 8 bytes regardless of the number of dimensions to start (though certainly a higher number of dimensions will result in higher approximation error). With this lossy representation we can produce a reconstruction of a vector from the 8 byte codes: we simply take each quantization code, look up the corresponding centroid, and concatenate these 8 centroids together to produce a reconstruction. Likewise, we can approximate the distance from the query to an index vector by computing the distance between the query and the reconstruction. We can do this computation quickly for many candidate points by computing the squared difference of each split of the query to all of the centroids for that split. After computing this table, we can compute the squared difference for an index point by looking up the precomputed squared difference for each of the 8 indexes and summing them together to get the total squared difference. This caching trick allows us to quickly rank many candidates without resorting to distance computations in the original vector space.

LOPQ adds one final detail: for each cluster in the multi-index, LOPQ fits a local rotation to the residuals of the points that fall in that cluster. This rotation is simply a PCA that aligns the major directions of variation in the data to the axes followed by a permutation to heuristically balance the variance across the splits of the product quantization. Note that this is the exact preprocessing step that is usually performed at the top-level multi-index. It tends to make the approximate distance computations more accurate by mitigating errors introduced by assuming that each split of the vector in the production quantization is statistically independent from other splits. Additionally, since a rotation is fit for each cluster, they serve to fit the local data distribution better.

Below is a diagram from the LOPQ paper that illustrates the core ideas of LOPQ. K-means (a) is very effective at allocating cluster centroids, illustrated as red points, that target the distribution of the data, but it has other drawbacks at scale as discussed earlier. In the 2d example shown, we can imagine product quantizing the space with 2 splits, each with 1 dimension. Product Quantization (b) clusters each dimension independently and cluster centroids are specified by pairs of cluster indexes, one for each split. This is effectively a grid over the space. Since the splits are treated as if they were statistically independent, we will, unfortunately, get many clusters that are “wasted” by not targeting the data distribution. We can improve on this situation by rotating the data such that the main dimensions of variation are axis-aligned. This version, called Optimized Product Quantization (c), does a better job of making sure each centroid is useful. LOPQ (d) extends this idea by first coarsely clustering the data and then doing a separate instance of OPQ for each cluster, allowing highly targeted centroids while still reaping the benefits of product quantization in terms of scalability.

LOPQ is state-of-the-art for quantization methods, and you can find more information about the algorithm, as well as benchmarks, here. Additionally, we provide an open-source implementation in Python and Spark which you can apply to your own datasets. The algorithm produces a set of cluster indexes that can be queried efficiently in an inverted index, as described. We have also explored use cases that use these indexes as a hash for fast deduplication of images and large-scale clustering. These extended use cases are studied here.


We have described our system for large-scale visual similarity search at Flickr. Techniques for producing high-quality vector representations for images with deep learning are constantly improving, enabling new ways to search and explore large multimedia collections. These techniques are being applied in other domains as well to, for example, produce vector representations for text, video, and even molecules. Large-scale approximate nearest neighbor search has importance and potential application in these domains as well as many others. Though these techniques are in their infancy, we hope similarity search provides a useful new way to appreciate the amazing collection of images at Flickr and surface photos of interest that may have previously gone undiscovered. We are excited about the future of this technology at Flickr and beyond.


Yannis Kalantidis, Huy Nguyen, Stacey Svetlichnaya, Arel Cordero. Special thanks to the rest of the Computer Vision and Machine Learning team and the Vespa search team who manages Yahoo’s internal search engine.

Thumbs.db – what are they for and why should I care?

Published 7 Mar 2017 by Jenny Mitcham in Digital Archiving at the University of York.

Recent work I’ve been doing on the digital archive has made me think a bit more about those seemingly innocuous files that Windows (XP, Vista, 7 and 8) puts into any directory that has images in – Thumbs.db.

Getting your folder options right helps!
Windows uses a file called Thumbs.db to create little thumbnail images of any images within a directory. It stores one of these files in each directory that contains images and it is amazing how quickly they proliferate. Until recently I wasn’t aware I had any in my digital archive at all. This is because although my preferences in Windows Explorer were set to display hidden files, the "Hide protected operating system files" option also needs to be disabled in order to see files such as these.

The reason I knew I had all these Thumbs.db files was through a piece of DROID analysis work published last month. Thumbs.db ranked at number 12 in my list of the most frequently occurring file formats in the digital archive. I had 210 of these files in total. I mentioned at the time that I could write a whole blog post about this, so here it is!

Do I really want these in the digital archive? In my mind, what is in the ‘original’ folders within the digital archive should be what OAIS would call the Submission Information Package (SIP). Just those files that were given to us by a donor or depositor. Not files that were created subsequently by my own operating system.

Though they are harmless enough they can be a bit irritating. Firstly, when I’m trying to run reports on the contents of the archive, the number of files for each archive is skewed by the Thumb.db files that are not really a part of the archive. Secondly, and perhaps more importantly, I was trying to create a profile of the dates of files within the digital archive (admittedly not an exact science when using last modified dates) and the span of dates for each individual archive that we hold. The presence of Thumbs.db files in each archive that contained images gave the false impression that all of the archives had had content added relatively recently, when in fact all that had happened was that a Thumbs.db file had automatically been added when I had transferred the data to the digital archive filestore. It took me a while to realise this - gah!

So, what to do? First I needed to work out how to stop them being created.

After a bit of googling I quickly established the fact that I didn’t have the necessary permissions to be able to disable this default behaviour within Windows so I called in the help of IT Services.

IT clearly thought this was a slightly unusual request, but made a change to my account which now stops these thumbnail images being created by me. Being that I am the only person who has direct access to the born digital material within the archive this should solve that problem.

Now I can systematically remove the files. This means that they won’t skew any future reports I run on numbers of files and last modified dates.

Perhaps once we get a proper digital archiving system in place here at the Borthwick we won’t need to worry about these issues as we won’t directly interact with the archive filestore? Archivematica will package up the data into an AIP and put it on the filestore for me.

However, I will say that now IT have stopped the use of Thumbs.db from my account I am starting to miss them. This setting applies to my own working filestore as well as the digital archive. It turns out that it is actually incredibly useful to be able to see thumbnails of your image files before double clicking on them! Perhaps I need to get better at practicing what I preach and make some improvements to how I name my own image files – without a preview thumbnail, an image file *really* does benefit from a descriptive filename!

As always, I'm interested to hear how other people tackle Thumbs.db and any other system files within their digital archives.

This Month’s Writer’s Block

Published 7 Mar 2017 by Dave Robertson in Dave Robertson.


This Month’s Writer’s Block

Published 7 Mar 2017 by Dave Robertson in Dave Robertson.



Published 6 Mar 2017 by timbaker in Tim Baker.

The image on the left was taken a year ago when I had to renew my driver’s license, so I am stuck with it for the next 10 years. I don’t mind so much as it reminds me how far I’ve come. The photo on...

Week #7: 999 assists and no more kneeling

Published 4 Mar 2017 by legoktm in The Lego Mirror.

Joe Thornton is one assist away from reaching 1,000 in his career. He's a team player - the recognition of scoring a goal doesn't matter to him, he just wants his teammates to score. And his teammates want him to achieve this milestone too, as shown by Sharks passing to Thornton and him passing back instead of them going directly for the easy empty netter.

Oh, and now that the trade deadline has passed with no movement on the goalie front, it's time for In Jones We Trust:

via /u/MisterrAlex on reddit

In other news, Colin Kaepernick announced that he's going to be a free agent and opted out of the final year of his contract. But in even bigger news, he said he will stop kneeling for the national anthem. I don't know if he is doing that to make himself more marketable, but I wish he would have stood (pun intended) with his beliefs.

Songs for the Beeliar Wetlands

Published 2 Mar 2017 by Dave Robertson in Dave Robertson.

The title track of the forthcoming Kiss List album has just been included on an awesome fundraising compilation of 17 songs by local songwriters for the Beeliar wetlands. All proceeds go to #rethinkthelink. Get it while its hot! You can purchase the whole album or just the songs you like.

Songs for the Beeliar Wetlands: Original Songs by Local Musicians (Volume 1) by Dave Robertson and The Kiss List