Sam's news

Here are some of the news sources I follow.

My main website is at https://samwilson.id.au/.


Revealed truths

Published 23 Apr 2018 by in New Humanist Articles and Posts.

Many far-right politicians have also turned out to be highly superstitious. What’s the link between earthly and divine authority?

Installing MediaWiki on Mamp for Windows, fileinfo.dll is needed and not loading

Published 22 Apr 2018 by jbegley in Newest questions tagged mediawiki - Stack Overflow.

New to PHP, more of a tech then a developer.

I am using MAMP for Windows, (not WAMP, but the Windows build of MAMP). I am trying to install a local copy of MediaWiki on my Windows 10 box, (PHP v7.17/Apache). I get a nag on the startup script that fileinfo is required and missing. I have un-commented the fileinfo line in the PHP.INI file and restarted the server with no success. I have verified that the fileinfo.dll is in the PHP /EXT folder. Is there any way to verify that my expected version of PHP.INI is loading and that fileinfo.dll is loading?

I currently have Wordpress running on the same instance of MAMP with no issues. (Separate directory)

Thanks in advance.


Make invitation only private Mediawiki sign-up link work

Published 22 Apr 2018 by tobias47n9e in Newest questions tagged mediawiki - Stack Overflow.

How does one need to configure group permissions to make a Mediawiki project private (only-signed up people can read and edit) work together with the InviteSignup extension?

I want only people with an invitation token to be able to go to Special:CreateAccount, but when I whitelist it, it obviously is again reachable to anyone. If the page is not on the whitelist the signup with the token does not work. It might also be a bug?

The relevant settings in my LocalSettings.php are:

wfLoadExtension('InviteSignup');
$wgGroupPermissions['user']['invitesignup'] = true;

$wgGroupPermissions['*']['read']    = false;
$wgGroupPermissions['user']['read'] = true;
$wgGroupPermissions['user']['edit'] = true;
$wgWhitelistRead =  [
    #"Special:CreateAccount",
];

Together

Published 20 Apr 2018 by Matthew Roth in code.flickr.com.

Flickr is excited to be joining SmugMug!

We’re looking forward to some interesting and challenging engineering projects in the next year, and would love to have more great people join the team!

We want to talk to people who are interested in working on an inclusive, diverse team, building large-scale systems that are backing a much-loved product.

You can reach us by email at: iwanttowork@flickr.com

Read our announcement blog post and our extended Q&A for more details.

~The Flickr Team


2017 Year Review

Published 20 Apr 2018 by addshore in Addshore.

2017 has been a great year with continued work at WMDE on both technical wishes projects and also Wikibase / Wikidata related areas. Along the way I shared a fair amount of this through this blog, although not as much as I would have liked. Hopefully I’ll be slightly more active in 2018. Here are some fun stats:

Top 5 posts by page views in 2017 were:

  1. Guzzle 6 retry middleware
  2. Misled by PHPUnit at() method
  3. Wikidata Map July 2017
  4. Add Exif data back to Facebook images
  5. Github release download count – Chrome Extension

To make myself feel slightly better we can have a look at github and the apparent 1,203 contributions in 2017:

The post 2017 Year Review appeared first on Addshore.


The 2nd UK AtoM user group meeting

Published 20 Apr 2018 by Jenny Mitcham in Digital Archiving at the University of York.

I was pleased to be able to host the second meeting of the UK AtoM user group here in York at the end of last week. AtoM (or Access to Memory) is the Archival Management System that we use here at the Borthwick Institute and it seems to be increasing in popularity across the UK.

We had 18 attendees from across England, Scotland and Wales representing both archives and service providers. It was great to see several new faces and meet people at different stages of their AtoM implementation.

We started off with introductions and everyone had the chance to mention one recent AtoM triumph and one current problem or challenge. A good way to start the conversation and perhaps a way of considering future development opportunities and topics for future meetings.

Here is a selection of the successes that were mentioned:

  • Establishing a search facility that searches across two AtoM instances
  • Getting senior management to agree to establishing AtoM
  • Getting AtoM up and running
  • Finally having an online catalogue
  • Working with authority records in AtoM
  • Working with other contributors and getting their records displaying on AtoM
  • Using the API to drive another website
  • Upgrading to version 2.4
  • Importing legacy EAD into AtoM
  • Uploading finding aids into AtoM 2.4
  • Adding 1000+ urls to digital resources into AtoM using a set of SQL update statements

...and here are some of the current challenges or problems users are trying to solve:
  • How to bar code boxes - can this be linked to AtoM?
  • Moving from CALM to AtoM
  • Not being able to see the record you want to link to when trying to select related records
  • Using the API to move things into an online showcase
  • Advocacy for taking the open source approach
  • Working out where to start and how best to use AtoM
  • Sharing data with the Archives Hub
  • How to record objects alongside archives
  • Issues with harvesting EAD via OAI-PMH
  • Building up the right level of expertise to be able to contribute code back to AtoM
  • Working out what to do when AtoM stops working
  • Discovering that AtoM doesn't enforce uniqueness in identifiers for archival descriptions

After some discussion about some of the issues that had been raised, Louise Hughes from the University of Gloucestershire showed us her catalogue and talked us through some of the decisions they had made as they set this up. 

The University of Gloucestershire's AtoM instance

She praised the digital object functionality and has been using this to add images and audio to the archival descriptions. She was also really happy with the authority records, in particular, being able to view a person and easily see which archives relate to them. She discussed ongoing work to enable records from AtoM to be picked up and displayed within the library catalogue. She hasn't yet started to use AtoM for accessioning but hopes to do so in the future. Adopting all the functionality available within AtoM needs time and thought and tackling it one step at a time (particularly if you are a lone archivist) makes a lot of sense.

Tracy Deakin from St John's College, Cambridge talked us through some recent work to establish a shared search page for their two institutional AtoM instances. One holds the catalogue of the college archives and the other is for the Special Collections Library. They had taken the decision to implement two separate instances of AtoM as they required separate front pages and the ability to manage the editing rights separately. However, as some researchers will find it helpful to search across both instances a search page has been developed that accesses the Elasticsearch index of each site in order to cross search.

The interface for a shared search across St John's College AtoM sites

Vicky Phillips from the National Library of Wales talked us through their processes for upgrading their AtoM instance to version 2.4 and discussed some of the benefits of moving to 2.4. They are really happy to have the full width treeview and the drag and drop functionality within it.

The upgrade has not been without it's challenges though. They have had to sort out some issues with invalid slugs, ongoing issues due to the size of some of their archives (they think the XML caching functionality will help with this) and sometimes find that MySQL gets overwhelmed with the number of queries and needs a restart. They still have some testing to do around bilingual finding aids and have also been working on testing out the new functionality around OAI PMH harvesting of EAD.

Following on from this I gave a presentation on upgrading AtoM to 2.4 at the Borthwick Institute. We are not quite there yet but I talked about the upgrade plan and process and some decisions we have made along the way. I won't say any more for the time being as I think this will be the subject of a future blog post.

Before lunch my colleague Charles Fonge introduced VIAF (Virtual International Authority File) to the group. This initiative will enable Authority Records created by different organisations across the world to be linked together more effectively. Several institutions may create an authority record about the same individual and currently it is difficult to allow these to be linked together when data is aggregated by services such as The Archives Hub. It is worth thinking about how we might use VIAF in an AtoM context. At the moment there is no place to store a VIAF ID in AtoM and it was agreed this would be a useful development for the future.

After lunch Justine Taylor from the Honourable Artillery Company introduced us to the topic of back up and disaster recovery of AtoM. She gave the group some useful food for thought, covering techniques and the types of data that would need to be included (hint: it's not solely about the database). This was particularly useful for those working in small institutions who don't have an IT department that just does all this for them as a matter of course. Some useful and relevant information on this subject can be found in the AtoM documentation.

Max Communications are a company who provide services around AtoM. They talked through some of their work with institutions and what services they can offer.  As well as being able to provide hosting and support for AtoM in the UK, they can also help with data migration from other archival management systems (such as CALM). They demonstrated their crosswalker tool that allows archivists to map structured data to ISAD(G) before import to AtoM.

They showed us an AtoM theme they had developed to allow Vimeo videos to be embedded and accessible to users. Although AtoM does have support for video, the files can be very large in size and there are large overheads involved in running a video server if substantial quantities are involved. Keeping the video outside of AtoM and managing the permissions through Vimeo provided a good solution for one of their clients.

They also demonstrated an AtoM plugin they had developed for Wordpress. Though they are big fans of AtoM, they pointed out that it is not the best platform for creating interesting narratives around archives. They were keen to be able to create stories about archives by pulling in data from AtoM where appropriate.

At the end of the meeting Dan Gillean from Artefactual Systems updated us (via Skype) about the latest AtoM developments. It was really interesting to hear about the new features that will be in version 2.5. Note, that none of this is ever a secret - Artefactual make their road map and release notes publicly available on their wiki - however it is still helpful to hear it enthusiastically described.

The group was really pleased to hear about the forthcoming audit logging feature, the clever new functionality around calculating creation dates, and the ability for users to save their clipboard across sessions (and share them with the searchroom when they want to access the items). Thanks to those organisations that are funding this exciting new functionality. Also worth a mention is the slightly less sexy, but very valuable work that Artefactual is doing behind the scenes to upgrade Elasticsearch.

Another very useful meeting and my thanks go to all who contributed. It is certainly encouraging to see the thriving and collaborative AtoM community we have here in the UK.

Our next meeting will be in London in the autumn.

The 2nd UK AtoM user group meeting

Published 20 Apr 2018 by Jenny Mitcham in Digital Archiving at the University of York.

I was pleased to be able to host the second meeting of the UK AtoM user group here in York at the end of last week. AtoM (or Access to Memory) is the Archival Management System that we use here at the Borthwick Institute and it seems to be increasing in popularity across the UK.

We had 18 attendees from across England, Scotland and Wales representing both archives and service providers. It was great to see several new faces and meet people at different stages of their AtoM implementation.

We started off with introductions and everyone had the chance to mention one recent AtoM triumph and one current problem or challenge. A good way to start the conversation and perhaps a way of considering future development opportunities and topics for future meetings.

Here is a selection of the successes that were mentioned:

  • Establishing a search facility that searches across two AtoM instances
  • Getting senior management to agree to establishing AtoM
  • Getting AtoM up and running
  • Finally having an online catalogue
  • Working with authority records in AtoM
  • Working with other contributors and getting their records displaying on AtoM
  • Using the API to drive another website
  • Upgrading to version 2.4
  • Importing legacy EAD into AtoM
  • Uploading finding aids into AtoM 2.4
  • Adding 1000+ urls to digital resources into AtoM using a set of SQL update statements

...and here are some of the current challenges or problems users are trying to solve:
  • How to bar code boxes - can this be linked to AtoM?
  • Moving from CALM to AtoM
  • Not being able to see the record you want to link to when trying to select related records
  • Using the API to move things into an online showcase
  • Advocacy for taking the open source approach
  • Working out where to start and how best to use AtoM
  • Sharing data with the Archives Hub
  • How to record objects alongside archives
  • Issues with harvesting EAD via OAI-PMH
  • Building up the right level of expertise to be able to contribute code back to AtoM
  • Working out what to do when AtoM stops working
  • Discovering that AtoM doesn't enforce uniqueness in identifiers for archival descriptions

After some discussion about some of the issues that had been raised, Louise Hughes from the University of Gloucestershire showed us her catalogue and talked us through some of the decisions they had made as they set this up. 

The University of Gloucestershire's AtoM instance

She praised the digital object functionality and has been using this to add images and audio to the archival descriptions. She was also really happy with the authority records, in particular, being able to view a person and easily see which archives relate to them. She discussed ongoing work to enable records from AtoM to be picked up and displayed within the library catalogue. She hasn't yet started to use AtoM for accessioning but hopes to do so in the future. Adopting all the functionality available within AtoM needs time and thought and tackling it one step at a time (particularly if you are a lone archivist) makes a lot of sense.

Tracy Deakin from St John's College, Cambridge talked us through some recent work to establish a shared search page for their two institutional AtoM instances. One holds the catalogue of the college archives and the other is for the Special Collections Library. They had taken the decision to implement two separate instances of AtoM as they required separate front pages and the ability to manage the editing rights separately. However, as some researchers will find it helpful to search across both instances a search page has been developed that accesses the Elasticsearch index of each site in order to cross search.

The interface for a shared search across St John's College AtoM sites

Vicky Phillips from the National Library of Wales talked us through their processes for upgrading their AtoM instance to version 2.4 and discussed some of the benefits of moving to 2.4. They are really happy to have the full width treeview and the drag and drop functionality within it.

The upgrade has not been without it's challenges though. They have had to sort out some issues with invalid slugs, ongoing issues due to the size of some of their archives (they think the XML caching functionality will help with this) and sometimes find that MySQL gets overwhelmed with the number of queries and needs a restart. They still have some testing to do around bilingual finding aids and have also been working on testing out the new functionality around OAI PMH harvesting of EAD.

Following on from this I gave a presentation on upgrading AtoM to 2.4 at the Borthwick Institute. We are not quite there yet but I talked about the upgrade plan and process and some decisions we have made along the way. I won't say any more for the time being as I think this will be the subject of a future blog post.

Before lunch my colleague Charles Fonge introduced VIAF (Virtual International Authority File) to the group. This initiative will enable Authority Records created by different organisations across the world to be linked together more effectively. Several institutions may create an authority record about the same individual and currently it is difficult to allow these to be linked together when data is aggregated by services such as The Archives Hub. It is worth thinking about how we might use VIAF in an AtoM context. At the moment there is no place to store a VIAF ID in AtoM and it was agreed this would be a useful development for the future.

After lunch Justine Taylor from the Honourable Artillery Company introduced us to the topic of back up and disaster recovery of AtoM. She gave the group some useful food for thought, covering techniques and the types of data that would need to be included (hint: it's not solely about the database). This was particularly useful for those working in small institutions who don't have an IT department that just does all this for them as a matter of course. Some useful and relevant information on this subject can be found in the AtoM documentation.

Max Communications are a company who provide services around AtoM. They talked through some of their work with institutions and what services they can offer.  As well as being able to provide hosting and support for AtoM in the UK, they can also help with data migration from other archival management systems (such as CALM). They demonstrated their crosswalker tool that allows archivists to map structured data to ISAD(G) before import to AtoM.

They showed us an AtoM theme they had developed to allow Vimeo videos to be embedded and accessible to users. Although AtoM does have support for video, the files can be very large in size and there are large overheads involved in running a video server if substantial quantities are involved. Keeping the video outside of AtoM and managing the permissions through Vimeo provided a good solution for one of their clients.

They also demonstrated an AtoM plugin they had developed for Wordpress. Though they are big fans of AtoM, they pointed out that it is not the best platform for creating interesting narratives around archives. They were keen to be able to create stories about archives by pulling in data from AtoM where appropriate.

At the end of the meeting Dan Gillean from Artefactual Systems updated us (via Skype) about the latest AtoM developments. It was really interesting to hear about the new features that will be in version 2.5. Note, that none of this is ever a secret - Artefactual make their road map and release notes publicly available on their wiki - however it is still helpful to hear it enthusiastically described.

The group was really pleased to hear about the forthcoming audit logging feature, the clever new functionality around calculating creation dates, and the ability for users to save their clipboard across sessions (and share them with the searchroom when they want to access the items). Thanks to those organisations that are funding this exciting new functionality. Also worth a mention is the slightly less sexy, but very valuable work that Artefactual is doing behind the scenes to upgrade Elasticsearch.

Another very useful meeting and my thanks go to all who contributed. It is certainly encouraging to see the thriving and collaborative AtoM community we have here in the UK.

Our next meeting will be in London in the autumn.

Back to the classroom - the Domesday project

Published 20 Apr 2018 by Jenny Mitcham in Digital Archiving at the University of York.

Yesterday I was invited to speak to a local primary school about my job. The purpose of the event was to inspire kids to work in STEM subjects (science, technology, engineering and maths) and I was faced with an audience of 10 and 11 year old girls.

One member of the audience (my daughter) informed me that many of the girls were only there because they had been bribed with cake.

This could be a tough gig!

On a serious note, there is a huge gender imbalance in STEM careers with women only making up 23% of the workforce in core STEM occupations. In talking to the STEM ambassador who was at this event, it was apparent that recruitment in engineering is quite hard, with not enough boys OR girls choosing to work in this area. This is also true in my area of work and is one of the reasons we are involved in the "Bridging the Digital Gap" project led by The National Archives. They note in a blog post about the project that:

"Digital skills are vital to the future of the archives sector ...... if archives are going to keep up with the pace of change, they need to attract members of the workforce who are confident in using digital technology, who not only can use digital tools, but who are also excited and curious about the opportunities and challenges it affords."

So why not try and catch them really young and get kids interested in our profession?

There were a few professionals speaking at the event and subjects were varied and interesting. We heard from someone who designed software for cars (who knew how many different computers are in a modern car?), someone who had to calculate exact mixes of seed to plant in Sites of Special Scientific Interest in order to encourage the right wild birds to nest there, a scientist who tested gelatin in sweets to find out what animal it was made from, an engineer who uses poo to heat houses....I had some pretty serious competition!

I only had a few minutes to speak so my challenge was to try and make digital preservation accessible, interesting and relevant in a short space of time. You could say that this was a bit of an elevator pitch to school kids.

Once I got thinking about this I had several ideas of different angles I could take.

I started off looking at the Mount School Archive that is held at the Borthwick. This is not a digital archive but was a good introduction to what archives are all about and why they are interesting and important. Up until 1948 the girls at this school created their own school magazine that is beautifully illustrated and gives a fascinating insight into what life was like at the school. I wanted to compare this with how schools communicate and disseminate information today and discuss some of the issues with preserving this more modern media (websites, twitter feeds, newsletters sent to parents via email).

Several powerpoint slides down the line I realised that this was not going to be short and snappy enough.

I decided to change my plans completely and talk about something that they may already know about, the Domesday Book.

I began by asking them if they had heard of the Domesday Book. Many of them had. I asked what they knew about it. They thought it was from 1066 (not far off!), someone knew that it had something to do with William the Conqueror, they guessed it was made of parchment (and they knew that parchment was made of animal skin). They were less certain of what it was actually for. I filled in the gaps for them.

I asked them whether they thought this book (that was over 900 years old) could still be accessed today and they weren't so sure about this. I was able to tell them that it is being well looked after by The National Archives and can still be accessed in a variety of ways. The main barrier to understanding the information is that it is written in Latin.

I talked about what the Domesday Book tells us about our local area. A search on Open Domesday tells us that Clifton only had 12 households in 1086. Quite different from today!

We then moved forward in time, to a period of history known as 'The 1980's' (a period that the children had recently been studying at school - now that makes me feel old!). I introduced them to the BBC Domesday Project of 1986. Without a doubt one of digital preservation's favourite case studies!

I explained how school children and communities were encouraged to submit information about their local areas. They were asked to include details of everyday life and anything they thought might be of interest to people 1000 years from then. People took photographs and wrote information about their lives and their local area. The data was saved on to floppy disks (what are they?) and posted to the BBC (this was before email became widely available). The BBC collated all the information on to laser disc (something that looks a bit like a CD but with a diameter of about 30cm).

I asked the children to consider the fact that the 900 year old Domesday Book is still accessible and  think about whether the 30 year old BBC Domesday Project discs were equally accessible. In discussion this gave me the opportunity to finally mention what digital archivists do and why it is such a necessary and interesting job. I didn't go into much technical detail but all credit to the folks who actually rescued the Domesday Project data. There is lots more information here.

Using the Domesday Reloaded website I was then able to show them what information is recorded about their local area from 1986. There was a picture of houses being built, and narratives about how a nearby lake was created. There were pieces written by a local school child and a teacher describing their typical day. I showed them a piece that was written about 'Children's Crazes' which concluded with:

" Another new activity is break-dancing
 There is a place in York where you can
 learn how to break-dance. Break     
 dancing means moving and spinning on 
 the floor using hands and body. Body-
 popping is another dance craze where 
 the dancer moves like a robot."


Disappointingly the presentation didn't entirely go to plan - my powerpoint only partially worked and the majority of my carefully selected graphics didn't display.

A very broken powerpoint presentation

There was thus a certain amount of 'winging it'!

This did however allow me to make the point that working with technology can be challenging as well as perhaps frustrating and exciting in equal measure!


Back to the classroom - the Domesday project

Published 20 Apr 2018 by Jenny Mitcham in Digital Archiving at the University of York.

Yesterday I was invited to speak to a local primary school about my job. The purpose of the event was to inspire kids to work in STEM subjects (science, technology, engineering and maths) and I was faced with an audience of 10 and 11 year old girls.

One member of the audience (my daughter) informed me that many of the girls were only there because they had been bribed with cake.

This could be a tough gig!

On a serious note, there is a huge gender imbalance in STEM careers with women only making up 23% of the workforce in core STEM occupations. In talking to the STEM ambassador who was at this event, it was apparent that recruitment in engineering is quite hard, with not enough boys OR girls choosing to work in this area. This is also true in my area of work and is one of the reasons we are involved in the "Bridging the Digital Gap" project led by The National Archives. They note in a blog post about the project that:

"Digital skills are vital to the future of the archives sector ...... if archives are going to keep up with the pace of change, they need to attract members of the workforce who are confident in using digital technology, who not only can use digital tools, but who are also excited and curious about the opportunities and challenges it affords."

So why not try and catch them really young and get kids interested in our profession?

There were a few professionals speaking at the event and subjects were varied and interesting. We heard from someone who designed software for cars (who knew how many different computers are in a modern car?), someone who had to calculate exact mixes of seed to plant in Sites of Special Scientific Interest in order to encourage the right wild birds to nest there, a scientist who tested gelatin in sweets to find out what animal it was made from, an engineer who uses poo to heat houses....I had some pretty serious competition!

I only had a few minutes to speak so my challenge was to try and make digital preservation accessible, interesting and relevant in a short space of time. You could say that this was a bit of an elevator pitch to school kids.

Once I got thinking about this I had several ideas of different angles I could take.

I started off looking at the Mount School Archive that is held at the Borthwick. This is not a digital archive but was a good introduction to what archives are all about and why they are interesting and important. Up until 1948 the girls at this school created their own school magazine that is beautifully illustrated and gives a fascinating insight into what life was like at the school. I wanted to compare this with how schools communicate and disseminate information today and discuss some of the issues with preserving this more modern media (websites, twitter feeds, newsletters sent to parents via email).

Several powerpoint slides down the line I realised that this was not going to be short and snappy enough.

I decided to change my plans completely and talk about something that they may already know about, the Domesday Book.

I began by asking them if they had heard of the Domesday Book. Many of them had. I asked what they knew about it. They thought it was from 1066 (not far off!), someone knew that it had something to do with William the Conqueror, they guessed it was made of parchment (and they knew that parchment was made of animal skin). They were less certain of what it was actually for. I filled in the gaps for them.

I asked them whether they thought this book (that was over 900 years old) could still be accessed today and they weren't so sure about this. I was able to tell them that it is being well looked after by The National Archives and can still be accessed in a variety of ways. The main barrier to understanding the information is that it is written in Latin.

I talked about what the Domesday Book tells us about our local area. A search on Open Domesday tells us that Clifton only had 12 households in 1086. Quite different from today!

We then moved forward in time, to a period of history known as 'The 1980's' (a period that the children had recently been studying at school - now that makes me feel old!). I introduced them to the BBC Domesday Project of 1986. Without a doubt one of digital preservation's favourite case studies!

I explained how school children and communities were encouraged to submit information about their local areas. They were asked to include details of everyday life and anything they thought might be of interest to people 1000 years from then. People took photographs and wrote information about their lives and their local area. The data was saved on to floppy disks (what are they?) and posted to the BBC (this was before email became widely available). The BBC collated all the information on to laser disc (something that looks a bit like a CD but with a diameter of about 30cm).

I asked the children to consider the fact that the 900 year old Domesday Book is still accessible and  think about whether the 30 year old BBC Domesday Project discs were equally accessible. In discussion this gave me the opportunity to finally mention what digital archivists do and why it is such a necessary and interesting job. I didn't go into much technical detail but all credit to the folks who actually rescued the Domesday Project data. There is lots more information here.

Using the Domesday Reloaded website I was then able to show them what information is recorded about their local area from 1986. There was a picture of houses being built, and narratives about how a nearby lake was created. There were pieces written by a local school child and a teacher describing their typical day. I showed them a piece that was written about 'Children's Crazes' which concluded with:

" Another new activity is break-dancing
 There is a place in York where you can
 learn how to break-dance. Break     
 dancing means moving and spinning on 
 the floor using hands and body. Body-
 popping is another dance craze where 
 the dancer moves like a robot."


Disappointingly the presentation didn't entirely go to plan - my powerpoint only partially worked and the majority of my carefully selected graphics didn't display.

A very broken powerpoint presentation

There was thus a certain amount of 'winging it'!

This did however allow me to make the point that working with technology can be challenging as well as perhaps frustrating and exciting in equal measure!


W3C is in Singapore for Seamless Payments!

Published 20 Apr 2018 by J. Alan Bird in W3C Blog.

W3C Seamless Payments booth The W3C Web Payments Working Group produced the Web Payment API which has been adopted by the major Web Browsers and we’re continuing to see adoption by others in the Payments Ecosystem. With the rechartering of the Web Commerce Interest Group we’re seeing a lot of exciting new work starting in areas that are of interest to merchants, retailers, banks and others in the Commerce segment of the Payments Industry.

While our participation has been fairly broad in coverage, we would like to see more participation from the Commerce and Payments Industry in the Asian countries. To bring our message to this community W3C has chosen to participate in Seamless Payments Asia on 3 and 4 May 2018 in Singapore.

Our participation at this event is in three dimensions. I am chairing a Keynote Panel in the afternoon of the first day , will have a booth in the Exhibition area and be doing a presentation in the Exhibition Theater.

I’m really excited about our Panel! The topic is “Frictionless Commerce: How to Make Cross-Boarder Payments, e-Commerce and Retail Easier“. On that panel we have two W3C Members, Airbnb and Rakuten, and two non-Members in National Australia Bank and Rocket International. I think it will be an exciting conversation.

In the booth we are finalizing our agenda but our goal is to have over 10 demonstrations from W3C Members about how the work in various parts of W3C is impacting their products and business.

Last, but not least, I will be doing a presentation in the Exhibition Theater on the 2nd morning at 1000A on “Improving Payments on the Web – an update from W3C“.

If you or others from your company are going to be at the event, let’s get together! Contact either myself or Naomi Yoshizawa to set up a time.


Slim 3.10.0 released

Published 19 Apr 2018 by in Slim Framework Blog.

We are delighted to release Slim 3.10.0. This version has a couple of minor new features and a couple of bug fixes.

The most noticeable improvement is that we now support $app->redirect('/from', '/to') to allow quick and easy redirecting of one path to another without having to write a route handler yourself. We have also added support for the SameSite flag in Slim\Http\Cookies

As usual, there are also some bug fixes, particularly we no longer override the Host header in the request if it’s already defined.

The full list of changes is here


Pittsburgh, We’ll See Yinz at RailsConf!

Published 18 Apr 2018 by Jaime Woo in The DigitalOcean Blog.

Pittsburgh, We’ll See Yinz at RailsConf!

RailsConf has left the desert and makes its way to Steel City April 17-19, 2018. We’ll have Sam Phippen presenting, and several DO-ers checking out talks and tending our booth. Here’s what you need to know about RailsConf 2018.

In Sam’s talk, “Quick and easy browser testing using RSpec and Rails 5.1,” you'll learn about the new system specs in RSpec, how to set them up, and what benefits they provide. It’s for anyone wanting to improve their RSpec suite with full-stack testing.

From the talk description:

Traditionally doing a full-stack test of a Rails app with RSpec has been problematic. The browser wouldn't automate, capybara configuration would be a nightmare, and cleaning up your DB was difficult. In Rails 5.1 the new 'system test' type was added to address this. With modern RSpec and Rails, testing every part of your stack including Javascript from a browser is now a breeze.

Make sure you don’t miss it, Thursday, April 19, from 10:50 AM-11:30 AM in the Spirit of Pittsburgh Ballroom. If you’re interested in RSpec, you might dig his talk from 2017, “Teaching RSpec to Play Nice with Rails.”

You can also catch us in the Exhibit Hall, at booth number 520. The Hall is on Level 2, in Hall A. We’ll be hanging at our booth Wednesday, April 18 from 9:30 AM-6:00 PM, and Thursday, April 19 from 9:30 AM-5:15 PM.

See you there, or, as they say in Pittsburgh, meechinsdahnair!


How Spotify's algorithm is transforming pop

Published 18 Apr 2018 by in New Humanist Articles and Posts.

Spotify offers fans unparalleled access to music – but is it flattening culture into an incentivised blandness?

How to receive a notification when a wiki page has not been updated in a certain amount of time?

Published 18 Apr 2018 by NETGEAR R6220 in Newest questions tagged mediawiki - Stack Overflow.

I would like to receive a notification when a wiki page hasn't been updated hasn't been updated for a certain amount of time. I am currently checking 'Special:AncientPages' and this is good to check it manually, but is there a way of doing this automatically.

The notification could be via email or anything else, it doesn't really matter. Just need to find out if this is possible or not.


MediaWiki with two database servers

Published 18 Apr 2018 by Sam Wilson in Sam's notebook.

I’ve been trying to replicate locally a bug with MediaWiki’s GlobalPreferences extension. The bug is about the increased number of database reads that happen when the extension is loaded, and the increase happens not on the database table that stores the global preferences (as might be expected) but rather on the ‘local’ tables. However, locally I’ve had all of these running on the same database server, which makes it hard to watch the standard monitoring tools to see differences; so, I set things up on two database servers locally.

Firstly, this was a matter of starting a new MySQL server in a Docker container (accessible at 127.0.0.1:3305 and with its data in a local directory so I could destroy and recreate the container as required):

docker run -it -e MYSQL_ROOT_PASSWORD=pwd123 -p3305:3306 -v$PWD/mysqldata:/var/lib/mysql mysql

(Note that because we’re keeping local data, root’s password is only set on the first set-up, and so the MYSQL_ROOT_PASSWORD can be left off future invocations of this command.)

Then it’s a matter of setting up MediaWiki to use the two servers:

$wgLBFactoryConf = [
	'class' => 'LBFactory_Multi',
	'sectionsByDB' => [
		// Map of database names to section names.
		'mediawiki_wiki1' => 's1',
		'wikimeta' => 's2',
	],
	'sectionLoads' => [
		// Map of sections to server-name/load pairs.
		'DEFAULT' => [ 'localdb'  => 0 ],
		's1' => [ 'localdb'  => 0 ],
		's2' => [ 'metadb' => 0 ],
	],
	'hostsByName' => [
		// Map of server-names to IP addresses (and, in this case, ports).
		'localdb' => '127.0.0.1:3306',
		'metadb' => '127.0.0.1:3305',
	],
	'serverTemplate' => [
		'dbname'        => $wgDBname,
		'user'          => $wgDBuser,
		'password'      => $wgDBpassword,
		'type'          => 'mysql',
		'flags'         => DBO_DEFAULT,
		'max lag'       => 30,
	],
];
$wgGlobalPreferencesDB = 'wikimeta';

How can I check that my mediawiki in my virtual machine is not accessible from the internet?

Published 17 Apr 2018 by user8886193 in Newest questions tagged mediawiki - Stack Overflow.

I downloaded from bitnami a virtual machine for mediawiki. The installation is done and I can access the website from host (my real operating system). During the installation i received the ip adress for the acess.

How can I check that my mediawiki in my virtual machine is not accessible from the internet?

It is for me important that the guest and webserver is only accessible by me and that there is no communication through host and the other way round.

(I am a beginner and searched the internet and decided to ask my question here. If it is the wrong place please write me a suitable place for this question.)


A short update on the web-platform-test project invitation

Published 17 Apr 2018 by Philippe le Hegaret in W3C Blog.

For those of you involved in the effort to build a cross-browser testsuite for the majority of the web platform, you are currently noticing that the project is moving to a new organization on GitHub. The move is intended to improve the management and handling of the WPT project on GitHub and Travis.

There is a myriad of services and repositories related to this project and the complex transition is happening precociously, lead by Philip Jägenstedt. This week, past collaborators and teams are invited to join the new GitHub organization to prevent breaking read/write access in the future. If you’re looking into continuing your contributions, you should accept the invitation to make it easier for you after the transition is over.

This transition does not change or impact the W3C relationship with the project. It has been and will continue to be a consensus-driven open-source project with a mission to improve the Web through testing.


Episode 6: Daren Welsh and James Montalvo

Published 17 Apr 2018 by Yaron Koren in Between the Brackets: a MediaWiki Podcast.

Daren Welsh and James Montalvo are flight controllers and instructors at the Extravehicular Activity (EVA) group at the Johnson Space Center at NASA. They first set up MediaWiki for their group in 2011; since then, they have overseen the spread of MediaWiki throughout the flight operations directorate at Johnson Space Center. They have also done a significant amount of MediaWiki development, including, most recently, the creation of Meza, a Linux-based tool that allows for easy installation and maintenance of MediaWiki.

Links for some of the topics discussed:


How to force Mediawiki to write an ampersand in HTML without it being encoded as &

Published 16 Apr 2018 by user1258361 in Newest questions tagged mediawiki - Stack Overflow.

Automatic encoding of & as & breaks expected functionality for extensions that write HTML or JS if the script/HTML text being written depends on them.

For example, if you write a script with a boolean AND conditional, the && gets encoded as && which makes no sense.

https://www.mediawiki.org/wiki/Help:Formatting

Nowiki tags don't apply to this encoding. Is there an extension available that allows HTML output of plain &?


Why Specs Change: EPUB 3.2 and the Evolution of the Ebook Ecosystem

Published 16 Apr 2018 by Dave Cramer in W3C Blog.

Drawing of four species of Galapagos Finches, showing the different beak shapes It takes much more than a village to make an ebook. Authors, publishers, developers, distributors, retailers, and readers must all work together. EPUB* requires authoring and validation tools as well as reading systems. The EPUB standard depends on the HTML and CSS standards, among others. There are millions of existing EPUB 2 and EPUB 3 files out there. Change anywhere is felt everywhere.

As this ecosystem evolves, the EPUB standard itself sometimes has to change to keep up. When the Web moved to HTML5, enabling better semantic markup and better accessibility, it was clear that EPUB could benefit. EPUB 3.0, which was released in October 2011, supported HTML5 as well as scripting and multimedia. EPUB could now be used for more kinds of books, better books, more accessible books. EPUB 3 was a big deal, significantly different from, and better than, EPUB 2. Today there’s no reason to use EPUB 2, and yesterday is the best day to start producing EPUB 3.

Sometimes the need for change comes from innovation inside the ebook world. As Apple and Amazon developed fixed-layout ebooks in the early 2010s, the IDPF knew they had to create a standard, to avoid fragmenting the marketplace. Sometimes specs just have bugs, or implementations discover an ambiguity. Some changes are large, like moving to HTML5, and some changes are small, like allowing multiple dc:source elements in EPUB 3.0.1. EPUB 3.0.1 was ultimately a maintenance release, incorporating in the fixed-layout spec, slightly expanding what sorts of attributes were valid in EPUB, and fixing various bugs. Existing EPUB 3s didn’t need to change to support 3.0.1.

In 2016, the IDPF’s EPUB Working Group started working on a more substantive revision, which would become EPUB 3.1. The goal was to bring EPUB closer to the rest of the Web Platform, and make the spec simpler and easier to read. The former was done partly by trying to remove seldom-used features in EPUB that were not part of the larger Web, such as the epub:switch and epub:trigger elements. The Group also clarified the relationship with CSS, moving from an explicit profile of supported properties (which had little bearing on what was actually supported) to using the W3C’s own official definition of CSS, which evolves. It did the same with HTML, referring to the latest version of HTML5, whatever version that might be. But most of our ambitious ideas were scaled back or dropped, such as allowing the regular HTML serialization of HTML5 in EPUB. EPUB 3.1 was officially finished in January 2017, before the IDPF became part of the W3C.

But remember that the spec is only a part of the ecosystem. Two factors proved fatal to EPUB 3.1. First, there are hundreds of thousands of EPUB 3.0.X files already out there. EPUB 3.1 changed the value of the version attribute in the package file, and so those existing files would need to be edited to comply with the new spec, even if they didn’t use any of the removed features.

Second, the validation tool EpubCheck was never updated to support EPUB 3.1.  Unlike the web, the ebook ecosystem is highly dependent on formal validation. EpubCheck is the gatekeeper of the digital publishing world, the tool that verifies compliance with EPUB standards. But EpubCheck is in trouble. It’s maintained by a handful of volunteers, and has almost no resources. There’s a backlog of maintenance work and bug fixes to do. Fifteen months after the release of EPUB 3.1, it still is not supported by EpubCheck, and thus no one can distribute or sell EPUB 3.1 through the major retailers. The Publishing Business Group is currently working to ensure EpubCheck’s future. Stay tuned!

EPUB 3.1 was a good spec—better-organized, easier to understand, clearer about the relationship between EPUB and the underlying web technologies. The EPUB 3.0.1 features it removed were seldom used, and often unsupported. But after 3.1 was completed, many people decided that, even if almost no existing EPUB 3 content was rendered incompatible with the new spec (aside from the version attribute), the price was too high. Better to live with some obsolete features, and guarantee compatibility, than require too much change. EPUB was having its “don’t break the Web” moment.

Early this year, Makoto Murata and Garth Conboy proposed that we roll back some of the changes in EPUB 3.1. This updated spec would be known as EPUB 3.2. The goals were:

  1. Guarantee that any EPUB 3.0.1 publication conforms to EPUB 3.2.
  2. Ensure that EPUB 3.0.1 Reading systems would accept and render any EPUB 3.2 publication, although graceful fallback may sometimes be required.

If you already have EPUB 3 files, you don’t need to make any changes to existing content or workflow to adopt the forthcoming 3.2 spec. You just have a few more options, much like the change from 3.0 to 3.0.1. If you don’t already have EPUB 3 files, start now (making 3.0.1)! There’s no reason to wait.

EPUB 3.2 will still be based on EPUB 3.1, and keep many of the changes in 3.1 that don’t affect compatibility, such as referring to the latest versions of HTML5 and SVG, and using the official CSS Snapshot rather than the old profile. 3.2 will also continue to include WOFF2 and SNFT fonts as core media types. Perhaps most importantly, making EPUB 3.2 closer to EPUB 3.0.1 will require much less work to upgrade EpubCheck.


The W3C EPUB 3 Community Group has started to work on EPUB 3.2, with the explicit goal of remaining compatible with all existing EPUB 3.0.1 files, while retaining the best features of EPUB 3.1. I expect this work to take six months or so; others are more optimistic. When final, EPUB 3.2 will become a W3C Community Group Report, as Community Groups do not create W3C Recommendations.

We need your help! Join the EPUB 3 Community Group at https://www.w3.org/community/epub3/. It’s free, you don’t have to be a W3C member, and everyone is welcome. Much of the discussion of technical issues will happen on GitHub; our repository is at https://github.com/w3c/publ-epub-revision/.

You can look at the early drafts of our spec, too:

    1. EPUB 3.2 Overview
    2. EPUB 3.2 Specification
    3. EPUB Packages 3.2
    4. EPUB Content Documents 3.2
    5. EPUB Media Overlays 3.2
    6. EPUB Open Container Format

*EPUB® is an interchange and delivery format for digital publications, based on XML and Web Standards. An EPUB Publication can be thought of as a reliable packaging of Web content that represents a digital book, magazine, or other type of publication, and that can be distributed for online and offline consumption.


The wilderness in us: how human identity is formed by landscape

Published 16 Apr 2018 by in New Humanist Articles and Posts.

We have always been shaped by the natural landscape, just as it has been shaped by our history.

Collapsing section heading in Mediawiki MobileFrontend

Published 15 Apr 2018 by Manu in Newest questions tagged mediawiki - Stack Overflow.

In MobileFrontend for Mediawiki, sections can be collapsed using the parameter $collapseSectionsByDefault = true. But this only applies to H2 sections (= first-level titles) but not to inferior-level sections (H3, H4, H5 and H6 titles).

However, this function seems to exist, I've found up these lines in the line 240 of extensions/MobileFrontend/ressources/mobile.toggle/toggle.js (see here):

// Also allow .section-heading if some extensions like Wikibase
// want to toggle other headlines than direct descendants of $container.
$firstHeading = $container.find( '> h1,> h2,> h3,> h4,> h5,> h6,.section-heading' ).eq( 0 );
tagName = $firstHeading.prop( 'tagName' ) || 'H1';

How to toggle other headlines than H2 in MobileFrontend 1.30 or 1.31?

Thanks in advance!


Concurrent Python Wikipedia Package Requests

Published 14 Apr 2018 by delhics in Newest questions tagged mediawiki - Stack Overflow.

I am making a python application which uses the python Wikipedia package to retrieve the body text of 3 different Wikipedia pages. However, I am noticing very slow performance when retrieving the articles one at a time. Is there a method that I can use to retrieve the body text of 3 Wikipedia pages in parallel?


Firefox Add-on to skip mobile Wikipedia redirect

Published 14 Apr 2018 by legoktm in The Lego Mirror.

Skip Mobile Wikipedia on Firefox Add-ons

Lately, I've been reading Wikipedia on my phone significantly more than I used to. I get 15 minutes on the train each morning, which makes for some great reading time. But when I'm on my phone, Wikipedia redirects to the mobile website. I'm sure there are some people out there who love it, but it's not for me.

There's a "Desktop" button at the bottom of the page, but it's annoying and inconvenient. So I created my first Firefox Add-on, "Skip Mobile Wikipedia". It rewrites all requests to the mobile Wikipedia website to the standard canonical domain, and sets a cookie to prevent any further redirects. It works on the standard desktop Firefox and on Android.

Install the Add-on and view the source code.


April Community DO-ers: Meetup Edition

Published 13 Apr 2018 by Daniel Zaltsman in The DigitalOcean Blog.

April Community DO-ers: Meetup Edition

On the six-year voyage toward becoming the cloud platform for developers and their teams, we have received tremendous support from the larger developer community. We’ve seen hundreds of Meetups organized, pull requests submitted, tutorials written, and Q&As contributed, with even more ongoing activity. To show our appreciation, last month we introduced a new way to highlight some of our most active community contributors - our Community DO-ers!

Community DO-ers help make the community better through the content they create and the value they add. In addition to the Community homepage, we’ll regularly highlight Community DO-ers on the blog, Infrastructure as a Newsletter, social media, and to our growing internal community. In March, we were excited to bring you the trio of Marko, Mateusz, and Peter. This month, with a focus on our global Meetup community, we have three new individuals for you to get to know and celebrate with us. Without further ado, meet April’s featured Community DO-ers:

Aditya Patawari (@adityapatawari)

Aditya is an early adopter and advocate of DigitalOcean, so it’s no surprise that he became the first organizer of our second largest Meetup group, based in Bangalore. He has been producing Meetups since 2016 and has served as a speaker and panelist at consecutive DigitalOcean TIDE conferences. His talk on foolproofing business through infrastructure gap analysis was well received at TIDE New Delhi, and we later invited him to conduct an online webinar on setting up a multi-tier web application with Ansible. We’re extremely proud and excited to be working with him because of his passion for education and for helping the wider community.

Samina Fu (@sufuf3149)

For the second month running, we are proud to highlight the work of our active Taiwan community. Specifically, we are excited to recognize Samina Fu, a Network and Systems Engineering graduate of National Chiao Tung University in Taiwan. Samina is a co-organizer of our Hsinchu community, which she has been bringing together since early 2017. She helped to organize our first of 120 Hacktoberfest Meetups last year, and works closely with Peter Hsu (who we highlighted last month) as a core contributor to the CDNJS project.

David Endersby (@davidendersby1)

When David filled out our Meetup Organizer Application Form in September 2016, we didn’t know he would go on to lead one of our largest and most active Meetup communities. Since early 2017, David has worked hard to develop a blueprint for successfully running a new Meetup community, covering everything from starting out, to finding speakers, to time management, choosing a location, feeding attendees, and more. His efforts have produced a wealth of content and he has an ambitious plan for 2018. If you’re interested in joining, he welcomes you with open arms!


Aditya’s, Samina’s, and David’s efforts exemplify the qualities we are proud to see in our community. They all have a knack for educating the community (off- and online), promoting both learning and community collaboration. But there are so many others we have yet to recognize! We look forward to highlighting more of our amazing community members in the months to come.

Are you interested in getting more involved in the DigitalOcean community? Here are a few places to start:

Know someone who fits the profile? Nominate a member to be recognized in the comments!


MediaWiki CSS not loading, MIME type error

Published 13 Apr 2018 by user3462317 in Newest questions tagged mediawiki - Stack Overflow.

My MediaWiki install on Gandi.net is having CSS problems: The main page works fine. However, all of the other pages are unstyled, as though the browser can't access the CSS.

I've tried using the console to debug in Chrome and get the following error message:

Refused to apply style from 'http://jollof.mariadavydenko.com/wiki/load.php?debug=false&lang=en&modules=mediawiki.feedlink%2Chelplink%2CsectionAnchor%7Cmediawiki.legacy.commonPrint%2Cshared%7Cmediawiki.skinning.content.externallinks%7Cmediawiki.skinning.interface%7Cmediawiki.special.changeslist%7Cmediawiki.special.changeslist.enhanced%2Clegend%7Cskins.monobook.styles&only=styles&skin=monobook' because its MIME type ('text/html') is not a supported stylesheet MIME type, and strict MIME checking is enabled.

I am running PHP version 5.6 and MySQL version 5.7. I've tried the load.php .htaccess fix recommended for these symptoms but it doesn't work -- load.php loads just fine. Any help would be greatly appreciated.


Correct robots.txt structure? (Mediawiki)

Published 13 Apr 2018 by Filip Torphage in Newest questions tagged mediawiki - Stack Overflow.

I've been checking around in different sites robots.txt files and stumbled upon something I didn't expect at MediaWiki's robots.txt. From what I've read so far you would can write in a robots.txt file like below:

Disallow: foo
Noindex: bar

I then wonder if:

Disallow: /wiki/Category:Noindexed_pages

is a correct structure in a robots.txt file, or at least for mediawiki's part? Also wants to know if Noindexed_pages can be anything or if it is static.

The last code was taken from a wikipedia article of mediawiki's robots.txt.


The sugar you've never heard of, hope for degenerative eye disorders, and some light relief

Published 12 Apr 2018 by in New Humanist Articles and Posts.

Chemistry, Biology, Physics: Three scientists talk through big recent developments in their fields.

W3C’s WAI-ACT Project Identified as ‘Key Innovator’

Published 10 Apr 2018 by Shadi Abou-Zahra in W3C Blog.

Today the new EU Innovation Radar was launched to recognize high potential innovations and innovators in EU-funded research. Among them is the W3C WAI-ACT Project (grant 287725 of the 7th Framework Programme), which was ranked among the top ten ‘high capacity innovation projects’ (third place in SME ranking) in the JRC Science and Policy Report of 2014. The project was recognized for innovation in ‘Practical guidance on evaluation of Web Accessibility Initiative guidelines’.

The workflow diagram above depicts five sequential steps: 1. Define the evaluation scope; 2. Explore the target website; 3. Select a representative sample; 4. Audit the selected sample and 5. Report the findings. Each step has an arrow to the next step, and arrows back to all prior steps. This illustrates how evaluators proceed from one step to the next, and may return to any preceding step in the process as new information is revealed to them during the evaluation process.

The WAI-ACT Project carried out its work in 2011-2014 through existing W3C working groups under the W3C consensus process, with broad involvement from different key stakeholders. Results of the project include:

The WAI-ACT Project also included dedicated efforts to engage with the community and liaise we related efforts in Europe and internationally. For example, the project organized the W3C Workshop on Referencing and Applying WCAG 2.0 in Different Contexts, to better support the uptake of WCAG 2.0 internationally.

WAI-ACT was a multi-partner project led by W3C through ERCIM as its European host. The project partners included:

WAI-ACT is part of a series of EU-funded projects led by the W3C Web Accessibility Initiative (WAI). The latest in this series is the currently on-going WAI-Tools Project, which in many ways builds on the efforts of the WAI-ACT Project and later W3C efforts to provide more guidance on accessibility conformance evaluation.

We would like to take the opportunity of this recognition to thank the European Commission for their support over many years, the project partners, the W3C working groups, and the broader community, which made this work happen. I look forward to continued open collaboration through the W3C process.


Untitled

Published 10 Apr 2018 by Sam Wilson in Sam's notebook.

I find autogenerated API docs for Javascript projects (e.g.) so much more useful than those for PHP projects.


Morning joy

Published 9 Apr 2018 by Sam Wilson in Sam's notebook.

I love the morning time, while the brain is still sharp enough to focus on one thing and get it done, but dull enough not to remember the other things and derail everything with panic about there being too much to do. The morning is when the world properly exists, and is broad and friendly.


How can I display spaces in Media Wiki Page Title but not in the URL?

Published 9 Apr 2018 by David Ruess in Newest questions tagged mediawiki - Stack Overflow.

How can I display spaces in Media Wiki Page Title but not in the URL?

Desired Result: if someone types in example.com/w/John1:1-5 then I'd like the page title on that page to show John 1:1-5.

I realize I could create a page at example.com/w/John_1:1-5, but I don't want users to have to type the underscore.

Is there a way to do this without creating a redirect?

Thanks!


Mediawiki MathJax use with Template:math

Published 9 Apr 2018 by CatMan in Newest questions tagged mediawiki - Stack Overflow.

With the SimpleMathJax Extemsion installed on a MediaWiki 1.27 I would like to provide offline access to some wikipedia page code. The code uses MathML tags and a math template. The installation both of Mediawiki and SimpleMathJax were using the defaults settings. There is only the default Mediawiki:Common.css or Mediawiki:Common.js content installed.

All math tags are working fine. However, I am seeing strange artifacts when trying to use e.g. the following expressions

<math>A = 42</math>      // base for (1)
<math>A {=} 42</math>    // base for (2)
<math>'''V'''</math>     // base for (3)

with a template using the code

{{math|A = 42}}          // (1)
{{math|A {=} 42}}        // (2)
{{math|'''V'''}}         // (3)

The versions using the <math> tags are working as expected. This shows that SimpleMathJax is installed correctly, I would guess.

The original template code from Wikipedia's "Template:Math" did not do anything in my installation, so I used this code for the Template:Math

{{#tag:math|{{{1}}}}}

(For reproducing the problem in Mediawiki, simply create a new page named "Template:math" and copy the code above. Then add the template code above to any page and check it with "Preview")

Quite a few things work with this template, e.g. {{math|{{!}}\alpha_{minor}-q^4{{!}}}}. So it can not be totally wrong. However, for the examples above I get the following output:

1                      // for (1)
A[[:Template :=]]42    // for (2)
'''V'''                // for (3)

In the web I found that (1) would fail, because in a Template the '=' character is interpreted by the parser. It needs to read '{{=}}'. But the (2) shows that this does not work. The two curly brackets seem to be interpreted as template. The other parts are OK. In (3) I would have expected 'V' als bold. There are other cases where the Template does fail as well, e.g. italics and <sup>..</sup>.

My Question: What is wrong or what is the proper template code to get the MathML tags working with the {{math|}} syntax?


Maintain mediawiki pages templates

Published 9 Apr 2018 by Kosho-b in Newest questions tagged mediawiki - Stack Overflow.

I'm working with pywikibot to create and update several pages from serialized python objects.

Those python objects updating once a week, after this update I want to run bot that takes the current state for each object and update it to the specific wiki page, I'm talking just about updating templates arguments atm.

I can translate the python object the required templates arguments and I'm searching for convenient library to work with. I came with those problems:

  1. Logging the diff between the old and new template args
  2. Saving the arguments like pretty print output, (Each one in new line and so on - for future manual editing).
  3. When creating new page according to Known template I didn't found a way to get python object with the current template args and create it alone.

I checked those libraries:

  1. pywikibot - bah, work with templates is very hard and not intuitive (extract_templates_and_params_regex_simple & glue_template_and_params).
  2. mwparserfromhell - parsed_page.filter_templates() that good start, but I cant see the diff in an easy way and need to create the template for new pages manually.
  3. Wikipedia\mwclient seems to not give any advantage to work with templates.

Thank you.


Email doesn’t disappear

Published 9 Apr 2018 by Bron Gondwana in FastMail Blog.

More and more often we are seeing stories like this one from Facebook about who has control over your messages on closed platforms.

I keep saying in response: email is your electronic memory. Your email is your copy of a conversation. Nobody, from the lowliest spammer to the grand exulted CEO of a massive company, can remove or change the content of an email message they have sent to you.

At first glance, Facebook Messenger seems to work the same way. You can delete your copy of any message in a conversation, but the other parties keep their unchanged copy. However, it turns out that insiders with privileged access can change history for somebody else, creating an effect similar to gaslighting where you can no longer confirm your recollection of what was once said.

In short, centralised social networks are not a safe repository for your electronic memory. They can change their policies and retroactively change messages underneath you.

With email, it’s all based on open standards, and you can choose a provider you trust to retain messages for you.

FastMail is a provider you can trust

We have built our business on a very simple proposition: we proudly charge money in exchange for providing a service. This means our loyalties are not split. We exist to serve your needs.

Our top three values are all about exactly this. You are our customer, your data belongs to you, we are good stewards of your data.

The right to remember, and the right to forget

We provide tools to allow you to implement rules around retention (for example, you can have your Trash folder automatically purge messages after 30 days), but we don’t ever remove messages without your consent and intent.

If you do delete messages, we don’t destroy them immediately, because our experience has shown that people make mistakes. We allow a window of between one and two weeks in which deleted messages can be recovered (see technical notes at the end of this post for exact details).

Since 2010, our self-service tool has allowed you to restore those recently deleted messages. We don't charge using this service, it’s part of making sure that decisions about your data are made by you, and helping you recover gracefully from mistakes.

Because we only scan message content to build the indexes that power our great search tools and (on delivery) for spam protection – once messages are deleted, they’re really gone. You have the right to forget emails you don’t want to keep.

You’re in control

Thanks as always to our customers who choose what to remember, and what to forget. It’s your email, and you are in control of its lifecycle. Our role is to provide the tools to implement your will.

Nobody else decides how long you keep your email for, and nobody can take back a message they’ve sent you. Your email, your memory, your choice.

An Update

Since I started drafting this article, Facebook have doubled down on the unsend feature, saying that they will make it possible for anybody to remove past messages.

While it's definitely more equitable, I still don't think this is a good idea. People will work around it by screenshotting conversations, and it just makes the platform more wasteful of everybody's time and effort. Plus it's much easier to fake a screenshot than to fake up a live Facebook Messenger interface while scrolling back to show messages.

There are really a lot of bad things about unreliable messaging systems, which is exactly what Wired has to say about this rushed and poorly thought-out feature. Stick with email for important communications.


Technical notes:

We currently purge messages every Sunday when the server load is lowest – and only messages which were deleted over a week ago. Therefore the exact calculation for message retention is one week plus the time until the next Sunday plus however long it takes the server to get to your mailbox as it scans through all the mailboxes containing purged messages. Deleting files is surprisingly expensive on most filesystems, which is why we save it until the servers are least busy.

We also have backups, which may retain deleted messages for longer based on repack schedules, but which can’t be automatically restored with messages that were deleted longer than two weeks ago.


Defenders of the Earth

Published 9 Apr 2018 by in New Humanist Articles and Posts.

Environmental activists used to enjoy greater freedom than most­, but now they are under attack, from Modi’s India to Trump’s America

cardiParty 2018-04 Melbourne Open Mic Night

Published 8 Apr 2018 by Justine in newCardigan.

a GLAMRous storytelling event 20 April 2018 6.30pm

Find out more...


Mediawiki search form autocomplete not working on my custom skin

Published 7 Apr 2018 by Jose in Newest questions tagged mediawiki - Stack Overflow.

Hello gurus of Mediawiki,

I am having trouble with modifying one of my custom mediawiki skin (1.26). I followed the mediawiki skinning guide to create a search form within my BaseTemplate. I am using the provided API method makeSearchInput to create the search input box. But for some reason, its not doing the auto-complete as it is supposed to do. I have looked into other mediawiki skin examples, tried to duplicate the settings to see if I can get it to work, but nothing really helped.

 <form class="mw-search" role="form" id="searchform" action="<?php $this->text('wgScript'); ?>">
            <?php
              echo $this->makeSearchInput(array('id' => 'searchInput'));

              echo Html::hidden( 'title', $this->get( 'searchtitle' ) );
             ?>
 </form>

When I look into the network activity, all of the other skins where the autocomplete works, I can see the network connectivity sending commands to the api.php each time I input any character into the input box. But for some reason, it doesn't send anything on my own custom skin. It almost looks like it doesn't even attempt to send the query. I have been searching online but without any luck in discovering what the problem is. Since it works on the other skins on the same server, it's probably not the global settings that I am missing but it could be something that I missed on skin configuration. I am not trying to do any fancy modification, so I must be doing something silly. I have been struggling and wasting many hours on this, so now I am here asking for the help...

Does anyone have any idea on what could be causing this? Any help would be very very much appreciated.

Sincerely,


How to maintain mediawiki pags with pywikibot?

Published 6 Apr 2018 by Kosho-b in Newest questions tagged mediawiki - Stack Overflow.

i'm trying to work with mediawiki + pywikibot, i've write few bots to run with my mediawiki instance, they are based on local data that i have which update once a week.

I want this maintenance would be automatic as much as possible, which means runs some of my scripts\bot once of X time and send me emails for errors. With all that i'll want to combine running the "regular" bots (that comes with pywikibot).

The solution i know for that its to use automation server, lets say jenkins, but i want to make sure there is nothing that does that before i'll go with that solution.

There is anything that interface with pywikibot and fulfills the requirements above, how the big mediawiki based wiki's does that?

Thank you.


Untitled

Published 6 Apr 2018 by Sam Wilson in Sam's notebook.

I want a login-by-emailed-link feature for MediaWiki, so it’s easier to log in from mobile.


Wikidata Map March 2018

Published 6 Apr 2018 by addshore in Addshore.

It’s time for the first 2018 installation of the Wikidata Map. It has been roughly 4 months since the last post, which compared July 2017 to November 2017. Here we will compare November 2017 to March 2018. For anyone new to this series of posts you can check back at the progression of these maps by looking at the posts on the series page.

Each Wikidata Item with a Coordinate Location(P625)will have a single pixel dot. The more Items present, the more pixel dots and the more the map will glow in that area. The pixel dots are plotted on a totally black canvas, so any land mass outline simply comes from the mass of dots. You can find the raw data for these maps and all historical maps on Wikimedia Tool Labs.

Looking at the two maps below (the more recent map being on the right) it is hard to see the differences by eye, which is why I’ll use ImageMagik to generate a comparison image. Previous comparisons have used Resemble.js.

ImageMagik has a compare script that can highlight areas of change in another colour, and soften the unchanged areas of the image. The image below highlights the changed areas in violet while fading everything that remains unchanged between the two images. As a result all areas highlighted in violet have either had Items added or removed. These areas can then be compared with the originals to confirm that these areas are in fact additions.

If you want to try comparing two maps, or two other images, using ImageMagik then you can try out https://online-image-comparison.com/ which allows you to do this online!

What has changed?

The main areas of change that are visible on the diff are:

There is a covering of violet across the entire map, but these are the key areas.

If you know the causes for these areas of greatest increase, or think I have missed something important, then leave a comment below and I’ll be sure to update this post with links to the projects and or users.

Files on Commons

All sizes of the Wikidata map for March have been uploaded to Wikimedia Commons.

The post Wikidata Map March 2018 appeared first on Addshore.


New MediaWiki extension: AutoCategoriseUploads

Published 5 Apr 2018 by Sam Wilson in Sam's notebook.

New MediaWiki extension: AutoCategoriseUploads. It “automatically adds categories to new file uploads based on keyword metadata found in the file. The following metadata types are supported: XMP (many file types, including JPG, PNG, PDF, etc.); ITCP (JPG); ID3 (MP3)”.

Unfortunately there’s no code yet in the repository, so there’s nothing to test. Sounds interesting though.


integrate existing PHP session with MediaWiki

Published 5 Apr 2018 by Saleh Altahini in Newest questions tagged mediawiki - Stack Overflow.

I have a site and want to integrate a Wiki into it.

I know I can change my code to register and/or set wiki cookies when the user login but this will slow the system down especially since not every user will visit the wiki.

is there a way to make the wiki check if a PHP Session exists and automatically show the logged in users from the main site also logged in on the wiki?

I tried looking into SessionManager and AuthManager but the documentary is too complicated for me since it's my first time working with MediaWiki. if anyone can point me to the right part of the docs for me it will be very much appreciated.


Disparity between Wikipedia's "What links here" count and backlink count using recommended tool

Published 5 Apr 2018 by user1200 in Newest questions tagged mediawiki - Stack Overflow.

I am trying to retrieve a list of backlinks to a list of pages on the english wikipedia database. I first tried using the mediawiki api to collect all of the links, using the blcontinue parameter; however, when I queried certain pages (e.g., Canada) there were an inordinate amount of backlinks i.e., many, many thousand.

When I look in the "what links here" for the Canada page, and exclude redirects, there seem to be again an inordinate amount (https://en.wikipedia.org/w/index.php?title=Special:WhatLinksHere/Canada&namespace=0&limit=5000&hideredirs=1). I decided that at the current time, I could just do with the total rather than the full list of links, so I used the recommended tool (https://en.wikipedia.org/wiki/Help:What_links_here#Number_of_links) and queried the api for Canada, non-redirects (the default namespace is 0), effectively replicating the above query. Here's the documentation, https://dispenser.info.tm/~dispenser/cgi-bin/backlinkscount.py, and here's some sample R code:

bl_url <- "https://dispenser.info.tm/~dispenser/cgi-bin/backlinkscount.py"
  query_param <- list(
  title  = "Canada",
  filterredir = "nonredirects")

bbl <- GET(bl_url, query = query_param)

num_bl <- as.numeric(content(bbl))

> num_bl
[1] 353

here's the url produced by the call to the api:

https://dispenser.info.tm/~dispenser/cgi-bin/backlinkscount.py?title=Canada&filterredir=nonredirects

So the total returned is 353, much fewer than on the "what links here"

Am I missing something obvious?


nginx and mediawiki on a subdirectory of a different server

Published 4 Apr 2018 by It support in Newest questions tagged mediawiki - Stack Overflow.

So, Marketing has requested that our wiki, currently in wiki.example.com, needs to be on www.example.com/wiki

Now, www.example.com and wiki.example.com are two different servers; not only that, www.example.com is a nginx and wiki.example.com is an Apache2

I need to be sure wiki.example.com keeps working so I cannot touch the LocalSettings to adapt it (unless I create a copy?) and no combination of proxy_pass, rewrites, etc. has helped me through this so I'm asking for help :)

If anyone asks I can type all the different options I have tried but from /wiki to: ~ ^/wiki(.*)? ... Anything I've found I've tried (even the proxy_redirect, which imho doesn't make much sense here)

I see the problem though. The nginx sends the request to apache+mediawiki, which converts the URLs and sends it back... then Nginx doesn't know how to treat that and I get nothing but a funny url (http://www.example.com/wiki/index.php/Main_Page) but a 404 error.

I just deleted all configurations and starting from scratch again, any idea/comment will be greatly appreciated.

Edit> Currently using this:

location /wiki {
    access_log /home/example/logs/wiki-access.log combined;
    error_log /home/example/logs/wiki-error.log;

    try_files $uri $uri/ @rewrite;
}
location @rewrite {
    rewrite ^/(.*)$ /index.php?title=$1&$args;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header Host www.example.com;
    proxy_pass http://wiki.example.com;
    proxy_redirect http://wiki.example.com /wiki;
}

location / {
    # try to serve file directly, fallback to front controller
    try_files $uri /index.php$is_args$args;
}

Forwards the whole page to http://wiki.example.com That's not what I want :(


The madness of the quantum universe

Published 4 Apr 2018 by in New Humanist Articles and Posts.

The fundamental building blocks of matter have a strange dual nature.

See My Hat! new exhibition for children and families coming soon

Published 3 Apr 2018 by carinamm in State Library of Western Australia Blog.

SeeMyHat_JGP

Studio portrait of Ella Mackay wearing a paper hat, 1915, State Library of Western Australia, 230179PD

Featuring photographs and picture books from the State Library collections this exhibition is designed especially for children and families.  Dress hats, uniform hats, fancy dress hats are just some of the millinery styles to explore. Children and their families have the opportunity to make a hat and share a picture book together.

See My Hat! will be on display in the Story Place Gallery, Mezzanine floor from Tuesday 10 April – Wednesday 11 July.


Episode 5: Brian Wolff

Published 3 Apr 2018 by Yaron Koren in Between the Brackets: a MediaWiki Podcast.

Brian Wolff (username Bawolff) works in the Security team at the Wikimedia Foundation, and has been doing MediaWiki and MediaWiki extension development since 2009.   Links for some of the topics discussed:

The enduring appeal of Armageddon

Published 3 Apr 2018 by in New Humanist Articles and Posts.

Why is our popular culture so obsessed with the end of the world?

Mentoring Engineers Through An Engineering Fellowship Program

Published 3 Apr 2018 by Tom Spiegelman in The DigitalOcean Blog.

Mentoring Engineers Through An Engineering Fellowship Program

For two years, I’ve managed the Infrastructure Engineering (“Infra”) team at DigitalOcean. We’re responsible for managing all servers and machines up to the application layer. This includes hardware spec, firmware, component validation, base OS, configuration management, and hardware management (hardware management database).

In addition to my core responsibilities managing the Infra team, I wanted to foster an environment where mentorship was possible and worked with colleagues to create the Infrastructure Engineering Fellowship Program. It’s an immersive program where DigitalOcean employees from other teams “join” the Infra team for two weeks. Employees with fundamental Linux knowledge and some configuration management experience are eligible to participate.

“Fellows”—as they are known—are invited to a private Slack channel with fellowship alum. They work through JIRA tickets assigned to the team (all while pairing with Infra team engineers), attend team stand-ups, and finally, pick a project to work on for the two week duration. Additionally, fellows meet with me at the start and end of each week to discuss what they worked on and to answer questions they have. To date, we’ve had nine people complete the fellowship and we continue to open the fellowship up to other engineers at DO.

How the Fellowship Started

This program started as a cross-team training experience between my team and the Tier-2 Cloud Operations team (the 24/7 team responsible for uptime on our servers and services), since both of our teams interacted with each other on a daily basis. After a few successful trials with the Cloud Operations team, we realized that there were several other teams that were interested in learning what we do and wanted to take advantage of the fellowship program. We have now had people from five different teams sign up and participate in the program.

My team gets so much more out of the fellowship than we put in. First, we build comradery between the wider organization and my team. Individuals we only worked with through JIRA and Slack now have a personal relationship with the team and are more eager to engage and work with us. My team gains a better perspective of what other teams go through and work on a daily basis which helps us build better tools and workflows to support them. Finally, it is a great way to recruit. Engineers that have been hired for my team came through the fellowship program.

Growing people internally is one of the greatest things I have done with my career. I have had three people join my team from inside the company and have been very successful in their new roles. In a perfect world, we would pair every senior engineer on the team with one engineer still early in their career. In my experience, when looking at the “Tuckman's stages of group development” you will have the best performing team when you have mentors and mentees going through the four stages together as a team:

Mentoring Engineers Through An Engineering Fellowship Program

Tuckman's stages of group development. Photo credit: Tutorials Point

Managing the Fellowship Program

One of the things that we keep top of mind is sustainability. Although two weeks isn’t very long, properly mentoring someone takes a lot of time, and we want to make sure no one feels overwhelmed by the experience. We currently take on just one fellow at a time, and we cater the program to each participant. For example, if a fellow is more interested in hardware than big data, they might pair with our integration team who is charged with managing hardware and firmware, rather than our DevOps-focused team.

There are a few benefits of managing the fellowship this way. One, we can iterate quickly since the program lasts just two weeks. And two, we can focus our energies on mentoring just one person at a time to limit straining the team’s bandwidth. Based on feedback from past fellows, we’ve changed how we handle our 1:1s with engineers and code pairing sessions. We now conduct 1:1s with specific goals in mind. Each fellow is asked to give feedback at the very end of the program to help us guide future fellows.

That said, the same benefits are in some ways ongoing challenges. Working with each fellow individually takes up my time, but it also affects the engineers on my team. They need to take time out of their busy schedules to pair with the fellow by breaking their usual workflow and compelling them to walk through projects step by step. This means something that may take them an hour ends up taking most of a day.

That said, we’re able to make this work because we work on a number of tasks and projects at any given time. If a team is working on one long-term project, the time it takes to explain the project to someone won’t actually yield any benefit in a two-week long program. The fellowship program (and programs like it) really need to be catered to the participant and the team that they are embedding with.

What Makes It Worthwhile

As I pointed out earlier, pairing engineers with more senior engineers leads to better performing teams. Furthermore, there is an even stronger connection when you pair engineers that have proprietary or historical knowledge from inside the company. I am a firm believer that if strong minded, eager-to-learn engineers exist within the company, you shouldn’t hire from outside the company. Creating infrastructure that supports mentorship leads to strong engineers, strong teams, and a strong company.

I love seeing people continue to have conversations and work on projects with my team after the fellowship is over. It is simply amazing to see, and I give all the credit to the engineers on my team. Every one of them is eager to pass on knowledge that they have, and they’ve embraced the fellowship and its goals. The fellowship wouldn’t have been successful if my team didn’t share the same beliefs around mentorship and its cross-team benefits that I have.

Future of the Fellowship

When I started my career in IT, I had an amazing mentor (shout out to Rob Lahnemann) who really took me under his wing and taught me everything he could about programming, Linux, and networking. My manager at the time (shout out to Eric Austin) set this up and put me in a place to succeed as a mentee. This experience really influenced what I believe it means to be a good manager. Pairing engineers eager to learn with senior engineers is huge key factor in any successful team. In the current engineering community, it is not uncommon to find engineers who are not influenced to share their knowledge or are not given the time to be a mentor. But in my opinion, growing as an engineer means being a mentor.

In the future, I would love to see the program more of a revolving door of people doing more work with the Infrastructure Engineering team and doing the fellowship program multiple times (hopefully sometimes for longer than two weeks). I also would love to influence programs like this more often inside DigitalOcean and outside DigitalOcean. One of my biggest goals and drivers in writing this is to influence similar programs in the industry as a whole. My career and pace of growth was directly influenced by a strong mentor, so my passion here for influencing more mentor/mentee relations in the industry is high.

Tom Spiegelman is an Infrastructure Engineering Manager at DigitalOcean. He has an awesome dog, a great team, and is married to the amazing Chantal Spiegelman. He is passionate about all things tech, specifically infrastructure. You can find him on LinkedIn or on Twitter.


From 0 to Kubernetes cluster with Ingress on custom VMs

Published 2 Apr 2018 by addshore in Addshore.

While working on a new Mediawiki project, and trying to setup a Kubernetes cluster on Wikimedia Cloud VPS to run it on, I hit a couple of snags. These were mainly to do with ingress into the cluster through a single static IP address and some sort of load balancer, which is usually provided by your cloud provider. I faffed around with various NodePort things, custom load balancer setups and ingress configurations before finally getting to a solution that worked for me using ingress and a traefik load balancer.

Below you’ll find my walk through, which works on Wikimedia Cloud VPS. Cloud VPS is an openstack powered public cloud solution. The walkthrough should also work for any other VPS host or a bare metal setup with few or no alterations.

Step 0 – Have machines to run Kubernetes on

This walkthrough will use 1 master and 4 nodes, but the principle should work with any other setup (single master single node OR combined master and node).

In the below setup m1.small and m1.medium are VPS flavours on Wikimedia Cloud VPS. m1.small has 1 CPU, 2 GB mem and 20 GB disk; m1.medium has 2 CPU, 4 GB mem and 40 GB disk. Each machine was running debian-9.3-stretch.

One of the nodes needs to have a publicly accessible IP address (Floating IP in on Wikimedia Cloud VPS). In this walkthrough we will assign this to the first node, node-01. Eventually all traffic will flow through this node.

If you have firewalls around your machines (as is the case with Wikimedia Cloud VPS) then you will also need to setup some firewall rules. The ingress rules should probably be slightly stricter as the below settings will allow ingress on any port.

Make sure you turn swap off, or you will get issues with kubernetes further down the line (I’m not sure if this is actually the correct way to do this, but it worked for my testing):

sudo swapoff -a
sudo sed -i \'/ swap /d\' /etc/fstab

Step 1 – Install packages (Docker & Kubernetes)

You need to run the following on ALL machines.

These instructions basically come from the docs for installing kubeadm, specifically, the docker and kube cli tools section.

If these machines are new, make sure you have updated apt:

sudo apt-get update

And install some basic packages that we need as part of this install step:

sudo apt-get install -y apt-transport-https ca-certificates curl software-properties-common

Next add the Docker and Kubernetes apt repos to the sources and update apt again:

sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg | apt-key add -
sudo add-apt-repository "deb https://download.docker.com/linux/$(. /etc/os-release; echo "$ID") $(lsb_release -cs) stable"
sudo curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
sudo echo "deb http://apt.kubernetes.io/ kubernetes-xenial main" &gt; /etc/apt/sources.list.d/kubernetes.list
sudo apt-get update

Install Docker:

sudo apt-get install -y docker-ce=$(apt-cache madison docker-ce | grep 17.03 | head -1 | awk '{print $3}')

Install the Kube packages:

sudo apt-get install -y kubelet kubeadm kubectl

You can make sure that everything installed correctly by checking the docker and kubeadm version on all machines:

docker --version
kubeadm version

Step 2.0 – Setup the Master

Setup the cluster with a CIDR range by running the following:

sudo kubeadm init --pod-network-cidr=10.244.0.0/16

The init command will spit out a token, you can choose to copy this now, but don’t worry, we can retrieve it later.

At this point you can choose to update your own user .kube config so that you can use kubectl from your own user in the future:

mkdir -p $HOME/.kube
rm -f $HOME/.kube/config
sudo cp -if /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

Setup a Flannel virtual network:

sudo sysctl net.bridge.bridge-nf-call-iptables=1
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/k8s-manifests/kube-flannel-rbac.yml

These yml files are coming directly from the coreos/flannel git repository on GitHub and you can easily pin these files at a specific commit (or run them from your own copies). I used kube-flannel.yml and kube-flannel-rbac.yml

Step 2.1 – Setup the Nodes

Run the following for networking to be correctly setup on each node:

sudo sysctl net.bridge.bridge-nf-call-iptables=1

In order to connect the nodes to the master you need to get the join command by running the following on the master:

sudo kubeadm token create --print-join-command

Then run this join on command (the one output by the command above) on each of the nodes. For example:

sudo kubeadm join 10.68.17.50:6443 --token whverq.hwixqd5mb5dhjz1f --discovery-token-ca-cert-hash sha256:d15bb42ebb761691e3c8b49f31888292c9978522df786c4jui817878a48d79b4

Step 2.2 – Setup the Ingress (traefik)

On the master, mark node-01 with a label stating that it has a public IP address:

kubectl label nodes node-01 haspublicip=true --overwrite

And apply a manifest traefik:

kubectl apply -f https://gist.github.com/addshore/a29affcf75868f018f2f586c0010f43d

This manifest is coming from a gist on GitHub. Of course you should run this from a local static copy really.

Step 3.0 – Setup the Kubernetes Dashboard

This isn’t really required, at this stage your kubernetes cluster should already be working, but for testing things and visualizing the cluster the kubernetes dashboard can be a nice bit of eye candy.

You can use this gist deployment manifest to run the dashboard.

Note: You should alter the Ingress configuration at the bottom of the manifest. Ingress is currently set to kubernetes-dashboard.k8s-example.addshore.com and kubernetes-dashboard-secure.k8s-example.addshore.com. Some basic authentication is also added with the username “dashuser” and password “dashpass”

Step 3.1 – Setup a test service (guids)

Again, your cluster should all be setup at this point, but if you want a simple service to play around with you can use the alexellis2/guid-service docker image which was used in the blog post “Kubernetes on bare-metal in minutes

You can use this gist deployment manifest to run the service.

Note: You should alter the Ingress configuration at the bottom of the manifest. Ingress is currently set to guids.k8s-example.addshore.com.

This service returns simple GUIDs, including the container name that guid was generated from. For example:

$ curl http://guids.k8s-example.addshore.com/guid
{"guid":"fb426500-4668-439d-b324-6b34d224a7df","container":"guids-5b7f49454-2ct2b"}

Automating this setup

While setting up my own kubernetes cluster using the steps above I actually used the python library and command line tool called fabric.

This allowed me to minimize my entire installation and setup to a few simple commands:

fab provision
fab initCluster
fab setupIngressService
fab deployDashboard
fab deployGuids

I might write a blog post about this in the future, until then fabric is definitely worth a read. I much prefer it to other tools (such as ansible) for fast prototyping and repeatability.

Other notes

This setup was tested roughly 1 hour before writing this blog post with some brand new VMs and everything went swimmingly, however that doesn’t mean things will go perfectly for you.

I don’t think I ever correctly set swap to remain off for any of the machines.

If a machine goes down, it will not rejoin the cluster, you will have to manually rejoin it (the last part of step 2.1).

The post From 0 to Kubernetes cluster with Ingress on custom VMs appeared first on Addshore.


cardiCast Episode 30 – Annika Kristensen

Published 2 Apr 2018 by Justine in newCardigan.

Melbourne February 2018 cardiParty

Recorded live

Our February Melbourne cardiParty was held at ACCA, the Australian Centre for Contemporary Art, with Senior Curator Annika Kristensen taking us on a special tour of the exhibition Unfinished Business: Perspectives on art and feminism.
Annika concentrated her discussion on a number of key artworks in the exhibition.

https://acca.melbourne

newcardigan.org
glamblogs.newcardigan.org

 

Music by Professor Kliq ‘Work at night’ Movements EP.
Sourced from Free Music Archive under a Creative Commons licence.


v2.4.8

Published 2 Apr 2018 by fabpot in Tags from Twig.


GLAM Blog Club April 2018

Published 1 Apr 2018 by Hugh Rundle in newCardigan.

April is here, Daylight Savings is over, and we’re all happy to be on Easter holidays. Happiness was our theme for March and as usual the newCardigan community shared some great blog posts.

Rebecca was first in with a post about her amazing trip to the Anna Amalia Bibliothek. Kara, meanwhile, told us about the moment she realised that librarianship would make her happier than search engine optimisation. Our own happiness specialist Anne, shares seven tips to make you a happier librarian. The Andrews took us on a long digression about archiving Twitch streams, took happiness in other people’s happiness, and waxed lyrical about …books! Stacey’s happiness comes from climbing (literal) cliffs, whereas Clare likes to play word games and think about utopias. Alissa loves being a librarian and she’s not even sorry, whereas I am sorry that my GLAM Blog Club post in March was actually on our February theme: Watch. Nik loves looking at photos of people (hello Instagram!), whilst Lydia’s personal Happiness Project turns out to be her profession: lucky Lydia! Lucinda found happiness at Geelong Gallery’s Kylie on Stage exhibition, whilst Michaela, despite being overseas at an amazing conference, still found time to blog about the surprising happiness that comes from eating real poutine. Finally, new GLAM Blog Clubber Donna finds happiness in libraries.

For April, our theme is Control. Are you, perhaps, a little bit of a control freak? Or are you more interested in finding ways that GLAM institutions can hand control back to the communities we serve? Do you hope to one day take control as a CEO or Manager of an institution? Or are you just trying to find a way to control your email inbox? Let us all know!

Most importantly, make sure you use a controlled vocabulary before you publish your blog post. Use the tag GLAM Blog Club in your post, and #GLAMBlogClub for any social media posts linking to it. And of course if you haven’t done so yet, remember to register your blog at Aus GLAM Blogs. Happy blogging!


Include and parse file (hosted on external site) as wikitext with MediaWiki

Published 30 Mar 2018 by Punknoodles in Newest questions tagged mediawiki - Stack Overflow.

With MediaWiki, is there any way to include a text file hosted on another site/server and parse that file as wikitext? Is there any way to include a text file at all?


How to import and save all articles in my own MediaWiki database?

Published 30 Mar 2018 by coding-jewelry in Newest questions tagged mediawiki - Stack Overflow.

I have installed MediaWiki on my own Ubuntu server, with MySQL as database.

What I want to do is to import all English articles (only texts, not images) and save them in my own MediaWiki database.
How can I do this? If you have any experience with it, please let me know how to do it.


cardiParty 2018-04 with Eddie Marcus

Published 29 Mar 2018 by Andrew Kelly in newCardigan.

Join Eddie Marcus (the sharp mind behind the Dodgy Perth blog) for the shortest heritage pub trail ever. Eddie will explore the Greek and Roman architecture of three iconic Northbridge pubs. 6:30pm, Friday 13 April.

Find out more...


Digital preservation begins at home

Published 29 Mar 2018 by Jenny Mitcham in Digital Archiving at the University of York.

A couple of things happened recently to remind me of the fact that I sometimes need to step out of my little bubble of digital preservation expertise.

It is a bubble in which I assume that everyone knows what language I'm speaking, in which everyone knows how important it is to back up your data, knows where their digital assets are stored, how big they might be and even what file formats they hold.

But in order to communicate with donors and depositors I need to move outside that bubble otherwise opportunities may be missed.

A disaster story

Firstly a relative of mine lost their laptop...along with all their digital photographs, documents etc.

I won't tell you who they are or how they lost it for fear of embarrassing them...

It wasn’t backed up...or at least not in a consistent way.

How can this have happened?

I am such a vocal advocate of digital preservation and do try and communicate outside my echo chamber (see for example my blog for International Digital Preservation Day "Save your digital stuff!") but perhaps I should take this message closer to home.

Lesson #1:

Digital preservation advocacy should definitely begin at home

When a back up is not a back up...

In a slightly delayed response to this sad event I resolved to help another family member ensure that their data was 'safe'. I was directed to their computer and a portable hard drive that is used as their back up. They confessed that they didn’t back up their digital photographs very often...and couldn’t remember the last time they had actually done so.

I asked where their files were stored on the computer and they didn’t know (well at least, they couldn’t explain it to me verbally).

They could however show me how they get to them, so from that point I could work it out. Essentially everything was in ‘My Documents’ or ‘My Pictures’.

Lesson #2:

Don’t assume anything. Just because someone uses a computer regularly it doesn’t mean they know where they put things.

Having looked firstly at what was on the computer and then what was on the hard drive it became apparent that the hard drive was not actually a ‘back up’ of the PC at all, but contained copies of data from a previous PC.

Nothing on the current PC was backed up and nothing on the hard drive was backed up.

There were however multiple copies of the same thing on the portable hard drive. I guess some people might consider that a back up of sorts but certainly not a very robust one.

So I spent a bit of time ensuring that there were 2 copies of everything (one on the PC and one on the portable hard drive) and promised to come back and do it again in a few months time.

Lesson #3:

Just because someone says they have 'a back up' it does not mean it actually is a back up.

Talking to donors and depositors

All of this made me re-evaluate my communication with potential donors and depositors.

Not everyone is confident in communicating about digital archives. Not everyone speaks the same language or uses the same words to mean the same thing.

In a recent example of this, someone who was discussing the transfer of a digital archive to the Borthwick talked about a 'database'. I prepared myself to receive a set of related tables of structured data alongside accompanying documentation to describe field names and table relationships, however, as the conversation evolved it became apparent that there was actually no database at all. The term database had simply been used to describe a collection of unstructured documents and images.

I'm taking this as a timely reminder that I should try and leave my assumptions behind me when communicating about digital archives or digital housekeeping practices from this point forth.










Digital preservation begins at home

Published 29 Mar 2018 by Jenny Mitcham in Digital Archiving at the University of York.

A couple of things happened recently to remind me of the fact that I sometimes need to step out of my little bubble of digital preservation expertise.

It is a bubble in which I assume that everyone knows what language I'm speaking, in which everyone knows how important it is to back up your data, knows where their digital assets are stored, how big they might be and even what file formats they hold.

But in order to communicate with donors and depositors I need to move outside that bubble otherwise opportunities may be missed.

A disaster story

Firstly a relative of mine lost their laptop...along with all their digital photographs, documents etc.

I won't tell you who they are or how they lost it for fear of embarrassing them...

It wasn’t backed up...or at least not in a consistent way.

How can this have happened?

I am such a vocal advocate of digital preservation and do try and communicate outside my echo chamber (see for example my blog for International Digital Preservation Day "Save your digital stuff!") but perhaps I should take this message closer to home.

Lesson #1:

Digital preservation advocacy should definitely begin at home

When a back up is not a back up...

In a slightly delayed response to this sad event I resolved to help another family member ensure that their data was 'safe'. I was directed to their computer and a portable hard drive that is used as their back up. They confessed that they didn’t back up their digital photographs very often...and couldn’t remember the last time they had actually done so.

I asked where their files were stored on the computer and they didn’t know (well at least, they couldn’t explain it to me verbally).

They could however show me how they get to them, so from that point I could work it out. Essentially everything was in ‘My Documents’ or ‘My Pictures’.

Lesson #2:

Don’t assume anything. Just because someone uses a computer regularly it doesn’t mean they know where they put things.

Having looked firstly at what was on the computer and then what was on the hard drive it became apparent that the hard drive was not actually a ‘back up’ of the PC at all, but contained copies of data from a previous PC.

Nothing on the current PC was backed up and nothing on the hard drive was backed up.

There were however multiple copies of the same thing on the portable hard drive. I guess some people might consider that a back up of sorts but certainly not a very robust one.

So I spent a bit of time ensuring that there were 2 copies of everything (one on the PC and one on the portable hard drive) and promised to come back and do it again in a few months time.

Lesson #3:

Just because someone says they have 'a back up' it does not mean it actually is a back up.

Talking to donors and depositors

All of this made me re-evaluate my communication with potential donors and depositors.

Not everyone is confident in communicating about digital archives. Not everyone speaks the same language or uses the same words to mean the same thing.

In a recent example of this, someone who was discussing the transfer of a digital archive to the Borthwick talked about a 'database'. I prepared myself to receive a set of related tables of structured data alongside accompanying documentation to describe field names and table relationships, however, as the conversation evolved it became apparent that there was actually no database at all. The term database had simply been used to describe a collection of unstructured documents and images.

I'm taking this as a timely reminder that I should try and leave my assumptions behind me when communicating about digital archives or digital housekeeping practices from this point forth.










The challenge of calendaring

Published 29 Mar 2018 by David Gurvich in FastMail Blog.

We often focus on email functionality as it is the main focus of our product. However, FastMail has two other components - calendaring and contacts.

In this post we’re focusing on our calendar.

While calendaring has become an integral part of our flagship service, our calendar feature was only introduced in 2014, making it still relatively young in the history of FastMail. Remember we’ve been around since 1999, which might equate to around 100 in modern tech years…

Just like with email, providing a calendar function presents its own challenges. In short, doing calendaring well is, well, hard. One of the main reasons is that standards related to calendaring are still over the place. We’re working hard on making these standards more consistent so that we can improve online calendaring for everyone.

One of our core values is a commitment to open standards. We’re not looking to create a walled garden by developing proprietary technology where your data is locked down to one source or provider.

With FastMail continuing to use CalDAV and iCalendar it helps to drive open standards in online calendaring and helps us to help you to use your information as you choose, syncing between different service providers and devices (as with email).

The data in your FastMail calendars are stored in open formats and can be downloaded or backed up using any number of standard tools that speak standard protocols.

Community-minded calendaring

We are responsible members of many open source communities. We use, create, sponsor and contribute back to a number of projects, including the Cyrus email server.

A significant part of FastMail’s infrastructure runs on Cyrus, the open source email communication technology that was initially developed at CMU.

Right now one of our biggest projects is implementing JMAP as a new standard, which will help to extend the functionality of calendaring and replace CalDAV.

In order for us to live our values we also invest in our people. And when it comes to calendaring we’ve got a great team that helps us to improve and advance calendaring for all of our users, and hopefully the internet in general.

Ken Murchsion, one of our calendar experts, was crucial to getting calendaring off the ground. Without Ken, calendaring and Cyrus may have never happened.

When Cyrus lacked any calendaring functionality it was Ken, then a CMU employee, who took up a casual challenge as a pet project and managed to build a calendaring function with very basic features.

Ken is quick to point out part of Cyrus’ calendaring ongoing development was made possible by attending CalConnect and meeting and speaking to other developers.

photo
of Ken presenting at CalConnect

Ken met Bron around the 2.5 release of Cyrus, and this fortuitous meeting has laid the foundation for several improvements to the calendar and ongoing CalConnect attendances (and of course, Ken becoming a permanent member of the FastMail team).

For the last few years FastMail has been a member of CalConnect and attending this conference really is important to our ongoing development. Robert, another important part of our calendar team, recently wrote about the importance of CalConnect to FastMail.

Looking ahead

We’re hoping to see JMAP recognized as a new standard during 2018 and once this is fully implemented it will help to see many more improvements across email, calendars and contacts.

At a top level this will help to continually improve backend, performance, scheduling and subscriptions.

At a feature level we’re already testing out some exciting new technology. One of these being ‘consensus scheduling’ – recently discussed at CalConnect - which takes the original scheduling functionality and enables a client to send multiple time options for a meeting or appointment to a group of people. So instead of going back and forth to confirm a meeting time it can all be done within the calendar.

Another feature we’ve started to explore is a polling function that could eventually be applied to things such as meeting confirmations for service providers, further reducing the reliance on telephone-based appointment making. Currently, a formal RFC is underway to help implement a standard.

We’re looking forward to introducing ongoing calendar improvements and features into FastMail and we’ll formally announce these as they enter our production environment.

A special event on the calendar

Earlier this year Ken was the ninth recipient of the CalConnect Distingushed Service Award.

Photo
of the service award trophy

This award is a testament to Ken’s dedication to improving calendaring specification and standards. He is also the author of several RFCs and specifications, which have helped to define calendaring for users the world over.

Reflecting on his achievement, Ken remains as modest as ever, “it’s this interaction with other developers (in attending CalConnect) that is so important, testing and banging out code together.”

Ken’s achievements in the calendaring space are immense and he continues to help improve calendaring for all of us.

As our CEO Bron noted, “CyrusIMAP now has best-in-the-world implementation of calendars and contacts due to Ken’s involvement in CalConnect.”

Well done Ken!


Speaker Profile: Donna Edwards

Published 28 Mar 2018 by Rebecca Waters in DDD Perth - Medium.

Donna Edwards presenting at DDD Perth 2017 (DDD Perth Flickr Album)

Donna Edwards, a well known figure in the Perth software industry, presented at DDD Perth 2017 on Attraction and retention strategies for Women in Tech. She is the Events Manager for Women in Technology WA (WiTWA), on the committee for SQL Saturday, and VP of the Central Communicator Toastmasters club. I asked Donna about her experiences at DDD Perth.

From a Director of ACR, a General Manager at Ignia and more recently, the State Delivery Manager at Readify, you have 20-odd years experience in the IT industry. Can you tell me a little about your career to date?

I’ve worked in different roles within the IT industry from sales, to crawling under desks setting up PCs, to phone support and even installing hardware and software for people. In the past ten years I’ve focused on culture and business growth. My passion has always been creating awesome places to work, winning high quality work and growing a phenomenal team. More than anything I believe life is too short to not love what you do — so follow what you love and everything will work out 😊

Words to live by right there. You’re a seasoned presenter on speaker panels; I’ve seen you speak at a number of events. Was DDD Perth one of your first solo presentations at a conference?

Yes I really enjoy panels and have done quite a few previously however DDD was my first solo presentation (over ten minutes long). Getting selected for a 45 minute slot was a huge achievement and pretty scary I have to admit 😊

What helped you decide to submit a talk to DDD Perth?

I knew that DDD was trying to attract more women presenters after 2016 and I’d never actually submitted for a conference before so I saw it as a challenge! My partner was also submitting so we actually spent a day whilst we were on a cruise sitting out on the deck writing out submissions 😊 we both submitted two talks. I certainly didn’t expect to get selected and was probably hoping not to haha!

That sounds like a bit of a #BeBoldForChange pledge from International Women’s Day 2017. Have you a #PressForProgress goal for 2018?

For me, its always about doing more and continuing to strive to be better both personally as well as achieve more for the community each year. This year I am currently about to take on another three committee roles as well as continuing to focus on taking the WiTWA events to another level. We've sold out our last three events hitting record numbers of attendees (200). It is super exciting to see the level of engaged women in our tech community. Just this week I shared four tech events with all female panels / speakers which is brilliant to see! And it will only get bigger and better 😊

Back to DDD Perth…Did you enjoy the day? How about presenting?

The day was fantastic. I got to hear some brilliant talks from the likes of Patima and Nathan and also got roped into being on a panel with them later in the day! There was a great vibe and everyone seemed to be really enjoying themselves along with lots of familiar faces as is the Perth IT industry 😊 Presenting was actually super fun! We had a few technical issues so it started a bit late which made me a little nervous but once I got started I thoroughly enjoyed the experience. I had done LOADS of practice so I felt pretty comfortable with the slides and content which definitely saved me! It didn’t help that I was in a job interview process and the two potential bosses were both watching my presentation — no pressure. I must have done ok cause I got the job 😉

Oh Wow! That’s an interesting point. As someone who makes hiring decisions for the company you work for, do you like seeing presentations and the like on a curriculum vitae?

Absolutely - whether they get involved in community events by either presenting or volunteering is a huge positive when I am choosing between applicants.

What are you looking forward to seeing in DDD Perth 2018?

The level of diversity for 2017 was great so I’m keen to see that remain or improve for 2018. I’m pretty sure it will be even bigger and better after last year sold out so that’s super exciting! More great sponsors no doubt and hopefully an even bigger after party (which means it will be huge). Finally looking forward to learning a lot — the best thing about DDD is the variety of awesome speakers and topics so you can really tailor the day for what you are interested in.

Thanks for chatting to us, Donna!


Speaker Profile: Donna Edwards was originally published in DDD Perth on Medium, where people are continuing the conversation by highlighting and responding to this story.


Book review: Xenofeminism

Published 28 Mar 2018 by in New Humanist Articles and Posts.

Rejecting the claim that science and technology are inherently masculine or patriarchal, Xenofeminism looks at attempts to repurpose technology to liberate women.

"The market gives the illusion of being egalitarian"

Published 28 Mar 2018 by in New Humanist Articles and Posts.

Q&A with academic and Marxism expert David Harvey.

Publishing @ W3C goes to ebookcraft

Published 28 Mar 2018 by Tzviya Siegman in W3C Blog.

For many of us who work with ebooks, the highlight of our year is ebookcraft in Toronto. ebookcraft is a two-day conference devoted to ebook production, sponsored by Booknet Canada. The fifth edition was held last week, and it was a veritable who’s who of Publishing @ W3C.

Why do we love ebookcraft? It’s full of “practical tips and forward-thinking inspiration.” It’s impeccably organized, by the wizardly Lauren Stewart and her team. It’s warm and welcoming. There are cookies. More than half the speakers are women. It really is about making beautiful, accessible ebooks. Of course, that requires standards. The ebook world has suffered more than most, with interoperability being a dream rather than a reality. Many of the presenters are involved with standards work at W3C.

The first day of ebookcraft was devoted to workshops, where longer talks and smaller audiences allow for in-depth coverage of various topics. Naomi Kennedy (Penguin Random House) kicked off the day speaking about “Images in Ebooks,” addressing approaches to format, size, and color with the ever-popular Bill the Cat.

Romain Deltour (DAISY) asked his audience “Is Your EPUB Accessible?” I found out that mine was almost there but not quite (and I wrote some of the specs he was featuring, uh-oh!). Romain walked us through concepts such as how information gets from HTML to the user, what assistive technologies are, how to figure out if your content has accessibility support, and how to test your files. Romain is one of the developers behind Ace by DAISY, a command-line program to check EPUBs for accessibility, and he did a demo for us. Ace by DAISY is based on the EPUB Accessibility 1.0 spec.

There was a panel over lunch called “Everybody’s Working on the Weekend,” about volunteerism in digital publishing. The panelists were from Booknet Canada, some of the wonderful planners of the conference. Many of them also devote their time to standards development at Booknet Canada and other organizations. When it was time for audience participation, it was pretty clear that publishing is a world of volunteers. Everyone wants to help, but there’s a serious shortage of time and resources, given busy day jobs. And standards work can be daunting at first—we need to find ways to gently welcome newcomers.

Deborah Kaplan picked up after lunch with ”WAI-ARIA in Practice.” She walked us through ARIA best practices, perhaps most importantly when NOT to use ARIA. She also opened our eyes to the wide world of keyboard navigation and gave us a hefty reading list for learning more.

Peter Krautzberger spoke about MathML: Equation Rendering in ebooks offered an overview of the options available for equational content in EPUB. We looked at equations in SVG and MathML and many options for making them accessible.

Conference organizer Laura Brady participated in a panel with the NNELS (National Network of Equitable Library Services) called “We Tear Apart Your Ebooks.” The panel discussed the NNELS open system for sharing accessible publications. Once a book is in the NNELS system, it can be shared throughout Canada. Authorized users request accessible publications, and the NNELS team works to make them accessible. Laura recently audited several publishers in Canada to assess their level of accessibility (really not that great) and trained them to get much better.

On Day 2, we shifted from workshops to the big room. Who better to kick off the day than Liisa McCloy-Kelley, co-chair of the Publishing Business Group? Liisa’s topic was “Laser Focus: Don’t Get Distracter by that Shiny Object.” Liisa gave us a short tour of the history of ebooks and EPUB (and made sure we knew how to spell it). Publishing, reading, and writing have changed a lot over the years. We all get caught up on “shiny objects” that might catch our attention briefly, but it’s important to explore why you want to do it. Is it because a feature is cool? Is someone asking you to add it? Are you fixing something that’s annoying? Do you have a unique solution? There are many questions to ask that can help you decide whether you should implement a change, and when (and if) you will make the change. There are some issues that the entire industry must address. We need to stop making proprietary formats and embrace standards. Focus on improving image quality as screen quality improves. We should consider the external contexts provided by reading systems, how voice, AR, and VR might affect our content, and be patient.

The highlight of the day was Rachel Comerford’s “epub and chill” talk. Somehow Rachel managed to compare online dating with ebooks. The whole room was chanting “expose your metadata, not yourself.” The rules for dating and ebooks are pretty similar: 1. Remember Your Audience 2. Use Standards 3. Be Transparent 4. Don’t Play Hard to Get. I strongly recommend checking out the video when it becomes available.

Karen Myers (W3C) and I spoke about standards in Publishing@W3C in a talk entitled “Great Expectations—The Sequel.” We offered a brief history of Publishing@W3C and a deep dive into the work happening in the EPUB3 Community Group, the Publishing Business Group, and the Publishing Working Group. We offered a quick tour of the cast of characters that makes up the rest of the W3C. We shared some highlights from groups such as WOFF, WAI, and Verifiable Claims that could be of real interest and value to the the publishing community. We spoke about how to get involved and how to stay current.

Dave Cramer (co-chair of the EPUB 3 CG) and Jiminy Panoz went on an “Excellent CSS Adventure.” You’ll have to watch the video for Dave’s biblical opening. Dave and Jiminy explained the magic of CSS with some great tips, from the power of selectors and the cascade to the mysteries of pseudo-elements and inline layout.

Benjamin Young and I discussed an HTML-First Workflow at Wiley. We spoke briefly of Wiley’s 200+ year history of publishing books and journals. We have recently begun exploring an HTML-first workflow for our journal articles that looks at content apart from metadata. We have focused on layers of material. The content is in HTML. Metadata is in RDFa. Style is acheived with CSS, and internal processing is accomplished using HTML’s data-*. attribute. The Wiley team that is working on this project began with a set of technical requirements with the goal of improving output. It is still a work in progress, but we heard that lots of people are ready to dive into HTML now.

Ben Dugas offered his perspective as an ebook retailer at the End of the Conveyer Belt. Ben works in Content Operations at Kobo. His team looks at all the files that pass through Kobo’s pipeline. To summarize, content creation is hard, spec creation is hard, content QA is hard, and building support is hard. My favorite part of Ben’s presentation was when he pointed out that it takes a little time to get used to standards work, but once they got used to our quirks, they realized they had actual opinions and it was okay to offer them. Ben’s advice is to move on to EPUB 3 (and beyond), use epubcheck and Ace, test across platforms, think about the reader, and not accept the status quo. Sound advice.

If you’re involved in the creation of ebooks, be sure to come to ebookcraft in 2019! In the meantime, you can see what people said about ebookcraft on social media, follow @ebookcraft on Twitter, and eagerly await the videos of this year’s conference.

Many thanks to Dave Cramer for his thoughtful editing of this post.


How to Conduct Effective Code Reviews

Published 28 Mar 2018 by Billie Cleek in The DigitalOcean Blog.

How to Conduct Effective Code Reviews

A code review, at its core, is a conversation about a set of proposed changes. Early in my career, I viewed code reviews as a mostly technical exercise that should be devoid of non-technical concerns. I now see them as one of the few opportunities to concurrently learn and teach while also strengthening my relationship with my peers and colleagues.

My team, Delivery, has been working together for at least six months (some much longer), but only two members work in the New York City office while the rest are spread across North America. Because of our familiarity with each other, most of our daily interactions take place via text or video chat. Code reviews are often short, but we also go out of our way to communicate when we are stating an opinion or being nit-picky.

Most software developers are expected to participate in code reviews, and yet few are offered any training or guidance on conducting and participating in an effective code review. Participants attempt to find the most appropriate solution to a problem given the constraints of time, effort, and skills of all involved. But how do we have that conversation? What does an effective conversation look like? And what are the challenges of participating in a code review, and how can you overcome them?

Whether your tool of choice is GitHub, GitLab, Gerrit, or another tool, the goal of this article is to help you get as much out of the code review process as possible.

What Are Code Reviews For?

Code reviews happen in a wide range of contexts, and often the skills and depth of experience of participants vary widely. On open source projects, for example, participants may not have any sort of personal relationship with each other. Indeed, they may never communicate outside of the code review process. At the other end of the spectrum are code reviews where the participants have daily face-to-face interactions, such as when everyone works at the same company. A good participant will adjust how they participate in a code review according to their knowledge of the other participants.

While it is important to adjust one's communication style in accordance with the intended recipient, how to adjust is influenced by three primary factors: the purpose of the code review, the intended audience, and one's relationship to the audience.

Identifying the Purpose of a Code Review

Code reviews serve both technical and cultural purposes: finding bugs before they're integrated, identifying security concerns, ensuring style consistency with the existing codebase, maintaining code quality, training, fostering a greater sense of ownership, and giving other maintainers an opportunity to get familiar with the code before it's integrated are just some of the reasons you may be asked to participate in code reviews. Make sure you know why you're participating in a code review beforehand.

Regardless of why you’re conducting a code review, it is important to respect the purposes that code reviews serve for the codebase. If the only purpose of a code review is to check for security concerns, then drop whatever personal concerns you may have about coding style or naming patterns. Unfortunately, it is not uncommon for the purpose of code reviews to be poorly defined or non-existent. In that case, once you've determined that the proposed changes are necessary and add value, I'd suggest reviewing for correctness, bug identification, and security concerns. Secondary to those concerns may be overall quality and long term maintainability of the proposed changes.

Submissions: What to Include

Code reviews typically start with a contributor submitting a proposed set of changes to the project. The submission should include:

Depending on the complexity of the changes, reviewers may find an overview of the trade-offs the submitter made in the patch helpful in order to be better understand why the patch is the most appropriate of the possible alternatives.

Written communication about technical subjects can be difficult: people have limited time, and each of us is on a journey of confronting challenges and personal growth. In code reviews every participant has a role to play, each with its own set of objectives:

Regardless of your role in the review process, respect that others may be at a different place in their journey, and assume that all participants are engaging in the process in good faith and because of shared values and goals. The process is easiest when one assumes that all other participants are doing their utmost to help you succeed and get better.

Here's an example of a pull request from our team where I asked for clarification, discussed my concerns, and ultimately landed on a compromise that made the submission better and easier to maintain, all while gaining personal knowledge of the subject at hand:

How to Conduct Effective Code ReviewsHow to Conduct Effective Code Reviews

Example of how my team communicates in our code reviews.

Knowing Your Audience

Start by reading all the code. As a reviewer, recognize that the submitter gave their time and energy and tried to improve the product in some way. As you read and strive to understand the patch, record your questions and concerns privately so that you understand the full context before providing any feedback. As mentioned previously, make an honest effort to restrict your feedback to the purposes for which the code review is being conducted.

Prepare and submit your feedback after reading and understanding the changes. Be gracious. Try to keep your comments focused on the code and the solution it offers; avoid digressing into unrelated matters. If you see something surprising, ask questions. If you don't have a strong history with a submitter, go the extra mile to communicate your good intentions. It's OK to use emojis to communicate tone. Strive to begin fostering a healthy, productive relationship with this new contributor.

Your feedback in code reviews is one of the primary ways to build a community of developers eager to contribute to your project. By nurturing a strong community, you will promote a quality product. Especially for open source maintainers, an authentic, explicit “thank you for the contribution” or other nice words can go a long way towards making people feel appreciated and fostering a supportive community.

Take the feedback, evaluate it, and decide what to do next. For submitters, it can be difficult to read criticism of the code you have written. When a reviewer asks for changes, they are doing so for the same reason a patch author submits a patch: a genuine desire to improve the product. Remind yourself that feedback about code is not personal. You may decide to accept the feedback and change something. Or you may decide that there was a misunderstanding, and that some requested changes are unwarranted or would simply be wrong or add no value. It’s OK to push back.

Developing a Partnership Through Code Reviews

When there is an asymmetric level of experience between the submitter and reviewer, use the opportunity to mentor. As a reviewer with more experience than the submitter, you may choose to accept that submitter's patch as-is and then improve upon it, contacting the submitter to let them know about your changes later. In a professional setting, such an approach isn't always feasible. Have the conversation in the open so that observers (i.e. other readers) can learn too, but reach out for a more personal touch if the extent of feedback is becoming overwhelming in written form. In my experience, patches submitted by someone significantly more experienced than the reviewer are usually accepted as-is or with only very minor changes requested.

When you're thinking out loud, make it clear to the reader so that they do not think you are asking for a change inasmuch as evaluating a possibility. If you're nitpicking, explain your reasons for doing so. On our team, we often preface nit-picky comments with (nit), in order to help contributors recognize these types of comments. This usually serves as a signal that the contributor can ignore that feedback if they want. Without that distinction, the nitpicks are not distinguishable from the feedback that the reviewer feels more strongly about. For all participants: when you're unsure about something, ask, and err on the side of clarity and friendliness.

A successful code review will result in a higher quality change, strengthen the relationship between reviewer and submitter, and increase the understanding that everyone involved has of the project. Code reviews are not just a formality that require a rubber stamp before being merged; they are an essential aspect of modern software development that provide real value to projects and teams by promoting good software engineering practices.

Conclusion

Through code reviews, I've learned to be more gracious and more understanding about the personal challenges and technical struggles that everyone experiences. I have learned to more thoughtfully examine the trade-offs that we all make when writing software. I hope the ideas presented here can help you grow your community and increase your effectiveness.

Billie Cleek is a Senior Software Engineer on the Delivery team where he supports internal tools to provide a consistent deployment surface for DigitalOcean's microservices. In his spare time, Billie is a maintainer of vim-go, infrequent contributor to other open source projects, and can be found working on his 100-year-old house or in the forests of the Pacific Northwest regardless of the weather. You may also find Billie on GitHub and Twitter.


mod-rewrite include actual files in a MediaWiki URL

Published 27 Mar 2018 by finnrayment in Newest questions tagged mediawiki - Stack Overflow.

I just setup a local MediaWiki server and have been able to use mod-rewrite to do the following:

http://10.0.0.160/wiki/index.php?Main_Page

Into:

http://10.0.0.160/wiki/Main_Page

This works well now and I am happy with it however any file or folder is automatically redirected to the actual file, and not the wiki article. For example, MediaWiki installations contain a LocalSettings.php file at the root directory, and thus when I do the following:

http://10.0.0.160/wiki/LocalSettings.php

I am directed to the actual file and not:

http://10.0.0.160/wiki/index.php?LocalSettings.php

I hope there must be a way to do it, as when I go to Wikipedia and type in en.wikipedia.org/LocalSettings.php, I get a redirect response to the relevant page and therefore it doesn't go to their settings file.

The same issue happens with folders too.

The way I setup the redirecting was with a .htaccess in the main directory of the MediaWiki installation, and thus it contains the following directives as shown here in a MediaWiki tutorial:

RewriteEngine On
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} !-f
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} !-d
RewriteRule ^(.*)$ %{DOCUMENT_ROOT}/wiki/index.php [L]

Finally there is the path setup for my MediaWiki:

$wgScriptPath = "/wiki";
$wgArticlePath = "/wiki/$1";

Me not knowing much about Apache means I don't have a clue on where to go with this, so if more information is needed, please feel free to ask.


Expanding Participation in W3C – a new Membership Level!

Published 27 Mar 2018 by J. Alan Bird in W3C Blog.

As W3C continues to evolve the breadth and depth of our work, we need to continue to address how we’re packaging and pricing our Membership options. In 2012 we added a tier designed for Startup organizations, which has enabled a significant number of these organizations to join our work. In 2014 we introduced the Introductory Industry Membership to allow organizations with a singular focus on a specific industry segment’s use cases to have a seat at the table.  Again, by all measures this has been a successful program. With the IDPF combination, we put a Transitional Publishing Industry Membership together to allow former IDPF Members a way to engage with us at a rate that is based on their prior IDPF fees with the understanding that it would lead to becoming regular Members at the end of that program. As with the other two, we’ve seen a significant number of organizations join the work at W3C via this program.

In December 2017 there was a discussion between W3C and a significant publishing company which showed that the step up for that publisher from TPI to regular Membership was too big a step. We found this informative, as they are one of the bigger organizations in the Publishing Industry. Based on that conversation, and exploratory discussions with potential Members, the W3C Business Development Team, and within our current Membership, we defined a new Membership level which would be aimed at public organizations that have revenues between $50M and $500M USD.

Today, I’m pleased to announce that W3C is moving ahead with that as a trial program to determine if this offering will be successful in attracting new Members to W3C.  To determine if you qualify for this program, please go to this site. If you do qualify, the Membership Application System is ready for you! If you have any questions about the program, please don’t hesitate to send me a note to abird@w3.org.

Cheers,

J. Alan Bird W3C Global Business Development Leader


Midwest Heritage of Western Australia

Published 27 Mar 2018 by Sam Wilson in Sam's notebook.

Midwest Heritage of Western Australia is a terrific database of records of graves and deceased people in the mid-west region of WA.


Untitled

Published 27 Mar 2018 by Sam Wilson in Sam's notebook.

I joined newCardigan today.


Like cats and dogs: why humans keep pets

Published 26 Mar 2018 by in New Humanist Articles and Posts.

Domesticated animals have been part of human society for tens of thousands of years. But can we really call it friendship?

How to read the text of an old MediaWiki page from a SQL backup?

Published 26 Mar 2018 by Peter in Newest questions tagged mediawiki - Stack Overflow.

I have a SQL backup of an old Wiki page (approx. version 1.16) and would like to reuse the content of some of these pages for a newer project. Instead of reinstalling the complete wiki, I would like to fetch some parts the old content from these SQL files, but I have not found the content pages in my SQL backup. My guess is that page content is stored in "text" table, but this table rows are binary coded, so I'm not able to directly read the content.

Is my guess right that wiki page content is in table "text"? - If yes, how can I read the row content?


AggregateIQ Brexit and SCL

Published 25 Mar 2018 by addshore in Addshore.

UPDATE 02/04/2018: Looks like AggregateIQ may have had a contract with Cambridge Analytica, but didn’t disclose it because of an NDA… But all spoilt by a unsecure gitlab instance.  https://nakedsecurity.sophos.com/2018/03/28/cambridge-analyticas-secret-coding-sauce-allegedly-leaked/


I wonder why AggregateIQ state that they have never entered a contract with Cambridge Analytica, but don’t mention SCL. Except they do mention they have never been part of SCL or Cambridge Analytica…

Channel 4 report on Brexit and AggregateIQ

From the AggregateIQ website & press release:

AggregateIQ is a digital advertising, web and software development company based in Canada. It is and has always been 100% Canadian owned and operated. AggregateIQ has never been and is not a part of Cambridge Analytica or SCL. Aggregate IQ has never entered into a contract with Cambridge Analytica. Chris Wylie has never been employed by AggregateIQ.
AggregateIQ works in full compliance within all legal and regulatory requirements in all jurisdictions where it operates. It has never knowingly been involved in any illegal activity. All work AggregateIQ does for each client is kept separate from every other client.

Links

The post AggregateIQ Brexit and SCL appeared first on Addshore.


How to use Mediawiki to work with graph database

Published 24 Mar 2018 by Pooja Harsh Upadhyay in Newest questions tagged mediawiki - Stack Overflow.

I am trying to integrate graph database of content.rdf.u8 format with mediawiki. So far I have followed the mediawiki manual, https://www.mediawiki.org/wiki/Manual:What_is_MediaWiki%3F

I am stuck with the integration part, can anyone please suggest some ways/maual/tutorial to use mediawiki with graph database?

Thank you.


Love is

Published 24 Mar 2018 by jenimcmillan in Jeni McMillan.

Lovers

I am passing through countries, discarding them like forgotten lovers. Now when I think about love, I have many more things to say. I think love is a vulnerability, a willingness to trust someone with a precious heart. To be so child-like and joyous that dancing and singing is a natural state. A heightened awareness of the beloved. A look, a tiny movement, a sigh, a tremor, a breath, a heartbeat, these are the signs that reveal the inner state. But love passes, in the same way that that cities fade into the distance as I travel across Europe. That is what you tell me. And so, I continue my journey.

‘Take your joy and spread it across the world, he wrote.

At least begin with a smile and hug yourself, she thought.’


Resurrecting a MediaWiki instance

Published 24 Mar 2018 by in Posts on The bugalore.

This was my first time backing up and setting up a new MediaWiki (-vagrant) instance from scratch so I decided to document it in the hope that future me might find it useful. We (teams at Wikimedia) often use MediaWiki-Vagrant instances on Labs, err, Cloud VPS to test and demonstrate our projects. It’s also pretty handy to be able to use it when one’s local dev environment is out of order (way more common than you’d think).

Stories Behind the Songs

Published 24 Mar 2018 by Dave Robertson in Dave Robertson.

Every song has a story. Here’s a little background on the writing and recording of each of the songs on Oil, Love & Oxygen. It is sometimes geeky, sometimes political and usually personal, though I reserve the right to be coy when I choose!

  1. Close Your Mouth is a funny one to start with, because it’s the most vague in terms of meaning – I think there were ideas floating around in my head about over-thinking in relationships, but it is not about anything specific. The “bed” of this track was a live take with drums and semi-electric guitar using just a pair of ribbon microphones – very minimalist! There is some beautiful crazy saxophone from Professor Merle in the background of the mix at the 1:02 minute mark.
  2. Good Together is one of my oldest songs, and the recording of it started eight years ago! It features catchy accordion from Cat Kohn (now Melbourne based) and a dreamy electric guitar solo from Ken Williford (now works for NASA). The lyrics are fairly direct storytelling, so I don’t feel the need to elaborate.
  3. Oil, Love & Oxygen. I’ve been banging on about the climate crisis for more than twenty years, and this is the song where I most directly address the emotional side of it. For the lyric writing nerds: I used a triplet syllable stress pattern in the verses. The choir part was an impromptu gathering of friends at the end of a house concert. I first played this song as a duo with Marie O’Dwyer who plays the piano part on this version. The almost subliminal organ part is Rachel playing a 1960s electric organ she found on the side of the road.
  4. The Relation Ship I wrote this on the ferry to Rotto. The “pain body” concept in the chorus comes from Eckhart Tolle’s book A New Earth, and is similar to the sankhara concept in Vipassana. For the A capella intro I experimented with double tracking the band singing together around a mid (omni) / side (ribbon) microphone setup, without using headphones.
  5. Perfect as Cats. As I kid I was fascinated by the big cats, especially snow leopards. This song is not about snow leopards. The drums and bass here were the only parts of the album recorded in a purpose built studio (the old Shanghai Twang). Ben Franz plays the double bass and Rob Binelli the drums (one of the six drummers on the album!).
  6. Dull Ache. Sometimes I wished I lived in Greece, Italy, The Philippines, Costa Rica, Mexico, Ecuador, Nigeria or Spain. The common theme here is the siesta! I’m not at my best in the mid arvo, partly because my sensitive eyes get weary in our harsh sun. Around 4 or 5pm the world becomes a softer place to me, and my mojo returns. This song is also more generally about existential angst and depression. Always reach out for support when you need it – it is not easy dealing with these crazy grey soft things behind our eyes. I love Rob’s crazy guitars on the second half of this song – they are two full takes panned either side without edits.
  7. Kissing and Comedy was inspired by a quote from Tom Robbins’s novel Even Cowgirls Get The Blues: “Maybe the human animal has contributed really nothing to the universe but kissing and comedy–but by God that’s plenty.” I wrote it on the Overland Train. The drums are a single playful take by Angus Diggs, recorded in Dave Johnson’s bedroom with my trusty pair of ribbon mics, and the song was built up from there.
  8. Now That We’ve Kissed was co-written with Ivy Penny and is about being kissed by famous people (which I haven’t) and the implications of kisses in general. The things that “come from a kiss” were literally phoned in by friends.
  9. Rogue State was written in 2007, just prior to the Australian federal election and the Bali Climate Change Conference. It reflects on Australia’s sabotage of progress on climate change at the Kyoto conference in 1997, as documented in books such as Guy Pearse’s “High & Dry: John Howard, Climate Change and the Selling of Australia’s Future” and Clive Hamilton’s “Scorcher”. I had no intention of putting this old song on the album, until the last minute when I decided it was still sadly relevant given so many politicians still show a lack of respect and understanding of science and the planet that supports us. The recording of the song was also an excuse to feature a bit of Peter Grayling cello magic.
  10. Montreal was the first song I wrote on ukulele, though I ended up recording it with guitar. Sian Brown, who helped greatly with recording my vocals for the album, makes a harmony cameo at the end of the song. As for the lyrics, it’s a fairly obvious bittersweet love song.
  11. I Stood You Up is my account of attending a fantastic music camp called Rhythm Song, and kicking myself for not following through with a potential jam with songwriting legend Kristina Olsen. One of her pieces of advice to performers is to make their audience laugh to balance out the sadder songs in a set. The song was written in a mad rush two hours before a Song Club when I thought “What music can I write quickly?… Well I don’t have a blues song yet!”. This version was largely recorded prior to The Kiss List taking shape, so it features multiple guest musicians who are listed in the liner notes.
  12. Measuring the Clouds I wrote for my Dad’s birthday a few years ago. He used to be a Weather Observer in the 60s, sending up the big balloons etc. from many locations around WA such as Cocos Island. He had a beautiful eccentric sense of humour and would answer the phone with “Charlie’s Chook House and Chicken Factory, Chief Chook speaking”. The musical challenge I set myself with this song was to use a five bar pattern in the verse. A cello part was recorded, but was dropped in the mixing when I decided it made the song feel too heavy and I wanted it to feel light and airy.

Share


Stories Behind the Songs

Published 24 Mar 2018 by Dave Robertson in Dave Robertson.

Every song has a story. Here’s a little background on the writing and recording of each of the songs on Oil, Love & Oxygen. It is sometimes geeky, sometimes political and usually personal, though I reserve the right to be coy when I choose!

  1. Close Your Mouth is a funny one to start with, because it’s the most vague in terms of meaning – I think there were ideas floating around in my head about over-thinking in relationships, but it is not about anything specific. The “bed” of this track was a live take with drums and semi-electric guitar using just a pair of ribbon microphones – very minimalist! There is some beautiful crazy saxophone from Professor Merle in the background of the mix at the 1:02 minute mark.
  2. Good Together is one of my oldest songs, and the recording of it started eight years ago! It features catchy accordion from Cat Kohn (now Melbourne based) and a dreamy electric guitar solo from Ken Williford (now works for NASA). The lyrics are fairly direct storytelling, so I don’t feel the need to elaborate.
  3. Oil, Love & Oxygen. I’ve been banging on about the climate crisis for more than twenty years, and this is the song where I most directly address the emotional side of it. For the lyric writing nerds: I used a triplet syllable stress pattern in the verses. The choir part was an impromptu gathering of friends at the end of a house concert. I first played this song as a duo with Marie O’Dwyer who plays the piano part on this version. The almost subliminal organ part is Rachel playing a 1960s electric organ she found on the side of the road.
  4. The Relation Ship I wrote this on the ferry to Rotto. The “pain body” concept in the chorus comes from Eckhart Tolle’s book A New Earth, and is similar to the sankhara concept in Vipassana. For the A capella intro I experimented with double tracking the band singing together around a mid (omni) / side (ribbon) microphone setup, without using headphones.
  5. Perfect as Cats. As I kid I was fascinated by the big cats, especially snow leopards. This song is not about snow leopards. The drums and bass here were the only parts of the album recorded in a purpose built studio (the old Shanghai Twang). Ben Franz plays the double bass and Rob Binelli the drums (one of the six drummers on the album!).
  6. Dull Ache. Sometimes I wished I lived in Greece, Italy, The Philippines, Costa Rica, Mexico, Ecuador, Nigeria or Spain. The common theme here is the siesta! I’m not at my best in the mid arvo, partly because my sensitive eyes get weary in our harsh sun. Around 4 or 5pm the world becomes a softer place to me, and my mojo returns. This song is also more generally about existential angst and depression. Always reach out for support when you need it – it is not easy dealing with these crazy grey soft things behind our eyes. I love Rob’s crazy guitars on the second half of this song – they are two full takes panned either side without edits.
  7. Kissing and Comedy was inspired by a quote from Tom Robbins’s novel Even Cowgirls Get The Blues: “Maybe the human animal has contributed really nothing to the universe but kissing and comedy–but by God that’s plenty.” I wrote it on the Overland Train. The drums are a single playful take by Angus Diggs, recorded in Dave Johnson’s bedroom with my trusty pair of ribbon mics, and the song was built up from there.
  8. Now That We’ve Kissed was co-written with Ivy Penny and is about being kissed by famous people (which I haven’t) and the implications of kisses in general. The things that “come from a kiss” were literally phoned in by friends.
  9. Rogue State was written in 2007, just prior to the Australian federal election and the Bali Climate Change Conference. It reflects on Australia’s sabotage of progress on climate change at the Kyoto conference in 1997, as documented in books such as Guy Pearse’s “High & Dry: John Howard, Climate Change and the Selling of Australia’s Future” and Clive Hamilton’s “Scorcher”. I had no intention of putting this old song on the album, until the last minute when I decided it was still sadly relevant given so many politicians still show a lack of respect and understanding of science and the planet that supports us. The recording of the song was also an excuse to feature a bit of Peter Grayling cello magic.
  10. Montreal was the first song I wrote on ukulele, though I ended up recording it with guitar. Sian Brown, who helped greatly with recording my vocals for the album, makes a harmony cameo at the end of the song. As for the lyrics, it’s a fairly obvious bittersweet love song.
  11. I Stood You Up is my account of attending a fantastic music camp called Rhythm Song, and kicking myself for not following through with a potential jam with songwriting legend Kristina Olsen. One of her pieces of advice to performers is to make their audience laugh to balance out the sadder songs in a set. The song was written in a mad rush two hours before a Song Club when I thought “What music can I write quickly?… Well I don’t have a blues song yet!”. This version was largely recorded prior to The Kiss List taking shape, so it features multiple guest musicians who are listed in the liner notes.
  12. Measuring the Clouds I wrote for my Dad’s birthday a few years ago. He used to be a Weather Observer in the 60s, sending up the big balloons etc. from many locations around WA such as Cocos Island. He had a beautiful eccentric sense of humour and would answer the phone with “Charlie’s Chook House and Chicken Factory, Chief Chook speaking”. The musical challenge I set myself with this song was to use a five bar pattern in the verse. A cello part was recorded, but was dropped in the mixing when I decided it made the song feel too heavy and I wanted it to feel light and airy.

Share


New design for /TR

Published 22 Mar 2018 by Philippe le Hegaret in W3C Blog.

Eager to better represent what the Web is, we are happy to introduce a redesigned version of /TR, which lists All Standards and Drafts.

It is responsive and works fine on mobile devices. Compared to the previous design, this version is a single view that uses search and filters, and displays indicators of maturity level for each document. The default view shows the latest version of upcoming work and W3C Recommendations.

Users can now search based on title, tag, maturity level and version (latest, upcoming, editor’s draft, aka nightly.) Read more about the Search Criteria.

A number of improvements are in the pipe, as well as a number of smaller enhancements and tweaks (notably, going over the list of more than a thousand specifications to ensure the right tags are defined.)

We welcome feedback and are looking forward to it.


gitgraph.js and codepen.io for git visualization

Published 22 Mar 2018 by addshore in Addshore.

I was looking for a new tool for easily visualizing git branches and workflows to try and visually show how Gerrit works (in terms of git basics) to clear up some confusions. I spent a short while reading stackoverflow, although most of the suggestions weren’t really any good as I didn’t want to visualize a real repository, but a fake set of hypothetical branches and commits.

I was suggested Graphviz by a friend, and quickly found webgraphviz.com which was going in the right direction, but this would require me to learn how to write DOT graph files.

Eventually I found gitgraph.js, which is a small JavaScript library for visualizing branching ‘things’, such as git, well, mainly git, hence the name and produce graphics such as the one below.

In order to rapidly prototype with gitgraph I setup a blueprint codepen.io pen with the following HTML …

<html>
  <head>
    <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/gitgraph.js/1.11.4/gitgraph.css" />
    <script src="https://cdnjs.cloudflare.com/ajax/libs/gitgraph.js/1.11.4/gitgraph.min.js"></script>
  </head>
  <body><canvas id="graph"></canvas></body>
</html>

… and following JS …

var graph = new GitGraph({
  template: "metro", // or blackarrow
  orientation: "vertical",
  elementId: 'graph',
  mode: "extended", // or compact if you don't want the messages  
});

var master = graph.branch("master");
master.commit( { message: "Initial Commit" });

… to render the rather simple single commit branch below …

Styling can be adjusted passing a template into the GitGraph object …

var myTemplateConfig = {
    colors: ["#008fb5", "#979797", "#f1c109", "#33cc33"],
          branch: {
            lineWidth: 3,
            spacingX: 30,
            labelRotation: 0
          },
  commit: {
        spacingY: 40,
        dot: {
          size: 10
        },
        message: {
          displayAuthor: false,
          displayBranch: true,
          displayHash: true,
          font: "normal 14pt Arial"
        }
    }
    
};
var myTemplate = new GitGraph.Template( myTemplateConfig );

var graph = new GitGraph({
  template: "metro", // or blackarrow
  orientation: "vertical",
  elementId: 'graph',
  mode: "extended", // or compact if you don't want the messages  
  template: myTemplate
});

… which would render …

The blueprint codepen for this style can be found at https://codepen.io/addshore/pen/xWdZXQ.

With this blueprint setup I now have a starting point for further visualizations using gitgraph and codepen comparing Gerrit and Github, for example below comparing a merged pull request consisting of two commits, the second of which contains fixes for the first, vs a single change in gerrit, that has 2 seperate versions.

Keep an eye out on this blog for any more progress I make with this.

The post gitgraph.js and codepen.io for git visualization appeared first on Addshore.


DigitalOcean Currents: March 2018

Published 22 Mar 2018 by Ryan Quinn in The DigitalOcean Blog.

DigitalOcean Currents: March 2018

Currents is back with our third report on the developer experience. This February we asked 5,993 participants about their thoughts on hot topics like artificial intelligence and machine learning, new ways of working with codebases and services like continuous integration and delivery, and important issues like the European Union’s General Data Protection Regulation (GDPR) and the FCC’s decision on net neutrality.

Among our findings this quarter:

If you work in a larger organization, chances are you are using CI/CD

DigitalOcean Currents: March 2018

While only 45% of developers in organizations with five employees or less are using continuous integration, and only 35% are using continuous delivery (CD), developers report the likelihood of using these technologies increases with the size of the organization. This is somewhat intuitive as many of the benefits of these methods provide ways for groups of developers to work together. In large organizations with over 1,000 employees, 68% of developers report using continuous integration and 52% are using continuous delivery.

Developers strongly disagree with the US FCC’s recent decision on net neutrality

DigitalOcean Currents: March 2018

Worldwide, the developers we surveyed voiced a strong opinion against the repeal of net neutrality in the US by the FCC. Among those in the United States this opinion was even more pronounced with 83% of developers against the decision and only 3.6% in favor of the change.

Adoption of the GDPR in Europe has many developers working to ensure compliance

DigitalOcean Currents: March 2018

Thirty-seven percent of the developers we surveyed reported that their teams were currently working to prepare for the GDPR. Unsurprisingly, developers in European countries are leading in this regard, with 58% of respondents in the Netherlands, 62% in Belgium, and 68% in Sweden stating their teams were actively working to ensure GDPR compliance. The United Kingdom saw the most engagement at 70%.


DigitalOcean Currents is published quarterly, highlighting the latest trends among developers.

If you would like to be among the first to receive Currents each quarter, sign up here. You’ll receive the latest report once it is released, share your ideas on what topics we should cover, and participate in our next survey.

Read more about these and other findings in the full report. Download the full Currents report here.


ConfirmAccount Extension Fix Failed

Published 21 Mar 2018 by johny why in Newest questions tagged mediawiki - Stack Overflow.

I attempted a fix on ConfirmAccount extension, but my fix did not seem to work. Any suggestions?

Details:

"pruning of old requests will not trigger often, so old rejected requests may persist." https://www.mediawiki.org/wiki/Extension:ConfirmAccount#Known_issues

This behavior will prevent rejected emails from requesting an account again. After Admin rejected an account request, the same username/email could not submit another request. Error on 2nd attempt:

Username is already in use in a pending account request.

We want to enable re-requests. To fix, i want to force prune after every rejection, to clear the request cache.

It appears that, currently, pruning occurs in file \extensions\ConfirmAccount\frontend\specialpages\actions\ConfirmAccount_body.php

# Every 30th view, prune old deleted items
if ( 0 == mt_rand( 0, 29 ) ) {
ConfirmAccount::runAutoMaintenance();
}

Therefor, the function runAutoMaintenance appears to be the pruning function. runAutoMaintenance lives in \ConfirmAccount\backend\ConfirmAccount.class.php

class ConfirmAccount {
/** * Move old stale requests to rejected list. Delete old rejected requests. */
public static function runAutoMaintenance() {
...

In order to call runAutoMaintenance after every reject-action, I think the call to runAutoMaintenance should be placed in function rejectRequest, in file \extensions\ConfirmAccount\business\AccountConfirmSubmission.php

Specifically, i think it can go directly under:

# Clear cache for notice of how many account requests there are
ConfirmAccount::clearAccountRequestCountCache();

Maybe pruning should also happen after Accept, Hold, and Spam actions. Unsure. For now, pruning after Reject should handle the original problem.

I attempted the above fix, and it did not seem to work. I'm at a loss.

Can someone help determine why this fix did not work?

Original code:

protected function rejectRequest( IContextSource $context ) {
....
# Clear cache for notice of how many account requests there are
ConfirmAccount::clearAccountRequestCountCache();
....

New code:

protected function rejectRequest( IContextSource $context ) {
....
# Clear cache for notice of how many account requests there are
ConfirmAccount::clearAccountRequestCountCache();
# Prune
ConfirmAccount::runAutoMaintenance();
....

On 2nd request, still getting "Username is already in use in a pending account request."


How do I check how much memory a Mediawiki instance has available to it?

Published 21 Mar 2018 by user1258361 in Newest questions tagged mediawiki - Server Fault.

Before anyone posts something about checking php.ini, bear in mind there are all sorts of ways it could be overridden. Where's the admin page or panel that lists the amount of RAM available to mediawiki?

(Due diligence: Searches turned up nothing. Proof in links below)

https://www.google.com/search?q=mediawiki+admin+panel&ie=utf-8&oe=utf-8&client=firefox-b-1 only relevant link is https://www.mediawiki.org/wiki/Manual:System_administration which contains nothing about memory or RAM

https://www.google.com/search?q=mediawiki+admin+UI+how+much+memory+is+allocated&ie=utf-8&oe=utf-8&client=firefox-b-1 again nothing

https://www.google.com/search?q=mediawiki+how+to+check+how+much+memory+is+allocated&ie=utf-8&oe=utf-8&client=firefox-b-1 again, nothing. First link suggests increasing amount of RAM but that isn't useful if my php.ini is being ignored for unknown reasons


Applying FlaggedRevs to individual MediaWiki articles

Published 21 Mar 2018 by Kevin Dufendach in Newest questions tagged mediawiki - Stack Overflow.

We have FlaggedRevs working for a MediaWiki at work, and we have successfully enabled it only for a specific namespace, but we would also like the option of specifying individual pages for review even if they are not in that namespace. Wikipedia has done this on several pages (e.g. https://en.wikipedia.org/wiki/Linux), referred to as "Pending changes protection," but it isn't clear to me how they tag those pages as pending changes pages. There's a "{{pp-pc1}}" (https://en.wikipedia.org/wiki/Template:Pp-pc1) template included, but I can't tell if that's what does it or if there's something else that's going on in the background.

I thought setting $wgFlaggedRevsProtection = true in LocalSettings.php might give me that option (per https://www.mediawiki.org/wiki/Help:Pending_changes), but I still don't see an option.

The extension "Approved_Revs" is an alternative extension that allows pages (or templates) to be tagged with the magic word __APPROVEDREVS__. I'm considering switching to this extension just for this feature. Does anyone know how to do this appropriately with FlaggedRevs?

Thanks!


jquery tags manager typehead not loading every time

Published 21 Mar 2018 by vijaybir singh in Newest questions tagged mediawiki - Stack Overflow.

I have implemented jquery-tag-manager in media wiki as below.

library loading in head section

<script src="/skins/Vector/fancybox/jquery-1.10.1.min.js"></script>
<link rel="stylesheet" type="text/css" href="/Vector/custom_css/tagmanager.min.css">
<script type="text/javascript" src="/Vector/custom_css/tagmanager.min.js"></script>
<script type="text/javascript" src="/Vector/custom_css/bootstrap3-typeahead.min.js"></script>

Text filed to implement tags manager

<input type="text" autocomplete="off" id="relatedArticlesText" name="tags" placeholder="Articles" class="form-control typeahead tm-input tm-input-info"/>

Script implementation as below.

<script type="text/javascript">
$(document).ready(function(event) {
            var tagApi = $(".tm-input").tagsManager({
                    delimiters: [44], prefilled:["abc","xyz"]
            });
            jQuery(".typeahead").typeahead({
                name: \'tags\',
                displayKey: \'name\',
                minLength: 3,
                source: function (query, process) {
                  return mediaWiki.loader.using(\'mediawiki.api\', function() {
                            (new mediaWiki.Api()).get({
                                    action: \'relatedArticles\',
                                    format: \'json\',
                                    query: query
                            }).done(function(data) {
                                return process(data[0]);
                            });
                    });
                },
              afterSelect :function (item){
                  tagApi.tagsManager("pushTag", item);

              }
            });
});
</script>

the above code worked fine few times but some times throws below error

enter image description here


Who's a senior developer?

Published 21 Mar 2018 by in Posts on The bugalore.

Something at work today prompted me to get thinking about what people generally mean when they say they are/someone is a senior developer. There are some things which are a given - long-term technical experience, fairly good knowledge of complex languages and codebases, past experience working on products and so on. But in my opinion, there are a fair number of things which we don’t really talk about but are important skills a “senior” developer must possess to actually deserve that title.

Spike in Adam Conover Wikipedia page views | WikiWhat Epsiode 4

Published 21 Mar 2018 by addshore in Addshore.

This post relates to the WikiWhat Youtube video entitled “Adam Conover Does Not Like Fact Checking | WikiWhat Epsiode 4” by channel Cntrl+Alt+Delete. It would appear that the video went slightly viral over the past few days, so let’s take a quick look at the impact that had on the Wikipedia page view for Adam’s article.

The video was published back in January, and although the viewing metrics are behind closed doors this video has had a lot of activity in the past 5 days (judging by the comments).

It is currently the top viewed video in the WikiWhat series at 198,000 views where the other 3 videos (John Bradley, Kate Upton & Lawrence Gillard Jr.) only have 6000 views between them.

The sharp increase in video views translates rather well into Wikipedia page view for the Adam Conover article.

Generate at https://tools.wmflabs.org/pageviews/?project=en.wikipedia.org&platform=all-access&agent=user&start=2018-02-28&end=2018-03-20&pages=Adam_Conover|Talk:Adam_Conover|User:Adam_Conover|User_talk:Adam_Conover

Interestingly this doesn’t just show a page view increase for the article, but also the talk page and Adam Conover’s user pages, all of which are shown in the video.

It’s a shame that 200,000 youtube views only translates to roughly 15,000 views on Wikipedia, but, still interesting to see the effect videos such as this can have for the visibility of the site.

You can watch the page views for an Wikipedia page using the Page views tool.

The post Spike in Adam Conover Wikipedia page views | WikiWhat Epsiode 4 appeared first on Addshore.


Introducing Dashboard: View Your Infrastructure At A Glance

Published 21 Mar 2018 by Josh Viney in The DigitalOcean Blog.

Introducing Dashboard: View Your Infrastructure At A Glance

Simplifying the developer experience in the cloud has been a priority for DigitalOcean since we launched Droplets in 2013. As our product capabilities grow, we're taking great care to ensure that using DigitalOcean to run your applications remains as easy and intuitive as possible.

Today, we’re announcing the Control Panel Dashboard, the first of many Control Panel updates planned for 2018 as part of our mission to make it simple for development teams to operate and scale production applications in the cloud.

Introducing The Dashboard

Every day as we talk to developers, read feedback from the community, and witness the amazing applications being launched on our platform, the message that rings the clearest is that everyone values simplicity and ease of use. Visualizing, understanding, and controlling your cloud infrastructure in a single place is not inherently simple or easy, and it can get significantly more difficult as complexity increases.

The release of the new Dashboard is specifically meant to help you quickly access your existing resources and key account-related information, while highlighting additional products and features we think you’ll find useful when deploying scalable, production-ready infrastructure.

For existing users, the Dashboard replaces the Droplets page as the new default home page of the Control Panel. It provides “at-a-glance” visibility into active resources, like Droplets, Spaces, Load Balancers, Domains, Floating IPs, month-to-date current billing usage, shortcuts to team management, and other common tasks without having to navigate to different, often hard-to-find, sections of the Control Panel.

Introducing Dashboard: View Your Infrastructure At A GlanceA look at the new Control Panel Dashboard.

Additionally, we’ve made changes to the top and bottom navigation to expose more helpful links to our status page, Community tutorials, API docs, and the support portal. All with the goal of surfacing more ways to help keep your applications running smoothly without overloading the UI.

The Dashboard is just the beginning. We have many more updates planned this year, and we can’t do it without your continued feedback. When you log in to take a look, please leave us some feedback using the little megaphone icon in the bottom right corner of the Control Panel. Or get early access to upcoming features by completing this survey.

The new Control Panel Dashboard is available starting today and will roll out to all DigitalOcean users over the course of the week. Stay tuned for more UI updates in the future!


Cambridge Analytica, #DeleteFacebook, and adding EXIF data back to your photos

Published 20 Mar 2018 by addshore in Addshore.

Back in 2016 I wrote a short hacky script for taking HTML from facebook data downloads and adding any data possible back to the image files that also came with the download. I created this as I wanted to grab all of my photos from Facebook and be able to upload them to Google Photos and have Google automatically slot them into the correct place in the timeline. Recent news articles about Cambridge Analytica and harvesting of Facebook data have lead to many people deciding the leave the platform, so I decided to check back with my previous script and see if it still worked, and make it a little easier to use.

Step #1 – Move it to Github

Originally I hadn’t really planned on anyone else using the script, in fact I still don’t really plan on it. But let’s keep code in Github not on aging blog posts.

https://github.com/addshore/facebook-data-image-exif

Step #2 – Docker

The previous version of the script had hard coded paths, and required a user to modify the script, and also download things such as the ExifTool before it would work.

Now the Github repo contains a Dockerfile that can be used that includes the script and all necessary dependencies

If you have Docker installed running the script is now as simple as docker run --rm -it -v //path/to/facebook/export/photos/directory://input facebook-data-image-exif.

Step #3 – Update the script for the new format

As far as I know the format of the facebook data dump downloads is not documented anywhere. The format totally sucks, it would be quite nice to have some JSON included, or anything slightly more structured than HTML.

The new format moved the location of the HTML files for each photos album, but luckily the format of the HTML remained mostly the same (or at least the crappy parsing I created still worked).

The new data download did however do something odd with the image sources. Instead of loading them from the local directory (all of the data you have just downloaded) the srcs would still point to the facebook CDN. Not sure if this was intentional, but it’s rather crappy. I imagine if you delete your whole facebook account these static HTML files will actually stop working. Sounds like someone needs to write a little script for this…

Step #4 – Profit!

Well, no profit, but hopefully some people can make use of this again, especially those currently fleeing facebook.

You can find the “download a copy” of my data link at the bottom of your facebook settings.

I wonder if there are any public figures for the rate of facebook account deactivations and deletions…

The post Cambridge Analytica, #DeleteFacebook, and adding EXIF data back to your photos appeared first on Addshore.


Episode 4: Bernhard Krabina

Published 20 Mar 2018 by Yaron Koren in Between the Brackets: a MediaWiki Podcast.

Bernhard Krabina is a researcher and consultant for KDZ, the Centre for Public Administration Research, a Vienna, Austria-based nonprofit that focuses on improving and modernizing technology-based solutions in government at all levels within Europe. He has been involved with MediaWiki in government for the last 10 years.

Links for some of the topics discussed:


v2.4.7

Published 20 Mar 2018 by fabpot in Tags from Twig.


v1.35.3

Published 20 Mar 2018 by fabpot in Tags from Twig.


How can I save a whole MediaWiki wiki without shell access?

Published 15 Mar 2018 by Slagathor in Newest questions tagged mediawiki - Stack Overflow.

I'd like to save a whole wiki site (not Wikipedia, but a smaller wiki), with all of its text and media. I'm using a Windows system. I'd prefer the simplest solution.


How We Support Remote Employees at DigitalOcean

Published 14 Mar 2018 by Amanda Brazzell in The DigitalOcean Blog.

How We Support Remote Employees at DigitalOcean

Remote culture at DigitalOcean is one of my favorite things to talk about when discussing my job. When I first joined the company in June of 2015, there was already a substantial percentage of existing remote employees (better known as our “remotees”). Working with the remotees wasn’t initially a part of my function, but as a member of the Employee Experience Team, I gradually found myself getting to know many of them more personally. I learned about their experiences as distributed employees, some of their pain points, and how it influences their engagement.

Since I've never been remote, I educated myself on best practices for companies with remote employees and how we could expand our top-notch employee experience to those outside of our HQ.

Two and a half years later, our remotee population totals over 200 employees, making up over 50% of our employees, and our program has grown to support both the needs of our business and those who work remotely. To date, remotees score higher in engagement than any other subgroup at the company. This has been attributed to the attention and effort we have actively given to support the remotee experience.

Here’s what we learned and how we adjusted our efforts to better support the remotee experience:

Remote Communication

“Watercooler talk” is an important aspect of working in-office, and it’s a practice that companies seeking to become more remote-friendly have trouble replicating. Being able to easily communicate with other colleagues helps improve team bonds and makes people feel part of the company fabric. At DO, we use several different mediums to avoid having remotees excluded from conversation and risking having information fall through the cracks:

Remote-inclusive Programs

While most of our teams at DigitalOcean are comprised of both in-office and remote employees, there is definite value in giving teams the opportunity to get together in person at different times during the year. Here are the processes we have in place to ensure teams get face time:

Perks for Remotees

While some companies see working from home as a perk in and of itself, we recreate many of the in-office perks and make them available to remotees. This is key to building a cohesive company culture and experience, and one where remotees feel engaged with the company at large.

Our remotes are able to participate in our workstation program, where they get access to different monitors, mouse/keyboards, and trackpads for their home offices, as well as credit up to $100 for headphones of their choice. The equivalent of our commuter benefit for in-house employees is providing remotes a credit toward the cost of either their monthly internet bill or their monthly coworking space membership. Additionally, remotes can opt into a monthly subscription snack box (because snacks are awesome!). Finally, DO covers travel and per diem costs, and provides accommodation at our corporate apartments for remotee visits to HQ.

"Love is What Makes Us Great"

DigitalOcean’s employee experience programs strives to be inclusive of all of our employees. We do this by keeping both the needs of in-office and remote employees in mind, and by adjusting our programs as needed to ensure they can change and scale with our growing organization. Removing obstacles to communication between people in our offices and remotes is essential for building cohesion across teams and to help everyone be the most productive employee they can be, no matter where they’re located.

Apply For a Job @ DO


Amanda Brazzell is DigitalOcean’s Office Experience Team Lead. She has helped build an effective Remote Experience program that drives dispersed employee engagement and job satisfaction. Amanda is a California native who moved to NYC without having ever visited the city before, and has been at DO since 2015.


How should optional or empty values be handled in Semantic Mediawiki?

Published 14 Mar 2018 by Topsy in Newest questions tagged mediawiki - Stack Overflow.

I am setting up some templates for a Semantic Mediawiki implementation. Template params are being fed into annotations. However, the values are optional; there's not always going to be a value in every field. This causes trouble with some data types. Specifically, if I have

{{#set:
| Has phone={{{phone}}}
}}

I will get an error of the form URIs of the form *** are not allowed where *** is either {{{phone}}} or whatever default value I try to drop in there. It seems impossible for datatypes like phone or email to be empty. I cannot figure out how to support empty values for these fields in my templates. What is the correct pattern to use for null values in SMW annotations?


How to best add extensions when using official docker image for MediaWiki?

Published 13 Mar 2018 by Streamline in Newest questions tagged mediawiki - Stack Overflow.

We are using the official MediaWiki Docker image and want to be able to add additional MediaWiki extensions.

Questions:

  1. What is the recommended next step here if we are currently using the docker-compose file below were we mount volumes on the host? Is it to build a new image that wraps the official image? Is there an example somewhere of this modified new image for adding a mediawiki extension?
  2. Or can we just mount an extensions volume on the host in the current docker-compose and if needed make any adjustments the LocalSettings.php?

This link on the docker website refers to adding PHP extensions and libraries but its not clear to me if this is attempting to be the same answer if wanting to add MediaWiki specific extensions since it does clearly say "PHP Extensions". Or should this documentation page have actually said "MediaWiki Extensions" even though that implies they are written in PHP?

Here is our current docker-compose file entry for mediawiki:

mediawiki:
  image: mediawiki
  container_name: mediawiki_production
  mem_limit: 4g
  volumes:
    - /var/www/mediawiki/uploads:/var/www/html/uploads
    - /var/www/mediawiki/LocalSettings.php:/var/www/html/LocalSettings.php
  environment:
    - MEDIAWIKI_DB_NAME=
    - MEDIAWIKI_DB_HOST=
    - MEDIAWIKI_DB_USER=
    - MEDIAWIKI_DB_PASSWORD=
    - VIRTUAL_HOST=wiki.exmaple.com
    - TERM=xterm
  restart: always
  network_mode: bridge

The extensions we are considering that are not part of the official image first off are (but would like a scalable solution for more later):

Any examples of an downstream docker image that uses the official mediawiki image as its "FROM" to include a mediawiki extension(s) and an updated docker-compose (if both are required) to be able to add mediawiki extensions would be helpful. Perhaps it may be good to explain what needs to change if the mediawiki extension itself relies on php extensions or libraries that are not already included in base image already vs adding a mediawiki extension that doesn't rely on any additional php extensions or libraries.


Serve MediaWiki Translate subpages from different subdomains with a single installation with Apache

Published 13 Mar 2018 by mapple in Newest questions tagged mediawiki - Stack Overflow.

I'm running a mediawiki installation with a "nice" URL htaccess to show https://example.com/Wiki/Page

I'd like to enable localization and translation on subdomains.

I'm thinking https://fr.example.com/Wiki/Page displays content from https://example.com/Wiki/Page/fr, and https://fr.example.com/Wiki/Page2 displays content from https://example.com/Wiki/Page2/fr

I think that should be achievable with .htaccess, but I'm not a mediawiki expert. Is there anyone out there that can help with enabling mediawiki translations on a subdomain like the above? I think it's just an .htaccess question, but not sure what it might break with mediawiki too :)


Facebook Inc. starts cannibalizing Facebook

Published 13 Mar 2018 by Carlos Fenollosa in Carlos Fenollosa — Blog.

Xataka is probably the biggest Spanish blogging company. I have always admired them, from my amateur perspective, for their ability to make a business out of writing blogs.

That is why, when they invited me to contribute with an article about the decline of Facebook, I couldn't refuse. Here it is.

Facebook se estanca, pero Zuckerberg tiene un plan: el porqué de las adquisiciones millonarias de WhatsApp e Instagram, or Facebook is stagnating, but Zuckerberg has a plan: the reason behind the billion dollar acquisitions of WhatsApp and Instagram.

Tags: facebook, internet, mobile

Comments? Tweet  


Vaporous perfection

Published 12 Mar 2018 by jenimcmillan in Jeni McMillan.

DSC_0405

Clouds, so impermanent, advise her that reality is a mere dream. The illusion of solidity in their shape and comforting forms is exactly that, illusion, disappearing as temperature changes, wind blows or night extinguishes day. Why would a cloud be other than this? I marvel at such simplicity. I will endeavour to leave clouds to their journey, not fall in love with them in any other way than to share their pleasure of being vaporous perfection.


Publishing WG Telco, 2018-03-12: F2F Planning, Issue Tracking

Published 12 Mar 2018 by Tzviya Siegman in W3C Blog.

See minutes online for a more detailed record of the discussions.

F2F Planning

We are planning to have several tasks accomplished in advance of our Spring F2F in Toronto. The Affordances TF plans to review open affordances issues, compile technical requirements, and begin work on a draft before the meeting. The WAM TF is planning on focusing on direct requests to be made of groups outside of PWG.

Issue Closing

The group worked through a backlog of issues on GitHub, planning to close many issues that have been addressed by the current draft or newer issues.


How do I track down the source definition of a custom hook event in a Mediawiki extension?

Published 12 Mar 2018 by user1258361 in Newest questions tagged mediawiki - Stack Overflow.

Here's an example:

https://phabricator.wikimedia.org/diffusion/EPFM/browse/master/?grep=BeforeFreeTextSubst

A Mediawiki extension where Hooks::run( 'PageForms::BeforeFreeTextSubst', ...) gets invoked but there's no other record or trace of where it's defined. If there was some mapping of strings/names to functions it would be registered somewhere else, and if it was a function name it should show up somewhere else.

I'm seeing this with a few other function hook events.


Celebrating the anniversary of the Web

Published 12 Mar 2018 by Coralie Mercier in W3C Blog.

graphic with the text: Join W3C in celebrating the 29th anniversary of the World Wide Web

On 12 March we celebrate an event which has both historically changed the world and is key to creating and empowering its future.

In March 1989,  while at CERN, Tim Berners-Lee wrote his memo “Information Management: A Proposal” which outlined the World Wide Web.

Today we celebrate a Web that is:

W3C CEO Jeff Jaffe noted: “There are very few innovations that have truly changed everything. The Web is the most impactful innovation of our time. Tim created the W3C so that all stakeholders in the Web can collectively be stewards of the Web. We are focused on the future of the Web and, with our Members and the public, are building a Web which is more secure; more responsive to user needs; more powerful for business needs; and more universally accessible.”

In a Web Foundation letter on the anniversary Tim states: “I remain committed to making sure the web is a free, open, creative space — for everyone. That vision is only possible if we get everyone online, and make sure the web works for people.” Tim continues: “Today, I want to challenge us all to have greater ambitions for the web. I want the web to reflect our hopes and fulfill our dreams, rather than magnify our fears and deepen our divisions.

Tim also notes: “As the late internet activist John Perry Barlow, once said: “a good way to invent the future is to predict it”. It may sound utopian, it may sound impossible to achieve after the setbacks of the last two years, but I want us to imagine that future and build it.”  This is inspiring, indeed; both generally but also for the W3C and the Web.

W3C is grateful to its Director and founder Tim Berners-Lee for his incredibly powerful and innovative invention. Today we celebrate our history with the Web as well as its future as, together with our members, editors and contributors, we are helping to shape what it will become.

If you feel so inclined, join us in celebrating by sharing in comment your Web success story, or write your own blog post, or tweet about #HappyBirthdayWWW.

Happy 29th Web anniversary!

 


Budapest Blues

Published 11 Mar 2018 by jenimcmillan in Jeni McMillan.

budapest

It’s Sunday and I’m in the most beautiful city in the world.

Cigarette butts crushed into broken tiles.

At my feet is another death, in the street,

Broken buildings and hollow dreams.

I’m in her arms like a stillborn child.

Feeling nothing, it seems,

But old.


Introducing Community DO-ers: March Edition

Published 7 Mar 2018 by Andrew Starr-Bochicchio in The DigitalOcean Blog.

Introducing Community DO-ers: March Edition

Here at DigitalOcean one of our core values is "our community is bigger than just us". From our support of the broader open source community to making our tutorials as platform agnostic as possible, we believe that contributing knowledge and resources to the community benefits not just ourselves but all members – past, present, and future.

We never could have anticipated the amazing amount of support we've received in return. You’ve built open source tools using our API, hosted Meetups across the globe, shared your DigitalOcean stories, and so much more. We wouldn’t be where we are today without you.

We're now six years into this journey and want to start recognizing our members more regularly. So today we are excited to highlight some of our most active Community contributors—our DO-ers!

Marko Mudrinić (@xmudrii)

It’s hard to overstate just how lucky we are to have people like Marko in our Community; he’s an all around rockstar whose contributions span from ocean to ocean. Marko is one of the most prolific users on our Community Q&A platform, where he helps users learn about and build on DigitalOcean. He’s written tutorials on topics like Prometheus and Go, but also puts that knowledge into practice. He is the most active contributor to doctl, our open source command line interface, and has worked extensively on DigitalOcean support in Kubicorn to help users get up and running with Kubernetes.

Mateusz Papiernik (@maticomp)

Mateusz's passion for giving back to the Community inspires us. He has been sharing his technical expertise with us for many years, which you can enjoy in the dozens of tutorials he has published on topics from ProxySQL to Nginx optimization. With even more in the works, he has already helped hundreds of thousands of readers. His genuine enthusiasm and drive to aid others shines through in his writing and his collaboration with our editorial team.

Peter Hsu (@peterdavehello)

Peter is an open source enthusiast who is always going above and beyond. He has traveled across Taiwan to share DigitalOcean with his community—from COSCUP in Taipei to MOPCON in Kaohsiung. As the maintainer of the CDNJS (a free, public, and open-source CDN service), he helps to power millions of websites across the globe. Closer to home, he is an organizer of the DigitalOcean Meetup group in Hsinchu, Taiwan, which is quickly approaching 600 members. With nine events in 2017—including the first Hacktoberfest event of the year—it’s one of our most active Meetups!

Marko, Mateusz, and Peter exemplify some of the best qualities found in our community. All three share our enthusiasm for open source and passion for knowledge-sharing. But they’re not alone! We look forward to recognizing more of our amazing Community members in the coming months.

Are you interested in getting more involved in the DigitalOcean Community? Here are a few places to start:


cardiParty 2018-03-Perth - Museum of Perth

Published 7 Mar 2018 by Andrew Kelly in newCardigan.

Reece Harley, Executive Director and founder, will give an introductory talk about Museum of Perth, giving background info about the museum, and the current exhibition. 6pm, Friday 16 March.

Find out more...


Episode 3: Mike Cariaso

Published 6 Mar 2018 by Yaron Koren in Between the Brackets: a MediaWiki Podcast.

Mike Cariaso is the co-founder of SNPedia, a MediaWiki-based repository of genomic information (founded in 2006), and the creator of Promethease, personal genetic analysis software that uses SNPedia's data.

Links for some of the topics discussed:


Digital Publishing Summit in Berlin, 16th and 17th of May

Published 6 Mar 2018 by Ivan Herman in W3C Blog.

The Digital Publishing Summit Europe will be held in Berlin on the 16th and 17th of May, 2018. The event is organized by EDRLab, Paris, with the support of W3C as part of the Publishing@W3C effort. This event aims at strengthening a true spirit of cooperation in the publishing industry worldwide, pushing for a massive adoption of open standards and software for reaching true interoperability on the market, and promoting a high level of accessibility for ebooks and other types of publications.

What is particularly interesting in this event is that it mixes market discussions with technical presentations—but never too technical. The organizers expect that up to 200 decision-makers and technical experts from the publishing industry will participate in the event.

You can check the program on the event’s website (some evolution is still possible). The European publishing industry will be obviously present, with, e.g., Martijn Roelandse, Head of Publishing Innovation at Springer Nature as keynote speaker; Hermann Eckel, COO of tolino media; Michaela Philipzen, Head of production at Ullstein; Virginie Clayssen, Chief Innovation officer at Editis; or Maureen Pennock, Head of Digital Publishing at the British Library. Solution providers working in the realm of EPUB and Web Publications will be there also, ranging from open-source providers like the Coko Foundation and Readium to large companies like Microsoft and Google; they will present diverse developments relative to Publishing and the Web. Scholarly publishing, archival of publications, digitization of out-of-commerce books, accessibility of ebooks, European strategy for improving innovation and interoperability: the topics of discussion will be wide during these one and half days.

Detailed information and admission tickets are available on the Website of the summit. W3C members benefit from a 30€ discount for the event, and an additional early bird discount of 50€ is available until March 11th.


cardiCast episode 29 – Adam Trainer

Published 5 Mar 2018 by Justine in newCardigan.

Perth February 2018 cardiParty

Recorded live

Adam Trainer, curator of the exhibition Alternative Frequencies: 40 Years of RTRFM which celebrates four decades of the state’s longest running FM community radio station, discusses the history of the station, the curation process and the exhibition’s relationship to SLWA’s broader local music collections.

newcardigan.org
glamblogs.newcardigan.org

Music by Professor Kliq ‘Work at night’ Movements EP.
Sourced from Free Music Archive under a Creative Commons licence.


Self-hosted websites are doomed to die

Published 3 Mar 2018 by Sam Wilson in Sam's notebook.

I keep wanting to be able to recommend the ‘best’ way for people (who don’t like command lines) to get research stuff online. Is it Flickr, Zenodo, Internet Archive, Wikimedia, and Github? Or is it a shared hosting account on Dreamhost, running MediaWiki, WordPress, and Piwigo? I’d rather the latter! Is it really that hard to set up your own website? (I don’t think so, but I probably can’t see what I can’t see.)

Anyway, even if running your own website, one should still be putting stuff on Wikimedia projects. And even if not using it for everything, Flickr is a good place for photos (in Australia) because you can add them to the Australia in Pictures group and they’ll turn up in searches on Trove. The Internet Archive, even if not a primary and cited place for research materials, is a great place to upload wikis’ public page dumps. So it really seems that the remaining trouble with self-hosting websites is that they’re fragile and subject to complete loss if you abandon them (i.e. stop paying the bills).

My current mitigation to my own sites’ reliance on me is to create annual dumps in multiple formats, including uploading public stuff to IA, and printing some things, and burning all to Blu-ray discs that get stored in polypropylene sleeves in the dark in places I can forget to throw them out. (Of course, I deal in tiny amounts of data, and no video.)

What was it Robert Graves said in I, Claudius about the best way to ensure the survival of a document being to just leave it sitting on ones desk and not try at all to do anything special — because it’s all perfectly random anyway as to what persists, and we can not influence the universe in any meaningful way?


v2.4.6

Published 3 Mar 2018 by fabpot in Tags from Twig.


v1.35.2

Published 3 Mar 2018 by fabpot in Tags from Twig.


Untitled

Published 2 Mar 2018 by Sam Wilson in Sam's notebook.

I think I am learning to love paperbacks. (Am hiding in New Editions this morning.)


v2.4.5

Published 2 Mar 2018 by fabpot in Tags from Twig.


Untitled

Published 2 Mar 2018 by Sam Wilson in Sam's notebook.

This seems cool: https://tilde.town/


v1.35.1

Published 2 Mar 2018 by fabpot in Tags from Twig.


The Deep End Podcast Ep. 14: Making Sense of It All with Attentive

Published 1 Mar 2018 by Hollie Haggans in The DigitalOcean Blog.

The Deep End Podcast Ep. 14: Making Sense of It All with Attentive

There’s such a thing as “too much information”, especially for companies scaling out their sales operations. That’s why Attentive was born in 2015: to help sales teams make their increasing pipelines simpler to manage. Indeed, the small, Portugal-based team is itself focused on scaling, having participated in accelerator programs like Techstars.

In this episode, Attentive founder and CTO Pedro Araújo talks about what it takes to build a tech product from the ground up. Discover their approach to running an engineering team, from adopting new open source technologies, to onboarding junior developers and learning about cloud infrastructure.

Subscribe to the The Deep End Podcast on iTunes and Spotify, or listen to the latest episode on SoundCloud below:

Hollie Haggans heads up Global Partnerships for DigitalOcean’s Hatch program. She is passionate about startups and cold brew coffee. Get in touch with questions at hatch@digitalocean.com.


Weather Report

Published 1 Mar 2018 by jenimcmillan in Jeni McMillan.

DSC_0437

It is Minus 11 in Berlin.

Heart rate slow.

Breath freezing.

It’s Minus 12 in Berlin.

Heart is warming.

Breath responding.

I think of the Life, Death, Rebirth cycle.

Again and again and again.

DSC_0429

Thank you Clarissa Pinkola Estés.

 

 

 


GLAM Blog Club March 2018

Published 28 Feb 2018 by Hugh Rundle in newCardigan.

February’s theme ‘watch’ is a wrap! It’s really interesting how many different ways the theme has been approached. If you haven’t already, make sure to check out everyone’s blogs.

Stacey, in DIY with videos, explains how much she has learnt by watching video tutorials via YouTube and Lynda.com, including knitting and using software programs.

Philip’s blog, Something to watch out for; or, Info-feudalism? Not on my watch; or, Tricknology 2.0, argues for a need to watch that “We may be heading for an age of info-feudalism, where we are back to the Dark Ages in terms of having reliable empirical evidence of the wider world, and trust for such mediated information lies with hierarchical structures of authority (in the academic sense, i.e., the ability to make authoritative statements) that replicate the feudal system in form.

Sarah is Watching GLAM activism on Twitter, witnessing the break of neutrality in the GLAM sector. Sarah aims to progress from watching activism, to being involved in activism herself.

Happier Librarian lists three YA lit books with Indigenous themes to watch out for: Sounds That Sound Like Blood; Clancy of the Undertow; and Becoming Kirrali Lewis in Indigenous YA fiction to watch out for….

Clare shares examples of LGBTIQ+ digital storytelling and other digital history-related projects to watch in Queer eye for the librarian ally: Go from LGBTIQ+ collection developed to community development and back again.

In my blog, Private moments, I state that a key aspect of the job of the archivist is to watch private moments in the correspondence, diary entries, works in process, contained within archives.

Andrew is feeling frustrated about the state of things in the GLAM sector, “If you like most only watch, then my dear Angelheaded Hipsters, watch what others do (or don’t).”

Alissa’s second cardi party was at ACCA at the Unfinished Business: Perspectives on art and feminism, in her blog Art // attack, Alissa describes the urge to go from watching to creating. “Despite having the artistic capability of a garden snail I was filled with a strange compulsion to do art. Watching art created by other people suddenly wasn’t enough. I didn’t know what I might do—I had no experience of doing it. I had this incredible need to express myself, artistically. To create, somehow. To be more than words.” I’m really sorry that you had a sub-acute panic episode at the cardi party lunch. Thankful for you sharing that experience, and I’m also thankful that you felt better the next day and visited NGV Triennial. Art is good for your wellbeing, and sharing your feelings is also important. Thanks Alissa.

Michaela’s We are volcanoes, like Sarah’s blog, is about going from watching to acting, to be the change. “One way we can enact change immediately is in who we choose to cite, or give voice to. In our papers, our talks, our blogs. We  also nudge the revolution along every time we buy a book or watch a movie or a play written by someone who isn’t a cisgendered pale male.” 

In On the look out… Clare shares some library trends we should watch out for, so make sure to read her blog to find out more.

Thank you to everyone for your blogs, it’s wonderful to read your ideas, projects, and learn more about what’s going on in GLAM.

Introducing our theme for March, it’s happiness! What makes you happy? Your job? Visiting your favourite library? Getting lost in an exhibition? We look forward to reading your blogs about happiness.

Don’t forget to give your blog post the correct metadata: tag your post ‘GLAM Blog Club‘ within your blogging software and share it on social media using the tag #glamblogclub – don’t get the two mixed up! Using the right terms helps us enormously with these roundup posts. If you have not yet done so, you can also register your blog at glamblogs.newcardigan.org – if you have a Pocket account you can also connect it to the app so you never miss a post.


cardiParty 2018-03-Melbourne - c3 Gallery

Published 28 Feb 2018 by Hugh Rundle in newCardigan.

Katie Paine, c3 Gallery Manager, will give an introductory talk about c3, giving background info about the community gallery space, and she will also explain the current exhibitions. 6:30pm, Friday 9 March.

Find out more...


Conference at UWA – Home 2018

Published 26 Feb 2018 by Tom Wilson in thomas m wilson.

I’ll be presenting a paper at the following conference in July 2018.  It will be looking at the theme of aspirations for home ownership from the perspective of Big History.  Hope to see you there.

Onward and Upward Together

Published 22 Feb 2018 by Ben Uretsky in The DigitalOcean Blog.

As we turn the page on 2017, I’m proud to share that DigitalOcean had another tremendous year of rapid growth and strong profitability, a combination which few tech companies have achieved at our scale. We are rapidly approaching $200M in annual recurring revenue and are looking forward to celebrating our 6th anniversary next month. The key to our success is our disruptive offering — a cloud computing platform that is engineered with simplicity at the core — and our vibrant, growing developer community. We see a substantial and growing market need, and believe that DigitalOcean is perfectly positioned to lead this category in the years ahead.

While we have enjoyed great success since I co-founded the company in 2012, I believe we have barely scratched the surface. I’ve been reflecting on our next phase of growth and what it will take to reach our full potential, and it’s become clear to me that now is the right time to identify my successor as CEO of DigitalOcean.

I recognize where my strengths lie and where others will have more experience to give. With all of the exciting opportunities in front of us, including the possibility of an IPO — a long-term goal we have frequently discussed internally — I feel a new seasoned executive will be best to guide the company through the next chapter of our journey. We have engaged a leading search firm to help us find a great leader. One that will be inspirational, able to scale our operations beyond 1,000 people, evolve our go-to-market strategy, and help us reach our audacious vision. Someone who can build a global brand that could potentially help us become a publicly-traded company with the simplest cloud platform for developers to run applications of any size.

Once we’ve identified this person, I’ll be taking on a new role as Chairman of the Board, which will allow me to support our company vision and strategy while working closely with the new CEO.

When Moisey, Mitch, Alec, Jeff, and I started the company in 2012, we left our families and friends in New York to join the Techstars program in Colorado. We slept on bunk beds and worked relentlessly pretty much every day until midnight. Finding product-market fit didn’t happen overnight and it took months of iterating and refining our product offering. We had 400 users when we graduated from the Techstars program, and while we knew we had developed something special, trying to raise venture capital at that time was a real uphill battle. We heard many “no’s” from investors along the way, but believed in our long-term vision.

After returning to a small office in New York City, we launched the first SSD virtual machine service with unprecedented price-to-performance on January 15th, 2013. We instantly went from signing up a couple of new users per day to more than 100. I vividly remember sitting at our kitchen table with the co-founding team, having to manually install SSDs into our servers to keep up with the demand. It’s been a humbling journey to say the least, and I could not have imagined the growth, success, and scale we would achieve only five years later. DigitalOcean has accomplished so many incredible things over the years and I know that our product, people, and operations have never been stronger.

Aug 9, 2012 - Mitch, Alec, Moisey, me and Jeff walking on stage for Techstars demo day

We have raised $123M from some of the world’s leading VCs that share our belief that the developer will lead the continuing technology revolution. Today, we have a team of 400-plus employees around the world with growing offices in New York, Cambridge, Mass., and Bangalore. Our user base has grown with us and last year we crossed one million users from almost every country in the world. Over the last few years, our product went from a single offering, Droplet, to a complete cloud platform. We are extremely proud to be one of the largest and fastest-growing cloud providers in the world.

I’ve always said that putting the business first and doing what is right for DigitalOcean is my highest priority. I’m making this decision knowing that DigitalOcean’s best days are still to come. We have never been in a better position to begin this transition. We have a great leadership team in place, the business has very strong momentum, and we are a clear leader in our industry. I’m confident that our new CEO will be able to rapidly build on this strong foundation.

No matter who our next leader is, one thing that definitely won’t change is our unwavering commitment to delivering the industry’s simplest cloud computing platform, while building one of the world’s largest developer communities. All of the core elements that have contributed to our success — the powerful simplicity of the product, the dedication and talent of the team, and the passionate community of developers that we serve — will remain the same.

I am tremendously excited about DigitalOcean’s future and the milestones ahead. I want to thank everyone who has helped turn our dream and passion into reality. The skills I have learned and friendships I have made while helping to build this company will last me a lifetime, for which I will be forever grateful and I couldn’t be more excited for the journey ahead.

Onward and upward together,
Ben Uretsky


Missing

Published 20 Feb 2018 by jenimcmillan in Jeni McMillan.

Trees

Sometimes I just miss people. I want to hold them in my arms and feel their heart beat. I want to look into their souls. Share stories. Linger in all the delicious ways. This isn’t lust. There are many ways to be in the world. Lust has its place. But the kind of desire I speak of is a love so deep that it may only last a second yet find perfection. The willingness to be absolutely present. This is not a contradiction. The longing is a sweetness, something that poetry holds hands with and prose takes a long walk through aimless streets.


Episode 2: Niklas Laxström

Published 20 Feb 2018 by Yaron Koren in Between the Brackets: a MediaWiki Podcast.

Niklas Laxström is the creator and co-maintainer of translatewiki.net, the site where MediaWiki and most of its extensions (along with other software, like OpenStreetMap) gets translated into hundreds of languages. Niklas also works for the Wikimedia Foundation as part of the Language team, where he helps to develop code related to translation and internationalization, most notably the Translate extension.

Links for some of the topics discussed:


Volunteer Spotlight: David Schokker

Published 20 Feb 2018 by Rebecca Waters in DDD Perth - Medium.

View from the DDD Perth 2017 Speaker and Sponsor party (David Schokker)

Volunteers are the lifeblood of DDD Perth. In order to pull off such a conference, we need volunteers on the ground before, during and after the big day. We simply couldn’t do it without them.

This week, I spent some time chatting with one of the many volunteers of DDD Perth, David Schokker.

battlepanda (@battlepanda_au) | Twitter

David, how did you come across DDD?

I was introduced to DDD by a fellow volunteer. As I have done other large scale events I felt that I could help with DDD and share my experience.

How did you help out on the day?

I was one of the event photographers, responsible for documenting the various interesting things that happen on the day. (Ed: you can check out photos from the day, taken by David and others, over on Flickr)

DDD Perth 2017

What was the most memorable part of your volunteering experience?

The overwhelming amount of appreciation not only myself but for the entire volunteer team. The personal gratitude is why I do events like this.

Would you recommend volunteering at DDD? Why?

Of course, the team is wonderful and diverse.
Not only do you get to help make an event such as DDD happen, but you also get a chance to mingle with some of the best people in their fields of expertise.

Did you meet and mingle with anyone that was particularly awesome?

Yeah meeting Kris (@web_goddess) was amazing, I got to hang out with her before her preso so it was unique to meet a new friend before seeing what they excel in. It really opened my eyes to how strong of a person she is and what great things she does with the community.

Will you be volunteering in 2018?

If you guys want me, of course!

David, I promise, we want you.


Volunteer Spotlight: David Schokker was originally published in DDD Perth on Medium, where people are continuing the conversation by highlighting and responding to this story.


Meet the DigitalOcean Brand Design Team

Published 20 Feb 2018 by Stephanie Morillo in The DigitalOcean Blog.

Meet the DigitalOcean Brand Design Team

As a company, we’ve always cared about contributing to developer culture in an authentic way, and one of the ways we do that is by adding moments of visual delight to everything we do, whether it's a Community tutorial, an interaction in the control panel, or a T-shirt at a conference. That is why, from the very beginning, DigitalOcean put an emphasis on building out a Brand Design team comprised of not just proficient graphic designers, but brilliant illustrators as well.

The Brand Designers at DigitalOcean are challenged every single day to transform extremely technical and esoteric content into approachable and friendly touch points. Lead Visual Designer Masami Kubo says, “We believe these technologies should be accessible to everyone, and a part of that is acknowledging and celebrating the diverse and quirky personality behind the humans that build these amazing things. Visuals and branding throughout the cloud computing industry are often disregarded or unconsidered, so it’s a unique opportunity for us as designers to bring that culture to life.”

We interviewed DO’s Brand (Visual) Designers Kasia Bojanowska, Masami Kubo, Pat Raubo, and Alex Mostov to learn more about their design process, how they illustrate technical concepts, and where they turn to for inspiration.

How do you approach technical topics as illustrators?

Masami: We’ve been illustrating technical topics for years, so the challenge now is how to keep it fresh and relevant. However, if we push the imagery too conceptual or meta, we run the risk of none of it making any sense to our audience. My approach now is to identify the primary action or message behind complex concepts, and focus on making that one thing really clear. I like to start minimal, then add elements sparingly to not distract from the primary message.

Alex: I came to the DigitalOcean team without much technical knowledge. In some ways I think this has actually been an advantage in creating conceptual illustrations. I create images that help me understand the concepts. I think and hope that inherently makes them more intuitive to others, too.

Where do you draw inspiration from for your designs?

Kasia: When starting a new project I definitely try to spend a good chunk of time looking for inspirations. Google image search, Pinterest, Dribbble, Behance are all wonderful resources for that. We have a few shared pinterest boards with stuff we like. I also get really inspired when I see great work being made by others on our team.

Pat: One of the benefits of working with a team of such enormously talented designers is that I draw inspiration from them and their work all the time. Masami and Kasia both do amazing work, and I’ve learned a great deal from both of them, as well as from Alex. I try to seek out inspiration from a number of things. Some have a pretty clear association with the kind of work we do at DO, like design and illustration done specifically for tech, but I also draw from editorial illustration, film, comics, and book covers, among other sources.

Meet the DigitalOcean Brand Design TeamIllustrations by Kasia Bojanowska, Patricia Raubo, & Alex Mostov

How do you come up with new ideas for similar technical topics?

Masami: I think it actually helps for imagery with similar technical topics to have a common thread of imagery, so as to build a visual association. We have strict style guides for most of our platforms and campaigns, but some of these style guides allow for permutation in aesthetics to avoid looking too repetitive over time.

Pat: I like to first do some research to understand the basic concept of what I’m going to illustrate, and then add to my notes with simple schematics and/or sketches to see if there’s anything I can pull from those for the final visuals.

Alex: I will often try to think about representing a topic in a different kind of space or world. For examples if I create an image for a topic in a 2D space, the next time I will try to figure out how I could represent that same concept in a 3D space or from a different perspective.

What is one of your favorite projects you’ve worked on at DO thus far?

Pat: I worked on a series of illustrations for our Employee Handbook, which meant drawing a team of cute sea creatures in an office setting. I really enjoyed working on that project, and it was great to see people respond to the illustrations in such a positive way.

Masami: My favorite projects are often also the most challenging ones. And usually the more ambitious they are, the more compromises on vision I’ve had to make. But some of the most exciting stuff I’ve worked on here is the art direction and design of our office spaces, in collaboration with architects, fabricators, and our People team. I was expected to transform the space into a branded and navigable experience. It’s still a work in progress, but I love the challenge of designing for physical spaces.

Meet the DigitalOcean Brand Design TeamMurals by Alex Mostov & Masami Kubo

What was one of the most challenging projects you’ve worked on at DO?

Kasia: Redesigning the DO logo was definitely the biggest challenge for me. The process was pretty high pressure but I was allowed enough time to really let myself explore and dig in deep. In this case having a supportive team to brainstorm and keep motivation high through all of the iterations was essential.

Masami: We did a design refresh of the marketing site a year ago, and it went through a lot of changes and push backs. The task was simple—refresh the designs and clean up the performance—but it involved approval from every department and stakeholder in the company. I was doing everything from art direction, web design layouts, and spot illustration. I learned a ton about project management and designing within web accessibility standards, thanks to Una Kravets. I felt creatively drained after the project was finished, and didn’t think it would be possible to revisit it with new ideas. Surprisingly, I am now leading a complete design overhaul for the marketing site, and I feel more equipped than ever to tackle all the challenges and make something more beautiful and smart than last year.

Meet the DigitalOcean Brand Design Team

Sometimes you create visual assets that are targeted at a very specific audience, and you have to balance things like humor with cultural sensitivities. How does localization factor into your designs?

Masami: Part of our job is being aware and sensitive to any imagery that might have harmful or negative impacts to our community. We are fortunate to have a diverse employee base that cares about these things, so the more opinions we can gather, the better. We try to treat branding the same in any other countries as we do here. However, we do want to highlight our growing global coverage, so one way we approach this is to celebrate the unique design culture local to these countries. For example, the Frankfurt datacenter launch campaign featured designs inspired by Bauhaus Constructivist design. For the Bangalore datacenter launch, we created stylized renditions of local architecture. Being a developer from another country doesn’t necessarily mean you have vastly different tastes or interests, so it’s important for companies and designers to address these things authentically.

How do you create different kinds of content while maintaining brand consistency?

Kasia: For illustrations, we keep a consistent color palette. We have a list of prompts to help us throughout the process, but we do not have a very strict style guide when it comes to editorial illustration. We tend to have more fun and variation with all of our community and conference designs. However, we are definitely more strict about stylistic consistency when it comes to our website design.

Like much of DO, the Brand Design team is distributed across the world. What systems or processes do you have in place that allow for open communication and collaboration?
Pat: One of our team members, Kasia, is based in Poland, so we have a time difference of six hours between us. We started to make a habit of doing our daily stand ups and critiques early in the day to make sure we were all able to benefit from them. We have a private Slack channel which we use to stay in contact, to brainstorm, and to share ideas on projects.

Where do you see the DO brand going?

Masami: When I first joined DigitalOcean in 2014, the company was breaking into the cloud computing world by differentiating itself as friendly and accessible. At the time that meant being extra illustrative and bubbly with our designs. We wanted to let the developer community know that their content and culture deserves this kind of attention. That attitude and core value is still what drives every decision, but our aesthetics have matured and evolved just as our products and features have grown. The brand now has a diverse voice ranging from playful and young to mature and sophisticated, all under the same goal of enabling the developer community. I think this range directly reflects the diversity of users we want to speak to.

Alex: I really like DO’s brand evolution because I feel like the changes are made based on need and effectiveness rather than just trying to make a splash. I think the brand will continue to change in this deliberate way as the community and product develop. I also hope it will always maintain the sense of playfulness that I think makes DO special.

What is your best advice for designers just starting out?

Pat: I would encourage aspiring creative folks of any stripe to always stay curious (as cliched as it may sound, it’s advice I’ve followed that I feel has served me well) and seek out inspiration from a range of sources (museums, books, online communities, whatever floats your boat!), because you never know what’s going to be the seed that becomes the root of a fantastic idea. Feeding your mind will give you perspective and enrich your work.

That said, don’t wait around for inspiration to strike, either! It’s best not to be too precious about your work. Just sit down, make the thing, and make it to suit your standards. Then, when you think it’s done, work on it just a little bit more. Keep learning, and push yourself a bit more with each new project.


Do you enjoy our designers' creations? Download desktop wallpapers from some of their favorite illustrations.


How to use toggleToc() in a MediaWiki installation

Published 18 Feb 2018 by lucamauri in Newest questions tagged mediawiki - Webmasters Stack Exchange.

I admin a wiki site running MediaWiki 1.29 and I have a problem collapsing the TOC on pages.

I would be interesting in keeping the Contents box, but loading the page with it collapsed by default.

It appears there is a simple solution here https://www.mediawiki.org/wiki/Manual_talk:Table_of_contents#Improved_Solution, but I fail to implement it and I have no idea where the error is, hopefully someone can help.

I integrated the code as explained and checked that MediaWiki:Common.js is used by the site.

During page rendering, I checked the Java code is loaded and executed, but it appears to fail because

ReferenceError: toggleToc is not defined

I also checked this page https://www.mediawiki.org/wiki/ResourceLoader/Migration_guide_(users)#MediaWiki_1.29 , but in the table there is a empty cell where it should be explained how to migrate toggleToc();. I am not even entirely sure it should be migrated.

Any help on this topic will be appreciated.

Thanks

Luca


How to use mw.site.siteName in Module:Asbox

Published 17 Feb 2018 by Rob Kam in Newest questions tagged mediawiki - Webmasters Stack Exchange.

Exporting Template:Stub from Wikipedia for use on non-WMF wiki, it transcludes Scribunto Module:Asbox which has on line 233:

' is a [[Wikipedia:stub|stub]]. You can help Wikipedia by [',

Substituting Wikipedia with magic word {{SITENAME}} doesn't work here. How to replace Wikipedia for the comparable Lua function mw.site.siteName, so that pages transcluding the stub template shows the local wiki name instead?


cardiCast episode 28 – Clare McKenzie and Kate Byrne – ILN

Published 15 Feb 2018 by Justine in newCardigan.

newCardigan interviews … Clare McKenzie and Kate Byrne.

 

During NLS8 the fabulous Andrew Kelly conducted several interviews. This is the last of the series.

Andrew was lucky enough to chat with Clare McKenzie, Manager, Scholarly Content at the University of Wollongong (UOW) Library and 

Kate Byrne, Research Platforms Product Manager at the software company, Symplectic.

 

Clare and Kate, along with Alyson Dalby, created and ran the International Librarians Network.

With Andrew, they discuss the ILN and learning to say no.

newcardigan.org

glamblogs.newcardigan.org

@newcardigan

Music by Professor Kliq ‘Work at night’ Movements EP.
Sourced from Free Music Archive under a Creative Commons licence.

 

 

 


Feel the love for digital archives!

Published 15 Feb 2018 by Jenny Mitcham in Digital Archiving at the University of York.

Yesterday was Valentine's Day.

I spent most of the day at work thinking about advocacy for digital preservation. I've been pretty quiet this month, beavering away at a document that I hope might help persuade senior management that digital preservation matters. That digital archives are important. That despite their many flaws and problems, we should look after them as best we can.

Yesterday I also read an inspiring blog post by William Kilbride: A foot in the door is worth two on the desk. So many helpful messages around digital preservation advocacy in here but what really stuck with me was this:

"Digital preservation is not about data loss, it’s about coming good on the digital promise. It’s not about the digital dark age, it’s about a better digital future."

Perhaps we should stop focusing on how flawed and fragile and vulnerable digital archives are, but instead celebrate all that is good about them! Let's feel the love for digital archives!

So whilst cycling home (in the rain) I started thinking about Valentine's cards that celebrate digital archives. Then with a glass of bubbly in one hand and a pen in the other I sketched out some ideas.


Let's celebrate that obsolete media that is still in good working
order (against all odds)

Even file migration can be romantic..

A card to celebrate all that is great about Broadcast
WAV format

Everybody loves a well-formed XML file

I couldn't resist creating one for all you PREMIS fans out there



I was also inspired by a Library of Congress blog post by Abbie Grotke that I keep going back to: Dear Husband: I’m So Sorry for Your Data Loss. I've used these fabulous 'data loss' cards several times over the years to help illustrate the point that we need to look after our digital stuff.



I'm happy for you to use these images if you think they might help with your own digital preservation advocacy. An acknowledgement is always appreciated!

I don't think I'll give up my day job just yet though...

Best get back to the more serious advocacy work I have to do today.





Feel the love for digital archives!

Published 15 Feb 2018 by Jenny Mitcham in Digital Archiving at the University of York.

Yesterday was Valentine's Day.

I spent most of the day at work thinking about advocacy for digital preservation. I've been pretty quiet this month, beavering away at a document that I hope might help persuade senior management that digital preservation matters. That digital archives are important. That despite their many flaws and problems, we should look after them as best we can.

Yesterday I also read an inspiring blog post by William Kilbride: A foot in the door is worth two on the desk. So many helpful messages around digital preservation advocacy in here but what really stuck with me was this:

"Digital preservation is not about data loss, it’s about coming good on the digital promise. It’s not about the digital dark age, it’s about a better digital future."

Perhaps we should stop focusing on how flawed and fragile and vulnerable digital archives are, but instead celebrate all that is good about them! Let's feel the love for digital archives!

So whilst cycling home (in the rain) I started thinking about Valentine's cards that celebrate digital archives. Then with a glass of bubbly in one hand and a pen in the other I sketched out some ideas.


Let's celebrate that obsolete media that is still in good working
order (against all odds)

Even file migration can be romantic..

A card to celebrate all that is great about Broadcast
WAV format

Everybody loves a well-formed XML file

I couldn't resist creating one for all you PREMIS fans out there



I was also inspired by a Library of Congress blog post by Abbie Grotke that I keep going back to: Dear Husband: I’m So Sorry for Your Data Loss. I've used these fabulous 'data loss' cards several times over the years to help illustrate the point that we need to look after our digital stuff.



I'm happy for you to use these images if you think they might help with your own digital preservation advocacy. An acknowledgement is always appreciated!

I don't think I'll give up my day job just yet though...

Best get back to the more serious advocacy work I have to do today.





Email is your electronic memory

Published 14 Feb 2018 by Bron Gondwana in FastMail Blog.

From the CEO’s desk.

Sometimes you write planned blog posts, sometimes events in the news are a prompt to re-examine your values. This is one of those second times.

Gmail and AMP

Yesterday, Google announced that Gmail will use AMP to make emails dynamic, up-to-date and actionable. At first that sounds like a great idea. Last week’s news is stale. Last week’s special offer from your favourite shop might not be on sale any more. The email is worthless to you now. Imagine if it could stay up-to-date.

TechCrunch wrote about AMP in Gmail and then one of their columnists wrote a followup response about why it might not be a good idea – which led to a lot of discussion on Hacker News.

Devin used the word static. In the past I have used the word immutable. I think “immutable” is more precise, though maybe less plain and simple language than “static” – because I don’t really care about how dynamic and interactive email becomes – usability is great, I’m all in favour.

But unchanging-ness... that’s really important. In fact, it’s the key thing about email. It is the biggest thing that email has over social networking or any of the hosted chat systems.

An email which is just a wrapper for content pulled from a website is no longer an unchangeable copy of anything.

To be totally honest, email already has a problem with mutability – an email which is just a wrapper around remotely hosted images can already be created, though FastMail offers you the option of turning them off or restricting them to senders in your address book. Most sites and email clients offer an option to block remote images by default, both for privacy and because they can change after being delivered (even more specifically, an email with remote images can totally change after being content scanned).

Your own memory

The email in your mailbox is your copy of what was said, and nobody else can change it or make it go away. The fact that the content of an email can’t be edited is one of the best things about POP3 and IMAP email standards. I admit it annoyed me when I first ran into it – why can’t you just fix up a message in place – but the immutability is the real strength of email. You can safely forget the detail of something that you read in an email, knowing that when you go back to look at it, the information will be exactly the same.

Over time your mailbox becomes an extension of your memory – a trusted repository of history, in the way that an online news site will never be. Regardless of the underlying reasons, it is a fact that websites can be “corrected” after you read them, tweets can be deleted and posts taken down.

To be clear, often things are taken down or edited for good reasons. The problem is, you can read something online, forward somebody a link to it or just go back later to re-read it, and discover that the content has changed since you were last there. If you don’t have perfect memory (I sure don’t!) then you may not even be sure exactly what changed – just be left with a feeling that it’s not quite how you remember it.

Right now, email is not like that. Email is static, immutable, unchanging. That’s really important to me, and really important to FastMail. Our values are very clear – your data belongs to you, and we promise to be good stewards of your data.

I'm not going to promise that FastMail will “never implement AMP” because compatibility is also important to our users, but we will proceed cautiously and skeptically on any changes that allow emails to mutate after you’ve seen them.

An online datastore

Of course, we’re a hosted “cloud” service. If we turned bad, we could start silently changing your email. The best defence against any cloud service doing that is keeping your own copies, or at least digests of them.

Apart from trusting us, and our multiple replicas and backups of every email, we make it very easy to keep your own copies of messages:

  1. Full standards-compliant access to email. You can use IMAP or POP3 to download messages. IMAP provides the triple of “foldername / uidvalidity / uid” as a unique key for every message. Likewise we provide CalDAV and CardDAV access to the raw copies of all your calendars and contacts.

  2. Export in useful formats. Multiple formats for contacts. Standard ICS files for calendars and it’s rather hidden, but at the bottom of the Folders screen, there’s a link called “Mass delete or remove duplicates” and there’s a facility on that screen to download entire folders as a zip file as well.

  3. Working towards new standards for email. Our team is working hard on JMAP and will be participating in a hackathon at IETF in London in March to test interoperability with other implementations.

  4. We also provide a DIGEST.SHA1 non-standard fetch item via IMAP that allows you to fetch the SHA1 of any individual email. It’s not a standard though. We plan to offer something similar via JMAP, but for any attachment or sub-part of emails as well.

Your data, your choice

We strongly believe that our customers stay with us because we’re the best, not because it’s hard to leave. If for any reason you want to leave FastMail, we make it as easy as possible to migrate your email away. Because it’s all about trust – trust that we will keep your email confidential, trust that we will make your email easy to access, and trust that every email will be exactly the same, every time you come back to read it.

Thank you to our customers for choosing us, and staying with us. If you’re not our customer yet, please do grab yourself a free trial account and check out our product. Let us know via support or twitter, whether you decide to stay, and particuarly if you decide not to! The only thing we don’t want to hear is “it should be free” – we’re not interested in that discussion, we provide a good service and we proudly charge for it so that you are our customer, not our product.

And if you’re not ready to move all your email, you can get a lot of the same features for a whole group of people using Topicbox – a shared memory without having to change anything except the “To:” line in the emails you send!

Cheers,

Bron.


Make a Lasting Impact with "Write for DOnations"

Published 14 Feb 2018 by Mark Drake in The DigitalOcean Blog.

Make a Lasting Impact with

“Our community is bigger than just us” — As DigitalOcean (DO) employees, we aim to keep this value at the front of our minds in all our work. Since the company was founded in 2012, we’ve worked hard to build a vibrant, engaging Community where everybody from beginners to professionals can learn from one another about working in the cloud.

It’s important to us that the Community emulates the best that tech has to offer by serving as a welcoming place where members can share their ideas and experiences. This is what led us to introduce the Write for DigitalOcean program. Write for DO gives Community members an opportunity to build their brand, develop their writing skills, and get paid for contributing to DigitalOcean’s collection of tutorials on open-source software deployment, configuration, and development.

We’re always looking for new ways to give back to the Community. To that end, we’re excited to announce some updates to the Write for DigitalOcean program and reintroduce it as “Write for DOnations” (currently in beta — the full program launch is coming later this year).

There are two main changes that we are excited to share:

The Write for DOnations beta program will follow the same editorial structure as Write for DO:

At the end of this review process, the author’s tutorial will be published on the Community website and they will receive their payout. The author will then get to choose the nonprofit(s) that will receive their matching donation. Donations will be processed through Bright Funds, and authors’ donations can either go to a single tech-focused nonprofit or be evenly split between a group of nonprofits that share similar missions. Please note that the charitable contributions made by DigitalOcean through this program are not tax-deductible to the authors.

Since its launch, the Write for DigitalOcean program has allowed authors to share their diverse technical knowledge with the world while also improving their writing skills and growing their personal brand. Our team is always on the lookout for fresh content our community will love. To get a sense of which tutorial topics we’re particularly interested in, take a look at our suggested topics page.

Although Write for DOnations is still in development, we’re excited to help our Community authors make a real impact by donating to fantastic organizations that are working to shape the world of tech for the better.

We are actively seeking feedback to inform the full release of the the new Write for DOnations program. Check out the program’s FAQ page for more details, and please share any questions or comments about the Write for DOnations beta launch in the comments below or reach out to us directly at writefordonations@digitalocean.com.


The Deep End Podcast Ep #13: From Prototype to Internet of Things with Muzzley

Published 13 Feb 2018 by Hollie Haggans in The DigitalOcean Blog.

The Deep End Podcast Ep #13: From Prototype to Internet of Things with Muzzley

A vision, a small prototype, and a PowerPoint presentation: that’s how Muzzley, a platform for interacting between Internet of Things (IoT) devices, was born three years ago. Today the Muzzley team works to solve a pain point for smart home consumers: managing their IoT devices from one interface, with minimum hassle. But they also place importance on transparency, privacy, and protecting their customers’ data.

In this episode, Muzzley co-founders, Domingo Bruges and Sasha Dewitt, discuss how Muzzley’s tech stack evolved to support a product that integrates with different vendors. They share insight into how they manage the data generated by consumer IoT devices, and how they approach consumer privacy and data production.

Subscribe to the The Deep End Podcast on iTunes, and listen to the latest episode on SoundCloud below:

Hollie Haggans heads up Global Partnerships for DigitalOcean’s Hatch program. She is passionate about startups and cold brew coffee. Get in touch with questions at hatch@digitalocean.com.


MySQL socket disappears

Published 9 Feb 2018 by A Child of God in Newest questions tagged mediawiki - Server Fault.

I am running Ubuntu 16.04 LTS, with MySQL server for MediaWiki 1.30.0 along with Apache2 and PHP7.0. The installation was successful for everything, I managed to get it all running. Then I start installing extensions for MediaWiki. Everything is fine until I install the Virtual Editor extension. It requires that I have both Parsoid and RESTBase installed. So I install those along side Virtual Editor. Then I go to check my wiki and see this message (database name for the wiki is "bible"):

Sorry! This site is experiencing technical difficulties.

Try waiting a few minutes and reloading.

(Cannot access the database: Unknown database 'bible' (localhost))

Backtrace:

#0 /var/www/html/w/includes/libs/rdbms/loadbalancer/LoadBalancer.php(1028): Wikimedia\Rdbms\Database->reportConnectionError('Unknown databas...')

#1 /var/www/html/w/includes/libs/rdbms/loadbalancer/LoadBalancer.php(670): Wikimedia\Rdbms\LoadBalancer->reportConnectionError()

#2 /var/www/html/w/includes/GlobalFunctions.php(2858): Wikimedia\Rdbms\LoadBalancer->getConnection(0, Array, false)

#3 /var/www/html/w/includes/user/User.php(493): wfGetDB(-1)

#4 /var/www/html/w/includes/libs/objectcache/WANObjectCache.php(892): User->{closure}(false, 3600, Array, NULL)

#5 /var/www/html/w/includes/libs/objectcache/WANObjectCache.php(1012): WANObjectCache->{closure}(false, 3600, Array, NULL)

#6 /var/www/html/w/includes/libs/objectcache/WANObjectCache.php(897): WANObjectCache->doGetWithSetCallback('global:user:id:...', 3600, Object(Closure), Array, NULL)

#7 /var/www/html/w/includes/user/User.php(520): WANObjectCache->getWithSetCallback('global:user:id:...', 3600, Object(Closure), Array)

#8 /var/www/html/w/includes/user/User.php(441): User->loadFromCache()

#9 /var/www/html/w/includes/user/User.php(405): User->loadFromId(0)

#10 /var/www/html/w/includes/session/UserInfo.php(88): User->load()

#11 /var/www/html/w/includes/session/CookieSessionProvider.php(119): MediaWiki\Session\UserInfo::newFromId('1')

#12 /var/www/html/w/includes/session/SessionManager.php(487): MediaWiki\Session\CookieSessionProvider->provideSessionInfo(Object(WebRequest))

#13 /var/www/html/w/includes/session/SessionManager.php(190): MediaWiki\Session\SessionManager->getSessionInfoForRequest(Object(WebRequest))

#14 /var/www/html/w/includes/WebRequest.php(735): MediaWiki\Session\SessionManager->getSessionForRequest(Object(WebRequest))

#15 /var/www/html/w/includes/session/SessionManager.php(129): WebRequest->getSession()

#16 /var/www/html/w/includes/Setup.php(762): MediaWiki\Session\SessionManager::getGlobalSession()

#17 /var/www/html/w/includes/WebStart.php(114): require_once('/var/www/html/w...')

#18 /var/www/html/w/index.php(40): require('/var/www/html/w...')

#19 {main}

I checked the error logs in MySQL, and the error message said that the database was trying to be accessed without a password. I restarted my computer and restarted Apache, Parsoid, RESTBase, and MySQL. But I could not successfully restart MySQL. The error log by typing the command journalctl -xe and saw that it failed to start because it couldn't write to /var/lib/mysql/. I went to StackExchange to see if I could a solution, and one answer said to use the command mysql -u root -p. I did and typed in the password and it gave this error:

ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)

I also check the status of it by typing sudo mysqladmin status which said:

mysqladmin: connect to server at 'localhost' failed error: 'Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)' Check that mysqld is running and that the socket: '/var/run/mysqld/mysqld.sock' exists!

I wanted to verify that it existed, but upon browsing to the location of the socket, I found it was not there. I saw an answer about a missing MySQL socket which said to use the touch command to create the socket and another file. I did it and still had the same issues. I went back to the directory and found the two files to be missing. So I created them again with the touch command and watched the folder to see what happens. After about half a minute, the folder seems to be deleted and recreated. I get kicked out of the folder into it's parent directory and when I go back in the files are gone.

Does anybody know why this is happening, or at least how I can fix this and get MySQL back up and running?


cardiCast episode 27 – Stephanie Milsom – Data Mining and Art

Published 8 Feb 2018 by Justine in newCardigan.

Melbourne December 2017 cardiParty

Recorded live

Stephanie Milsom discusses her recent exhibition, All of It, a collection of every piece of information Google has collected about her, every piece of data that she has left behind. Stephanie discusses data as art, digital privacy and just what is her relationship with her own data.

stephaniemilsom.com

newcardigan.org
glamblogs.newcardigan.org

Music by Professor Kliq ‘Work at night’ Movements EP.
Sourced from Free Music Archive under a Creative Commons licence.


Global Diversity Call for Proposals Day

Published 7 Feb 2018 by Rebecca Waters in DDD Perth - Medium.

Photo by Hack Capital on Unsplash

February 3rd, 2018: Global Diversity Call for Proposals (CFP) Day. Around the globe, over 50 cities across 23 countries participated by running CFP workshops.

The workshops were aimed at first time would-be speakers, from any field (technology focus not required). Mentors were available to help with proposals, provide speaking advice and share their enthusiasm to get newcomers up on stage.

Workshops were held in Brisbane, Melbourne, Perth and Sydney in Australia, by some of the most vocal supporters of diversity in technology in the country.

In Perth, Fenders and DDD Perth run proposal writing workshops to help reduce the barrier to submitting, and so it made sense for us to join in this February fun and encourage a whole new group of conference potentials to get up and share their knowledge!

The workshop in Perth was well attended with participants from different backgrounds, both personally and professionally, coming together to work on their proposals. Mentors from Fenders and DDD Perth brought their children down and the entire building at Meerkats was filled with excitement (and snacks!).

A special mention goes to the company who provided the space, Meerkats; complete with breakout rooms and comfy couches, our supportive DDD Dad was able to entertain the children easily and the workshop participants could move around the space and find a quiet corner to work in when the workshop required.

Personally, I can’t wait to see the 13 proposals that were written get accepted at local user groups, conferences and maybe further afield.

Mark Lockett’s open source software talk captured my interest (and not just me, I’m sure). Rosemary Lynch’s experience with online publishing for organisations is eye opening, and Amy Kapernick’s take on failure is sure to be a hit in the future.

If these topics sound interesting, be sure to follow these people and lend your support for their first foray into speaking in the WA community!

If you missed this workshop, be sure to follow DDD Perth on Twitter or join our mailing list as we will be holding more workshops throughout the year to help aspiring speakers on their way!


Global Diversity Call for Proposals Day was originally published in DDD Perth on Medium, where people are continuing the conversation by highlighting and responding to this story.


Page-specific skins in MediaWiki?

Published 7 Feb 2018 by Alexander Gorelyshev in Newest questions tagged mediawiki - Webmasters Stack Exchange.

Is there a way to force a particular skin to be applied while displaying specific MediaWiki articles?

In my wiki many articles will have a "flip" version with alternative content (think "good" and "evil" perspectives of the same topic). I was thinking about using namespaces to separate these versions, but I need a definitive way to visually contrast them.


Episode 1: Cindy Cicalese

Published 6 Feb 2018 by Yaron Koren in Between the Brackets: a MediaWiki Podcast.

Cindy Cicalese was a Principal Software Systems Engineer at MITRE for many years, where, among other things, she oversaw the creation and maintenance of over 70 MediaWiki installations, as well as development of many MediaWiki extensions. Last year she joined the Wikimedia Foundation as Product Manager for the MediaWiki Platform team.

Links for some of the topics discussed:


A Practical Droplet Performance Comparison

Published 6 Feb 2018 by Reynold Harbin in The DigitalOcean Blog.

A Practical Droplet Performance Comparison

Benchmarks are a common way to measure and compare the performance of cloud compute servers. While standardized benchmarks are useful for establishing a consistent, broad set of comparison metrics, it can be useful and more practical to compare the performance of the actual tasks you run most often on your servers as well.

For example, how much time could you save when running your app's automated test scripts if you used a more powerful cloud server?

We compared the performance of Standard and Optimized Droplets when doing just this. Specifically, we used the basic React Boilerplate app, which includes a comprehensive set of testing scripts covering 99% of the project. Because the tests are CPU-intensive, we chose test execution time as our comparison metric for the two different Droplet configurations.

Server Setup and Testing Methodology

For the default environment, we used a Standard $40 Droplet, which is configured with 4 vCPUs (Intel Xeon CPU E5-2650L v3 @ 1.80GHz), 8GB of RAM, and 160GB of SSD storage.

For the comparison environment, we used an Optimized $40 Droplet, which is configured with 2 dedicated vCPUs (Intel Xeon CPU E5-2697A v4 @ 2.60GHz), 4GB of RAM, and 25GB of SSD storage.

Both Droplets were running Ubuntu 16.04, and we set both up using the following procedure.

After initial setup to create a non-root user and basic firewall, we verified the CPU architecture using lscpu. We installed Node.js using the PPA to get a recent version of Node.js that includes npm, the Node.js package manager, which we needed to execute the test scripts. Finally, we installed React Boilerplate by cloning the react-boilerplate repository and running npm run setup to install its dependencies.

At this point, we had everything we needed to run the tests. To measure the time it takes to execute them, we used the utility program time, which summarizes the time and system resource usage for a given program command.

As a baseline, we first compared Droplet performance when running React Boilerplate's test suite with its default settings using time npm test.

Because npm uses a test framework that can use all available processors, we also ran a single CPU comparison to better understand the impact of CPU on performance. For the single CPU comparison, we ran time npm test -- --runInBand to force all of the automated tests to run sequentially. This test is relevant for applications that are not designed to use multiple CPUs, where a more powerful processor can improve performance.

Additionally, we found that setting the number of worker nodes to match the number of vCPUs on the server yielded the fastest overall test execution time, so we compared the best case setup on both servers as well. For the vCPU-specific comparison, we ran time npm test -- --maxWorkers=4 for the Standard Droplet (which has 4 vCPUs) and time npm test -- --maxWorkers=2 for the Optimized Droplet (which has 2 vCPUs).

We ran each of these tests five times on each server to look at the average execution time over a larger sample size.

So, how did the Standard and Optimized Droplets perform?

Results

Here's an example (truncated for length) of the output from time npm test on the Optimized Droplet:

> react-boilerplate@3.5.0 pretest /home/perfaccount/react-boilerplate
> npm run test:clean && npm run lint
[...]

 PASS  app/containers/App/tests/index.test.js
 PASS  app/containers/LocaleToggle/tests/index.test.js
[...]
 PASS  app/containers/HomePage/tests/actions.test.js

Test Suites: 76 passed, 76 total  
Tests:       289 passed, 289 total  
Snapshots:   4 passed, 4 total  
Time:        14.725s, estimated 33s  
Ran all test suites.  
---------------------------------|----------|----------|----------|----------|----------------|
File                             |  % Stmts | % Branch |  % Funcs |  % Lines |Uncovered Lines |  
---------------------------------|----------|----------|----------|----------|----------------|
All files                        |      100 |      100 |      100 |      100 |                |  
 app                             |      100 |      100 |      100 |      100 |                |
  configureStore.js              |      100 |      100 |      100 |      100 |                |
[...]
  sagaInjectors.js               |      100 |      100 |      100 |      100 |                |
---------------------------------|----------|----------|----------|----------|----------------|

real    0m22.380s  
user    0m23.512s  
sys    0m0.884s  

The output we’re interested in is real time, which is the actual elapsed wall-clock time it took to execute the tests. In this example, the test script completed in 22.380 seconds.

These are our results showing the average execution time across multiple runs:

A Practical Droplet Performance Comparison

The Optimized Droplet outperformed the Standard Droplet in all tests, but as we explain in the next section, this isn't the only factor to consider when choosing the right configuration for your use case.

Conclusions

When comparing cloud servers with the goal of optimizing price-to-performance and resources, it's important to test the applications that you plan to run on the server in addition to comparing standard benchmarks.

In measuring the execution times of the react-boilerplate project's automated tests, our results showed a small improvement of 4.9% when using a $40 Optimized Droplet compared to a $40 Standard Droplet. For applications that perform similarly and do not take full advantage of all CPUs, choosing the $40 Standard Droplet may be a better choice because of its additional memory (8GB vs 4GB) and larger SSD (160GB vs 25GB).

However, the Optimized Droplet executed 37.3% faster when running the tests sequentially. For compute-intensive applications that use a single vCPU, this difference may be significant enough to choose the Optimized Droplet for the same price as the Standard Droplet.

If your application can run in a clustered mode with a specific number of CPU resources, you may be able to optimize price to resources by using a Standard Plan with more CPU, RAM and SSD versus a lower number of higher powered CPUs. We saw the best performance on both Droplets when we set the number of application instances to match the number of available vCPUs, where Optimized Droplets still outperformed Standard Droplets by a significant 21.7%, though the additional RAM and SSD in Standard Droplets may be preferable.

The tests performed in this article are not designed to be comprehensive, but are tailored to the types of applications that typically consume time and CPU resources. To maximize price-to-performance and resources for your applications, you can test various Droplet configurations and measure execution times of the typical jobs you place on your servers.

Test Droplets for Your Apps


How do i edit Login Required Page [closed]

Published 5 Feb 2018 by jehovahsays in Newest questions tagged mediawiki - Webmasters Stack Exchange.

On my private MediaWiki view & read is set to false.
My website visitors would see
Please Login to view other pages.
What needed to do was edit the login link located in this error message.


According to You

Published 2 Feb 2018 by Rebecca Waters in DDD Perth - Medium.

Ian Hughes presenting at DDD Perth 2017

Following DDD Perth 2017, a bunch of attendees gave the internet their take on our conference. Apart from becoming one of the top five trending topics in Australia for conference day…

…there were also some blushingly complimentary posts about the conference.

If perchance you missed one, we’ve rounded them up for you. Take a look!

Kris Howard

Thank you again to the DDD Perth organisers for inviting me to participate in this wonderful event!

Kris, we’re the ones who are thanking you for delivering such an inspiring locknote. Catch the video of Kris’ talk here.

DDD Perth 2017

Dash Digital

They put on a great event that was well-priced at only $50 a ticket, and included top-notch speakers, lunch and networking opportunities at the afterparty.

What we love about this round up from Dash is the coverage of three of our speakers, Kris Howard, Nathan Jones and Will Webster.

Dash developers do #DDDPerth

Nathan Jones

It was a credit to the organisers and a shout-out to the sponsors for helping make it happen.

We love how Nathan invites future conversations and connections. As he says in the blog post, his DM’s are open! You can catch Nathan’s talk here.

DDD Perth Wrap-Up

Gaia Resources

Having the opportunity to hear some of Perth’s best developers talk about their experiences and recommendations is an excellent opportunity to get away from our day-to-day work and find some fresh perspective.

Fun fact: Six developers from Gaia turned out to represent at DDD Perth. Talk about commitment to learning!

Serverless Architecture at the Small Scale - Gaia Resources

LiveHire

Thank you, DDD Perth for creating such a worthwhile and exciting event.

LiveHire is a continued proud sponsor of DDD Perth and we couldn’t do what we do without them. Thanks for the round up!

LiveHire @ Developer Developer Developer! Perth Event 👏 #DDDPerth

Amy Kapernick

Despite the small price tag and the fact that I had to get up early on a Saturday, not only did it not disappoint but I was blown away by the number of supporters, the calibre of the speakers and the overall experience.

Amy’s glowing review blew us out of the water. Thanks for the kind words, Amy - you keep us motivated for 2018!

Amy Goes to Perth

Donna Edwards

I was also really impressed with the diversity at the conference and how the committee had really made a huge effort to attract more women speakers

Donna’s talk was extremely informative; you can check it out here. Donna herself organises large scale events and we really value her feedback on DDD!

DDD Perth 2017

Did we miss a post? Let us know so we can include it.


According to You was originally published in DDD Perth on Medium, where people are continuing the conversation by highlighting and responding to this story.


How To Code in Python: A DigitalOcean eBook

Published 1 Feb 2018 by Lisa Tagliaferri in The DigitalOcean Blog.

How To Code in Python: A DigitalOcean eBook

We have always been community-focused at DigitalOcean. On our Community site, we offer a few ways that developers can connect with each other, through sharing projects, learning about meetups, or answering questions. Additionally, we have over 1,800 technical tutorials, written by both external community members and internal technical writers, that have been designed to support the learning pathways of software engineers and system administrators as they develop their skills and scale their projects.

Since joining the DigitalOcean Community team, I have focused on curriculum development and technical writing related to Python software development. Today, I am happy to share that we are repackaging the “How To Code in Python 3” tutorial series as an eBook that can serve as both a teaching tool for beginners and a point of reference for more seasoned developers.

Our goal in making this tutorial series available in an eBook format is to facilitate access to this educational content. This is especially significant for people with limited internet access, long commutes without wifi, or who primarily access written material from mobile devices. Our hope is that the people who will benefit from this eBook will become more knowledgeable about how coding works, and thereby increase the number of technology stakeholders, decision makers, and knowledge producers who can work to build better software for everyone. By offering a new format of this content, we would like to drive engagement with and interest in software development across broader and more diverse communities.

Creating an eBook

This eBook project came about during a DigitalOcean company-wide Hackathon. Hackathons offer a great environment to test out projects that teams have been thinking about taking on, but have not been able to devote the time and resources to during a regular work week. Our team, which we nicknamed Bookworms, consisted of Brian Boucheron (Technical Writer), Kasia Bojanowska (Senior Visual Designer), and myself.

Brian was our eBook developer. He used pandoc, GNU Make, and Perl scripting to automate the eBook creation process from the original tutorial markdown. For some final stylistic choices, he has done some hand crafting along the way, but has worked to ensure that the eBook can be read as its user desires across devices. We intend to release relevant source code in a repository for others to extend and modify.

Kasia has done a lot of the design work that sets DigitalOcean’s tutorials and brand apart, and has conceived of a new vibrant cover for this eBook. Designs and imagery that invite readers in are an instrumental element of book conception, and Kasia’s dynamic image inspires curiosity and playfulness.

Since the Hackathon, I have worked to ensure that this eBook is made publicly available from major eBook distributors, is catalogued in libraries, and made available as an open educational resource in schools and universities.

What Is an Open Educational Resource?

Open educational resources (OERs) are texts or digital assets that can be used for teaching, learning, and research. What is significant about them is that they are openly accessible and openly licensed. At DigitalOcean, we use a Creative Commons License on all of our tutorials so that others can freely translate our technical content to other languages to encourage learning.

Each version of the eBook that is made publicly available will have a separate ISBN in order to facilitate access to the book. I have been working with the librarians at the City University of New York’s Brooklyn College and Graduate Center in order to catalogue the eBook and make it available for students as an open educational resource. If you would like to see this eBook in your library, share this WorldCat link with your local librarian.

By having this eBook available in libraries and within OER repositories, more students will be able to access computer programming learning material without having to pay textbook prices for that privilege.

We hope that readers who learn from or reference this eBook will be empowered to make their own contributions to open-source code via software and documentation pull requests or repository maintenance. Our community is bigger than just us, and building software together can make sure that everyone has an opportunity to participate in the technology we use every day.

You can now download the free eBook in one of the following formats:

Lisa Tagliaferri is the manager of Community Content at DigitalOcean. In addition to writing about Python, Lisa helps people find solutions through technology and infrastructure. Holding a PhD from CUNY, Lisa has a continued interest in interdisciplinary research, and is committed to community building through education. Find her on Twitter @lisaironcutter.


Cleaning up your inbox

Published 31 Jan 2018 by David Gurvich in FastMail Blog.

With email forming such a big part of our life it’s possible you had a New Year’s resolution to clean up your inbox.

Perhaps you spent last year, or even previous years, at the mercy of your unruly inbox? Or maybe you’ve come back to your email account after some time off and been overwhelmed with cleaning out all those emails.

Putting aside any regular email blasts from friends or family (read on for how to manage that), it’s likely that a lot of your inbox spam or clutter is from marketing lists you have signed up to.

What once seemed like an invitation too good to ignore might now be taking over your email life, so that every time you visit your inbox you’re confronted with more and more emails.

Types of unwanted email

Unwanted email may come in several forms and can include:

  1. Marketing lists - from retailers and organisations.
  2. Social media notifications – linked to an account you’ve already set up.
  3. Spam – communication from people you have no prior relationship with.

So let’s take a look at each of those kinds of unwanted mail in more detail and the best way to keep their effect on your inbox to a minimum.

1. Marketing lists

Imagine you signed up to a marketing list some years ago for a particular retailer. Maybe at a certain period in time you were really interested in throw pillows. But in the intervening years you’ve forgotten about ever signing up to this list and wondering why your inbox keeps filling up with offers on something you don’t want, featured within emails you don’t want to receive. Now you simply find these emails annoying – and consider them to be spam.

Unsubscribe from a list

So how do you stop receiving all of those throw pillow emails? Well, rather than using the 'Report Spam' button the best thing to do is to manually unsubscribe from the list you once signed up to.

Most lists by law should have an unsubscribe link included somewhere within the body of the email; often this is located on the footer. If you can't see an unsubscribe link you may need to contact the sender directly to request removal.

Find lists

There are a few ways you can audit your inbox for lists. The first is to use the 'Mailing lists' tab button. (Note that this is not visible if your screen layout is configured to use the reading pane.)

Image of
filter buttons in the UI with the mailing list button selected

You can click on this to quickly filter your inbox by senders. Then you can go through and decide what you want to keep and what you want to unsubscribe from.

The other way to find a known list is to use our search toolbar and look for it by name.

2. Social media notifications

These days there seems to be a never-ending list of social media platforms to use. Most of us would be aware of, or likely use, some or all of the biggest platforms such as Twitter, Facebook and LinkedIn.

And while social media can be great for staying in touch and promoting your business, notifications are often linked to the email address you set up your account with.

At times this can be convenient, however as these platforms continue to evolve you might find you have endless social media notifications taking over your inbox too.

Switching off notifications at the source

The good thing is that these notifications can be turned off, or managed, directly from the user settings for each individual social media platform you are using.

Visiting the ‘Settings’ or ‘Help’ menu of any social media platform you use should give you step-by-step instructions on how to control what gets sent to your inbox.

3. Spam

At FastMail we define spam as unsolicited mail sent to a large number of users in an attempt to get you to click on a link to buy a product, market an otherwise dubious service, scam you out of some money or install a virus or malware on your computer.

We’re often asked why would you keep receiving certain emails if they had previously been marked as spam?

For example, you may have previously received email you consider to be spam and decide to report the sender as spam using the 'Report Spam' button. However, some days later you find another email from the same sender in your inbox, rather than automatically being moved to your Spam folder upon delivery.

There are a few reasons for this. The first is that at some stage you likely consented to receiving these emails (in some form) so that tells our systems you do want to receive these emails (and we’re all about making sure you receive your email).

The second reason is to do with how our spam filters work. You can choose a range of settings to ensure spam filtering works the best for your needs. We’ve talked about this previously, but essentially you train the spam filter.

Everybody's spam is different. When you report spam that's slipped through our filters, or non-spam that we've mistakenly classified, we feed this information into a database that's tuned just for you. We also automatically train this with spam you've deleted permanently from your spam folder, and non-spam you've moved to your Archive folder or replied to.

And while we never sell email addresses, nor disclose email addresses at our site to anyone else, there are other instances where unscrupulous marketers may have placed you on mailing lists you didn’t consent to – let’s just call them spammers – using a range of methods to spam you.


Taking action

FastMail gives you the power to control your inbox, using a range of features to manage which mail comes to you.

Block the sender

If you can't unsubscribe or switch off notifications, you can block a particular sender by setting up a rule to permanently discard a message upon delivery. We do recommend sending mail into a folder when first setting up the rule, because mail discarded in this way is gone forever: we can't even get it back from backups.

If you have lots of senders you want to block, add them to a group in your addressbook, then create a rule to discard or file mail from that group. You can also create a contact for a wildcard on the whole domain in this group: this will also block mail being sent from any address at that domain.

Mail you want to keep

If you want to never block certain senders, add them to your address book. This also means mail from these trusted senders bypass any spam checking. This might be a good option for online retailers you regularly use, making sure you receive any correspondence straight to your inbox.

Using rules to keep mail organised

Sometimes you still might want to receive email from particular senders but not have these messages taking over our inbox.

We recently wrote about organising your mail with rules and this is ideal for any correspondence that you still want but maybe not at the expense of your day-to-day inbox experience.

When you’re viewing a message you can use the 'More' button, then 'Add rule from Message…' option to directly create a new Rule for that particular mail. For example, you might send all mail from online retailers to a folder called ‘Purchases’.

image showing the Add Rule function when viewing a message in the FastMail web interface

Welcome to your streamlined inbox

So now, rather than waiting for your inbox to fill up and then manually batch-deleting every few weeks or months you can take back control today!

And whether you want to completely unsubscribe from lists or set up rules, the choice is up to you.

Either way, this year you may finally get to utter those words, “I finally unsubscribed from those throw pillow emails” making 2018 the year you bring more peace and control to your inbox.


My back to school free software toolkit

Published 30 Jan 2018 by legoktm in The Lego Mirror.

The 2018 spring semester started last Wednesday. I think I've set up a pretty good free software toolkit for a successful year:

Total software cost: $0


The Journey

Published 26 Jan 2018 by jenimcmillan in Jeni McMillan.

IMG_0349
I’m on a bus. Denmark has faded into the distance and now I’m passing through wind generator infested fields on the way to Berlin.  You know I care about climate change.  I’ve even vowed not to get on a plane again so that could very bad news for anyone expecting me back soon. I guess there’s always sea travel but I can’t decide what worries me more… pirates or seasickness.  I’ll start by doing laps of the sauna. (I know that doesn’t make sense but they’re great).
News trickles through to the remote corners of the world where I’ve been thigh deep in snow, that Australia has been experiencing a heatwave. When I was in Russia someone told me that Sydney had 48 degrees that day. He wasn’t Russian. In general, they’re not friendly with foreigners, unless one is in a sparse, white-tiled community bathhouse with a crowd of large, naked women. Trust me, it was fabulous. If only I had my sketchbook and charcoal.
Along with breathtaking architecture and cheap hostels that were once palaces,  and some photo opportunities that were golden, the lack of smiles was a constant during my three weeks in post Soviet Russia.
When I arrived in Stockholm,  laughter surprised me and the variety of different backgrounds were striking. What a relief to be amongst other humans who could laugh even when life isn’t perfect. It was still minus 5, the metro crowded and I was a foreigner. Of course I loved Russia but a huge thank you to the Swedes, Norwegians and Danish people for being you. I had a fabulous time and I’m sure I’ll go back for my friend’s wedding in August, assuming I manage the next round of paperwork in France.
I’m making my way back to France slowly.  There’s a whole mini series in my dental tourism escapades that happen before I get there. Hello Budapest.. I don’t require being picked up at the airport or help with a discounted hotel but bus and hostel will be fine to get me to your lovely dental suites. 12 February. Stay tuned.
In the meantime, Berlin with its politics, art, contact improvisation and some lovely friends are less than an hour away. I’m excited! The bus is approaching Frankfurt and it’s time I started looking out the window.
Take care, smile and give hugs. It’s a wonderful gift.
PS I didn’t pose naked in the snow but I did take the photograph.

How can I allow sysops and specific users to hide spam articles in MediaWiki?

Published 24 Jan 2018 by jehovahsays in Newest questions tagged mediawiki - Webmasters Stack Exchange.

My website is powered by MediaWiki Version 1.29.1.

Sometimes the Recent Changes results page becomes cluttered with spam articles that I wish to hide from the results page. How can I allow specific users to hide them?

Keep in mind, I don't need spam protection and I only need to know how to hide spam articles from the results page.


Meet the Team

Published 23 Jan 2018 by Rebecca Waters in DDD Perth - Medium.

The 2017 DDD Perth team (credit: David Schokker)

The last 6 months at DDD Perth has seen us hit some amazing milestones:

DDD Perth started in 2015, and two dedicated people, Robert Daniel Moore and Matt Davies have been driving the conference forward each and every year. In 2017, we spread the load amongst a wider group, and in 2018 we’ve formalised the conference with the incorporation of DDD WA Inc. It’s the DDD Perth committee you know and love, with a new name (for the association only) and a big ‘ole cup of motivation for the future.

So, let’s meet the team involved in DDD Perth for 2018.

Lee Campbell

Lee Campbell (@LeeRyanCampbell) | Twitter

Lee describes himself as an angry, messy, intolerant dev who has a compelling need to contribute to the community to make up for his sins. He’s a freelance developer, author and presenter with extensive experience in concurrent programming, composite applications, web and integration technologies.
Lee’s vision for DDD WA is a platform where we can grow the broad base of Junior members of the community so that they can challenge the seniors. At the other end of the spectrum provide the content and challenging ideas that can help our seniors become world leaders.

Rebecca Waters

Rebecca Waters | Professional Profile | LinkedIn

Rebecca is a software engineer and project manager who feeds off the enthusiasm of others and contributes widely to the Perth software industry. She is a mentor in and outside of her company to junior developers and other professionals, an ex-officio board member of the Australian Industry and Defence Network of WA, and committee member of the Australian Computer Society — Women chapter.
Rebecca’s vision for DDD WA is to be the place you want to be at, and DDD Perth the conference you can’t afford to miss.

Rob Moore

Robert Daniel Moore (@robdmoore) | Twitter

Rob is a Principal Consultant for Readify. He’s Passionate about leading software teams to deliver business value.
Rob wants an inclusive and accessible conference for the Perth software industry, and will work hard to make DDD Perth that conference!

Ian Hughes

Ian Hughes (@ian_hughes) | Twitter

Ian Hughes likes code, science, travel, beer, and footy. He’s a Principal Consultant with Readify during the week, primarily in Agile, DevOps and Mobile; he does other stuff on the weekend, like trying to bring the amazing developer community in WA together to talk about and share their experiences at DDD Perth and the Agile Perth Meetup.

Rob Crowley

Rob Crowley (@robdcrowley) | Twitter

Rob is a software consultant, developer and team lead with a passion for delivering systems that perform at scale. Rob has over 15 years of experience building distributed systems on the web stack and has read more RFCs than he cares to admit. He has spoken at various conferences around Australia such as Elastic{ON}, DDD Melbourne and NDC Sydney and brings a wealth of experience to DDD WA.

Aidan Morgan

Aidan Morgan (@aidanjmorgan) | Twitter

Aidan likes whiskey and making things. He is an experienced CTO and is currently the head of engineering at Boundlss. He is most passionate about machine learning, Agile and the Perth Startup community, but is mainly interested in getting things done. He also hates talking about himself in the third person.
Aidan is passionate about DDD Perth because he likes connecting people and learning more about the cool things that are going on in the Perth software community.

Matt Ward

matt ward (@mattyjward) | Twitter

Matt is a technical lead at Bankwest and full stack developer. Matt is passionate about Junior Developers and is involved in the juniordev.io movement in Perth as well as DDD Perth.

Ashley Aitken

Ashley Aitken (@AshleyAitken) | Twitter

Ashley is an entrepreneur, software engineer, IT trainer and academic, and family man. He’s trained software developers and IT professionals for companies like IBM, Apple, and HP, and organised and presented at IT conferences around the world. He’s also a big fan of Lean Startup with Customer Development and runs the Lean Startup Perth meetup.
Ashley’s vision for DDD Perth is for it to encourage and support WA software developers to lead the world in software development practices. He believes we don’t have to just follow, we can set the pace and direction and DDD Perth will play a big part in that.

Marjan Zia

Tweets with replies by Marjan Zia (@zia_marjan) | Twitter

Marjan is a passionate software developer who feels lucky to work in the finance industry. Her main hobby is to build applications that bring benefits to the end users. She is a very customer focused person who puts the customers at the heart of her development cycle. Her main career goal is to become a tech-lead to aid development teams with designing software packages.
Her main goal being part of DDD is to help organise events that WA Tech lovers want to attend and get great benefit from.

Derek Bingham

Derek Bingham (@deekob) | Twitter

Derek is a journeyman developer, building software applications over the past 20 years in many stacks you can and can’t imagine.
Witnessing the inclusivity and diversity that DDD has brought to the Perth community has been inspirational and he hopes to make a contribution to that continuing. Currently plying his trade at Amazon Web Services.

Jake Ginnivan

Jake Ginnivan (@JakeGinnivan) | Twitter

Jake is the Digital Development Manager for Seven West Media and an Open Source Software enthusiast. Jake’s a seasoned presenter and workshop facilitator. Jake has spoken most recently at NDC Sydney, NDC London, and DDD Perth. He brings a wealth of experience to DDD WA.

Andrea Chagas

Andrea Chagas (@andrealchagas) | Twitter

Andrea Chagas is a mobile developer at Bankwest. She is a tech enthusiast who is passionate about collaboration with colleagues. She is constantly cooking up new ideas and ways to do things.


Meet the Team was originally published in DDD Perth on Medium, where people are continuing the conversation by highlighting and responding to this story.


The Full BBS Documentary Interviews are Going Online

Published 23 Jan 2018 by Jason Scott in ASCII by Jason Scott.

This year, the full 250 hours of interviews I conducted for the BBS Documentary are going online at the Internet Archive.

There’s already a collection of them up, from when I first set out to do this. Called “The BBS Documentary Archive“, it’s currently 32 items from various interviews, including a few clip farms and full interviews of a bunch of people who sat with me back in the years of 2002-2004 to talk about all matter of technology and bulletin board history.

That collection, as it currently stands, is a bit of an incomplete mess. Over the course of this project, it will become a lot less so. I’ll be adding every minute of tape I can recover from my storage, as well as fixing up metadata where possible. Naturally you will be asked to help as well.

A bit of background for people coming into this cold: I shot a movie called “BBS: The Documentary” which ended up being an eight episode mini-series. It tried to be the first and ultimately the last large documentary about bulletin board systems, those machines hooked up to phone lines that lived far and wide from roughly 1978-2000s. They were brilliant and weird and they’re one of the major examples of life going online. They laid the foundation for a population that used the Internet and the Web, and I think they’re terribly interesting.

I was worried that we were going to never get The Documentary On BBSes and so I ended up making it. It’s already 10 years and change since the movie came out, and there’s not been another BBS Documentary, so I guess this is it. My movie was very North American-centric and didn’t go into blistering detail about Your Local BBS Scene, and some people resented that, but I stand by both decisions; just getting the whole thing done required a level of effort and energy I’m sure I’m not capable of any more.

Anyway, I’m very proud of that movie.

I’m also proud of the breadth of interviews – people who pioneered BBSes in the 1970s, folks who played around in scenes both infamous and obscure, and experts in areas of this story that would never, ever have been interviewed by any other production. This movie has everything: Vinton Cerf (co-creator of the Internet) along with legends of Fidonet like Tom Jennings and Ken Kaplan and even John Madill, who drew the FidoNet dog logo. We’ve got ANSI kids and Apple II crackers and writers of a mass of the most popular BBS software packages. The creator of .QWK packets and multiple members of the Cult of the Dead Cow. There’s so much covered here that I just think would never, ever be immortalized otherwise.

And the movie came out, and it sold really well, and I open licensed it, and people discover it every day and play it on YouTube or pull out the package and play the original DVDs. It’s a part of culture, and I’m just so darn proud of it.

Part of the reason the movie is watchable is because I took the 250 hours of footage and made it 7.5 hours in total. Otherwise… well….

…unless, of course, you’re a maniac, and you want to watch me talking with people about subjects decades in the past and either having it go really well or fall completely apart. The shortest interview is 8 minutes. The longest is five hours. There’s legions of knowledge touched on in these conversations, stuff that can be a starting port for a bunch of research that would otherwise be out of options to even find what the words are.

Now, a little word about self-doubt.

When I first starting uploading hours of footage of BBS Documentary interviews to the Internet Archive, I was doing it from my old job, and I had a lot going on. I’d not done much direct work with Internet Archive and didn’t know anything going on behind the scenes or how things worked or frankly much about the organization in any meaningful amount. I just did it, and sent along something like 20 hours of footage. Things were looking good.

Then, reviews.

Some people started writing a few scathing responses to the uploads, pointing out how rough they were, my speech patterns, the interview style, and so on. Somehow, I let that get into my head, and so, with so much else to do, I basically walked away from it.

12 years later (12 years!) I’m back, and circumstances have changed.

I work for the Archive, I’ve uploaded hundreds of terabytes of stuff, and the BBS documentary rests easily on its laurels of being a worthwhile production. Comments by randos about how they wish I’d done some prettify-ing of the documentary “raw” footage don’t even register. I’ve had to swim upstream through a cascade of poor responses to things I’ve done in public since then – they don’t get at me. It took some time to get to this place of comfort, which is why I bring it up. For people who think of me as some bulletproof soul, let it be known that “even I” had to work up to that level, even when sitting on something like BBS Documentary and years of accomplishment. And those randos? Never heard from them again.

The interview style I used in the documentary raw footage should be noted because it’s deliberate: they’re conversations. I sometimes talk as much as the subjects. It quickly became obvious that people in this situation of describing BBS history would have aspects that were crystal clear, but would also have a thousand little aspects lost in fuzzy clouds of memory. As I’d been studying BBSes intensely for years at this point, it would often take me telling them some story (and often the same stories) to trigger a long-dormant tale that they would fly with. In many cases, you can see me shut up the second people talk, because that was why I was talking in the first place. I should have known people might not get that, and I shouldn’t have listened to them so long ago.

And from these conversations come stories and insights that are priceless. Folks who lived this life in their distant youth have all sorts of perspectives on this odd computer world and it’s just amazing that I have this place and collection to give them back to you.

But it will still need your help.

Here’s the request.

I lived this stupid thing; I really, really want to focus on putting a whole bunch of commitments to bed. Running the MiniDV recorder is not too hard for me, and neither is the basic uploading process, which I’ve refined over the years. But having to listen to myself for hundreds of hours using whatever time I have on earth left… it doesn’t appeal to me at all.

And what I really don’t want to do, beyond listening to myself, is enter the endless amount of potential metadata, especially about content. I might be inspired to here and there, especially with old friends or interviews I find joyful every time I see them again. But I can’t see myself doing this for everything and I think metadata on a “subjects covered” and “when was this all held” is vital for the collection having use. So I need volunteers to help me. I run a Discord server that communicates with people collaborating with me and I have a bunch of other ways to be reached. I’m asking for help here – turning this all into something useful beyond just existing is a vital step that I think everyone can contribute to.

If you think you can help with that, please step forward.

Otherwise… step back – a lot of BBS history is about to go online.

 


The Undiscovered

Published 19 Jan 2018 by Jason Scott in ASCII by Jason Scott.

There’s a bit of a nuance here; this entry is less about the specific situation I’m talking about, than about the kind of situation it is.

I got pulled into this whole thing randomly, when someone wrote me to let me know it was going along. Naturally, I fired into it all with all cylinders, but after a while, I figured out very good people were already on it, by days, and so I don’t actually have to do much of anything. That works for me.

It went down like this.

MOS Technology designed the 6502 chip which was in a mass of home computers in the 1970s and 1980s. (And is still being sold today.) The company, founded in 1969, was purchased in 1976 by Commodore (they of the 64 and Amiga) and became their chip production arm. A lot of the nitty gritty details are in the Wikipedia page for MOS. This company, now a subsidiary, lived a little life in Pennsylvania throughout the 1980s as part of the Commodore family. I assume people went to work, designed things, parked in the parking lot, checked out prototypes, responded to crazy Commodore administration requests… the usual.

In 1994, Commodore went out of business and its pieces bought by various groups. In the case of the MOS Technology building, it was purchased by various management and probably a little outside investment, and became a new company, called GMT Microelectronics. GMT did whatever companies like that do, until 2001, when they were shut down by the Environmental Protection Agency because it turns out they kind of contaminated the groundwater and didn’t clean it up very well.

Then the building sat, a memory to people who cared about the 6502 (like me), to former employees, and probably nobody else.

Now, welcome to 2017!

The building has gotten a new owner who wants to turn the property into something useful. To do this, they basically have to empty it, raze the building the ground, clean the ground, and then build a new building. Bravo, developer. Remember, this building has sat for 16 years, unwanted and unused.

The sign from the GMT days still sits outside, unchanged and just aged from when the building was once that business. Life has certainly gone on. By the way, these photos are all from Doug Crawford of the Vintage Computing Federation, who took this tour in late 2017.

Inside, as expected, it is a graffiti and firepit shitshow, the result of years of kids and others camping out in the building’s skeletal remains and probably whiling away the weekends hanging out.

And along with these pleasant scenes of decay and loss are some others involving what Doug thought were “Calcium Deposits” and which I personally interpret as maybe I never need to set foot in this building at any point in my future life and probably will have to burn any clothing I wear should I do so.

But damn if Doug didn’t make the journey into this environmentally problematic deathtrap to document it, and he even brought a guest of some reknown related to Commodore history: Bil Herd, one of the designers of the Commodore 128.

So, here’s what I want to get to: In this long-abandoned building, decades past prime and the province of trespassers and neglect, there turns out to have been quite a bit of Commodore history lying about.

There’s unquestionably some unusually neat items here – old printed documentation, chip wafers, and those magnetic tapes of who knows what; maybe design or something else that needed storage.

So here’s the thing; the person who was cleaning up this building for demolishing was put into some really weird situations – he wanted people to know this was here, and maybe offer it up to collectors, but as the blowback happened from folks when he revealed he’d been throwing stuff out, he was thrown into a defensive position and ultimately ended up sticking with looking into selling it, like salvage.

I think there’s two lessons here:

  1. There’s no question there’s caches of materials out there, be they in old corporate offices, warehouses, storerooms, or what have you, that are likely precious windows into bygone technology. There’s an important lesson in not assuming “everything” is gone and maybe digging a bit deeper. That means contacting places, inquiring with simple non-badgering questions, and being known as someone interested in some aspect of history so people might contact you about opportunities going forward.
  2. Being a shouty toolbox about these opportunities will not improve the situation.

I am lucky enough to be offered a lot of neat materials in a given month; people contact me about boxes, rooms and piles that they’re not sure what the right steps are. They don’t want to be lectured or shouted at; they want ideas and support as they work out their relationship to the material. These are often commercial products now long-gone and there’s a narrative that old automatically means “payday at auction” and that may or may not be true; but it’s a very compelling narrative, especially when times are hard.

So much has been saved and yes, a lot has been lost. But if the creators of the 6502 can have wafers and materials sitting around for 20 years after the company closed, I think there’s some brightness on the horizon for a lot of other “lost” materials as well.


User Dictionaries – a Fundamental Design Flaw

Published 18 Jan 2018 by Andy Mabbett in Andy Mabbett, aka pigsonthewing.

I have just had to add several words to the user dictionary for the spell-checker in Notepad++, that I have already added to my user dictionary in LibreOffice, and to my user dictionary in (all under Windows 10 – does this happen with user dictionaries under Unix & Mac operating systems?).

Notepad++ spell-checker, not recognising the word 'Mabbett'

Under , a user should not have to accept a word’s spelling more than once.

User dictionaries should not be in a “walled garden” within an application. They should exist at operating-system level, or more specifically, at user-account level.

Or, until Microsoft (and other operating system vendors) implement this, applications — at least, open source applications like those listed above — should make their user dictionaries accessible to each other.

Some issues to consider: users with dictionaries in more than one language; security.

Prior art: I raised a Notepad++ ticket about this. It was (not unreasonably) closed, with a pointer to this DSpellCheck ticket on the same subject.

The post User Dictionaries – a Fundamental Design Flaw appeared first on Andy Mabbett, aka pigsonthewing.


User Dictionaries – a Fundamental Design Flaw

Published 18 Jan 2018 by Andy Mabbett in Andy Mabbett, aka pigsonthewing.

I have just had to add several words to the user dictionary for the spell-checker in Notepad++, that I have already added to my user dictionary in LibreOffice, and to my user dictionary in (all under Windows 10 – does this happen with user dictionaries under Unix & Mac operating systems?).

Notepad++ spell-checker, not recognising the word 'Mabbett'

Under , a user should not have to accept a word’s spelling more than once.

User dictionaries should not be in a “walled garden” within an application. They should exist at operating-system level, or more specifically, at user-account level.

Or, until Microsoft (and other operating system vendors) implement this, applications — at least, open source applications like those listed above — should make their user dictionaries accessible to each other.

Some issues to consider: users with dictionaries in more than one language; security.

Prior art: I raised a Notepad++ ticket about this. It was (not unreasonably) closed, with a pointer to this DSpellCheck ticket on the same subject.

The post User Dictionaries – a Fundamental Design Flaw appeared first on Andy Mabbett, aka pigsonthewing.


Edit existing wiki page with libreoffice writer

Published 18 Jan 2018 by Rafiek in Newest questions tagged mediawiki - Ask Ubuntu.

I've read about sending a page to a mediawiki using libre writer.
But is it possible to call up an existing wiki page, edit it and send it back to the wiki?
If so, how is this done?


Use remote Tomcat/Solr for BlueSpice ExtendedSearch

Published 15 Jan 2018 by Dominic P in Newest questions tagged mediawiki - Webmasters Stack Exchange.

Is it possible to configure the BlueSpice ExtendedSearch extension to connect to a remote Apache Tomcat/Solr instance instead of installing all of that on the same machine that runs BlueSpice?

I looked through the install guide for ExtendedSearch, but I couldn't find any mention of this as an option.

Any ideas?


THIS IS NOT A WELLNESS BLOG

Published 15 Jan 2018 by timbaker in Tim Baker.

I have found myself recently in the unfamiliar and uncomfortable position of defending natural therapies. This is not a role I ever foresaw for myself. I understand the rigorous scientific process of developing and testing theories, assessing evidence and requiring proof. I studied science into...

New year, new tool - TeraCopy

Published 12 Jan 2018 by Jenny Mitcham in Digital Archiving at the University of York.

For various reasons I'm not going to start 2018 with an ambitious to do list as I did in 2017 ...I've still got to do much of what I said I was going to do in 2017 and my desk needs another tidy!

In 2017 I struggled to make as much progress as I would have liked - that old problem of having too much to do and simply not enough hours in the day.

So it seems like a good idea to blog about a new tool I have just adopted this week to help me use the limited amount of time I've got more effectively!

The latest batch of material I've been given to ingest into the digital archive consists of 34 CD-ROMs and I've realised that my current ingest procedures were not as efficient as they could be. Virus checking, copying files over from 1 CD and then verifying the checksums is not very time consuming, but when you have to do this 34 times, you do start to wonder whether your processes could be improved!

In my previous ingest processes, copying files and then verifying checksums had been a two stage process. I would copy files over using Windows Explorer and then use FolderMatch to confirm (using checksums) that my copy was identical to the original.

But why use a two stage process when you can do it in one go?

The dialog that pops up when you copy
I'd seen TeraCopy last year whilst visiting The British Library (thanks Simon!) so decided to give it a go. It is a free file transfer utility with a focus on data integrity.

So, I've installed it on my PC. Now, whenever I try and copy anything in Windows it pops up and asks me whether I want to use TeraCopy to make my copy.

One of the nice things about this is that this will also pop up when you accidentally click and drop a directory into another directory in Windows Explorer (who hasn't done that at least once?) and gives you the opportunity to cancel the operation.

When you copy with TeraCopy it doesn't just copy the files for you, but also creates checksums as it goes along and then at the end of the process verifies that the checksums are the same as they were originally. Nice! You need to tweak the settings a little to get this to work.


TeraCopy busy copying some files for me and creating checksums as it goes


When copying and verifying is complete it tells you how many files it has
verified and shows matching checksums for both copies - job done!

So, this has made the task of copying data from 34 CDs into the digital archive a little bit less painful and has made my digital ingest process a little bit more efficient.

...and that from my perspective is a pretty good start to 2018!

New year, new tool - TeraCopy

Published 12 Jan 2018 by Jenny Mitcham in Digital Archiving at the University of York.

For various reasons I'm not going to start 2018 with an ambitious to do list as I did in 2017 ...I've still got to do much of what I said I was going to do in 2017 and my desk needs another tidy!

In 2017 I struggled to make as much progress as I would have liked - that old problem of having too much to do and simply not enough hours in the day.

So it seems like a good idea to blog about a new tool I have just adopted this week to help me use the limited amount of time I've got more effectively!

The latest batch of material I've been given to ingest into the digital archive consists of 34 CD-ROMs and I've realised that my current ingest procedures were not as efficient as they could be. Virus checking, copying files over from 1 CD and then verifying the checksums is not very time consuming, but when you have to do this 34 times, you do start to wonder whether your processes could be improved!

In my previous ingest processes, copying files and then verifying checksums had been a two stage process. I would copy files over using Windows Explorer and then use FolderMatch to confirm (using checksums) that my copy was identical to the original.

But why use a two stage process when you can do it in one go?

The dialog that pops up when you copy
I'd seen TeraCopy last year whilst visiting The British Library (thanks Simon!) so decided to give it a go. It is a free file transfer utility with a focus on data integrity.

So, I've installed it on my PC. Now, whenever I try and copy anything in Windows it pops up and asks me whether I want to use TeraCopy to make my copy.

One of the nice things about this is that this will also pop up when you accidentally click and drop a directory into another directory in Windows Explorer (who hasn't done that at least once?) and gives you the opportunity to cancel the operation.

When you copy with TeraCopy it doesn't just copy the files for you, but also creates checksums as it goes along and then at the end of the process verifies that the checksums are the same as they were originally. Nice! You need to tweak the settings a little to get this to work.


TeraCopy busy copying some files for me and creating checksums as it goes


When copying and verifying is complete it tells you how many files it has
verified and shows matching checksums for both copies - job done!

So, this has made the task of copying data from 34 CDs into the digital archive a little bit less painful and has made my digital ingest process a little bit more efficient.

...and that from my perspective is a pretty good start to 2018!

Keeping it real

Published 4 Jan 2018 by jenimcmillan in Jeni McMillan.

DSC_0048

It’s true, cities are not places for wild goats. It’s difficult to reflect amongst the chaos of a human built landscape, unless it’s on our generation of narcissists. This is not personal. Who hasn’t taken a gratuitous selfie once in a while? I’m right in there, fudging the edges with art in my heart. You know me, I adore a good self-portrait, usually without clothes.

Now that I have a backpack instead of a room and a bank account that dives gracefully toward the abyss, I’ve crossed borders and fallen in love with a number of foreign places. All on the cheap. Hitch-hiking. Sleeping in my tent. Washing under greek waterfalls or in post-soviet sauna houses. Wherever I find myself, there are people with smartphones. We Insta and Facebook, Gab, Google+, MySpace, LinkedIn, Pinterest, Reddit, Tumblr, Twitter, Viber, VK, WeChat, Weibo, WhatsApp, Wikia, Snapchat and YouTube. Sometimes we even email friends who are detoxing from social media overload. Then we write blogs.

Yesterday I went to The Winter Place of Peter the Great. Yes, I’m in St Petersburg where people rarely smile, unless they are really happy. That can be infuriating but somehow, in a world of manufactured happiness and political turmoil, perhaps it is a good thing.


Looking forwards, looking backwards (2017/2018)

Published 2 Jan 2018 by inthemailbox in In the mailbox.

Yesterday, 1 January, I was reminded by the British Museum that the month is named after Janus, the Roman god. The International Council on Archives (ICA) uses a stylised form of Janus for their logo, because archivists do the same thing. Archives are identified and appraised based on their ongoing value to the community and to the organisations and people that created them. Over time, they develop historicity, which leads to the common, but mistaken, belief that archives are about “old stuff”.

January 1 is also the time for looking back over the previous year, and making resolutions about the forthcoming year. Personally, I think the latter is foolish, because I ascribe to the aphorism that no plan survives contact with reality, and 2017 demonstrates that perfectly.

I started with grand plans for a new blog on the Pie Lunch Lot, my mother’s and her cronies answer to what we now call the FIFO lifestyle, without benefit of modern social media. This would mean that I would take charge of my personal archives, and work within an historian’s framework. Yeah, right.

Blogs on this site were also few and far between. I left Curtin, and the luxury of reading and reviewing articles as part of my work there. Back at SRO, I’ve been involved with archival description and with developing our archival management system. This has had congruences with my private work, including  a poster at the Association of Canadian Archivists conference in Ottawa (Disrupting description – Canada3) and developing a workshop on archival description for the ASA conference in Melbourne (of which more another time).

I also became the program chair for the ASA conference in Perth in 2018 – “Archives in a Blade Runner age”, which has led to the creation of another blogsite, this one on science fiction and archives. (Don’t forget to submit an abstract before February 28, and, yes, there will be no extensions.) And, I became a National Councillor for the ASA, which has its own steep learning curve.

Add in the usual chaos that is life, and there you have it. 2017 not as planned, 2018 already out of control 🙂


Looking forward to 2018

Published 24 Dec 2017 by Bron Gondwana in FastMail Blog.

This is the final post in the 2017 FastMail Advent Calendar. In the previous post we met Rik.

We’ve done it! 24 blog posts, one per day.

To begin this aspirational blog post, a goal. We plan to post more frequently overall next year. At least one post every month.

This should be fairly easy since we have a Pobox blog, a Topicbox blog and of course this FastMail blog.

One company

In 2018 we will continue the process of becoming a single company where everybody “owns” all our products, rather than two separate companies flying in close formation, each with their own products. Rik is driving the process of integrating development while Rob N ★ leads merging of operations.

We no longer wake somebody with automated notifications if there’s a person awake in a different timezone who can perform triage, leading to better sleep for everybody. We’re also distributing first-line support between the offices, and training support staff in all our products, for a closer working relationship between the people seeing emerging issues, and the people responding to them.

Our 4 products have their own branding, but internally we’re becoming a single team who love all our children equally (ssssh … I think we each still have our favourite)

Settling in to our new digs

FastMail Melbourne moved to a new office in the middle of the year, and the process was not entirely painless.

Special mention to Jamie who somehow didn’t go mad through all of this. What a welcome to the company – he’s just coming up to the completion of his first year with us, and when I asked him to take point on the office fit-out, I had no idea what I was asking him to do. I’m sure he had no idea either, or he wouldn’t have said yes!

Our office is a lovely space, just 50 metres from our old office, so we can still go to our favourite coffee places in the morning! We have a favourite place we normally go, but we can be fickle – if their coffee isn’t up to our snobby standards, Melbourne has plenty of nearby hipster options just waiting for our custom. This year we’ve mostly dragged people who used disposable paper cups into bringing reusable keep-cups instead. Reusable keep-cups are totally Melbourne.

The morning coffee run is our main regular social gathering, and a highlight of the day. Even non-coffee-drinkers join us for the walk and chat.

Improving our products

The world keeps changing, and no product can keep still and stay successful. But we’re not just reacting, we have plans for new features too!

Next year we will keep polishing Topicbox based on feedback from our early adopters. We also have some neat ideas for new features which will make it even more compelling for teams working together.

FastMail hasn’t seen many user-visible changes in the past year, largely because we’ve been focusing on getting JMAP finished and the interface ported over to use it. 3 Years since our first blog post about JMAP, we’re really close to a finished mail spec. 2018 will be the year of JMAP on the desktop, and then we can start adding new features that build upon the new protocol.

More and more people are accessing our products primarily on mobile devices. We have neglected our apps in 2017, and we will remedy that in 2018. Mobile experience is an explicit focus for us in the coming year, and we’ve engaged outside help to assist with our app development.

Continuing to support Open Source and the community

We fund large amounts of the development work going into the Cyrus IMAP server, as well as the many other open source projects we work on.

We have sponsored various conferences in the past year, and provide free email service to some groups that we feel are well aligned with our mission, like the Thunderbird project, one of the most well known open source email clients.

And of course we send people, and give our time to standards work and collaboration at IETF, M3AAGW and CalConnect.

Pragmatism

This is always the most interesting thing to me when I follow discussions about issues that affect us and our customers. Privacy and security are key features for everybody, as are usability and speed. Ideally, as a customer, these things are invisible. You only notice speed when things get slow. You only notice usability when you’re struggling to achieve your goals. You only notice privacy and security when it turns out you didn’t have them.

Neil wrote a great post earlier in this advent series about our mindset around security. Security and usability are frequently in opposition – the most secure computer is one which is turned off and unplugged from the network. The problem is, it’s easy to believe that something is more secure just because it’s harder to use – that is rarely true.

For example if you pop up dialogs all the time to ask users to make security decisions, they will just click “Yes” without reading and actually be less secure than if asked rarely. Our preferred interaction is to perform the requested action immediately, but make undo simple, so the common case is painless. We also provide a restore from backup feature which allows recovery from most mistakes.

As we review our systems for GDPR compliance next year, we will have pragmatic and effective security in mind.

To 2018

The advent calendar is over, Christmas is almost here, and the New Year just a week away. 2018 will be an exciting year for us.

Thank you again for reading, and for your loyal support throughout 2017. We depend upon the existence of people who are willing to pay to be the customer rather than the product. We’re not the cheapest option for your email, but we firmly believe we are the best. We love what we do, and we love the direct relationship with our customers, payment for service provided.

FastMail the company is growing strongly. We have great people, great products, great customers, and funky new t-shirts.

Cheers! See you next year :)


Team Profile: Rik

Published 23 Dec 2017 by Helen Horstmann-Allen in FastMail Blog.

This is the twenty-third post in the 2017 FastMail Advent series. The previous post was about our repsonse to the GDPR. We finish the series with a post looking forward to next year.


2017 has been a year of big changes for FastMail, team-wise. As Bron Gondwana stepped up to CEO, the role of CTO has been taken on by one of our new American colleagues, Ricardo Signes. We picked him up in 2015 when we acquired Pobox and Listbox, and he’s spent the bulk of his time since then building our newest team email product, Topicbox. Get to know Rik!

Photo of Ricardo Signes

What I work on

Historically, I have been the primary programmer on Pobox and Listbox, and I did a lot of work in the last few years building the framework of Topicbox. But nowadays, I spend most of my time coordinating the work of the development and operations teams, figuring out who’s doing what and whose work might be blocking whom, so that people aren’t sitting frustrated at their desks.

As CTO, I balance the technology requirements across different groups. Generally we don’t have people who want contradictory things, but sorting out work often requires determining invisible pre-requisites and doing that work first. It requires figuring out the way to get from here to there… And preferably after I’ve already figured out what people are likely to want next.

Figuring out what people will want next is often a natural by-product of talking to people about what they want. As we take things from the abstract to the concrete, I try to stay focused on the goals (and really understanding them!) rather than the suggested technical implementation they’ve requested. Time is often a consideration; a lot of times, just keeping in mind the next logical iteration of the solution you can get today is all the plan for the future you need.

How did you get involved with FastMail?

They bought me? I got involved with Pobox in 2005 when Dieter Pearcey heard me saying I was looking for somewhere else to hang my hat. He and I had debugged some problems earlier that year on IRC, so when he told me to apply, I did. About 8 years later, I met Bron at OSCON. We were having a beer when super-connector Paul Fenwick realized we worked at Pobox and FastMail, respectively, and asked if we were going to brawl. We did not; we ended up discussing the common problems and solutions of our load balancers and user migrator implementations. About a year after that, we started the long process of acquisition. A year after that, it happened. 16 months after that, I was the CTO.

I took a photo at the time, recording our meeting for posterity.

Bron and RJBS

What’s your favourite thing you’ve worked on this year?

In technical terms, it’s Topicbox. When building Topicbox, we treated it like a greenfield project. We didn’t reuse many of our standard components, but the technical decisions I made were based on years of using our tools and thinking about how I would do it if I did it from scratch. As many of those plans were borne out in successful technical solutions, it was really rewarding — a pleasure to build and see used.

But, more than that, I have loved organizing the technical team. It’s a really talented group of people, with many unique areas of expertise. Orchestrating all of them requires having a handle on what everyone is doing. Doing it successfully also requires I have at least a basic understanding of what everyone is working on. It is either an excuse or demand for me to be learning a little more all the time, which is great! It forces me to get off the same old path.

What’s your preferred mobile platform?

I use iOS. I don’t have really strong feelings about it. It has a bunch of things I prefer, but … I’ll use anything that has Spotify.

What other companies, people or projects inspire you?

The Rust language project is really inspirational. Like most technical projects, they’ve always striven to have a technically excellent product, but they also decided early on that they were unwilling to compromise on their community to get it. So, the community is not the toxic, or even merely odious, community that you can get in other projects with a lesser commitment to community.

Julia Evans, who is an endless prolific source of interesting and instructive material, who is always positive in attitude, is the kind of technical role model I aspire to be. She says my favorite thing, which is that computers are not magical; you can start from first principles and figure out what is going on, always.

Companies are impressive for lots of reasons, but I’m pleased when companies doing interesting work make it clear what their values are, especially when you can see it’s true. They make it clear they have a direction beyond just making money. They promote that the direction they’ve chosen has value to them. They make it easy to guess what it would be like to work there, and what kind of work and behavior would be rewarded. Netflix and Stripe are two examples that come to mind; I hope I do my part to expose a similar ethos here at FastMail.

What’s your favourite FastMail feature?

I like FastMail’s push support, because it makes clear that FastMail really is fast. It looks like a super simple feature, but the technical details are way more complicated than they should be. It’s technically interesting and you can always get Rob N to tell a good story about it!

My favorite Pobox feature is the RSS feed of spam reports, which lets you review the small amount of mail Pobox isn’t quite sure about. I like it both because RSS is something that I wish had gotten wider adoption, and because I like having it in a separate place than my email or the web (which are the two other places you can review it.)

My favorite Topicbox feature is organization-wide search! Topicbox makes it easy for team members to create new groups, which is awesome for making sure all the right people are included in a discussion. But as soon as you start having enough information that you can’t see in one screen, you want to search for it. The Topicbox search technology is based on FastMail’s, so it’s fast, thorough, and easy to refine. You find the full thread… and the conclusion. Organization-wide search is, to me, the best reason to move your organization’s email discussions to Topicbox. (And, yes, we can help you import from an archive or a even a personal mailbox!)

What’s your favourite or most used piece of technology?

My bicycle! It embodies everything I think of as technology. It lets you solve a problem that you could probably solve without it, but much more efficiently. It also rewards curiosity. You don’t need to know how it works to use it. But it’s really easy to learn how to take it apart, fix it, and make it better. Also, like most of the technology I like, I don't use it as often as I'd like.

This isn't my bike. It's a photo I took while on a trip to the Australian office. It's a sculpture of three bicycles occupying the same space!

Three bicycles

What are you listening to / watching these days?

I’m finally catching up on Song-by-Song podcasts, which discusses every Tom Waits song, one per episode. But that means I’m listening to a lot of Tom Waits again too. It’s good to listen to full albums!

We talk a lot about music at FastMail, and we’ve gotten almost everyone on Spotify. We have a bot who tracks people’s Discover Weekly playlists, looking for duplicates, and determining who has compatible (and diverse!) musical tastes. I’ve found a bunch of good music that I wouldn’t have heard before because staffers have been listening. I also know who has consistent enough taste that I know I can always hit up their weekly playlist for, say, synth pop and 80s hits (Rob Mueller!).

What do you like to do outside of work?

I do coding projects outside of work, too, though less this year than in years past. I used to manage the perl5 project for many years, but now I'm just an active member of the peanut gallery.

I watch a lot of movies. I talk a lot about the horror movies I watch because they are the funniest to discuss, but I actually watch plenty of movies across all genres.

I run a D&D game, and I’ve been playing a lot of Mario Kart on my Nintendo Switch.

What's your favourite animal?

I used to have pet guinea pigs, so they’re up there! They’re my favorite animal that I would actually consider palling around with. But I’m also a fan of any animal that is really bizarre in some way. It reminds you that evolution goes in a lot of crazy ways.

Any FM staffers you want to brag on?

Everybody’s great! If I was going to call somebody out in particular, though, it would be Bron. We had reached an inflection point in terms of scale, where we needed to rethink the way we organized our work. Bron stepped up to make that happen, and we’re all better off for it.

What are you proudest of in your work?

In my technical work, over many years, I’m proudest we’ve been able to use a slowly evolving set of patterns without finding out they were fundamentally bankrupt. With Topicbox, we were able to test that theory in the biggest way — we started from scratch using those patterns as first principles, and it worked. So that was really rewarding.

On a larger scale than that, it’s always a blast to meet people in a professional setting who have heard of FastMail or Pobox. They will be excited to talk about what they know of the company, and often tell me they think it would be an exciting and great place to work. In large part, that's because of people and culture we have, and I’m proud to have been part of making that the case!


GDPR: European Data Protection

Published 22 Dec 2017 by Bron Gondwana in FastMail Blog.

This is the twenty-second post in the 2017 FastMail Advent Calendar. The previous post was about our new monitoring infrastructure. In the next post we meet Rik, our intrepid CTO.


Some of you may already be aware of the upcoming GPDR legislation. We’ve certainly been getting support tickets and the occasional tweet asking about our plans.

General Data Protection Regulation

The GDPR is a European regulation which affects the processing and collection of data about all European residents, no matter where they are in the world, as well as data about any human physically present in Europe, regardless of residency.

In short – the GDPR affects almost everybody in the world, since Europeans are prone to traveling. It definitely affects FastMail, who sell our services worldwide, and have many customers in the EU.

The big scary part of the GDPR is the fines for non-compliance – 4% of global revenue or $20,000,000 per offense, whichever is greater. They’re not playing around.

FastMail’s products have features that make us both a data controller and a data processor under the definitions of the GDPR.

The GDPR takes force on 25 May 2018, and FastMail intends to be ready.

Australian advice

Australia already has very strong privacy laws, which we take seriously. The Office of the Australian Information Commissioner gave guidance about GDPR for Australian businesses earlier this year, which details similarities and differences between the two laws.

The good news is that we can be GDPR-compliant without a conflict of law. Sadly this isn’t always the case in international law – there exist cases where a person can have no option that does not result in them committing a crime according to somewhere in the world.

In this case, it looks like Australia will be following Europe’s lead, with new laws like the Notifiable Data Breaches scheme coming into effect next year.

Interesting questions

While most parts of the GDPR are good and we implement them already, the European right to be forgotten raises interesting questions about who owns information about a person. Fairly clearly for our FastMail product, the private mailbox of a person is their own personal electronic memory and an email you sent somebody doesn’t count as personal data that we, FastMail the company, hold about you. You shouldn’t be able to take that email back, certainly not just by asking us to do it.

On the other hand, Topicbox groups can be public. Clearly public groups archives could be abused to host spam, phishing, or other nasties. The exact same issue already exists for files published as user websites.

Published information might need to be taken down - due to terms of service violation, DMCA request, GDPR-covered privacy request, or any other legal method. The tension between maintaining an accurate immutable record and allowing permanent removal of material that should never have been published is very real.

Finally, backups contain data for a time after it’s been deleted. Shredding every copy of something is actually really tricky, and guaranteeing that every possible copy has been removed is a tradeoff as well. I have personally dealt with an account for somebody who had obtained power of attorney for his father who was no longer able to remember very well. The father’s email account at FastMail had been unused and unpaid for long enough that it had expired and the backups had been removed. It was very hard to tell this person that they had lost important family data – for somebody who had been a loyal FastMail customer for 10 years none less.

Shredding backups is not always the right choice. We now keep backups longer for lapsed accounts – those where the user had not explicitly asked us to delete anything, than for accounts where the user chooses to close the account. And yet … I've still had ex-users ask if I can dig up an old backup because they forgot to copy some important message before closing their account!

Supporting FastMail’s values

We blogged last year about our values. The GDPR’s requirements about privacy and consent to store and use data are very compatible with our values: “Your data belongs to you” and “We are good stewards of your data”.

We’re working on our support processes to make consent more explicit if we access your account to help you with an issue. As we audit our processes for GDPR next year, we will continue to focus on practical and usable methods to maintain our customers’ privacy.


Monitoring FastMail with Prometheus

Published 21 Dec 2017 by Rob N ★ in FastMail Blog.

This is the twenty-first post in the 2017 FastMail Advent Calendar. The previous post was a design spotlight on the Topicbox logo. The next post is about our response to the GDPR.


Monitoring is a critical part of any complex system. If you’re not monitoring your system properly, you can’t be sure that it’s working the way it’s supposed to, and it’s hard to figure out what’s going on when there is a problem.

For most of FastMail's life, it's been monitored by a set of scripts that run regularly on each machine, look for anything that seems wrong and if a problem is found, report it. Most of the time that's by sending an email for engineers to look into later but in extreme cases, they would send an SMS and demand immediate attention.

These tests are fairly simple. The example I usually use is about disk space. Every time the monitor scripts run, they would check the amount of space free on each mail disk. If it's more than 92% full, it would emit a warning. More than 95%, it would wake someone.

The problem with this simple test is that it doesn't really say much about why the disk is filling or what the correct remedy is.

All of these cases are important, but not all demand immediate attention, and all have different methods to resolve the problem. But we've just woken someone and told them "the disk is nearly full", and it's up to them to figure out what's going on.

As one of the lucky people that would get woken up by this, it wasn't long before I started to get a sense of what information we didn't have that could really help here.

These are hard questions to answer with a simple per-host "is anything wrong right now?" check. Importers and deliver happen on different hosts to where the mail is stored. Tracking rates of change means you need to know previous values, which means some storage for the check scripts. To improve things, we needed something new.

A central metrics store

During 2015 we started looking seriously at a product or service that could do the job for us. There were lots of possibilities, each with pros and cons. Things we particularly wanted:

We did a few experiments, and eventually settled on Prometheus, which ticks pretty much all of these boxes.

Prometheus is what's known as a time-series database, which basically means that it stores the value of different things at a given point in time. Each thing it stores is called a "metric", and has a numeric value.

Using our disk usage example again, we might store two values: the total size of a disk, and the amount of free space. So our metrics (in the Prometheus data format) might look like:

disk_size   1968480002048
disk_free   1236837498880

Now we have these raw values stored, we can use Prometheus' query language to understand more about these values. For example, we could use a simple query to get a "percent full" value:

100 - 100 * disk_free / disk_size     37.16789108382110

Because Prometheus is constantly checking and storing these values as they change, we can also do more advanced things based on the history. For example, we can use the deriv() function to find out how much the space on this disk changes each second (based on the last 15 minutes worth of values):

deriv(disk_free[15m])   -3205234.3553299494

We can also use a separate product, Grafana, to graph these values. This is a very boring one of this disk value:

Grafana disk usage graph

There's loads more things we can ask Prometheus about our metrics, and Grafana can graph pretty much everything we can query. Prometheus also comes with an alerting component, which can send emails, SMS messages and other stuff based on the results of queries. It's a fantastic piece of software and we're very happy with it.

Gathering metrics

It's all very well having a place to store all these metrics, but we still have to get them out of our systems. Prometheus' model is very simple: every few seconds, it connects to very simple web servers all over your network and requests a list of all the metrics they have to give.

The idea is that all of your internal services will have a HTTP port and be able to provide metrics about what they're doing, but that's not always possible. Some software is too simple to carry its own metrics, or does do its own metrics but presents them in some different way (often logfiles).

So, the Prometheus project and wider community have produced a huge number of "exporters", which interrogate applications or the OS to collect information about them, and then present those to Prometheus. We use a number of off-the-shelf exporters for off-the-shelf software we use (eg node_exporter for the OS, mysql_exporter for MySQL, etc), and we've written our own for a few things where off-the-shelf exporter didn't exist or didn't do what we wanted (PowerDNS, Postfix, tinydns, etc).

The most common style of exporter we have at the moment is one that monitors an application logfile, extracts information about events that occurred and presents them to Prometheus as counts. Many of our existing monitoring scripts already did that, and most of our own internal applications log lots about what they're doing, so it's been a very good transition step.

One of the most interesting applications for us, of course, is the Cyrus mail server, the place where your mail is stored, searched and accessed. When we started with Prometheus there was no good way to see inside and find out what it was doing. It does log lots about what it's doing, and we do watch those logs for problems, but there's so many things it does internally that we don't have a lot of visibility on. One of the things about gathering metrics is that you want information available when you have new questions to ask, new things you want to know. So we wanted to bring out as much information from Cyrus as we could, much more than what was currently available in the logs.

Adding Prometheus metrics to Cyrus

So I sat down with Ellie, my local Cyrus developer, and had a chat about it.

It turns out that Cyrus once had SNMP support. SNMP is an older protocol for remotely monitoring and managing things. It's still in wide use with network equipment. Cyrus still had support for it, but large parts of it didn't appear to work. Ellie had been keen to understand it for a while and either fix it or remove it, and this seemed like a great opportunity.

We spent an afternoon pooling our knowledge, me with my basic understanding of how Prometheus worked and what Cyrus is like to operate, her with her knowledge of Cyrus internals and what kind of monitoring other users were asking for, and worked out plan.

Cyrus already has a HTTP server (for CalDAV, CardDAV, etc) so getting it to serve metrics was not complicated. The toughest part is actually around Cyrus' architecture. The Cyrus model is a single process that accepts incoming connections, which it then hands off to another process for service. There's no central coordinating component for these processes; they just do their thing and use filesystem locks to make sure they don't get in each others' way when accessing mail data. That's part of how it gets its performance, so that’s great. The downside is that all these worker processes need to record metric events, so something somewhere needs to collect all the stats coming back from each process and combine it.

To do that, Ellie modified each Cyrus process to write its own stats out to a file, and then created a new Cyrus component called promstatsd which collects all of these, combines them and prepares a metrics data file ready for Prometheus to come and fetch.

As for the metrics themselves, we chose to start with a few that already had internal counters inside Cyrus: active connection counts, message delivery counts and IMAP command counts. These aren't things we currently pay a lot of attention to but they might be useful in the future (as noted above, we want to collect everything we can, just in case) but Cyrus already knows them so it makes it easier to focus on adding the metrics support. They're also easy to test, since they are incremented when simple actions are taken: login, send a message, etc.

Cyrus is written in C, which is not widely used for server development anymore, so there were no good Prometheus metrics libraries available to use. Ellie wrote a very nice little metrics implementation for Cyrus that is designed to perform well and make it easy to add new metrics without adding overhead.

Put it all together, and we can now use Prometheus and Grafana to see what's happening inside Cyrus. For example, here's a graph of the number of active IMAP connections to a single Cyrus shard:

Grafana IMAP connection graph

Now that we have the basic framework in place, we have a lot more metrics we want to add, particularly for the httpd server that will be used for JMAP, and improvements to the way the metrics are expressed to Prometheus to give us more querying options (eg, only count CalDAV requests, excluding CardDAV requests).

Adding support to Cyrus has been pretty exciting, partly because of the general excitement of being able to better understand what our servers are doing, but also because it's really interesting to keep updating this ancient project to support new ways of working.

What's next?

We won't stop! We still have a long way to go to get all of our monitoring and alerting moved into Prometheus, but every week we move one or two little things over, or add one or two little new things, often finding minor problems or quirks that we didn't know about but can now see, and thus fix. Every time we do this we smile, because we learn a little more about how our systems really work and feel a little more confident that they're doing what they're supposed to. And with everything we know about our systems all available in one central place that everyone can see and work with, we’ll be able to sleep easy at night knowing that everything is running smoothly and any problems will be quickly noticed.


Design spotlight: how the Topicbox logo came to be

Published 20 Dec 2017 by Jamie Toyama in FastMail Blog.

This is the twentieth post in the 2017 FastMail Advent series. The previous post showed how you can easily set up your iPhone with your camera. The next post is about our new monitoring infrastructure.


Many of you would have seen our new Topicbox logo, but did you ever wonder how it came to be?

Before I permanently joined the FastMail Team, I had already been working with David and Neil to help out with some design work. This started with a FastMail website refresh and led to some preliminary work for an exciting new group email product, which eventually became Topicbox.

As we were creating a product and a brand identity from scratch it gave me the freedom to really help shape the design elements and in this post I’m going to concentrate on the Topicbox logo.

Logo design – or really, any general design process – will follow a number of collaborative steps to ensure the brief can be successfully achieved:

1. Kick off meeting

A logo should be distinctive, memorable and an expression of the company it represents and every good design project starts with a kick-off meeting.

I wanted to learn as much as possible from the project team (initially Helen, Neil and David) – things like company values and culture, potential customers and competitors. By gathering up as much information as I could it would really help me to distil all of these elements into the brand identity.

recognisable brand logos from large companies

2. Discovery

During this process I took the time to further research the potential customers and competitors in the market place.

Having a better understanding of the customer – or our ideal customer – helped me to understand the types of people this logo had to engage with.

Looking at the competition was important, it allowed me to get a good understanding of what the other brands were doing, so I could then make sure my design was going to be different.

I then looked at the logo application – how and where will the logo be used? Is it going to appear in both print and digital?

Understanding any restrictions that might be applied to the design of a logo means you have already problem-solved any possible solutions to a design before it's used across its various touchpoints.

For example, a mobile app icon is kept to a fairly small area, compared to that of a business card, which will have less restrictions on things like size and operating systems.

breaking down the elements of the logo

3. Brain dump

This is where some of the initial design starts to formulate. I took all of the knowledge I had gathered up to this stage and started putting my thoughts and ideas together. From there I took these ideas – which could be a sketch or even a list of words – and started to develop these in Illustrator (design software).

I find that I can quickly get my initial thoughts down on paper, but then I start to break the core idea down using Illustrator. By taking the concepts of ‘group communication’, ‘email’, and ‘archive’ as key inspiration I got my draft designs together and presented these to Helen, David and Neil.

It’s worth noting that as this point I specifically presented these in black and white only. This is so we can focus on the logo design and not be emotionally swayed by colour.

At this stage all feedback is welcome. The feedback is important in allowing me to get a good sense of what’s working and where I need to spend more time developing a concept.

first round of topicbox logo samples

4. Refining concepts

This was probably the most time-consuming and longest part of the process. It involved me going back and forth with the team and refining the logos down to four key concepts.

During this stage I started to introduce colour and how this influenced the design of the logo. I also looked at refining the details within the logo – the curves, thickness of lines and the typography.

The process of refinement can be a challenging one! I remember thinking after a meeting with FastMail, how much further can I push this concept? However, only now when I look back on this do I understand that it was this constant need to keep pushing an idea that allowed me to get to the final execution of my design. It pays to never settle for ‘just good’.

second round of topicbox logo ideas in colour

5. Finally … a logo appears!

The Topicbox logo has been designed to express a friendly, inclusive and approachable design through the unique graphic style, typography and colours. There are a couple of key features worth noting in the final design of the Topicbox logo:

logo components: reading news + community groups + email

The final Topicbox logo

A black tshirt with Topicbox logo

Topicbox logo on a phone


Review of Stepping Off from Limina.

Published 17 Dec 2017 by Tom Wilson in thomas m wilson.

This book review is a good summary of Stepping Off  – thanks to Amy Budrikis.  Read the review at Limina.  

How would you change Archivematica's Format Policy Registry?

Published 15 Dec 2017 by Jenny Mitcham in Digital Archiving at the University of York.

A train trip through snowy Shropshire to get to Aberystwyth
This week the UK Archivematica user group fought through the snow and braved the winds and driving rain to meet at the National Library of Wales in Aberystwyth.

This was the first time the group had visited Wales and we celebrated with a night out at a lovely restaurant on the evening before our meeting. Our visit also coincided with the National Library cafe’s Christmas menu so we were treated to a generous Christmas lunch (and crackers) at lunch time. Thanks NLW!

As usual the meeting covered an interesting range of projects and perspectives from Archivematica users in the UK and beyond. As usual there was too much to talk about and not nearly enough time. Fortunately this took my mind off the fact I had damp feet for most of the day.

This post focuses on a discussion we had about Archivematica's Format Policy Registry or FPR. The FPR in Archivematica is a fairly complex beast, but a crucial tool for the 'Preservation Planning' step in digital archiving. It is essentially a database which allows users to define policies for handling different file formats (including the actions, tools and settings to apply to specific file type for the purposes of preservation or access). The FPR comes ready populated with a set of rules based on agreed best practice in the sector, but institutions are free to change these and add new tools and rules to meet their own requirements.

Jake Henry from the National Library of Wales kicked off the discussion by telling us about some work they had done to make the thumbnail generation for pdf files more useful. Instead of supplying a generic thumbnail image for all pdfs they wanted the thumbnail to actually represent the file in question. They made changes to the FPR to change the pdf thumbnail generation to use GhostScript.

NLW liked the fact that Archivematica converted pdf files to pdf/a but also wanted that same normalisation pathway to apply to existing pdf/a files. Just because a pdf/a file is already in a preservation file format it doesn’t mean it is a valid file. By also putting pdf/a files through a normalisation step they had more confidence that they were creating and preserving pdf/a files with some consistency.

Sea view from our meeting room!
Some institutions had not had any time to look in any detail at the default FPR rules. It was mentioned that there was trust in how the rules had been set up by Artefactual and that people didn’t feel expert enough to override these rules. The interface to the FPR within Archivematica itself is also not totally intuative and requires quite a bit of time to understand. It was mentioned that adding a tool and a new rule for a specific file format in Archivematica is quite an complex task and not for the faint hearted...!

Discussion also touched on the subject of those files that are not identified. A file needs to be identified before a FPR rule can be set up for it. Ensuring files are identified in the first instance was seen to be a crucial step. Even once a format makes its way into PRONOM (TNA’s database of file formats) Artefactual Systems have to carry out extra work to get Archivematica to pick up that new PUID.

Unfortunately normalisation tools do not exist for all files and in many cases you may just have to accept that a file will stay in the format in which it was received. For example a Microsoft Word document (.doc) may not be an ideal preservation format but in the absence of open source command line migration tools we may just have to accept the level of risk associated with this format.

Moving on from this, we also discussed manual normalisations. This approach may be too resource intensive for many (particularly those of us who are implementing automated workflows) but others would see this as an essential part of the digital preservation process. I gave the example of the WordStar files I have been working with this year. These files are already obsolete and though there are other ways of viewing them, I plan to migrate them to a format more suitable for preservation and access. This would need to be carried out outside of Archivematica using the manual normalisation workflow. I haven’t tried this yet but would very much like to test it out in the future.

I shared some other examples that I'd gathered outside the meeting. Kirsty Chatwin-Lee from the University of Edinburgh had a proactive approach to handling the FPR on a collection by collection and PUID by PUID basis. She checks all of the FPR rules for the PUIDs she is working with as she transfers a collection of digital objects into Archivematica and ensures she is happy before proceding with the normalisation step.

Back in October I'd tweeted to the wider Archivematica community to find out what people do with the FPR and had a few additional examples to share. For example, using Unoconv to convert office documents and creating PDF access versions of Microsoft Word documents. We also looked at some more detailed preservation planning documentation that Robert Gillesse from the International Institute of Social History had shared with the group.

We had a discussion about the benefits (or not) of normalising a compressed file (such as a JPEG) to an uncompressed format (such as TIFF). I had already mentioned in my presentation earlier that this default migration rule was turning 5GB of JPEG images into 80GB of TIFFs - and this is without improving the quality or the amount of information contained within the image. The same situation would apply to compressed audio and video which would increase even more in size when converted to an uncompressed format.

If storage space is at a premium (or if you are running this as a service and charging for storage space used) this could be seen as a big problem. We discussed the reasons for and against leaving this rule in the FPR. It is true that we may have more confidence in the longevity of TIFFs and see them as more robust in the face of corruption, but if we are doing digital preservation properly (checking checksums, keeping multiple copies etc) shouldn't corruption be easily spotted and fixed?

Another reason we may migrate or normalise files is to restrict the file formats we are preserving to a limited set of known formats in the hope that this will lead to less headaches in the future. This would be a reason to keep on converting all those JPEGs to TIFFs.

The FPR is there to be changed and being that not all organisations have exactly the same requirements it is not surprising that we are starting to tweak it here and there – if we don’t understand it, don’t look at it and don’t consider changing it perhaps we aren’t really doing our jobs properly.

However there was also a strong feeling in the room that we shouldn’t all be re-inventing the wheel. It is incredibly useful to hear what others have done with the FPR and the rationale behind their decisions.

Hopefully it is helpful to capture this discussion in a blog post, but this isn’t a sustainable way to communicate FPR changes for the longer term. There was a strong feeling in the room that we need a better way of communicating with each other around our preservation planning - the decisions we have made and the reasons for those decisions. This feeling was echoed by Kari Smith (MIT Libraries) and Nick Krabbenhoeft (New York Public Library) who joined us remotely to talk about the OSSArcFlow project - so this is clearly an international problem! This is something that Jisc are considering as part of their Research Data Shared Service project so it will be interesting to see how this might develop in the future.

Thanks to the UK Archivematica group meeting attendees for contributing to the discussion and informing this blog post.

How would you change Archivematica's Format Policy Registry?

Published 15 Dec 2017 by Jenny Mitcham in Digital Archiving at the University of York.

A train trip through snowy Shropshire to get to Aberystwyth
This week the UK Archivematica user group fought through the snow and braved the winds and driving rain to meet at the National Library of Wales in Aberystwyth.

This was the first time the group had visited Wales and we celebrated with a night out at a lovely restaurant on the evening before our meeting. Our visit also coincided with the National Library cafe’s Christmas menu so we were treated to a generous Christmas lunch (and crackers) at lunch time. Thanks NLW!

As usual the meeting covered an interesting range of projects and perspectives from Archivematica users in the UK and beyond. As usual there was too much to talk about and not nearly enough time. Fortunately this took my mind off the fact I had damp feet for most of the day.

This post focuses on a discussion we had about Archivematica's Format Policy Registry or FPR. The FPR in Archivematica is a fairly complex beast, but a crucial tool for the 'Preservation Planning' step in digital archiving. It is essentially a database which allows users to define policies for handling different file formats (including the actions, tools and settings to apply to specific file type for the purposes of preservation or access). The FPR comes ready populated with a set of rules based on agreed best practice in the sector, but institutions are free to change these and add new tools and rules to meet their own requirements.

Jake Henry from the National Library of Wales kicked off the discussion by telling us about some work they had done to make the thumbnail generation for pdf files more useful. Instead of supplying a generic thumbnail image for all pdfs they wanted the thumbnail to actually represent the file in question. They made changes to the FPR to change the pdf thumbnail generation to use GhostScript.

NLW liked the fact that Archivematica converted pdf files to pdf/a but also wanted that same normalisation pathway to apply to existing pdf/a files. Just because a pdf/a file is already in a preservation file format it doesn’t mean it is a valid file. By also putting pdf/a files through a normalisation step they had more confidence that they were creating and preserving pdf/a files with some consistency.

Sea view from our meeting room!
Some institutions had not had any time to look in any detail at the default FPR rules. It was mentioned that there was trust in how the rules had been set up by Artefactual and that people didn’t feel expert enough to override these rules. The interface to the FPR within Archivematica itself is also not totally intuative and requires quite a bit of time to understand. It was mentioned that adding a tool and a new rule for a specific file format in Archivematica is quite an complex task and not for the faint hearted...!

Discussion also touched on the subject of those files that are not identified. A file needs to be identified before a FPR rule can be set up for it. Ensuring files are identified in the first instance was seen to be a crucial step. Even once a format makes its way into PRONOM (TNA’s database of file formats) Artefactual Systems have to carry out extra work to get Archivematica to pick up that new PUID.

Unfortunately normalisation tools do not exist for all files and in many cases you may just have to accept that a file will stay in the format in which it was received. For example a Microsoft Word document (.doc) may not be an ideal preservation format but in the absence of open source command line migration tools we may just have to accept the level of risk associated with this format.

Moving on from this, we also discussed manual normalisations. This approach may be too resource intensive for many (particularly those of us who are implementing automated workflows) but others would see this as an essential part of the digital preservation process. I gave the example of the WordStar files I have been working with this year. These files are already obsolete and though there are other ways of viewing them, I plan to migrate them to a format more suitable for preservation and access. This would need to be carried out outside of Archivematica using the manual normalisation workflow. I haven’t tried this yet but would very much like to test it out in the future.

I shared some other examples that I'd gathered outside the meeting. Kirsty Chatwin-Lee from the University of Edinburgh had a proactive approach to handling the FPR on a collection by collection and PUID by PUID basis. She checks all of the FPR rules for the PUIDs she is working with as she transfers a collection of digital objects into Archivematica and ensures she is happy before proceding with the normalisation step.

Back in October I'd tweeted to the wider Archivematica community to find out what people do with the FPR and had a few additional examples to share. For example, using Unoconv to convert office documents and creating PDF access versions of Microsoft Word documents. We also looked at some more detailed preservation planning documentation that Robert Gillesse from the International Institute of Social History had shared with the group.

We had a discussion about the benefits (or not) of normalising a compressed file (such as a JPEG) to an uncompressed format (such as TIFF). I had already mentioned in my presentation earlier that this default migration rule was turning 5GB of JPEG images into 80GB of TIFFs - and this is without improving the quality or the amount of information contained within the image. The same situation would apply to compressed audio and video which would increase even more in size when converted to an uncompressed format.

If storage space is at a premium (or if you are running this as a service and charging for storage space used) this could be seen as a big problem. We discussed the reasons for and against leaving this rule in the FPR. It is true that we may have more confidence in the longevity of TIFFs and see them as more robust in the face of corruption, but if we are doing digital preservation properly (checking checksums, keeping multiple copies etc) shouldn't corruption be easily spotted and fixed?

Another reason we may migrate or normalise files is to restrict the file formats we are preserving to a limited set of known formats in the hope that this will lead to less headaches in the future. This would be a reason to keep on converting all those JPEGs to TIFFs.

The FPR is there to be changed and being that not all organisations have exactly the same requirements it is not surprising that we are starting to tweak it here and there – if we don’t understand it, don’t look at it and don’t consider changing it perhaps we aren’t really doing our jobs properly.

However there was also a strong feeling in the room that we shouldn’t all be re-inventing the wheel. It is incredibly useful to hear what others have done with the FPR and the rationale behind their decisions.

Hopefully it is helpful to capture this discussion in a blog post, but this isn’t a sustainable way to communicate FPR changes for the longer term. There was a strong feeling in the room that we need a better way of communicating with each other around our preservation planning - the decisions we have made and the reasons for those decisions. This feeling was echoed by Kari Smith (MIT Libraries) and Nick Krabbenhoeft (New York Public Library) who joined us remotely to talk about the OSSArcFlow project - so this is clearly an international problem! This is something that Jisc are considering as part of their Research Data Shared Service project so it will be interesting to see how this might develop in the future.

Thanks to the UK Archivematica group meeting attendees for contributing to the discussion and informing this blog post.

The Squeeze

Published 8 Dec 2017 by jenimcmillan in Jeni McMillan.

 

I’m about to squeeze this hulk between solid stone buildings that have withstood two world wars and four hundred years of seasonal change, love and laughter in the Aveyron. I’m not the first. This is the through road between wheat fields on high and the ancient moulins along the river that ground the grain to fine flour for the communal bread ovens. Tractors, horses, wagons, and more recently cars and the occasional truck have traversed this route. Today I’m driving the old Mercedes.

I’ve been in Europe for six months now. Hitching my way around France and Greece, meeting the strange, the interesting and the humorous along my way. Striding with backpack or pedalling the tiny trails that connect villages. I don’t drive cars. I’m on the wrong side of the road, the wrong side of the car, and I’m trying to brake with my right foot on the pedal. Sure I have been granted a temporary French permit to drive, but do I really want to exchange a life of adventurous travel for the easy option? I will decide after I have safely parked the car on the wild and wintery hill-top back at my friend’s house.

 

 


Cakes, quizzes, blogs and advocacy

Published 4 Dec 2017 by Jenny Mitcham in Digital Archiving at the University of York.

Last Thursday was International Digital Preservation Day and I think I needed the weekend to recover.

It was pretty intense...

...but also pretty amazing!

Amazing to see what a fabulous international community there is out there working on the same sorts of problems as me!

Amazing to see quite what a lot of noise we can make if we all talk at once!

Amazing to see such a huge amount of advocacy and awareness raising going on in such a small space of time!

International Digital Preservation Day was crazy but now I have had a bit more time to reflect, catch up...and of course read a selection of the many blog posts and tweets that were posted.

So here are some of my selected highlights:

Cakes

Of course the highlights have to include the cakes and biscuits including those produced by Rachel MacGregor and Sharon McMeekin. Turning the problems that we face into something edible helps does seem to make our challenges easier to digest!

Quizzes and puzzles

A few quizzes and puzzles were posed on the day via social media - a great way to engage the wider world and have a bit of fun in the process.


There was a great quiz from the Parliamentary Archives (the answers are now available here) and a digital preservation pop quiz from Ed Pinsent of CoSector which started here. Also for those hexadecimal geeks out there, a puzzle from the DP0C Fellows at Oxford and Cambridge which came just at the point that I was firing up a hexadecimal viewer as it happens!

In a blog post called Name that item in...? Kirsty Chatwin-Lee at Edinburgh University encourages the digital preservation community to help her to identify a mysterious large metal disk found in their early computing collections. Follow the link to the blog to see a picture - I'm sure someone out there can help!

Announcements and releases

There were lots of big announcements on the day too. IDPD just kept on giving!

Of course the 'Bit List' (a list of digitally endangered species) was announced and I was able to watch this live. Kevin Ashley from the Digital Curation Coalition discusses this in a blog post. It was interesting to finally see what was on the list (and then think further about how we can use this for further advocacy and awareness raising).

I celebrated this fact with some Fake News but to be fair, William Kilbride had already been on the BBC World Service the previous evening talking about just this so it wasn't too far from the truth!

New versions of JHOVE and VeraPDF were released as well as a new PRONOM release.  A digital preservation policy for Wales was announced and a new course on file migration was launched by CoSector at the University of London. Two new members also joined the Digital Preservation Coalition - and what a great day to join!

Roadshows

Some institutions did a roadshow or a pop up museum in order to spread the message about digital preservation more widely. This included the revival of the 'fish screensaver' at Trinity College Dublin and a pop up computer museum at the British Geological Survey.

Digital Preservation at Oxford and Cambridge blogged about their portable digital preservation roadshow kit. I for one found this a particularly helpful resource - perhaps I will manage to do something similar myself next IDPD!

A day in the life

Several institutions chose to mark the occasion by blogging or tweeting about the details of their day. This gives an insight into what we DP folks actually do all day and can be really useful being that the processes behind digital preservation work are often less tangible and understandable than those used for physical archives!

I particularly enjoyed the nostalgia of following ex colleagues at the Archaeology Data Service for the day (including references to those much loved checklists!) and hearing from  Artefactual Systems about the testing, breaking and fixing of Archivematica that was going on behind the scenes.

The Danish National Archives blogged about 'a day in the life' and I was particularly interested to hear about the life-cycle perspective they have as new software is introduced, assessed and approved.

Exploring specific problems and challenges

Plans are my reality from Yvonne Tunnat of the ZBW Leibniz Information Centre for Economics was of particular interest to me as it demonstrates just how hard the preservation tasks can be. I like it when people are upfront and honest about the limitations of the tools or the imperfections of the processes they are using. We all need to share more of this!

In Sustaining the software that preserves access to web archives, Andy Jackson from the British Library tells the story of an attempt to maintain a community of practice around open source software over time and shares some of the lessons learned - essential reading for any of us that care about collaborating to sustain open source.

Kirsty Chatwin-Lee from Edinburgh University invites us to head back to 1985 with her as she describes their Kryoflux-athon challenge for the day. What a fabulous way to spend the day!

Disaster stories

Digital Preservation Day wouldn't be Digital Preservation Day without a few disaster stories too! Despite our desire to move away beyond the 'digital dark age' narrative, it is often helpful to refer to worse case scenarios when advocating for digital preservation.

Cees Hof from DANS in the Netherlands talks about the loss of digital data related to rare or threatened species in The threat of double extinction, Sarah Mason from Oxford University uses the recent example of the shutdown of DCist to discuss institutional risk, José Borbinha from Lisbon University, Portugal talks about his own experiences of digital preservation disaster and Neil Beagrie from Charles Beagrie Ltd highlights the costs of inaction.

The bigger picture

Other blogs looked at the bigger picture

Preservation as a present by Barbara Sierman from the National Library of the Netherlands is a forward thinking piece about how we could communicate and plan better in order to move forward.

Shira Peltzman from the University of California, Los Angeles tries to understand some of the results of the 2017 NDSA Staffing Survey in It's difficult to solve a problem if you don't know what's wrong.

David Minor from the University of San Diego Library, provides his thoughts on What we’ve done well, and some things we still need to figure out.

I enjoyed reading a post from Euan Cochrane from Yale University Library on The Emergence of “Digital Patinas”. A really interesting piece... and who doesn't like to be reminded of the friendly and helpful Word 97 paperclip?

In Towards a philosophy of digital preservation, Stacey Erdman from Beloit College, Wisconsin USA asks whether archivists are born or made and discusses her own 'archivist "gene"'.




So much going on and there were so many other excellent contributions that I missed.

I'll end with a tweet from Euan Cochrane which I thought nicely summed up what International Digital Preservation Day is all about and of course the day was also concluded by William Kilbride of the DPC with a suitably inspirational blog post.



Congratulations to the Digital Preservation Coalition for organising the day and to the whole digital preservation community for making such a lot of noise!



Cakes, quizzes, blogs and advocacy

Published 4 Dec 2017 by Jenny Mitcham in Digital Archiving at the University of York.

Last Thursday was International Digital Preservation Day and I think I needed the weekend to recover.

It was pretty intense...

...but also pretty amazing!

Amazing to see what a fabulous international community there is out there working on the same sorts of problems as me!

Amazing to see quite what a lot of noise we can make if we all talk at once!

Amazing to see such a huge amount of advocacy and awareness raising going on in such a small space of time!

International Digital Preservation Day was crazy but now I have had a bit more time to reflect, catch up...and of course read a selection of the many blog posts and tweets that were posted.

So here are some of my selected highlights:

Cakes

Of course the highlights have to include the cakes and biscuits including those produced by Rachel MacGregor and Sharon McMeekin. Turning the problems that we face into something edible helps does seem to make our challenges easier to digest!

Quizzes and puzzles

A few quizzes and puzzles were posed on the day via social media - a great way to engage the wider world and have a bit of fun in the process.


There was a great quiz from the Parliamentary Archives (the answers are now available here) and a digital preservation pop quiz from Ed Pinsent of CoSector which started here. Also for those hexadecimal geeks out there, a puzzle from the DP0C Fellows at Oxford and Cambridge which came just at the point that I was firing up a hexadecimal viewer as it happens!

In a blog post called Name that item in...? Kirsty Chatwin-Lee at Edinburgh University encourages the digital preservation community to help her to identify a mysterious large metal disk found in their early computing collections. Follow the link to the blog to see a picture - I'm sure someone out there can help!

Announcements and releases

There were lots of big announcements on the day too. IDPD just kept on giving!

Of course the 'Bit List' (a list of digitally endangered species) was announced and I was able to watch this live. Kevin Ashley from the Digital Curation Coalition discusses this in a blog post. It was interesting to finally see what was on the list (and then think further about how we can use this for further advocacy and awareness raising).

I celebrated this fact with some Fake News but to be fair, William Kilbride had already been on the BBC World Service the previous evening talking about just this so it wasn't too far from the truth!

New versions of JHOVE and VeraPDF were released as well as a new PRONOM release.  A digital preservation policy for Wales was announced and a new course on file migration was launched by CoSector at the University of London. Two new members also joined the Digital Preservation Coalition - and what a great day to join!

Roadshows

Some institutions did a roadshow or a pop up museum in order to spread the message about digital preservation more widely. This included the revival of the 'fish screensaver' at Trinity College Dublin and a pop up computer museum at the British Geological Survey.

Digital Preservation at Oxford and Cambridge blogged about their portable digital preservation roadshow kit. I for one found this a particularly helpful resource - perhaps I will manage to do something similar myself next IDPD!

A day in the life

Several institutions chose to mark the occasion by blogging or tweeting about the details of their day. This gives an insight into what we DP folks actually do all day and can be really useful being that the processes behind digital preservation work are often less tangible and understandable than those used for physical archives!

I particularly enjoyed the nostalgia of following ex colleagues at the Archaeology Data Service for the day (including references to those much loved checklists!) and hearing from  Artefactual Systems about the testing, breaking and fixing of Archivematica that was going on behind the scenes.

The Danish National Archives blogged about 'a day in the life' and I was particularly interested to hear about the life-cycle perspective they have as new software is introduced, assessed and approved.

Exploring specific problems and challenges

Plans are my reality from Yvonne Tunnat of the ZBW Leibniz Information Centre for Economics was of particular interest to me as it demonstrates just how hard the preservation tasks can be. I like it when people are upfront and honest about the limitations of the tools or the imperfections of the processes they are using. We all need to share more of this!

In Sustaining the software that preserves access to web archives, Andy Jackson from the British Library tells the story of an attempt to maintain a community of practice around open source software over time and shares some of the lessons learned - essential reading for any of us that care about collaborating to sustain open source.

Kirsty Chatwin-Lee from Edinburgh University invites us to head back to 1985 with her as she describes their Kryoflux-athon challenge for the day. What a fabulous way to spend the day!

Disaster stories

Digital Preservation Day wouldn't be Digital Preservation Day without a few disaster stories too! Despite our desire to move away beyond the 'digital dark age' narrative, it is often helpful to refer to worse case scenarios when advocating for digital preservation.

Cees Hof from DANS in the Netherlands talks about the loss of digital data related to rare or threatened species in The threat of double extinction, Sarah Mason from Oxford University uses the recent example of the shutdown of DCist to discuss institutional risk, José Borbinha from Lisbon University, Portugal talks about his own experiences of digital preservation disaster and Neil Beagrie from Charles Beagrie Ltd highlights the costs of inaction.

The bigger picture

Other blogs looked at the bigger picture

Preservation as a present by Barbara Sierman from the National Library of the Netherlands is a forward thinking piece about how we could communicate and plan better in order to move forward.

Shira Peltzman from the University of California, Los Angeles tries to understand some of the results of the 2017 NDSA Staffing Survey in It's difficult to solve a problem if you don't know what's wrong.

David Minor from the University of San Diego Library, provides his thoughts on What we’ve done well, and some things we still need to figure out.

I enjoyed reading a post from Euan Cochrane from Yale University Library on The Emergence of “Digital Patinas”. A really interesting piece... and who doesn't like to be reminded of the friendly and helpful Word 97 paperclip?

In Towards a philosophy of digital preservation, Stacey Erdman from Beloit College, Wisconsin USA asks whether archivists are born or made and discusses her own 'archivist "gene"'.




So much going on and there were so many other excellent contributions that I missed.

I'll end with a tweet from Euan Cochrane which I thought nicely summed up what International Digital Preservation Day is all about and of course the day was also concluded by William Kilbride of the DPC with a suitably inspirational blog post.



Congratulations to the Digital Preservation Coalition for organising the day and to the whole digital preservation community for making such a lot of noise!



Wikidata Map November 2017

Published 3 Dec 2017 by addshore in Addshore.

It has only been 4 months since my last Wikidata map update post, but the difference on the map in these 4 months is much greater than the diff shown in my last post covering 9 months. The whole map is covered with pink (additions to the map). The main areas include Norway, Germany, Malaysia, South Korea, Vietnam and New Zealand to name just a few.

As with previous posts varying sizes of the images generated can be found on Wikimedia Commons along with the diff image:

July to November in numbers

In the last 4 months (roughly speaking):

All of these numbers were roughly pulled out of graphs by eye. The graphs can be seen below:

The post Wikidata Map November 2017 appeared first on Addshore.


Wikibase docker images

Published 3 Dec 2017 by addshore in Addshore.

This is a belated post about the Wikibase docker images that I recently created for the Wikidata 5th birthday. You can find the various images on docker hub and matching Dockerfiles on github. These images combined allow you to quickly create docker containers for Wikibase backed by MySQL and with a SPARQL query service running alongside updating live from the Wikibase install.

A setup was demoed at the first Wikidatacon event in Berlin on the 29th of October 2017 and can be seen at roughly 41:10 in the demo of presents video which can be seen below.

The images

The ‘wikibase‘ image is based on the new official mediawiki image hosted on the docker store. The only current version, which is also the version demoed is for MediaWiki 1.29. This image contains MediaWiki running on PHP 7.1 served by apache. Right now the image does some sneaky auto installation of the MediaWiki database tables which might be disappearing in the future to make the image more generic.

The ‘wdqs‘ image is based on the official openjdk image hosted on the docker store. This image also only has one version, the current latest version of the Wikidata Query Service which is downloaded from maven. This image can be used to run the blazegraph service as well as run an updater that reads from the recent changes feed of a wikibase install and adds the new data to blazegraph.

The ‘wdqs-frontend‘ image hosts the pretty UI for the query service served by nginx. This includes auto completion and pretty visualizations. There is currently an issue which means the image will always serve examples for Wikidata which will likely not work on your custom install.

The ‘wdqs-proxy‘ image hosts an nginx proxy that restricts external access to the wdqs service meaning it is READONLY and also has a time limit for queries (not currently configurable). This is very important as if the wdqs image is exposed directly to the world then people can also write to your blazegraph store.

You’ll also need to have some mysql server setup for wikibase to use, you can use the default mysql or mariadb images for this, this is also covered in the example below.

All of the wdqs images should probably be renamed as they are not specific to Wikidata (which is where the wd comes from), but right now the underlying repos and packages have the wd prefix and not a wb prefix (for Wikibase) so we will stick to them.

Compose example

The below example configures volumes for all locations with data that should / could persist. Wikibase is exposed on port 8181 with the query service UI on 8282 and the queryservice itself (behind the proxy) on 8989.

Each service has a network alias defined (that probably isn’t needed in most setups), but while running on WMCS it was required to get around some bad name resolving.

version: '3'

services:
  wikibase:
    image: wikibase/wikibase
    restart: always
    links:
      - mysql
    ports:
     - "8181:80"
    volumes:
      - mediawiki-images-data:/var/www/html/images
    depends_on:
    - mysql
    networks:
      default:
        aliases:
         - wikibase.svc
  mysql:
    image: mariadb
    restart: always
    volumes:
      - mediawiki-mysql-data:/var/lib/mysql
    environment:
      MYSQL_DATABASE: 'my_wiki'
      MYSQL_USER: 'wikiuser'
      MYSQL_PASSWORD: 'sqlpass'
      MYSQL_RANDOM_ROOT_PASSWORD: 'yes'
    networks:
      default:
        aliases:
         - mysql.svc
  wdqs-frontend:
    image: wikibase/wdqs-frontend
    restart: always
    ports:
     - "8282:80"
    depends_on:
    - wdqs-proxy
    networks:
      default:
        aliases:
         - wdqs-frontend.svc
  wdqs:
    image: wikibase/wdqs
    restart: always
    build:
      context: ./wdqs/0.2.5
      dockerfile: Dockerfile
    volumes:
      - query-service-data:/wdqs/data
    command: /runBlazegraph.sh
    networks:
      default:
        aliases:
         - wdqs.svc
  wdqs-proxy:
    image: wikibase/wdqs-proxy
    restart: always
    environment:
      - PROXY_PASS_HOST=wdqs.svc:9999
    ports:
     - "8989:80"
    depends_on:
    - wdqs
    networks:
      default:
        aliases:
         - wdqs-proxy.svc
  wdqs-updater:
    image: wikibase/wdqs
    restart: always
    command: /runUpdate.sh
    depends_on:
    - wdqs
    - wikibase
    networks:
      default:
        aliases:
         - wdqs-updater.svc

volumes:
  mediawiki-mysql-data:
  mediawiki-images-data:
  query-service-data:

Questions

I’ll vaugly keep this section up to date with Qs & As, but if you don’t find you answer here, leave a comment, send an email or file a phabricator ticket.

Can I use these images in production?

I wouldn’t really recommend running any of these in ‘production’ yet as they are new and not well tested. Various things such as upgrade for the query service and upgrades for mediawiki / wikibase are also not yet documented very well.

Can I import data into these images from an existing wikibase / wikidata? (T180216)

In theory, although this is not documented. You’ll have to import everything using an XML dump of the existing mediawiki install, the configuration will also have to match on both installs. When importing using an XML dump the query service will not be updated automatically, and you will likely have to read the manual.

Where was the script that you ran in the demo video?

There is a copy in the github repo called setup.sh, but I can’t guarantee it works in all situations! It was specifically made for a WMCS debian jessie VM.

Links

The post Wikibase docker images appeared first on Addshore.


What shall I do for International Digital Preservation Day?

Published 30 Nov 2017 by Jenny Mitcham in Digital Archiving at the University of York.

I have been thinking about this question for a few months now and have only recently come up with a solution.

I wanted to do something big on International Digital Preservation Day. Unfortunately other priorities have limited the amount of time available and I am doing something a bit more low key. To take a positive from a negative I would like to suggest that as with digital preservation more generally, it is better to just do something rather than wait for the perfect solution to come along!

I am sometimes aware that I spend a lot of time in my own echo chamber - for example talking on Twitter and through this blog to other folks who also work in digital preservation. Though this is undoubtedly a useful two-way conversation, for International Digital Preservation Day I wanted to target some new audiences.

So instead of blogging here (yes I know I am blogging here too) I have blogged on the Borthwick Institute for Archives blog.

The audience for the Borthwick blog is a bit different to my usual readership. It is more likely to be read by users of our services at the Borthwick Institute and those who donate or deposit with us, perhaps also by staff working in other archives in the UK and beyond. Perfect for what I had planned.

In response to the tagline of International Digital Preservation Day ‘Bits Decay: Do Something Today’ I wanted to encourage as many people as possible to ‘Do Something’. This shouldn’t be just limited to us digital preservation folks, but to anyone anywhere who uses a computer to create or manage data.

This is why I decided to focus on Personal Digital Archiving. The blog post is called “Save your digital stuff!” (credit to the DPC Technology Watch Report on Personal Digital Archiving for this inspiring title - it was noted that at a briefing day hosted by the Digital Preservation Coalition (DPC) in April 2015, one of the speakers suggested that the term ‘personal digital archiving’ be replaced by the more urgent exhortation, ‘Save your digital stuff!’).

The blog post aimed to highlight the fragility of digital resources and then give a few tips on how to protect them. Nothing too complicated or technical, but hopefully just enough to raise awareness and perhaps encourage engagement. Not wishing to replicate all the great work that has already been done on Personal Digital Archiving, by the Library of Congress, the Paradigm project and others I decided to focus on just a few simple pieces of advice and then link out to other resources.

At the end of the post I encourage people to share information about any actions they have taken to protect their own digital legacies (of course using the #IDPD17 hashtag). If I inspire just one person to take action I'll consider it a win!

I'm also doing a 'Digital Preservation Takeover' of the Borthwick twitter account @UoYBorthwick. I lined up a series of 'fascinating facts' about the digital archives we hold here at the Borthwick and tweeted them over the course of the day.



OK - admittedly they won't be fascinating to everyone, but if nothing else it helps us to move further away from the notion that an archive is where you go to look at very old documents!

...and of course I now have a whole year to plan for International Digital Preservation Day 2018 so perhaps I'll be able to do something bigger and better?! I'm certainly feeling inspired by the range of activities going on across the globe today.


What shall I do for International Digital Preservation Day?

Published 30 Nov 2017 by Jenny Mitcham in Digital Archiving at the University of York.

I have been thinking about this question for a few months now and have only recently come up with a solution.

I wanted to do something big on International Digital Preservation Day. Unfortunately other priorities have limited the amount of time available and I am doing something a bit more low key. To take a positive from a negative I would like to suggest that as with digital preservation more generally, it is better to just do something rather than wait for the perfect solution to come along!

I am sometimes aware that I spend a lot of time in my own echo chamber - for example talking on Twitter and through this blog to other folks who also work in digital preservation. Though this is undoubtedly a useful two-way conversation, for International Digital Preservation Day I wanted to target some new audiences.

So instead of blogging here (yes I know I am blogging here too) I have blogged on the Borthwick Institute for Archives blog.

The audience for the Borthwick blog is a bit different to my usual readership. It is more likely to be read by users of our services at the Borthwick Institute and those who donate or deposit with us, perhaps also by staff working in other archives in the UK and beyond. Perfect for what I had planned.

In response to the tagline of International Digital Preservation Day ‘Bits Decay: Do Something Today’ I wanted to encourage as many people as possible to ‘Do Something’. This shouldn’t be just limited to us digital preservation folks, but to anyone anywhere who uses a computer to create or manage data.

This is why I decided to focus on Personal Digital Archiving. The blog post is called “Save your digital stuff!” (credit to the DPC Technology Watch Report on Personal Digital Archiving for this inspiring title - it was noted that at a briefing day hosted by the Digital Preservation Coalition (DPC) in April 2015, one of the speakers suggested that the term ‘personal digital archiving’ be replaced by the more urgent exhortation, ‘Save your digital stuff!’).

The blog post aimed to highlight the fragility of digital resources and then give a few tips on how to protect them. Nothing too complicated or technical, but hopefully just enough to raise awareness and perhaps encourage engagement. Not wishing to replicate all the great work that has already been done on Personal Digital Archiving, by the Library of Congress, the Paradigm project and others I decided to focus on just a few simple pieces of advice and then link out to other resources.

At the end of the post I encourage people to share information about any actions they have taken to protect their own digital legacies (of course using the #IDPD17 hashtag). If I inspire just one person to take action I'll consider it a win!

I'm also doing a 'Digital Preservation Takeover' of the Borthwick twitter account @UoYBorthwick. I lined up a series of 'fascinating facts' about the digital archives we hold here at the Borthwick and tweeted them over the course of the day.



OK - admittedly they won't be fascinating to everyone, but if nothing else it helps us to move further away from the notion that an archive is where you go to look at very old documents!

...and of course I now have a whole year to plan for International Digital Preservation Day 2018 so perhaps I'll be able to do something bigger and better?! I'm certainly feeling inspired by the range of activities going on across the globe today.


Preserving Google Drive: What about Google Sheets?

Published 29 Nov 2017 by Jenny Mitcham in Digital Archiving at the University of York.

There was lots of interest in a blog post earlier this year about preserving Google Docs.

Often the issues we grapple with in the field of digital preservation are not what you'd call 'solved problems' and that is what makes them so interesting. I always like to hear how others are approaching these same challenges so it is great to see so many comments on the blog itself and via Twitter.

This time I'm turning my focus to the related issue of Google Sheets. This is the native spreadsheet application for Google Drive.

Why?

Again, this is an application that is widely used at the University of York in a variety of different contexts, including for academic research data. We need to think about how we might preserve data created in Google Sheets for the longer term.


How hard can it be?

Quite hard actually - see my earlier post!


Exporting from Google Drive

For Google Sheets I followed a similar methodology to Google Docs. Taking a couple of sample spreadsheets and downloading them in the formats that Google provides, then examining these exported versions to assess how well specific features of the spreadsheet were retained.

I used the File...Download as... menu in Google Sheets to test out the available export formats

The two spreadsheets I worked with were as follows:


Here is a summary of my findings:

Microsoft Excel - xlsx

I had high hopes for the xlsx export option - however, on opening the exported xlsx version of my flexisheet I was immediately faced with an error message telling me that the file contained unreadable content and asking whether I wanted to recover the contents.

This doesn't look encouraging...

Clicking 'Yes' on this dialogue box then allows the sheet to open and another message appears telling you what has been repaired. In this case it tells me that a formula has been removed.


Excel can open the file if it removes the formula

This is not ideal if the formula is considered to be worthy of preservation.

So clearly we already know that this isn't going to be a perfect copy of the Google sheet.

This version of my flexisheet looks pretty messed up. The dates and values look OK, but none of the calculated values are there - they are all replaced with "#VALUE".

The colours on the original flexisheet are important as they flag up problems and issues with the data entered. These however are not fully retained - for example, weekends are largely (but not consistently) marked as red and in the original file they are green (because it is assumed that I am not actually meant to be working weekends).

The XLSX export does however give a better representation of the more simple menu choices Google sheet. The data is accurate, and comments are present in a partial way. Unfortunately though, replies to comments are not displayed and the comments are not associated with a date or time.


Open Document Format - ods

I tried opening the ODS version of the flexisheet in LibreOffice on a Macbook. There were no error messages (which was nice) but the sheet was a bit of a mess. There were similar issues to those that I encountered in the Excel export though it wasn't identical. The colours were certainly applied differently, neither entirely accurate to the original.

If I actually tried to use the sheet to enter more data in, the formula do not work - they do not calculate anything, though it does appear that the formula itself appears to be retained. Any values that are calculated on the original sheet are not present.

Comments are retained (and replies to comments) but no date or time appears to be associated with them (note that the data may be there but just not displaying in LibreOffice).

I also tried opening the ODS file in Microsoft Office. On opening it the same error message was displayed to the one originally encountered in the XLSX version described above and this was followed by notification that “Excel completed file level validation and repair. Some parts of this workbook may have been repaired or discarded.” Unlike the XLSX file there didn't appear to be any additional information available about exactly what had been repaired or discarded - this didn't exactly fill me with confidence!

PDF document - pdf

When downloading a spreadsheet as a PDF you are presented with a few choices - for example:
  • Should the export include all sheets, just the current sheet or current selection (note that current sheet is the default response)
  • Should the export include the document title?
  • Should the export include sheet names?
To make the export as thorough as possible I chose to export all sheets and include document title and sheet names.

As you might expect this was a good representation of the values on the spreadsheet - a digital print if you like - but all functionality and interactivity was lost. In order to re-use the data, it would need to be copied and pasted or re-typed back into a spreadsheet application.

Note that comments within the sheet were not retained and also there was no option to export sheets that were hidden.

Web page - html

This gave an accurate representation of the values on the spreadsheet, but, similar to the PDF version, not in a way that really encourages reuse. Formula were not retained and the resulting copy is just a static snapshot.

Interestingly, the comments in the menu choices example weren't retained. This surprised me because when using the html export option for Google documents one of the noted benefits was that comments were retained. Seems to be a lack of consistency here.

Another thing that surprised me about this version of the flexisheet was that it included hidden sheets (I hadn't until this point realised that there were hidden sheets!). I later discovered that the XLSX and ODS also retained the hidden sheets ...but they were (of course) hidden so I didn't immediately notice them! 

Tab delimited and comma separated values - tsv and csv

It is made clear on export that only the current sheet is exported so if using this as an export strategy you would need to ensure you exported each individual sheet one by one.

The tab delimited export of the flexisheet surprised me. In order to look at the data properly I tried importing it into MS Excel. It came up with a circular reference warning which surprised me - were some of the dynamic properties of the sheets being somehow retained (all be it in a way that was broken)?

tab_delim_error_when_import_to_Excel.png
A circular reference warning when opening the tab delimited file in Microsoft Excel

Both of these formats did a reasonable job of capturing the simple menu choices data (though note that the comments were not retained) but neither did an acceptable job of representing the complex data within the flexisheet (given that the more complex elements such as formulas and colours were not retained).

What about the metadata?

I won't go into detail again about the other features of a Google Sheet that won't be saved with these export options - for example information about who created it and when and the complete revision history that is available through Google Drive - this is covered in a previous post. Given my findings when I interviewed a researcher here at the University of York about their use of Google Sheets, the inability of the export options to capture the version history will be seen as problematic for some use cases.

What is the best export format for Google Sheets?

The short answer is 'it depends'.

The export options available all have pros and cons and as ever, the most suitable one will very much depend on the nature of the original file and the properties that you consider to be most worthy of preservation.


  • If for example the inclusion of comments is an essential requirement, XLSX or ODS will be the only formats that retain them (with varying degrees of success). 
  • If you just want a static snapshot of the data in its final form, PDF will do a good job (you must specify that all sheets are saved), but note that if you want to include hidden sheets, HTML may be a better option. 
  • If the data is required in a usable form (including a record of the formula used) you will need to try XLSX or ODS but note that calculated values present in the original sheet may be missing. Similar but not identical results were noted with XLSX and ODS so it would be worth trying them both and seeing if either is suitable for the data in question.


It should be possible to export an acceptable version of the data for a simple Google Sheet but for a complex dataset it will be difficult to find an export option that adequately retains all features.

Exporting Google Sheets seems even more problematic and variable than Google Documents and for a sheet as complex as my flexisheet it appears that there is no suitable option that retains the functionality of the sheet as well as the content.

So, here's hoping that native Google Drive files appear on the list of World's Endangered Digital Species...due to be released on International Digital Preservation Day! We will have to wait until tomorrow to find out...



A disclaimer: I carried out the best part of this work about 6 months ago but have only just got around to publishing it. Since I originally carried out the exports and noted my findings, things may have changed!


Preserving Google Drive: What about Google Sheets?

Published 29 Nov 2017 by Jenny Mitcham in Digital Archiving at the University of York.

There was lots of interest in a blog post earlier this year about preserving Google Docs.

Often the issues we grapple with in the field of digital preservation are not what you'd call 'solved problems' and that is what makes them so interesting. I always like to hear how others are approaching these same challenges so it is great to see so many comments on the blog itself and via Twitter.

This time I'm turning my focus to the related issue of Google Sheets. This is the native spreadsheet application for Google Drive.

Why?

Again, this is an application that is widely used at the University of York in a variety of different contexts, including for academic research data. We need to think about how we might preserve data created in Google Sheets for the longer term.


How hard can it be?

Quite hard actually - see my earlier post!


Exporting from Google Drive

For Google Sheets I followed a similar methodology to Google Docs. Taking a couple of sample spreadsheets and downloading them in the formats that Google provides, then examining these exported versions to assess how well specific features of the spreadsheet were retained.

I used the File...Download as... menu in Google Sheets to test out the available export formats

The two spreadsheets I worked with were as follows:


Here is a summary of my findings:

Microsoft Excel - xlsx

I had high hopes for the xlsx export option - however, on opening the exported xlsx version of my flexisheet I was immediately faced with an error message telling me that the file contained unreadable content and asking whether I wanted to recover the contents.

This doesn't look encouraging...

Clicking 'Yes' on this dialogue box then allows the sheet to open and another message appears telling you what has been repaired. In this case it tells me that a formula has been removed.


Excel can open the file if it removes the formula

This is not ideal if the formula is considered to be worthy of preservation.

So clearly we already know that this isn't going to be a perfect copy of the Google sheet.

This version of my flexisheet looks pretty messed up. The dates and values look OK, but none of the calculated values are there - they are all replaced with "#VALUE".

The colours on the original flexisheet are important as they flag up problems and issues with the data entered. These however are not fully retained - for example, weekends are largely (but not consistently) marked as red and in the original file they are green (because it is assumed that I am not actually meant to be working weekends).

The XLSX export does however give a better representation of the more simple menu choices Google sheet. The data is accurate, and comments are present in a partial way. Unfortunately though, replies to comments are not displayed and the comments are not associated with a date or time.


Open Document Format - ods

I tried opening the ODS version of the flexisheet in LibreOffice on a Macbook. There were no error messages (which was nice) but the sheet was a bit of a mess. There were similar issues to those that I encountered in the Excel export though it wasn't identical. The colours were certainly applied differently, neither entirely accurate to the original.

If I actually tried to use the sheet to enter more data in, the formula do not work - they do not calculate anything, though it does appear that the formula itself appears to be retained. Any values that are calculated on the original sheet are not present.

Comments are retained (and replies to comments) but no date or time appears to be associated with them (note that the data may be there but just not displaying in LibreOffice).

I also tried opening the ODS file in Microsoft Office. On opening it the same error message was displayed to the one originally encountered in the XLSX version described above and this was followed by notification that “Excel completed file level validation and repair. Some parts of this workbook may have been repaired or discarded.” Unlike the XLSX file there didn't appear to be any additional information available about exactly what had been repaired or discarded - this didn't exactly fill me with confidence!

PDF document - pdf

When downloading a spreadsheet as a PDF you are presented with a few choices - for example:
  • Should the export include all sheets, just the current sheet or current selection (note that current sheet is the default response)
  • Should the export include the document title?
  • Should the export include sheet names?
To make the export as thorough as possible I chose to export all sheets and include document title and sheet names.

As you might expect this was a good representation of the values on the spreadsheet - a digital print if you like - but all functionality and interactivity was lost. In order to re-use the data, it would need to be copied and pasted or re-typed back into a spreadsheet application.

Note that comments within the sheet were not retained and also there was no option to export sheets that were hidden.

Web page - html

This gave an accurate representation of the values on the spreadsheet, but, similar to the PDF version, not in a way that really encourages reuse. Formula were not retained and the resulting copy is just a static snapshot.

Interestingly, the comments in the menu choices example weren't retained. This surprised me because when using the html export option for Google documents one of the noted benefits was that comments were retained. Seems to be a lack of consistency here.

Another thing that surprised me about this version of the flexisheet was that it included hidden sheets (I hadn't until this point realised that there were hidden sheets!). I later discovered that the XLSX and ODS also retained the hidden sheets ...but they were (of course) hidden so I didn't immediately notice them! 

Tab delimited and comma separated values - tsv and csv

It is made clear on export that only the current sheet is exported so if using this as an export strategy you would need to ensure you exported each individual sheet one by one.

The tab delimited export of the flexisheet surprised me. In order to look at the data properly I tried importing it into MS Excel. It came up with a circular reference warning which surprised me - were some of the dynamic properties of the sheets being somehow retained (all be it in a way that was broken)?

tab_delim_error_when_import_to_Excel.png
A circular reference warning when opening the tab delimited file in Microsoft Excel

Both of these formats did a reasonable job of capturing the simple menu choices data (though note that the comments were not retained) but neither did an acceptable job of representing the complex data within the flexisheet (given that the more complex elements such as formulas and colours were not retained).

What about the metadata?

I won't go into detail again about the other features of a Google Sheet that won't be saved with these export options - for example information about who created it and when and the complete revision history that is available through Google Drive - this is covered in a previous post. Given my findings when I interviewed a researcher here at the University of York about their use of Google Sheets, the inability of the export options to capture the version history will be seen as problematic for some use cases.

What is the best export format for Google Sheets?

The short answer is 'it depends'.

The export options available all have pros and cons and as ever, the most suitable one will very much depend on the nature of the original file and the properties that you consider to be most worthy of preservation.


  • If for example the inclusion of comments is an essential requirement, XLSX or ODS will be the only formats that retain them (with varying degrees of success). 
  • If you just want a static snapshot of the data in its final form, PDF will do a good job (you must specify that all sheets are saved), but note that if you want to include hidden sheets, HTML may be a better option. 
  • If the data is required in a usable form (including a record of the formula used) you will need to try XLSX or ODS but note that calculated values present in the original sheet may be missing. Similar but not identical results were noted with XLSX and ODS so it would be worth trying them both and seeing if either is suitable for the data in question.


It should be possible to export an acceptable version of the data for a simple Google Sheet but for a complex dataset it will be difficult to find an export option that adequately retains all features.

Exporting Google Sheets seems even more problematic and variable than Google Documents and for a sheet as complex as my flexisheet it appears that there is no suitable option that retains the functionality of the sheet as well as the content.

So, here's hoping that native Google Drive files appear on the list of World's Endangered Digital Species...due to be released on International Digital Preservation Day! We will have to wait until tomorrow to find out...



A disclaimer: I carried out the best part of this work about 6 months ago but have only just got around to publishing it. Since I originally carried out the exports and noted my findings, things may have changed!


Server failures in october and november 2017

Published 28 Nov 2017 by Pierrick Le Gall in The Piwigo.com Blog.

The huge downtime at OVH that occurred on November 9th 2017 was quite like an earthquake for the European web. Of course Piwigo.com was impacted. But before that, we lived the server failure of October 7th and another one on October 14th. Let’s describe and explain what happened.

Photo by Johannes Plenio on Unsplash

Photo by Johannes Plenio on Unsplash

A) October 7th, the first server failure

On October 7th 2017, during saturday evening, our “reverse-proxy” server, the one through which all web traffic goes, crashed. OVH, our technical host, has identified a problem on the motherboard and replaced it. Web traffic was routed to the spare server during the short downtime. A server failure without real gravity, without loss of data, but which announced the start of a painful series of technical problems.

B) October 14th, a more serious server failure

A week later, on October 14th, the very same “reverse-proxy” server saw his load go into such high levels it was unable to deliver web pages… Web traffic is again switched to the spare server, in read-only mode for accounts hosted on this server. About 10 hours of investigation later, we were still not able to understand the origin of the problem. We have to decide to switch the spare server to write mode. This decision was difficult to take because it meant losing data produced between the last backup (1am) and the switch to spare server (about 8am). In other words, for the accounts hosted on this server, the photos added during the night simply “disappeared” from their Piwigo.

This is the first time in the history of Piwigo.com that we switch a spare server to write mode. Unfortunately, another problem has happened, related to the first one. To explain this problem, it is necessary to understand how Piwigo.com servers infrastructure works.

On the Piwigo.com infrastructure, servers work in pairs: a main server and its spare server. There are currently 4 pairs in production. The main server takes care of the “live operations”, while the spare server is synchronized with its main server every night and receives the web traffic in read-only during downtimes.

In the usual way, spare servers only allow read operations, ie you can visit the albums or view the photos, but not enter the administration or add photos.

One of the server pairs is what we call the “reverse-proxy”: all the web traffic of *.piwigo.com goes through this server and according to the piwigo concerned, the traffic goes to one or the other pair. Normally the reverse-proxy is configured to point to the main servers, not spare servers.

When a problem occurs on one of the main servers, we switch the traffic to its spare server. If the reverse-proxy server is concerned, we switch the IP address Fail-Over (IPFO): a mechanism that we manage on our OVH administration pannel. For other servers, we change the reverse-proxy configuration.

That’s enough for infrastructure details… let’s go back to October 14th: so we switched the IPFO to use the spare reverse-proxy server. Unfortunately, we met 2 problems in cascade:

  1. the spare reverse-proxy server, for one of the server pairs, pointed to the spare server
  2. this very spare server was configured in write mode instead of read-only

Why such an unexpected configuration?

Because we sometimes use the spare infrastructure to do real-life tests. In this case, these were IPV6 tests.

What impact for users?

During the many hours when the web traffic went through the spare reverse-proxy server, accounts hosted on the faulty server returned to the state of the previous night where photos added during night & morning had apparently disappeared but they were able to keep adding photos. This state did not trigger any specific alert : the situation seemed “normal” for the users concerned and for our monitor system. When the problem was detected, we changed the reverse proxy configuration to point back to the main server. Consequence: all the photos added during the downtime apparently disappeared.

What actions have been taken after October 14th?

1) Checks on reverse-proxy configuration

A new script was pushed on production. It checks very often that reverse-proxy is configured to send web traffic on main servers only.

2) Checks on write Vs read-only mode

Another script was pushed to production. This one checks main servers are configured in write mode and spare severs are in read-only mode.

3) Isolate third-party web applications

The “non-vital” web applications, on which we have less expertise, were switched to a third-party server dedicated to this use: 2 WordPress blogs, wiki, forum and piwik (analytics for visits). Indeed, one of the possibilities for the server failure, is that an application entered the 4th dimension or was under attack. Moving these applications into an “isolated” server helps to limit the impact of any future issue.

4) New backup system

The decision to switch a spare server to write mode, ie turn it into a main server, is a hard to take. Indeed it means giving up any hope to return to the main server. This decision is difficult because it involves accepting a loss of data.

To make this decision simpler, two measures have been taken: first to define a time threshold after which we apply the switch. In our case, if the failure lasts more than 2 hours, we will switch. Then backups must be more frequent than once a day: if the backups were only 1 or 2 hours old, the decision would have been much easier!

In addition to the daily backup, we have added a new “rolling backups” system: every 15 minutes, the script analyzes each Piwigo on specific criteria (new/modified/deleted photos/users/albums/groups…). If anything has changed since the last backup, the script backs up the Piwigo (files + database) with a synchronization on the spare server.

C) What about the giant downtime on OVH network, on October 9th and 10th ?

Being hosted at OVH, especially in the datacenter of Strasbourg (France, Europe), the downtime has greatly impacted our own infrastructure. First because our main reverse-proxy server is in Strasbourg. The datacenter failure put Piwigo.com completely out of order during the morning of November 9th (Central Europe time). Then because we could not switch the IP Fail Over. Or rather, OVH allowed us to do it, but instead of requiring ~60 seconds, it took ~10 hours! Hours when the accounts hosted on the reverse-proxy server were in read-only.

Unlike the October 14th situation, we could not make the decision to switch the spare server in write mode because an IPFO switch request was in progress, and we had no idea how long it would take OVH to apply the action.

The Piwigo.com infrastructure has returned to its normal state on November 10th at 14:46, Paris time (France).

OVH has just provided compensation for these failures. We were waiting for it to publish this blog post. The compensation is not much, compared to the actual damage, but we will fully transfer this compensation to our customers. After very high level calculations, 3 days of time credits were added to each account. It’s a small commercial gesture but we think we have to reverse it to you as a symbol!

We are sorry for these inconveniences. As you read in this blog post, we’ve improved our methods to mitigate risk in the future and reduce the impact of an irreversible server failure.


Delirious Sky

Published 27 Nov 2017 by jenimcmillan in Jeni McMillan.

DSC_0856

It is a delicious moment,

Delirious sky.

The sun burning deeply,

Her skin starts to fry.

She gathers her senses,

Surrounded by life.

When death beckons shyly,

She submits to his knife.

It’s only a metaphor,

We grow and we die,

And laugh at the Present,

The Goddess on High.


Dreaming

Published 26 Nov 2017 by jenimcmillan in Jeni McMillan.

DSC_0861

 

She sat by the river,

Dreaming.

Singing to water birds,

And frogs in the slime.

Distant places alive in her mind.

It’s not so hard, called the grasses wild,

You’re rooted to earth,

This isn’t your fault.

It’s a breathing, crumbling, uplifting result.

Her thoughts began shifting,

She rustled her leaves.

Wind carried her desires,

And soon there was peace.

The elements colluded,

Earth, water, air and fire.

She picked up her roots,

And flew to the sky.


Slim 3.9.1 (and 3.9.2) released

Published 26 Nov 2017 by in Slim Framework Blog.

After the release of 3.9.0, a regression and an unexpected side-effect of a bug fix were noticed.

Firstly, you could not clear the user’s password when using Uri::withUserInfo(''), so this is fixed in #2332.

Secondly, we discovered that return $response->withHeader('Location', '/login'); no longer redirected in a browser. This isn’t a surprise as the 302 status code isn’t explicitly set developers were relying on a feature of PHP’s header() function that set 302 for them. This side-effect was causing other issues such as #1730, so it was fixed in 3.9.0. To mitigate the effect of this change, 3.9.1 includes #2345 which sets the status code to 302 when you add a Location header if the status code is currently 200. This change will not be forward-ported to 4.x though.

The full list of changes is here

Update: Shortly after the release of 3.9.1, it was discovered that #2342 should not have been merged as it breaks BC, so this PR was reverted in 3.9.2.


My best books of 2017

Published 25 Nov 2017 by Tom Wilson in thomas m wilson.

My best books of 2017… Deeply insightful works from Yale University Press on geopolitics today, a history of consumerism in the West, a wave making read on the Anthropocene as a new era, a powerful explanation of the nature/nurture question for human identity by a very funny Californian, and a charming meander through the English […]

How do I promote a user automatically in Mediawiki and create a log of those promotions?

Published 24 Nov 2017 by sau226 in Newest questions tagged mediawiki - Webmasters Stack Exchange.

I control a Mediawiki site. Here you can see users being automatically updated and added into the extended confirmed user group.

If I have a group called "util" where I just want to add relevant code to enable autopromotion with a log entry like that, make an edit and get promoted automatically into the group before removing the bit of code would it be possible? Also what code would I have to use to gain a level of access like that?


Is it possible to find broken link anchors in MediaWiki?

Published 24 Nov 2017 by Lyubomyr Shaydariv in Newest questions tagged mediawiki - Webmasters Stack Exchange.

Probably a simple question answered million times, but I can't find an answer. MediaWiki can track missing pages and report those with Special:WantedPages. I'm not sure if it's possible, but can MediaWiki report broken anchors? Say, I have the Foo page that refers the Bar page like this: [[Bar#Name]]. Let's assume the Bar page lacks this section therefore the Name section does not exist there, but Special:WantedPages won't report this link as broken because the Bar page exists. Is there any way to find all broken anchors? Thanks in advance!


SLAM POETRY DAD

Published 16 Nov 2017 by timbaker in Tim Baker.

I recently made my public slam poetry debut at the Men of Letters event in Brisbane in the salubrious surrounds of one of Australia’s most iconic live music venues, the Zoo, in Fortitude Valley. Men of Letters is a spin off of the hugely successful...

Your Fun and Informative Guide to Consuming “Oil, Love & Oxygen”

Published 16 Nov 2017 by Dave Robertson in Dave Robertson.

The Paradox of Choice says that too many options can demotivate people, so here’s a short guide to the options for getting your ears on “Oil, Love & Oxygen”.

Gigs
For the personal touch you can always get CDs at our shows. They come with a lush booklet of lyrics and credits, and the enchanting artwork of Frans Bisschops. Discounted digital download codes are also available for Bandcamp…

Bandcamp
Bandcamp is a one-stop online shop for your album consumption needs. You can get a digital download in your choice of format, including high-resolution formats for “audiophiles and nerds”. If you go for one of the “lossless” formats such as ALAC, then you are getting the highest sound quality possible (higher than CD). Downloads also come with a digital version of the aforementioned booklet.

Bandcamp is also where you can place a mail-order for the CD if you want to get physical. Another feature of Bandcamp is fans can pay more than the minimum price if they want to support the artist.

iTunes
The iTunes store is a great simple option for those in the Apple ecosystem, because it goes straight into the library on your device(s). You also get the same digital booklet as Bandcamp, and the audio for this release has been specially “Mastered for iTunes”. This means the sound quality is a bit better than most digital downloads (though not as good as the lossless formats available on Bandcamp).

This album was mastered by Ian Shepherd who has been a vigorous campaigner against the “loudness wars”. Did you ever notice that much, maybe most, music after the early 90s started to sound flat and bland? Well one reason was the use of “brick wall limiters” to increase average loudness, but this came at the expense of dynamics. I’m glad my release is not a causality of this pointless war, but I digress.

Other Digital Download Services
The album is on many other services, so just search for “Oil, Love & Oxygen” on your preferred platform. These services don’t provide you the booklet though and are not quite as high sound quality as the above two.

Streaming (Spotify etc.)
The album is also available across all the major streaming platforms. While streaming is certainly convenient, it is typically low sound quality and pays tiny royalties to artists.

Vinyl and Tape
Interestingly these formats are seeing a bit of a resurgence around the world. I would argue this is not because they are inherently better than digital, but because digital is so often abused (e.g. the aforementioned loudness wars and the use of “lossy” formats like mp3). If you seriously want vinyl or tape though, let me know and I will consider getting old school!

Share the Love
If you like the album, then please consider telling friends, rating or reviewing the album on iTunes etc., liking our page on the book of face…

Short enough?

Share


Your Fun and Informative Guide to Consuming “Oil, Love & Oxygen”

Published 16 Nov 2017 by Dave Robertson in Dave Robertson.

The Paradox of Choice says that too many options can demotivate people, so here’s a short guide to the options for getting your ears on “Oil, Love & Oxygen”.

Gigs
For the personal touch you can always get CDs at our shows. They come with a lush booklet of lyrics and credits, and the enchanting artwork of Frans Bisschops. Discounted digital download codes are also available for Bandcamp…

Bandcamp
Bandcamp is a one-stop online shop for your album consumption needs. You can get a digital download in your choice of format, including high-resolution formats for “audiophiles and nerds”. If you go for one of the “lossless” formats such as ALAC, then you are getting the highest sound quality possible (higher than CD). Downloads also come with a digital version of the aforementioned booklet.

Bandcamp is also where you can place a mail-order for the CD if you want to get physical. Another feature of Bandcamp is fans can pay more than the minimum price if they want to support the artist.

iTunes
The iTunes store is a great simple option for those in the Apple ecosystem, because it goes straight into the library on your device(s). You also get the same digital booklet as Bandcamp, and the audio for this release has been specially “Mastered for iTunes”. This means the sound quality is a bit better than most digital downloads (though not as good as the lossless formats available on Bandcamp).

This album was mastered by Ian Shepherd who has been a vigorous campaigner against the “loudness wars”. Did you ever notice that much, maybe most, music after the early 90s started to sound flat and bland? Well one reason was the use of “brick wall limiters” to increase average loudness, but this came at the expense of dynamics. I’m glad my release is not a causality of this pointless war, but I digress.

Other Digital Download Services
The album is on many other services, so just search for “Oil, Love & Oxygen” on your preferred platform. These services don’t provide you the booklet though and are not quite as high sound quality as the above two.

Streaming (Spotify etc.)
The album is also available across all the major streaming platforms. While streaming is certainly convenient, it is typically low sound quality and pays tiny royalties to artists.

Vinyl and Tape
Interestingly these formats are seeing a bit of a resurgence around the world. I would argue this is not because they are inherently better than digital, but because digital is so often abused (e.g. the aforementioned loudness wars and the use of “lossy” formats like mp3). If you seriously want vinyl or tape though, let me know and I will consider getting old school!

Share the Love
If you like the album, then please consider telling friends, rating or reviewing the album on iTunes etc., liking our page on the book of face…

Short enough?

Share


1.5.1

Published 11 Nov 2017 by mblaney in Tags from simplepie.

1.5.1 (#559)

* Revert sanitisation type change for author and category.

* Check if the Sanitize class has been changed and update the registry.
Also preference links in the headers over links in the body to
comply with WebSub specification.

* Improvements to mf2 feed parsing.

* Switch from regex to xpath for microformats discovery.

* 1.5.1 release.

* Remove PHP 5.3 from testing.


Slim 3.9.0 released

Published 4 Nov 2017 by in Slim Framework Blog.

We are delighted to release Slim 3.9.0. As Slim 3 is stable, there’s mostly bug fixes in this version.

Probably the most noticeable changes are that we now allow any HTTP method name in the Request object and the Uri now correctly encodes the user information, which will ensure user names and passwords with reserved characters such as @ will work as you expect. Also in the HTTP component, the Request’s getParams() now allows you to provide a list of the parameters you want returned, allowing you to filter for a specific set.

As usual, there are also some bug fixes, particularly around the output buffering setting and you can now use any HTTP method you want to without getting an error.

The full list of changes is here


Understanding WordStar - check out the manuals!

Published 20 Oct 2017 by Jenny Mitcham in Digital Archiving at the University of York.

Last month I was pleased to be able to give a presentation at 'After the Digital Revolution' about some of the work I have been doing on the WordStar 4.0 files in the Marks and Gran digital archive that we hold here at the Borthwick Institute for Archives. This event specifically focused on literary archives.

It was some time ago now that I first wrote about these files that were recovered from 5.25 inch floppy (really floppy) disks deposited with us in 2009.

My original post described the process of re-discovery, data capture and file format identification - basically the steps that were carried out to get some level of control over the material and put it somewhere safe.

I recorded some of my initial observations about the files but offered no conclusions about the reasons for the idiosyncrasies.

I’ve since been able to spend a bit more time looking at the files and investigating the creating application (WordStar) so in my presentation at this event I was able to talk at length (too long as usual) about WordStar and early word processing. A topic guaranteed to bring out my inner geek!

WordStar is not an application I had any experience with in the past. I didn’t start word processing until the early 90’s when my archaeology essays and undergraduate dissertation were typed up into a DOS version of Word Perfect. Prior to that I used a typewriter (now I feel old!).

WordStar by all accounts was ahead of its time. It was the first Word Processing application to include mail merge functionality. It was hugely influential, introducing a number of keyboard shortcuts that are still used today in modern word processing applications (for example control-B to make text bold). Users interacted with WordStar using their keyboard, selecting the necessary keystrokes from a set of different menus. The computer mouse (if it was present at all) was entirely redundant.

WordStar was widely used as home computing and word processing increased in popularity through the 1980’s and into the early 90’s. However, with the introduction of Windows 3.0 and Word for Windows in 1989, WordStar gradually fell out of favour (info from Wikipedia).

Despite this it seems that WordStar had a loyal band of followers, particularly among writers. Of course the word processor was the key tool of their trade so if they found an application they were comfortable with it is understandable that they might want to stick with it.

I was therefore not surprised to hear that others presenting at 'After the Digital Revolution' also had WordStar files in their literary archives. Clear opportunities for collaboration here! If we are all thinking about how to provide access to and preserve these files for the future then wouldn't it be useful to talk about it together?

I've already learnt a lot through conversations with the National Library of New Zealand who have been carrying out work in this area (read all about it here: Gattuso J, McKinney P (2014) Converting WordStar to HTML4. iPres.)

However, this blog post is not about defining a preservation strategy for the files it is about better understanding them. My efforts have been greatly helped by finding a copy of both a WordStar 3 manual and a WordStar 4 manual online.

As noted in my previous post on this subject there were a few things that stand out when first looking at the recovered WordStar files and I've used the manuals and other research avenues to try and understand these better.


Created and last modified dates

The Marks and Gran digital archive consists of 174 files, most of which are WordStar files (and I believe them to be WordStar version 4).

Looking at the details that appear on the title pages of some of the scripts, the material appears to be from the period 1984 to 1987 (though not everything is dated).

However the system dates associated with the files themselves tell a different story. 

The majority of files in the archive have a creation date of 1st January 1980.

This was odd. Not only would that have been a very busy New Year's Day for the screen writing duo, but the timestamps on the files suggest that they were also working in the very early hours of the morning - perhaps unexpected when many people are out celebrating having just seen in the New Year!

This is the point at which I properly lost my faith in technical metadata!

In this period computers weren't quite as clever as they are today. When you switched them on they would ask you what date it was. If you didn't tell them the date, the PC would fall back to a system default ....which just so happens to be 1st January 1980.

I was interested to see Abby Adams from the Harry Ransom Center, University of Texas at Austin (also presenting at 'After the Digital Revolution') flag up some similarly suspicious dates on files in a digital archive held at her institution. Her dates differed just slightly to mine, falling on the evening of the 31st December 1979. Again, these dates looked unreliable as they were clearly out of line with the rest of the collection.

This is the same issue as mine, but the differences relate to the timezone. There is further explanation here highlighted by David Clipsham when I threw the question out to Twitter. Thanks!


Fragmentation

Another thing I had noticed about the files was the way that they were broken up into fragments. The script for a single episode was not saved as a single file but typically as 3 or 4 separate files. These files were named in such a way that it was clear that they were related and that the order that the files should be viewed or accessed was apparent - for example GINGER1, GINGER2 or PILOT and PILOTB.

This seemed curious to me - why not just save the document as a single file? The WordStar 4 manual didn't offer any clues but I found this piece of information in the WordStar 3 manual which describes how files should be split up to help manage the storage space on your diskettes:

From the WordStar 3 manual




Perhaps some of the files in the digital archive are from WordStar 3, or perhaps Marks and Gran had been previously using WordStar 3 and had just got into the habit of splitting a document into several files in order to ensure they didn't run out of space on their floppy disks.

I can not imagine working this way today! Technology really has come on a long way. Imagine trying to format, review or spell check a document that exists as several discrete files potentially sitting on different media!


Filenames

One thing that stands out when browsing the disks is that all the filenames are in capital letters. DOES ANYONE KNOW WHY THIS WAS THE CASE?

File names in this digital archive were also quite cryptic.This is the 1980’s so filenames conform to the 8.3 limit. Only 8 characters are allowed in a filename and it *may* also include a 3 character file extension.

Note that the file extension really is optional and WordStar version 4 doesn’t enforce the use of a standard file extension. Users were encouraged to use those last 3 characters of the file name to give additional context to the file content rather than to describe the file format itself.

Guidance on file naming from the WordStar 4 manual
Some of the tools and processes we have in place to analyse and process the files in our digital archives use the file extension information to help understand the format. The file naming methodology described here therefore makes me quite uncomfortable!

Marks and Gran tended not to use the file extension in this way (though there are a few examples of this in the archive). The majority of WordStar files have no extension at all. The real consistent use of file extensions related to their back up files.


Backup files

Scattered amongst the recovered data were a set of files that had the extension BAK. This clearly is a file extension that WordStar creates and uses consistently. These files clearly contained very similar content to other documents within the archive but typically with just a few differences in content. These files were clearly back up files of some sort but I wondered whether they had been created automatically or by the writers themselves.

Again the manual was helpful in moving forward my understanding on this:

Backup files from the WordStar 4 manual

This backup procedure is also summarised with the help of a diagram in the WordStar 3 manual:


The backup procedure from WordStar 3 manual


This does help explain why there were so many back up files in the archive. I guess the next question is 'should we keep them?'. It does seem that they are an artefact of the application rather than representing a conscious process by the writers to back their files up at a particular point in time and that may impact on their value. However, as discussed in a previous post on preserving Google documents there could be some benefit in preserving revision history (even if only partial).



...and finally

My understanding of these WordStar files has come on in leaps and bounds by doing a bit of research and in particular through finding copies of the manuals.

The manuals even explain why alongside the scripts within the digital archive we also have a disk that contains a copy of the WordStar application itself. 

The very first step in the manual asks users to make a copy of the software:


I do remember having to do this sort of thing in the past! From WordStar 4 manual


Of course the manuals themselves are also incredibly useful in teaching me how to actually use the software. Keystroke based navigation is hardly intuitive to those of us who are now used to using a mouse, but I think that might be the subject of another blog post!



Understanding WordStar - check out the manuals!

Published 20 Oct 2017 by Jenny Mitcham in Digital Archiving at the University of York.

Last month I was pleased to be able to give a presentation at 'After the Digital Revolution' about some of the work I have been doing on the WordStar 4.0 files in the Marks and Gran digital archive that we hold here at the Borthwick Institute for Archives. This event specifically focused on literary archives.

It was some time ago now that I first wrote about these files that were recovered from 5.25 inch floppy (really floppy) disks deposited with us in 2009.

My original post described the process of re-discovery, data capture and file format identification - basically the steps that were carried out to get some level of control over the material and put it somewhere safe.

I recorded some of my initial observations about the files but offered no conclusions about the reasons for the idiosyncrasies.

I’ve since been able to spend a bit more time looking at the files and investigating the creating application (WordStar) so in my presentation at this event I was able to talk at length (too long as usual) about WordStar and early word processing. A topic guaranteed to bring out my inner geek!

WordStar is not an application I had any experience with in the past. I didn’t start word processing until the early 90’s when my archaeology essays and undergraduate dissertation were typed up into a DOS version of Word Perfect. Prior to that I used a typewriter (now I feel old!).

WordStar by all accounts was ahead of its time. It was the first Word Processing application to include mail merge functionality. It was hugely influential, introducing a number of keyboard shortcuts that are still used today in modern word processing applications (for example control-B to make text bold). Users interacted with WordStar using their keyboard, selecting the necessary keystrokes from a set of different menus. The computer mouse (if it was present at all) was entirely redundant.

WordStar was widely used as home computing and word processing increased in popularity through the 1980’s and into the early 90’s. However, with the introduction of Windows 3.0 and Word for Windows in 1989, WordStar gradually fell out of favour (info from Wikipedia).

Despite this it seems that WordStar had a loyal band of followers, particularly among writers. Of course the word processor was the key tool of their trade so if they found an application they were comfortable with it is understandable that they might want to stick with it.

I was therefore not surprised to hear that others presenting at 'After the Digital Revolution' also had WordStar files in their literary archives. Clear opportunities for collaboration here! If we are all thinking about how to provide access to and preserve these files for the future then wouldn't it be useful to talk about it together?

I've already learnt a lot through conversations with the National Library of New Zealand who have been carrying out work in this area (read all about it here: Gattuso J, McKinney P (2014) Converting WordStar to HTML4. iPres.)

However, this blog post is not about defining a preservation strategy for the files it is about better understanding them. My efforts have been greatly helped by finding a copy of both a WordStar 3 manual and a WordStar 4 manual online.

As noted in my previous post on this subject there were a few things that stand out when first looking at the recovered WordStar files and I've used the manuals and other research avenues to try and understand these better.


Created and last modified dates

The Marks and Gran digital archive consists of 174 files, most of which are WordStar files (and I believe them to be WordStar version 4).

Looking at the details that appear on the title pages of some of the scripts, the material appears to be from the period 1984 to 1987 (though not everything is dated).

However the system dates associated with the files themselves tell a different story. 

The majority of files in the archive have a creation date of 1st January 1980.

This was odd. Not only would that have been a very busy New Year's Day for the screen writing duo, but the timestamps on the files suggest that they were also working in the very early hours of the morning - perhaps unexpected when many people are out celebrating having just seen in the New Year!

This is the point at which I properly lost my faith in technical metadata!

In this period computers weren't quite as clever as they are today. When you switched them on they would ask you what date it was. If you didn't tell them the date, the PC would fall back to a system default ....which just so happens to be 1st January 1980.

I was interested to see Abby Adams from the Harry Ransom Center, University of Texas at Austin (also presenting at 'After the Digital Revolution') flag up some similarly suspicious dates on files in a digital archive held at her institution. Her dates differed just slightly to mine, falling on the evening of the 31st December 1979. Again, these dates looked unreliable as they were clearly out of line with the rest of the collection.

This is the same issue as mine, but the differences relate to the timezone. There is further explanation here highlighted by David Clipsham when I threw the question out to Twitter. Thanks!


Fragmentation

Another thing I had noticed about the files was the way that they were broken up into fragments. The script for a single episode was not saved as a single file but typically as 3 or 4 separate files. These files were named in such a way that it was clear that they were related and that the order that the files should be viewed or accessed was apparent - for example GINGER1, GINGER2 or PILOT and PILOTB.

This seemed curious to me - why not just save the document as a single file? The WordStar 4 manual didn't offer any clues but I found this piece of information in the WordStar 3 manual which describes how files should be split up to help manage the storage space on your diskettes:

From the WordStar 3 manual




Perhaps some of the files in the digital archive are from WordStar 3, or perhaps Marks and Gran had been previously using WordStar 3 and had just got into the habit of splitting a document into several files in order to ensure they didn't run out of space on their floppy disks.

I can not imagine working this way today! Technology really has come on a long way. Imagine trying to format, review or spell check a document that exists as several discrete files potentially sitting on different media!


Filenames

One thing that stands out when browsing the disks is that all the filenames are in capital letters. DOES ANYONE KNOW WHY THIS WAS THE CASE?

File names in this digital archive were also quite cryptic.This is the 1980’s so filenames conform to the 8.3 limit. Only 8 characters are allowed in a filename and it *may* also include a 3 character file extension.

Note that the file extension really is optional and WordStar version 4 doesn’t enforce the use of a standard file extension. Users were encouraged to use those last 3 characters of the file name to give additional context to the file content rather than to describe the file format itself.

Guidance on file naming from the WordStar 4 manual
Some of the tools and processes we have in place to analyse and process the files in our digital archives use the file extension information to help understand the format. The file naming methodology described here therefore makes me quite uncomfortable!

Marks and Gran tended not to use the file extension in this way (though there are a few examples of this in the archive). The majority of WordStar files have no extension at all. The real consistent use of file extensions related to their back up files.


Backup files

Scattered amongst the recovered data were a set of files that had the extension BAK. This clearly is a file extension that WordStar creates and uses consistently. These files clearly contained very similar content to other documents within the archive but typically with just a few differences in content. These files were clearly back up files of some sort but I wondered whether they had been created automatically or by the writers themselves.

Again the manual was helpful in moving forward my understanding on this:

Backup files from the WordStar 4 manual

This backup procedure is also summarised with the help of a diagram in the WordStar 3 manual:


The backup procedure from WordStar 3 manual


This does help explain why there were so many back up files in the archive. I guess the next question is 'should we keep them?'. It does seem that they are an artefact of the application rather than representing a conscious process by the writers to back their files up at a particular point in time and that may impact on their value. However, as discussed in a previous post on preserving Google documents there could be some benefit in preserving revision history (even if only partial).



...and finally

My understanding of these WordStar files has come on in leaps and bounds by doing a bit of research and in particular through finding copies of the manuals.

The manuals even explain why alongside the scripts within the digital archive we also have a disk that contains a copy of the WordStar application itself. 

The very first step in the manual asks users to make a copy of the software:


I do remember having to do this sort of thing in the past! From WordStar 4 manual


Of course the manuals themselves are also incredibly useful in teaching me how to actually use the software. Keystroke based navigation is hardly intuitive to those of us who are now used to using a mouse, but I think that might be the subject of another blog post!



Crime and Punishment

Published 19 Oct 2017 by leonieh in State Library of Western Australia Blog.

Many Western Australians have a convict or pensioner guard in their ancestral family. The State Library has digitised some items from our heritage collections relating to convicts, the police and the early criminal justice system.

Convicts slwa_b2462917_1

Convicts Tom the dealer, Davey Evans and Paddy Paternoster b2462917

Police Gazette of Western Australia, 1876-1900
The Police Gazettes include information under various headings including apprehensions (name of person arrested, arresting constable, charge and sentence), police appointments, tickets of leave, certificates of freedom, and conditional pardons issued to convicts. You may find physical descriptions of prisoners. Deserters from military service and escaped prisoners are sought. Mention is also made of expirees leaving the colony; inquests (where held, date, name and date of death of person, verdict); licences (publican, gallon, eating, boarding and lodging houses, railway refreshment rooms, wine and beer and spirit merchants, etc. giving name of licensee, name of hotel and town or district). There are listings for missing friends; prisoners discharged; people tried at Quarter Sessions (name, offence, district, verdict); and warrants issued. There are many reasons for a name to appear in the gazettes.

We thank the Friends of Battye Library and the Sholl Bequest, for supporting the digitising of the Police Gazettes.

Click to view slideshow.

 

A great resource for researching the broader experience of WA convicts is The convict system in Western Australia, 1850-1870 by Cherry Gertzel. This thesis explains the workings of the convict system, and explores the conditions under which the convicts lived and worked, their effect on the colony and, to some extent, the attitudes of colonists to the prisoners.

Click to view slideshow.

Another valuable publication is Further correspondence on the subject of convict discipline and transportation. This comprises official documents relating to the transportation of convicts to Australia, covering the period 1810-1865, and is bound in 8 volumes.
This set from our rare book collection gives an excellent background to the subject for anyone researching convicts or convict guards, with individuals (very) occasionally being named.
The easiest way to access this wonderful resource is to type convict system under Title in our catalogue and select State Library Online from the drop-down box. Once you’ve selected a volume, you can browse through the pages by placing your cursor on the edge of a page and clicking. If you have the volume turned on, this makes a very satisfying page-turning noise! If you want to search for names, scroll down and select the Download button. You can then save a searchable PDF version to your PC. The files are fairly large so you may need to be patient.

Return of the number of wives and families of ticket-of-leave holders to be sent out to Western Australia 1859

Return of the number of wives and families of ticket-of-leave holders to be sent out to Western Australia 1859 From: Further correspondence on the subject of convict discipline and transportation, 1859-1865 p.65. [vol.8]

 There are several online diaries relating to convict voyages. The diary, including copies of letters home, of convict John Acton Wroth was kept during his transportation to Western Australia on the Mermaid in 1851 and for a while after his arrival. Wroth was only 17 years old at the time of his conviction. Apparently he was enamoured of a young woman and resorted to fraud in order to find the means to impress her. The diary spans 1851-1853 and it reveals one young man’s difficulty in finding himself far from the love and support of his family while accepting of the circumstance he has brought upon himself. Wroth subsequently settled in Toodyay and became a respected resident, raising a large family and running several businesses as well as acting for some time as local school master. Click to view slideshow.

Another interesting read is the transcript of the diary of John Gregg, carpenter on the convict ship York. This 1862 diary gives details of work each day, which was often difficult when the weather was foul and the carpenter sea-sick, and uncommon events such as attempts by convicts to escape –

“…the affair altogether must be admitted to reflect little credit on the military portion of the convict guard, for although the officer of the watch called loud and long for the guard, none were forthcoming until the prisoners were actually in custody.”

Click to view slideshow.

Diary of John Gregg, carpenter on the convict ship ‘York’, with definitions of nautical terms, compiled by Juliet Ludbrook.

Picture1

 

 

 

A letter from a convict in Australia to a brother in England, originally published in the Cornhill Magazine, April 1866 contains insights into the experience of a more educated felon and some sharp observations on convict life as lived by him upon his arrival in Western Australia-

“…you can walk about and talk with your friends as you please. So long as there is no disturbance, there is no interference”

and

“…the bond class stand in the proportion of fully five-sevenths of the entire population, and are fully conscious of their power…”

Other miscellaneous convict -related items include:

Two posters listing convict runaways with details of their convictions and descriptions:
Return of convicts who have escaped from the colony, and whose absconding has been notified to this office between the 1st June, 1850, and the 31st of March, 1859
and
List of convicts who are supposed to have escaped the Colony (a broadsheet giving the name, number and description of 83 escaped convicts).


Parade state of the Enrolled Guard, 30 March 1887, on the occasion of the inspection of the guard by Sir Frederick Napier Broome, prior to disbandment.

Parade_state_of_the_Enrolled_Guard___b1936163_2017-10-11_1638

Parade state of the Enrolled Guard… b1936163

 

British Army pensioners came out to Western Australia as convict guards. This document gives the following details for those still serving in 1887:- rank, name, regiment, age, rate of pension, length of Army service, rank when pensioned, date of joining the Enrolled Guard, medals and clasps.

 

 

 

 

 

 

Scale of remission for English convicts sentenced to penal servitude subsequent to 1 July 1857  is a table showing how much time in good behaviour convicts needed to accrue in order to qualify for privileges.

Certificate of freedom, 1869 [Certificates of freedom of convict William Dore]

This is just a small sample of convict-related material in the State Library collections that you can explore online. You can also visit the Battye Library of West Australian History to research individual convicts, policemen, pensioner guards or others involved in the criminal justice system.

 


“Why archivists need a shredder…”

Published 13 Oct 2017 by inthemailbox in In the mailbox.

Struggling to explain what it is that you do and why you do it? President of the Australian Society of Archivists, Julia Mant, gives it a red hot go in an interview for the University of Technology Sydneyhttps://itunes.apple.com/au/podcast/glamcity/id1276048279?mt=2

https://player.whooshkaa.com/player/playlist/show/1927?visual=true&sharing=true

 


Google Books and Mein Kampf

Published 10 Oct 2017 by Karen Coyle in Coyle's InFormation.

I hadn't look at Google Books in a while, or at least not carefully, so I was surprised to find that Google had added blurbs to most of the books. Even more surprising (although perhaps I should say "troubling") is that no source is given for the book blurbs. Some at least come from publisher sites, which means that they are promotional in nature. For example, here's a mildly promotional text about a literary work, from a literary publisher:



This gives a synopsis of the book, starting with:

"Throughout a single day in 1892, John Shawnessy recalls the great moments of his life..." 

It ends by letting the reader know that this was a bestseller when published in 1948, and calls it a "powerful novel."

The blurb on a 1909 version of Darwin's The Origin of Species is mysterious because the book isn't a recent publication with an online site providing the text. I do not know where this description comes from, but because the  entire thrust of this blurb is about the controversy of evolution versus the Bible (even though Darwin did not press this point himself) I'm guessing that the blurb post-dates this particular publication.


"First published in 1859, this landmark book on evolutionary biology was not the first to deal with the subject, but it went on to become a sensation -- and a controversial one for many religious people who could not reconcile Darwin's science with their faith."
That's a reasonable view to take of Darwin's "landmark" book but it isn't what I would consider to be faithful to the full import of this tome.

The blurb on Hitler's Mein Kampf is particularly troubling. If you look at different versions of the book you get both pro- and anti- Nazi sentiments, neither of which really belong  on a site that claims to be a catalog of books. Also note that because each book entry has only one blurb, the tone changes considerably depending on which publication you happen to pick from the list.


First on the list:
"Settling Accounts became Mein Kampf, an unparalleled example of muddled economics and history, appalling bigotry, and an intense self-glorification of Adolf Hitler as the true founder and builder of the National Socialist movement. It was written in hate and it contained a blueprint for violent bloodshed."

Second on the list:
"This book has set a path toward a much higher understanding of the self and of our magnificent destiny as living beings part of this Race on our planet. It shows us that we must not look at nature in terms of good or bad, but in an unfiltered manner. It describes what we must do if we want to survive as a people and as a Race."
That's horrifying. Note that both books are self-published, and the blurbs are the ones that I find on those books in Amazon, perhaps indicating that Google is sucking up books from the Amazon site. There is, or at least at one point there once was, a difference between Amazon and Google Books. Google, after all, scanned books in libraries and presented itself as a search engine for published texts; Amazon will sell you Trump's tweets on toilet paper. The only text on the Google Books page still claims that Google Books is about  search: "Search the world's most comprehensive index of full-text books." Libraries partnered with Google with lofty promises of gains in scholarship:
"Our participation in the Google Books Library Project will add significantly to the extensive digital resources the Libraries already deliver. It will enable the Libraries to make available more significant portions of its extraordinary archival and special collections to scholars and researchers worldwide in ways that will ultimately change the nature of scholarship." Jim Neal, Columbia University
I don't know how these folks now feel about having their texts intermingled with publications they would never buy and described by texts that may come from shady and unreliable sources.

Even leaving aside the grossest aspects of the blurbs and Google's hypocrisy about its commercialization of its books project, adding blurbs to the book entries with no attribution and clearly not vetting the sources is extremely irresponsible. It's also very Google to create sloppy algorithms that illustrate their basic ignorance of the content their are working with -- in this case, the world's books.


Why do I write environmental history?

Published 8 Oct 2017 by Tom Wilson in thomas m wilson.

Why bother to tell the history of the plants and animals that make up my home in Western Australia?  Partly its about reminding us of what was here on the land before, and in some ways, could be here again. In answering this question I’d like to quote the full text of Henry David Thoreau’s […]

Oil, Love & Oxygen – Album Launch

Published 29 Sep 2017 by Dave Robertson in Dave Robertson.

“Oil, Love & Oxygen” is a collection of songs about kissing, climate change, cult 70s novels and more kissing. Recorded across ten houses and almost as many years, the album is diverse mix of bittersweet indie folk, pop, rock and blues. The Kiss List bring a playful element to Dave Robertson’s songwriting, unique voice and percussive acoustic guitar work. This special launch night also features local music legends Los Porcheros, Dave Johnson, Sian Brown, Rachel Armstrong and Merle Fyshwick.

Tickets $15 through https://www.trybooking.com/SDCA , or on the door if still available

Share


Oil, Love & Oxygen – Album Launch

Published 29 Sep 2017 by Dave Robertson in Dave Robertson.

“Oil, Love & Oxygen” is a collection of songs about kissing, climate change, cult 70s novels and more kissing. Recorded across ten houses and almost as many years, the album is diverse mix of bittersweet indie folk, pop, rock and blues. The Kiss List bring a playful element to Dave Robertson’s songwriting, unique voice and percussive acoustic guitar work. This special launch night also features local music legends Los Porcheros, Dave Johnson, Sian Brown, Rachel Armstrong and Merle Fyshwick.

Tickets $15 through https://www.trybooking.com/SDCA , or on the door if still available

Share


v2.4.4

Published 27 Sep 2017 by fabpot in Tags from Twig.


v1.35.0

Published 27 Sep 2017 by fabpot in Tags from Twig.


The first UK AtoM user group meeting

Published 27 Sep 2017 by Jenny Mitcham in Digital Archiving at the University of York.

Yesterday the newly formed UK AtoM user group met for the first time at St John's College Cambridge and I was really pleased that myself and a colleague were able to attend.
Bridge of Sighs in Autumn (photo by Sally-Anne Shearn)

This group has been established to provide the growing UK AtoM community with a much needed forum for exchanging ideas and sharing experiences of using AtoM.

The meeting was attended by about 15 people though we were informed that there are nearly 50 people on the email distribution list. Interest in AtoM is certainly increasing in the UK.

As this was our first meeting, those who had made progress with AtoM were encouraged to give a brief presentation covering the following points:
  1. Where are you with AtoM (investigating, testing, using)?
  2. What do you use it for? (cataloguing, accessions, physical storage locations)
  3. What do you like about it/ what works?
  4. What don’t you like about it/ what doesn’t work?
  5. How do you see AtoM fitting into your wider technical infrastructure? (do you have separate location or accession databases etc?)
  6. What unanswered questions do you have?
It was really interesting to find out how others are using AtoM in the UK. A couple of attendees had already upgraded to the new 2.4 release so that was encouraging to see.

I'm not going to summarise the whole meeting but I made a note of people's likes and dislikes (questions 3 and 4 above). There were some common themes that came up.

Note that most users are still using AtoM 2.2 or 2.3, those who have moved to 2.4 haven't had much chance to explore it yet. It may be that some of these comments are already out of date and fixed in the new release.


What works?


AtoM seems to have lots going for it!

The words 'intuitive', 'user friendly', 'simple', 'clear' and 'flexible' were mentioned several times. One attendee described some user testing she carried out during which she found her users just getting on and using it without any introduction or explanation! Clearly a good sign!

The fact that it was standards compliant was mentioned as well as the fact that consistency was enforced. When moving from unstructured finding aids to AtoM it really does help ensure that the right bits of information are included. The fact that AtoM highlights which mandatory fields are missing at the top of a page is really helpful when checking through your own or others records.

The ability to display digital images was highlighted by others as a key selling point, particularly the browse by digital objects feature.

The way that different bits of the AtoM database interlink was a plus point that was mentioned more than once - this allows you to build up complex interconnecting records using archival descriptions and authority records and these can also be linked to accession records and a physical location.

The locations section of AtoM was thought to be 'a good thing' - for recording information about where in the building each archive is stored. This works well once you get your head around how best to use it.

Integration with Archivematica was mentioned by one user as being a key selling point for them - several people in the room were either using, or thinking of using Archivematica for digital preservation.

The user community itself and the quick and helpful responses to queries posted on the user forum were mentioned by more than one attendee. Also praised was the fact that AtoM is in continuous active development and very much moving in the right direction.


What doesn't work?


Several attendees mentioned the digital object functionality in AtoM. As well as being a clear selling point, it was also highlighted as an area that could be improved. The one-to-one relationship between an archival description and a digital object wasn't thought to be ideal and there was some discussion about linking through to external repositories - it would be nice if items linked in this way could be displayed in the AtoM image carousel even where the url doesn't end in a filename.

The typeahead search suggestions when you enter search terms were not thought to be helpful all of the time. Sometimes the closest matches do not appear in the list of suggested results.

One user mentioned that they would like a publication status that is somewhere in between draft and published. This would be useful for those records that are complete and can be viewed internally by a selected group of users who are logged in but are not available to the wider public.

More than one person mentioned that they would like to see a conservation module in AtoM.

There was some discussion about the lack of an audit trail for descriptions within AtoM. It isn't possible to see who created a record, when it was created and information about updates. This would be really useful for data quality checking, particularly when training new members of staff and volunteers.

Some concerns about scalability were mentioned - particularly for one user with a very large number of records within AtoM - the process of re-indexing AtoM can take three days.

When creating creator or access points, the drop down menu doesn’t display all the options so this causes difficulties when trying to link to the right point or establishing whether the desired record is in the system or not. This can be particularly problematic for common surnames as several different records may exist.

There are some issues with the way authority records are created currently, with no automated way of creating a unique identifier and no ability to keep authority records in draft.

A comment about the lack of auto-save and the issue of the web form timing out and losing all of your work seemed to be a shared concern for many attendees.

Other things that were mentioned included an integration with Active Directory and local workarounds that had to be put in place to make finding aids bi-lingual.


Moving forward


The group agreed that it would be useful to keep a running list of these potential areas of development for AtoM and that perhaps in the future members may be able to collaborate to jointly sponsor work to improve AtoM. This would be a really positive outcome for this new network.

I was also able to present on a recent collaboration to enable OAI-PMH harvesting of EAD from AtoM and use it as an opportunity to try to drum up support for further development of this new feature. I had to try and remember what OAI-PMH stood for and think I got 83% of it right!

Thanks to St John's College Cambridge for hosting. I look forward to our next meeting which we hope to hold here in York in the Spring.

The first UK AtoM user group meeting

Published 27 Sep 2017 by Jenny Mitcham in Digital Archiving at the University of York.

Yesterday the newly formed UK AtoM user group met for the first time at St John's College Cambridge and I was really pleased that myself and a colleague were able to attend.
Bridge of Sighs in Autumn (photo by Sally-Anne Shearn)

This group has been established to provide the growing UK AtoM community with a much needed forum for exchanging ideas and sharing experiences of using AtoM.

The meeting was attended by about 15 people though we were informed that there are nearly 50 people on the email distribution list. Interest in AtoM is certainly increasing in the UK.

As this was our first meeting, those who had made progress with AtoM were encouraged to give a brief presentation covering the following points:
  1. Where are you with AtoM (investigating, testing, using)?
  2. What do you use it for? (cataloguing, accessions, physical storage locations)
  3. What do you like about it/ what works?
  4. What don’t you like about it/ what doesn’t work?
  5. How do you see AtoM fitting into your wider technical infrastructure? (do you have separate location or accession databases etc?)
  6. What unanswered questions do you have?
It was really interesting to find out how others are using AtoM in the UK. A couple of attendees had already upgraded to the new 2.4 release so that was encouraging to see.

I'm not going to summarise the whole meeting but I made a note of people's likes and dislikes (questions 3 and 4 above). There were some common themes that came up.

Note that most users are still using AtoM 2.2 or 2.3, those who have moved to 2.4 haven't had much chance to explore it yet. It may be that some of these comments are already out of date and fixed in the new release.


What works?


AtoM seems to have lots going for it!

The words 'intuitive', 'user friendly', 'simple', 'clear' and 'flexible' were mentioned several times. One attendee described some user testing she carried out during which she found her users just getting on and using it without any introduction or explanation! Clearly a good sign!

The fact that it was standards compliant was mentioned as well as the fact that consistency was enforced. When moving from unstructured finding aids to AtoM it really does help ensure that the right bits of information are included. The fact that AtoM highlights which mandatory fields are missing at the top of a page is really helpful when checking through your own or others records.

The ability to display digital images was highlighted by others as a key selling point, particularly the browse by digital objects feature.

The way that different bits of the AtoM database interlink was a plus point that was mentioned more than once - this allows you to build up complex interconnecting records using archival descriptions and authority records and these can also be linked to accession records and a physical location.

The locations section of AtoM was thought to be 'a good thing' - for recording information about where in the building each archive is stored. This works well once you get your head around how best to use it.

Integration with Archivematica was mentioned by one user as being a key selling point for them - several people in the room were either using, or thinking of using Archivematica for digital preservation.

The user community itself and the quick and helpful responses to queries posted on the user forum were mentioned by more than one attendee. Also praised was the fact that AtoM is in continuous active development and very much moving in the right direction.


What doesn't work?


Several attendees mentioned the digital object functionality in AtoM. As well as being a clear selling point, it was also highlighted as an area that could be improved. The one-to-one relationship between an archival description and a digital object wasn't thought to be ideal and there was some discussion about linking through to external repositories - it would be nice if items linked in this way could be displayed in the AtoM image carousel even where the url doesn't end in a filename.

The typeahead search suggestions when you enter search terms were not thought to be helpful all of the time. Sometimes the closest matches do not appear in the list of suggested results.

One user mentioned that they would like a publication status that is somewhere in between draft and published. This would be useful for those records that are complete and can be viewed internally by a selected group of users who are logged in but are not available to the wider public.

More than one person mentioned that they would like to see a conservation module in AtoM.

There was some discussion about the lack of an audit trail for descriptions within AtoM. It isn't possible to see who created a record, when it was created and information about updates. This would be really useful for data quality checking, particularly when training new members of staff and volunteers.

Some concerns about scalability were mentioned - particularly for one user with a very large number of records within AtoM - the process of re-indexing AtoM can take three days.

When creating creator or access points, the drop down menu doesn’t display all the options so this causes difficulties when trying to link to the right point or establishing whether the desired record is in the system or not. This can be particularly problematic for common surnames as several different records may exist.

There are some issues with the way authority records are created currently, with no automated way of creating a unique identifier and no ability to keep authority records in draft.

A comment about the lack of auto-save and the issue of the web form timing out and losing all of your work seemed to be a shared concern for many attendees.

Other things that were mentioned included an integration with Active Directory and local workarounds that had to be put in place to make finding aids bi-lingual.


Moving forward


The group agreed that it would be useful to keep a running list of these potential areas of development for AtoM and that perhaps in the future members may be able to collaborate to jointly sponsor work to improve AtoM. This would be a really positive outcome for this new network.

I was also able to present on a recent collaboration to enable OAI-PMH harvesting of EAD from AtoM and use it as an opportunity to try to drum up support for further development of this new feature. I had to try and remember what OAI-PMH stood for and think I got 83% of it right!

Thanks to St John's College Cambridge for hosting. I look forward to our next meeting which we hope to hold here in York in the Spring.

Moving a proof of concept into production? it's harder than you might think...

Published 20 Sep 2017 by Jenny Mitcham in Digital Archiving at the University of York.

Myself and colleagues blogged a lot during the Filling the Digital Preservation Gap Project but I’m aware that I’ve gone a bit quiet on this topic since…

I was going to wait until we had a big success to announce, but follow on work has taken longer than expected. So in the meantime here is an update on where we are and what we are up to.

Background


Just to re-cap, by the end of phase 3 of Filling the Digital Preservation Gap we had created a working proof of concept at the University of York that demonstrated that it is possible create an automated preservation workflow for research data using PURE, Archivematica, Fedora and Samvera (then called Hydra!).

This is described in our phase 3 project report (and a detailed description of the workflow we were trying to implement was included as an appendix in the phase 2 report).

After the project was over, it was agreed that we should go ahead and move this into production.

Progress has been slower than expected. I hadn’t quite appreciated just how different a proof of concept is to a production-ready environment!

Here are some of the obstacles we have encountered (and in some cases overcome):

Error reporting


One of the key things that we have had to build in to the existing code in order to get it ready for production is error handling.

This was not a priority for the proof of concept. A proof of concept is really designed to demonstrate that something is possible, not to be used in earnest.

If errors happen and things stop working (which they sometimes do) you can just kill it and rebuild.

In a production environment we want to be alerted when something goes wrong so we can work out how to fix it. Alerts and errors are crucial to a system like this.

We are sorting this out by enabling Archivematica's own error handling and error catching within Automation Tools.


What happens when something goes wrong?


...and of course once things have gone wrong in Archivematica and you've fixed the underlying technical issue, you then need to deal with any remaining problems with your information packages in Archivematica.

For example, if the problems have resulted in failed transfers in Archivematica then you need to work out what you are going to do with those failed transfers. Although it is (very) tempting to just clear out Archivematica and start again, colleagues have advised me that it is far more useful to actually try and solve the problems and establish how we might handle a multitude of problematic scenarios if we were in a production environment!

So we now have scenarios in which an automated transfer has failed so in order to get things moving again we need to carry out a manual transfer of the dataset into Archivematica. Will the other parts of our workflow still work if we intervene in this way?

One issue we have encountered along the way is that though our automated transfer uses a specific 'datasets' processing configuration that we have set up within Archivematica, when we push things through manually it uses the 'default' processing configuration which is not what we want.

We are now looking at how we can encourage Archivematica to use the specified processing configuration. As described in the Archivematica documentation, you can do this by including an XML file describing your processing configuration within your transfer.

It is useful to learn lessons like this outside of a production environment!


File size/upload


Although our project recognised that there would be limit to the size of dataset that we could accept and process with our application, we didn't really bottom out what size dataset we intended to support.

It has now been agreed that we should reasonably expect the data deposit form to accept datasets of up to 20 GB in size. Anything larger than this would need to be handed in a different way.

Testing the proof of concept in earnest showed that it was not able to handle datasets of over 1 GB in size. Its primary purpose was to demonstrate the necessary integrations and workflow not to handle larger files.

Additional (and ongoing) work was required to enable the web deposit form to work with larger datasets.


Space


In testing the application we of course ended up trying to push some quite substantial datasets through it.

This was fine until everything abrubtly seemed to stop working!

The problem was actually a fairly simple one but because of our own inexperience with Archivematica it took a while to troubleshoot and get things moving in the right direction again.

It turned out that we hadn’t allocated enough space in one of the bits of filestore that Archivematica uses for failed transfers (/var/archivematica/sharedDirectory/failed). This had filled up and was stopping Archivematica from doing anything else.

Once we knew the cause of the problem the available space was increased but then everything ground to a halt again because we had quickly used that up again ….increasing the space had got things moving but of course while we were trying to demonstrate the fact that it wasn't working, we had deposited several further datasets which were waiting in the transfer directory and quickly blocked things up again.

On a related issue, one of the test datasets I had been using to see how well Research Data York could handle larger datasets consisted of c.5 GB consisting of about 2000 JPEG images. Of course one of the default normalisation tasks in Archivematica is to convert all of these JPEGs to TIFF.

Once this collection of JPEGs were converted to TIFF the size of the dataset increased to around 80 GB. Until I witnessed this it hadn't really occurred to me that this could cause problems.

The solution - allocate Archivematica much more space than you think it will need!

We also now have the filestore set up so that it will inform us when the space in these directories gets to 75% full. Hopefully this will allow us to stop the filestore filling up in the future.


Workflow


The proof of concept did not undergo rigorous testing - it was designed for demonstration purposes only.

During the project we thought long and hard about the deposit, request and preservation workflows that we wanted to support, but we were always aware that once we had it in an environment that we could all play with and test, additional requirements would emerge.

As it happens, we have discovered that the workflow implemented is very true to that described in the appendix of our phase 2 report and does meet our needs. However, there are lots of bits of fine tuning required to enhance the functionality and make the interface more user friendly.

The challenge here is to try to carry out the minimum of work required to turn it into an adequate solution to take into production. There are so many enhancements we could make – I have a wish list as long as my arm – but until we better understand whether a local solution or a shared solution (provided by the Jisc Research Data Shared Service) will be adopted in the future it is not worth trying to make this application perfect.

Making it fit for production is the priority. Bells and whistles can be added later as necessary!





My thanks to all those who have worked on creating, developing, troubleshooting and testing this application and workflow. It couldn't have happened without you!


Moving a proof of concept into production? it's harder than you might think...

Published 20 Sep 2017 by Jenny Mitcham in Digital Archiving at the University of York.

Myself and colleagues blogged a lot during the Filling the Digital Preservation Gap Project but I’m aware that I’ve gone a bit quiet on this topic since…

I was going to wait until we had a big success to announce, but follow on work has taken longer than expected. So in the meantime here is an update on where we are and what we are up to.

Background


Just to re-cap, by the end of phase 3 of Filling the Digital Preservation Gap we had created a working proof of concept at the University of York that demonstrated that it is possible create an automated preservation workflow for research data using PURE, Archivematica, Fedora and Samvera (then called Hydra!).

This is described in our phase 3 project report (and a detailed description of the workflow we were trying to implement was included as an appendix in the phase 2 report).

After the project was over, it was agreed that we should go ahead and move this into production.

Progress has been slower than expected. I hadn’t quite appreciated just how different a proof of concept is to a production-ready environment!

Here are some of the obstacles we have encountered (and in some cases overcome):

Error reporting


One of the key things that we have had to build in to the existing code in order to get it ready for production is error handling.

This was not a priority for the proof of concept. A proof of concept is really designed to demonstrate that something is possible, not to be used in earnest.

If errors happen and things stop working (which they sometimes do) you can just kill it and rebuild.

In a production environment we want to be alerted when something goes wrong so we can work out how to fix it. Alerts and errors are crucial to a system like this.

We are sorting this out by enabling Archivematica's own error handling and error catching within Automation Tools.


What happens when something goes wrong?


...and of course once things have gone wrong in Archivematica and you've fixed the underlying technical issue, you then need to deal with any remaining problems with your information packages in Archivematica.

For example, if the problems have resulted in failed transfers in Archivematica then you need to work out what you are going to do with those failed transfers. Although it is (very) tempting to just clear out Archivematica and start again, colleagues have advised me that it is far more useful to actually try and solve the problems and establish how we might handle a multitude of problematic scenarios if we were in a production environment!

So we now have scenarios in which an automated transfer has failed so in order to get things moving again we need to carry out a manual transfer of the dataset into Archivematica. Will the other parts of our workflow still work if we intervene in this way?

One issue we have encountered along the way is that though our automated transfer uses a specific 'datasets' processing configuration that we have set up within Archivematica, when we push things through manually it uses the 'default' processing configuration which is not what we want.

We are now looking at how we can encourage Archivematica to use the specified processing configuration. As described in the Archivematica documentation, you can do this by including an XML file describing your processing configuration within your transfer.

It is useful to learn lessons like this outside of a production environment!


File size/upload


Although our project recognised that there would be limit to the size of dataset that we could accept and process with our application, we didn't really bottom out what size dataset we intended to support.

It has now been agreed that we should reasonably expect the data deposit form to accept datasets of up to 20 GB in size. Anything larger than this would need to be handed in a different way.

Testing the proof of concept in earnest showed that it was not able to handle datasets of over 1 GB in size. Its primary purpose was to demonstrate the necessary integrations and workflow not to handle larger files.

Additional (and ongoing) work was required to enable the web deposit form to work with larger datasets.


Space


In testing the application we of course ended up trying to push some quite substantial datasets through it.

This was fine until everything abrubtly seemed to stop working!

The problem was actually a fairly simple one but because of our own inexperience with Archivematica it took a while to troubleshoot and get things moving in the right direction again.

It turned out that we hadn’t allocated enough space in one of the bits of filestore that Archivematica uses for failed transfers (/var/archivematica/sharedDirectory/failed). This had filled up and was stopping Archivematica from doing anything else.

Once we knew the cause of the problem the available space was increased but then everything ground to a halt again because we had quickly used that up again ….increasing the space had got things moving but of course while we were trying to demonstrate the fact that it wasn't working, we had deposited several further datasets which were waiting in the transfer directory and quickly blocked things up again.

On a related issue, one of the test datasets I had been using to see how well Research Data York could handle larger datasets consisted of c.5 GB consisting of about 2000 JPEG images. Of course one of the default normalisation tasks in Archivematica is to convert all of these JPEGs to TIFF.

Once this collection of JPEGs were converted to TIFF the size of the dataset increased to around 80 GB. Until I witnessed this it hadn't really occurred to me that this could cause problems.

The solution - allocate Archivematica much more space than you think it will need!

We also now have the filestore set up so that it will inform us when the space in these directories gets to 75% full. Hopefully this will allow us to stop the filestore filling up in the future.


Workflow


The proof of concept did not undergo rigorous testing - it was designed for demonstration purposes only.

During the project we thought long and hard about the deposit, request and preservation workflows that we wanted to support, but we were always aware that once we had it in an environment that we could all play with and test, additional requirements would emerge.

As it happens, we have discovered that the workflow implemented is very true to that described in the appendix of our phase 2 report and does meet our needs. However, there are lots of bits of fine tuning required to enhance the functionality and make the interface more user friendly.

The challenge here is to try to carry out the minimum of work required to turn it into an adequate solution to take into production. There are so many enhancements we could make – I have a wish list as long as my arm – but until we better understand whether a local solution or a shared solution (provided by the Jisc Research Data Shared Service) will be adopted in the future it is not worth trying to make this application perfect.

Making it fit for production is the priority. Bells and whistles can be added later as necessary!





My thanks to all those who have worked on creating, developing, troubleshooting and testing this application and workflow. It couldn't have happened without you!


How do you deal with mass spam on MediaWiki?

Published 19 Sep 2017 by sau226 in Newest questions tagged mediawiki - Webmasters Stack Exchange.

What would be the best way to find a users IP address on MediaWiki if all the connections were proxied through squid proxy server and you have access to all user rights?

I am a steward on a centralauth based wiki and we have lots of spam accounts registering and making 1 spam page each.

Can someone please tell me what the best way to mass block them is as I keep on having to block each user individually and lock their accounts?


HAPPY RETIREMENT, MR GAWLER

Published 18 Sep 2017 by timbaker in Tim Baker.

The author (centre) with Ruth and Ian Gawler Recently a great Australian, a man who has helped thousands of others in their most vulnerable and challenging moments, a Member of the Order of Australia, quietly retired from a long and remarkable career of public service....

Harvesting EAD from AtoM: we need your help!

Published 18 Sep 2017 by Jenny Mitcham in Digital Archiving at the University of York.

Back in February I published a blog post about a project to develop AtoM to allow EAD (Encoded Archival Description) to be harvested via OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting): “Harvesting EAD from AtoM: a collaborative approach

Now that AtoM version 2.4 is released (hooray!), containing the functionality we have sponsored, I thought it was high time I updated you on what has been achieved by this project, where more work is needed and how the wider AtoM community can help.


What was our aim?


Our development work had a few key aims:

  • To enable finding aids from AtoM to be exposed as EAD 2002 XML for others to harvest. The partners who sponsored this project were particularly keen to enable the Archives Hub to harvest their EAD.
  • To change the way that EAD was generated by AtoM in order to make it more scalable. Moving EAD generation from the web browser to the job scheduler was considered to be the best approach here.
  • To make changes to the existing DC (Dublin Core) metadata generation feature so that it also works through the job scheduler - making this existing feature more scalable and able to handle larger quantities of data

A screen shot of the job scheduler in AtoM - showing the EAD and
DC creation jobs that have been completed

What have we achieved?

The good

We believe that the EAD harvesting feature as released in AtoM version 2.4 will enable a harvester such as the Archives Hub to harvest our catalogue metadata from AtoM as EAD. As we add new top level archival descriptions to our catalogue, subsequent harvests should pick up and display these additional records. 

This is a considerable achievement and something that has been on our wishlist for some time. This will allow our finding aids to be more widely signposted. Having our data aggregated and exposed by others is key to ensuring that potential users of our archives can find the information that they need.

Changes have also been made to the way metadata (both EAD and Dublin Core) are generated in AtoM. This means that the solution going forward is more scalable for those AtoM instances that have very large numbers of records or large descriptive hierarchies.

The new functionality in AtoM around OAI-PMH harvesting of EAD and settings for moving XML creation to the job scheduler is described in the AtoM documentation.

The not-so-good

Unfortunately the EAD harvesting functionality within AtoM 2.4 will not do everything we would like it to do. 

It does not at this point include the ability for the harvester to know when metadata records have been updated or deleted. It also does not pick up new child records that are added into an existing descriptive hierarchy. 

We want to be able to edit our records once within AtoM and have any changes reflected in the harvested versions of the data. 

We don’t want our data to become out of sync. 

So clearly this isn't ideal.

The task of enabling full harvesting functionality for EAD was found to be considerably more complex than first anticipated. This has no doubt been confounded by the hierarchical nature of the EAD which differs from the simplicity of the traditional Dublin Core approach.

The problems encountered are certainly not insurmountable, but lack of additional resources and timelines for the release of AtoM 2.4 stopped us from being able to finish off this work in full.

A note on scalability


Although the development work deliberately set out to consider issues of scalability, it turns out that scalability is actually on a sliding scale!

The National Library of Wales had the forethought to include one of their largest archival descriptions as sample data for inclusion in the version of AtoM 2.4 that Artefactual deployed for testing. Their finding aid for St David’s Diocesan Records is a very large descriptive hierarchy consisting of 33,961 individual entries. This pushed the capabilities of EAD creation (even when done via the job scheduler) and also led to discussions with The Archives Hub about exactly how they would process and display such a large description at their end even if EAD generation within AtoM were successful.

Some more thought and more manual workarounds will need to be put in place to manage the harvesting and subsequent display of large descriptions such as these.

So what next?


We are keen to get AtoM 2.4 installed at the Borthwick Institute for Archives over the next couple of months. We are currently on version 2.2 and would like to start benefiting from all the new features that have been introduced available... and of course to test in earnest the EAD harvesting feature that we have jointly sponsored.

We already know that this feature will not fully meet our needs in its current form, but would like to set up an initial harvest with the Archives Hub and further test some of our assumptions about how this will work.

We may need to put some workarounds in place to ensure that we have a way of reflecting updates and deletions in the harvested data – either with manual deletes or updates or a full delete and re-harvest periodically.

Harvesting in AtoM 2.4 - some things that need to change


So we have a list of priority things that need to be improved in order to get EAD harvesting working more smoothly in the future:


In line with the OAI-PMH specification

  • AtoM needs to expose updates to the metadata to the harvester
  • AtoM needs to expose new records (at any level of description) to the harvester
  • AtoM needs to expose information about deletions to the harvester
  • AtoM also needs to expose information about deletions to DC metadata to the harvester (it has come to my attention during the course of this project that this isn’t happening at the moment) 

Some other areas of potential work


I also wanted to bring together and highlight some other areas of potential work for the future. These are all things that were discussed during the course of the project but were not within the scope of our original development goals.

  • Harvesting of EAC (Encoded Archival Context) - this is the metadata standard for authority records. Is this something people would like to see enabled in the future? Of course this is only useful if you have someone who actually wants to harvest this information!
  • On the subject of authority records, it would be useful to change the current AtoM EAD template to use @authfilenumber and @source - so that an EAD record can link back to the relevant authority record in the local AtoM site. The ability to create rich authority records is such a key strength of AtoM, allowing an institution to weave rich interconnecting stories about their holdings. If harvesting doesn’t preserve this inter-connectivity then I think we are missing a trick!
  • EAD3 - this development work has deliberately not touched on the new EAD standard. Firstly, this would have been a much bigger job and secondly, we are looking to have our EAD harvested by The Archives Hub and they are not currently working with EAD3. This may be a priority area of work for the future.
  • Subject source - the subject source (for example "Library of Congress Subject Headings") doesn't appear in AtoM generated EAD at the moment even though it can be entered into AtoM - this would be a really useful addition to the EAD.
  • Visible elements - AtoM allows you to decide which elements you wish to display/hide in your local AtoM interface. With the exception of information relating to physical storage, the XML generation tasks currently do not take account of visible elements and will carry out an export of all fields. Further investigation of this should be carried out in the future. If an institution is using the visible elements feature to hide certain bits of information that should not be more widely distributed, they would be concerned if this information was being harvested and displayed elsewhere. As certain elements will be required in order to create valid EAD, this may get complicated!
  • ‘Manual’ EAD generation - the project team discussed the possibility of adding a button to the AtoM user interface so that staff users can manually kick-off EAD regeneration for a single descriptive hierarchy. Artefactual suggested this as a method of managing the process of EAD generation for large descriptive hierarchies. You would not want the EAD to regenerate with each minor tweak if a large archival description was undergoing several updates, however, you need to be able to trigger this task when you are ready to do so. It should be possible to switch off the automatic EAD re-generation (which normally triggers when a record is edited and saved) but have a button on the interface that staff can click when they want to initiate the process - for example when all edits are complete. 
  • As part of their work on this project, Artefactual created a simple script to help with the process of generating EAD for large descriptive hierarchies - it basically provides a way of finding out which XML files relate to a specific archival description so that EAD can be manually enhanced and updated if it is too large for AtoM to generate via the job scheduler. It would be useful to turn this script into a command-line task that is maintained as part of the AtoM codebase.

We need your help!


Although we believe we have something we can work with here and now, we are not under any illusions that this feature does all that it needs to in order to meet our requirements in the longer term. 

I would love to find out what other AtoM users (and harvesters) think of the feature. Is it useful to you? Are there other things we should put on the wishlist? 

There is a lot of additional work described in this post which the original group of project partners are unlikely to be able to fund on their own. If EAD harvesting is a priority to you and your organisation and you think you can contribute to further work in this area either on your own or as part of a collaborative project please do get in touch.


Thanks


I’d like to finish with a huge thanks to those organisations who have helped make this project happen, either through sponsorship, development or testing and feedback.



Harvesting EAD from AtoM: we need your help!

Published 18 Sep 2017 by Jenny Mitcham in Digital Archiving at the University of York.

Back in February I published a blog post about a project to develop AtoM to allow EAD (Encoded Archival Description) to be harvested via OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting): “Harvesting EAD from AtoM: a collaborative approach

Now that AtoM version 2.4 is released (hooray!), containing the functionality we have sponsored, I thought it was high time I updated you on what has been achieved by this project, where more work is needed and how the wider AtoM community can help.


What was our aim?


Our development work had a few key aims:

  • To enable finding aids from AtoM to be exposed as EAD 2002 XML for others to harvest. The partners who sponsored this project were particularly keen to enable the Archives Hub to harvest their EAD.
  • To change the way that EAD was generated by AtoM in order to make it more scalable. Moving EAD generation from the web browser to the job scheduler was considered to be the best approach here.
  • To make changes to the existing DC (Dublin Core) metadata generation feature so that it also works through the job scheduler - making this existing feature more scalable and able to handle larger quantities of data

A screen shot of the job scheduler in AtoM - showing the EAD and
DC creation jobs that have been completed

What have we achieved?

The good

We believe that the EAD harvesting feature as released in AtoM version 2.4 will enable a harvester such as the Archives Hub to harvest our catalogue metadata from AtoM as EAD. As we add new top level archival descriptions to our catalogue, subsequent harvests should pick up and display these additional records. 

This is a considerable achievement and something that has been on our wishlist for some time. This will allow our finding aids to be more widely signposted. Having our data aggregated and exposed by others is key to ensuring that potential users of our archives can find the information that they need.

Changes have also been made to the way metadata (both EAD and Dublin Core) are generated in AtoM. This means that the solution going forward is more scalable for those AtoM instances that have very large numbers of records or large descriptive hierarchies.

The new functionality in AtoM around OAI-PMH harvesting of EAD and settings for moving XML creation to the job scheduler is described in the AtoM documentation.

The not-so-good

Unfortunately the EAD harvesting functionality within AtoM 2.4 will not do everything we would like it to do. 

It does not at this point include the ability for the harvester to know when metadata records have been updated or deleted. It also does not pick up new child records that are added into an existing descriptive hierarchy. 

We want to be able to edit our records once within AtoM and have any changes reflected in the harvested versions of the data. 

We don’t want our data to become out of sync. 

So clearly this isn't ideal.

The task of enabling full harvesting functionality for EAD was found to be considerably more complex than first anticipated. This has no doubt been confounded by the hierarchical nature of the EAD which differs from the simplicity of the traditional Dublin Core approach.

The problems encountered are certainly not insurmountable, but lack of additional resources and timelines for the release of AtoM 2.4 stopped us from being able to finish off this work in full.

A note on scalability


Although the development work deliberately set out to consider issues of scalability, it turns out that scalability is actually on a sliding scale!

The National Library of Wales had the forethought to include one of their largest archival descriptions as sample data for inclusion in the version of AtoM 2.4 that Artefactual deployed for testing. Their finding aid for St David’s Diocesan Records is a very large descriptive hierarchy consisting of 33,961 individual entries. This pushed the capabilities of EAD creation (even when done via the job scheduler) and also led to discussions with The Archives Hub about exactly how they would process and display such a large description at their end even if EAD generation within AtoM were successful.

Some more thought and more manual workarounds will need to be put in place to manage the harvesting and subsequent display of large descriptions such as these.

So what next?


We are keen to get AtoM 2.4 installed at the Borthwick Institute for Archives over the next couple of months. We are currently on version 2.2 and would like to start benefiting from all the new features that have been introduced available... and of course to test in earnest the EAD harvesting feature that we have jointly sponsored.

We already know that this feature will not fully meet our needs in its current form, but would like to set up an initial harvest with the Archives Hub and further test some of our assumptions about how this will work.

We may need to put some workarounds in place to ensure that we have a way of reflecting updates and deletions in the harvested data – either with manual deletes or updates or a full delete and re-harvest periodically.

Harvesting in AtoM 2.4 - some things that need to change


So we have a list of priority things that need to be improved in order to get EAD harvesting working more smoothly in the future:


In line with the OAI-PMH specification

  • AtoM needs to expose updates to the metadata to the harvester
  • AtoM needs to expose new records (at any level of description) to the harvester
  • AtoM needs to expose information about deletions to the harvester
  • AtoM also needs to expose information about deletions to DC metadata to the harvester (it has come to my attention during the course of this project that this isn’t happening at the moment) 

Some other areas of potential work


I also wanted to bring together and highlight some other areas of potential work for the future. These are all things that were discussed during the course of the project but were not within the scope of our original development goals.

  • Harvesting of EAC (Encoded Archival Context) - this is the metadata standard for authority records. Is this something people would like to see enabled in the future? Of course this is only useful if you have someone who actually wants to harvest this information!
  • On the subject of authority records, it would be useful to change the current AtoM EAD template to use @authfilenumber and @source - so that an EAD record can link back to the relevant authority record in the local AtoM site. The ability to create rich authority records is such a key strength of AtoM, allowing an institution to weave rich interconnecting stories about their holdings. If harvesting doesn’t preserve this inter-connectivity then I think we are missing a trick!
  • EAD3 - this development work has deliberately not touched on the new EAD standard. Firstly, this would have been a much bigger job and secondly, we are looking to have our EAD harvested by The Archives Hub and they are not currently working with EAD3. This may be a priority area of work for the future.
  • Subject source - the subject source (for example "Library of Congress Subject Headings") doesn't appear in AtoM generated EAD at the moment even though it can be entered into AtoM - this would be a really useful addition to the EAD.
  • Visible elements - AtoM allows you to decide which elements you wish to display/hide in your local AtoM interface. With the exception of information relating to physical storage, the XML generation tasks currently do not take account of visible elements and will carry out an export of all fields. Further investigation of this should be carried out in the future. If an institution is using the visible elements feature to hide certain bits of information that should not be more widely distributed, they would be concerned if this information was being harvested and displayed elsewhere. As certain elements will be required in order to create valid EAD, this may get complicated!
  • ‘Manual’ EAD generation - the project team discussed the possibility of adding a button to the AtoM user interface so that staff users can manually kick-off EAD regeneration for a single descriptive hierarchy. Artefactual suggested this as a method of managing the process of EAD generation for large descriptive hierarchies. You would not want the EAD to regenerate with each minor tweak if a large archival description was undergoing several updates, however, you need to be able to trigger this task when you are ready to do so. It should be possible to switch off the automatic EAD re-generation (which normally triggers when a record is edited and saved) but have a button on the interface that staff can click when they want to initiate the process - for example when all edits are complete. 
  • As part of their work on this project, Artefactual created a simple script to help with the process of generating EAD for large descriptive hierarchies - it basically provides a way of finding out which XML files relate to a specific archival description so that EAD can be manually enhanced and updated if it is too large for AtoM to generate via the job scheduler. It would be useful to turn this script into a command-line task that is maintained as part of the AtoM codebase.

We need your help!


Although we believe we have something we can work with here and now, we are not under any illusions that this feature does all that it needs to in order to meet our requirements in the longer term. 

I would love to find out what other AtoM users (and harvesters) think of the feature. Is it useful to you? Are there other things we should put on the wishlist? 

There is a lot of additional work described in this post which the original group of project partners are unlikely to be able to fund on their own. If EAD harvesting is a priority to you and your organisation and you think you can contribute to further work in this area either on your own or as part of a collaborative project please do get in touch.


Thanks


I’d like to finish with a huge thanks to those organisations who have helped make this project happen, either through sponsorship, development or testing and feedback.



Jason Scott Talks His Way Out of It: A Podcast

Published 14 Sep 2017 by Jason Scott in ASCII by Jason Scott.

Next week I start a podcast.

There’s a Patreon for the podcast with more information here.

Let me unpack a little of the thinking.

Through the last seven years, since I moved back to NY, I’ve had pretty variant experiences of debt or huge costs weighing me down. Previously, I was making some serious income from a unix admin job, and my spending was direct but pretty limited. Since then, even with full-time employment (and I mean, seriously, a dream job), I’ve made some grandiose mistakes with taxes, bills and tracking down old obligations that means I have some notable costs floating in the background.

Compound that with a new home I’ve moved to with real landlords that aren’t family and a general desire to clean up my life, and I realized I needed some way to make extra money that will just drop directly into the bill pit, never to really pass into my hands.

How, then, to do this?

I work very long hours for the Internet Archive, and I am making a huge difference in the world working for them. It wouldn’t be right or useful for me to take on any other job. I also don’t want to be doing something like making “stuff” that I sell or otherwise speculate into some market. Leave aside I have these documentaries to finish, and time has to be short.

Then take into account that I can no longer afford to drop money going to anything other than a small handful of conferences that aren’t local to me (the NY-CT-NJ Tri-State area), and that people really like the presentations I give.

So, I thought, how about me giving basically a presentation once a week? What if I recorded me giving a sort of fireside chat or conversational presentation about subjects I would normally give on the road, but make them into a downloadable podcast? Then, I hope, everyone would be happy: fans get a presentation. I get away from begging for money to pay off debts. I get to refine my speaking skills. And maybe the world gets something fun out of the whole deal.

Enter a podcast, funded by a Patreon.

The title: Jason Talks His Way Out of It, my attempt to write down my debts and share the stories and thoughts I have.

I announced the Patreon on my 47th birthday. Within 24 hours, about 100 people had signed up, paying some small amount (or not small, in some cases) for each published episode. I had a goal of $250/episode to make it worthwhile, and we passed that handily. So it’s happening.

I recorded a prototype episode, and that’s up there, and the first episode of the series drops Monday. These are story-based presentations roughly 30 minutes long apiece, and I will continue to do them as long as it makes sense to.

Public speaking is something I’ve done for many, many years, and I enjoy it, and I get comments that people enjoy them very much. My presentation on That Awesome Time I Was Sued for Two Billion Dollars has passed 800,000 views on the various copies online.

I spent $40 improving my sound setup, which should work for the time being. (I already had a nice microphone and a SSD-based laptop which won’t add sound to the room.) I’m going to have a growing list of topics I’ll work from, and I’ll stay in communication with the patrons.

Let’s see what this brings.

One other thing: Moving to the new home means that a lot of quality of life issues have been fixed, and my goal is to really shoot forward finishing those two documentaries I owe people. I want them done as much as everyone else! And with less looming bills and debts in my life, it’ll be all I want to do.

So, back the new podcast if you’d like. It’ll help a lot.


An Eventlogging adventure

Published 14 Sep 2017 by in Posts on The bugalore.

What the heck is eventlogging? Eventlogging is a MediaWiki extension which lets us log events such as how users interact with a certain feature (client-side logging) or capturing the state of a system (user, permissions etc.) when a certain event happens (server-side logging). There are 3 different parts to eventlogging an event. The schema, the code and the log data. I won’t be going into the details of that because there’s a detailed guide for it.

Does Mediawiki encrypt logins by default as the browser sends them to the server?

Published 11 Sep 2017 by user1258361 in Newest questions tagged mediawiki - Server Fault.

Several searches only turned up questions about encrypting login info on the server side. Does Mediawiki encrypt logins after you type them in the browser and send them? (to prevent a man-in-the-middle from reading them in transit and taking over an account)


The Bounty of the Ted Nelson Junk Mail

Published 9 Sep 2017 by Jason Scott in ASCII by Jason Scott.

At the end of May, I mentioned the Ted Nelson Junk Mail project, where a group of people were scanning in boxes of mailings and pamphlets collected by Ted Nelson and putting them on the Internet Archive. Besides the uniqueness of the content, it was also unique in that we were trying to set it up to be self-sustaining from volunteer monetary contributions, and the compensate the scanners doing the work.

This entire endeavor has been wildly successful.

We are well past 18,000 pages scanned. We have taken in thousands in donations. And we now have three people scanning and one person entering metadata.

Here is the spreadsheet with transparency and donation information.

I highly encourage donating.

But let’s talk about how this collection continues to be amazing.

Always, there are the pure visuals. As we’re scanning away, we’re starting to see trends in what we have, and everything seems to go from the early 1960s to the early 1990s, a 30-year scope that encompasses a lot of companies and a lot of industries. These companies are trying to thrive in a whirlpool of competing attention, especially in certain technical fields, and they try everything from humor to class to rudimentary fear-and-uncertainty plays in the art.

These are exquisitely designed brochures, in many cases – obviously done by a firm or with an in-house group specifically tasked with making the best possible paper invitations and with little expense spared. After all, this might be the only customer-facing communication a company could have about its products, and might be the best convincing literature after the salesman has left or the envelope is opened.

Scanning at 600dpi has been a smart move – you can really zoom in and see detail, find lots to play with or study or copy. Everything is at this level, like this detail about a magnetic eraser that lets you see the lettering on the side.

Going after these companies for gender roles or other out-of-fashion jokes almost feels like punching down, but yeah, there’s a lot of it. Women draped over machines, assumptions that women will be doing the typing, and clunky humor about fulfilling your responsibilities as a (male) boss abounds. Cultural norms regarding what fears reigned in business or how companies were expected to keep on top of the latest trends are baked in there too.

The biggest obstacle going forward, besides bringing attention to this work, is going to be one of findability. The collection is not based on some specific subject matter other than what attracted Ted’s attention over the decades. He tripped lightly among aerospace, lab science, computers, electronics, publishing… nothing escaped his grasp, especially in technical fields.

If people are looking for pure aesthetic beauty, that is, “here’s a drawing of something done in a very old way” or “here are old fonts”, then this bounty is already, at 1,700 items, a treasure trove that could absorb weeks of your time. Just clicking around to items that on first blush seem to have boring title pages will often expand into breathtaking works of art and design.

I’m not worried about that part, frankly – these kind of sell themselves.

But there’s so much more to find among these pages, and as we’re now up to so many examples, it’s going to be a challenge to get researching folks to find them.

We have the keywording active, so you can search for terms like monitor, circuit, or hypercard and get more specific matches without concentrating on what the title says or what graphics appear on the front. The Archive has a full-text search, and so people looking for phrases will no doubt stumble into this collection.

But how easily will people even think to know about a wristwatch for the Macintosh from 1990, a closed circuit camera called the Handy Looky..  or this little graphic, nestled away inside a bland software catalog:

…I don’t know. I’ll mention that this is actually twitter-fodder among archivists, who are unhappy when someone is described as “discovering” something in the archives, when it was obvious a person cataloged it and put it there.

But that’s not the case here. Even Kyle, who’s doing the metadata, is doing so in a descriptive fashion, and on a rough day of typing in descriptions, he might not particularly highlight unique gems in the pile (he often does, though). So, if you discover them in there, you really did discover them.

So, the project is deep, delightful, and successful. The main consideration of this is funding; we are paying the scanners $10/hr to scan and the metadata is $15/hr. They work fast and efficiently. We track them on the spreadsheet. But that means a single day of this work can cause a notable bill. We’re asking people on twitter to raise funds, but it never hurts to ask here as well. Consider donating to this project, because we may not know for years how much wonderful history is saved here.

Please share the jewels you find.


Blog? Bleurgh.

Published 9 Sep 2017 by in Posts on The bugalore.

The what, the why, the when, the who, the how: I’ve been coding for a while now, mostly on Wikimedia projects. Every time I get stuck with a bug or come across something I didn’t know of, I’d learn something new, fix it and move on. While this works great for me, it’s a wealth of knowledge that I’m keeping all to myself. I wanted to be able to pass on the intriguing lessons I learnt to other people who might want to hear the stories I brought back from the pits I fell into (no, not like Bruce Wayne).

4 Months!

Published 9 Sep 2017 by Jason Scott in ASCII by Jason Scott.

It’s been 4 months since my last post! That’s one busy little Jason summer, to be sure.

Obviously, I’m still around, so no heart attack lingering or problems. My doctor told me that my heart is basically healed, and he wants more exercise out of me. My diet’s continued to be lots of whole foods, leafy greens and occasional shameful treats that don’t turn into a staple.

I spent a good month working with good friends to clear out the famous Information Cube, sorting out and mailing/driving away all the contents to other institutions, including the Internet Archive, the Strong Museum of Play, the Vintage Computer Federation, and parts worldwide.

I’ve moved homes, no longer living with my brother after seven up-and-down years of siblings sharing a house. It was time! We’re probably not permanently scarred! I love him very much. I now live in an apartment with very specific landlords with rules and an important need to pay them on time each and every month.

To that end, I’ve cut back on my expenses and will continue to, so it’s the end of me “just showing up” to pretty much any conferences that I’m not being compensated for, which will of course cut things down in terms of Jason appearances you can find me at.

I’ll still be making appearances as people ask me to go, of course – I love travel. I’m speaking in Amsterdam in October, as well as being an Emcee at the Internet Archive in October as well. So we’ll see how that goes.

What that means is more media ingestion work, and more work on the remaining two documentaries. I’m going to continue my goal of clearing my commitments before long, so I can choose what I do next.

What follows will be (I hope) lots of entries going deep into some subjects and about what I’m working on, and I thank you for your patience as I was not writing weblog entries while upending my entire life.

To the future!


Godless for God’s Sake: Now available for Kindle for just $5.99

Published 6 Sep 2017 by Nontheist Friends in NontheistFriends.org.

godsake_large

Godless for God’s Sake: Nontheism in Contemporary Quakerism

In this book edited by British Friend and author David Boulton, 27 Quakers from 4 countries and 13 yearly meetings tell how they combine active and committed membership in the Religious Society of Friends with rejection of traditional belief in the existence of a transcendent, personal and supernatural God.

For some, God is no more (but no less) than a symbol of the wholly human values of “mercy, pity, peace and love”. For others, the very idea of God has become an archaism.

Readers who seek a faith free of supernaturalism, whether they are Friends, members of other religious traditions or drop-outs from old-time religion, will find good company among those whose search for an authentic 21st century understanding of religion and spirituality has led them to declare themselves “Godless – for God’s Sake”.

Contents

Preface: In the Beginning…

1. For God’s Sake? An Introduction

 

David Boulton

2. What’s a Nice Nontheist Like You Doing Here?

 

Robin Alpern

3. Something to Declare

 

Philip Gross

4. It’s All in the Numbers

Joan D Lucas

5. Chanticleer’s Call: Religion as a Naturalist Views It

Os Cresson

6. Mystery: It’s What we Don’t Know

James T Dooley Riemermann

7. Living the Questions

Sandy Parker

8. Listening to the Kingdom

Bowen Alpern

9. The Making of a Quaker Nontheist Tradition

David Boulton and Os Cresson

10. Facts and Figures

David Rush

11. This is my Story, This is my Song…


 

Ordering Info

Links to forms for ordering online will be provided here as soon as they are available. In the meantime, contact the organizations listed below, using the book details at the bottom of this page.

QuakerBooks of Friends General Conference

(formerly FGC Bookstore)

1216 Arch St., Ste 2B

Philadelphia, PA 19107

215-561-1700 fax 215-561-0759

http://www.quakerbooks.org/get/333011

(this is the “Universalism” section of Quakerbooks, where the book is currently located)

(this is the “Universalism” section of Quakerbooks, where the book is currently located)

or

The

Quaker Bookshop

173 Euston Rd London NW1 2BJ

020 7663 1030, fax 020 7663 1008 bookshop@quaker.org.uk

 

Those outside the United Kingdom and United States should be able to order through a local bookshop, quoting the publishing details below – particularly the ISBN number. In case of difficulty, the book can be ordered direct from the publisher’s address below.

Title: “Godless for God’s Sake: Nontheism in Contemporary Quakerism” (ed. David Boulton)

Publisher: Dales Historical Monographs, Hobsons Farm, Dent, Cumbria LA10 5RF, UK. Tel 015396 25321. Email davidboulton1@compuserve.com.

Retail price: ?9.50 ($18.50). Prices elsewhere to be calculated on UK price plus postage.

Format: Paperback, full colour cover, 152 pages, A5

ISBN number: 0-9511578-6-8 (to be quoted when ordering from any bookshop in the world)


MassMessage hits 1,000 commits

Published 28 Aug 2017 by legoktm in The Lego Mirror.

The MassMessage MediaWiki extension hit 1,000 commits today, following an update of the localization messages for the Russian language. MassMessage replaced a Toolserver bot that allowed sending a message to all Wikimedia wikis, by integrating it into MediaWiki and using the job queue. We also added some nice features like input validation and previewing. Through it, I became familiar with different internals of MediaWiki, including submitting a few core patches.

I made my first commit on July 20, 2013. It would get a full rollout to all Wikimedia wikis on November 19, 2013, after a lot of help from MZMcBride, Reedy, Siebrand, Ori, and other MediaWiki developers.

I also mentored User:wctaiwan, who worked on a Google Summer of Code project that added a ContentHandler backend to the extension, to make it easier for people to create and maintain page lists. You can see it used by The Wikipedia Signpost's subscription list.

It's still a bit crazy to think that I've been hacking on MediaWiki for over four years now, and how much it has changed my life in that much time. So here's to the next four years and next 1,000 commits to MassMessage!


Requiring HTTPS for my Toolforge tools

Published 26 Aug 2017 by legoktm in The Lego Mirror.

My Toolforge (formerly "Tool Labs") tools will now start requiring HTTPS, and redirecting any HTTP traffic. It's a little bit of common code for each tool, so I put it in a shared "toolforge" library.

from flask import Flask
import toolforge

app = Flask(__name__)
app.before_request(toolforge.redirect_to_https)

And that's it! Your tool will automatically be HTTPS-only now.

$ curl -I "http://tools.wmflabs.org/mwpackages/"
HTTP/1.1 302 FOUND
Server: nginx/1.11.13
Date: Sat, 26 Aug 2017 07:58:39 GMT
Content-Type: text/html; charset=utf-8
Content-Length: 281
Connection: keep-alive
Location: https://tools.wmflabs.org/mwpackages/
X-Clacks-Overhead: GNU Terry Pratchett

My DebConf 17 presentation - Bringing MediaWiki back into Debian

Published 25 Aug 2017 by legoktm in The Lego Mirror.

Full quality video available on Wikimedia Commons, as well as the slides.

I had a blast attending DebConf '17 in Montreal, and presented about my efforts to bring back MediaWiki into Debian. The talks I went to were all fantastic, and got to meet some amazing people. But the best parts about the conference was the laid-back atmosphere and the food. I've never been to another conference that had food that comes even close to DebConf.

Feeling very motivated, I have three new packages in the pipeline: LuaSandbox, uprightdiff, and libkiwix.

I hope to be at DebConf again next year!


Benchmarking with the NDSA Levels of Preservation

Published 18 Aug 2017 by Jenny Mitcham in Digital Archiving at the University of York.

Anyone who has heard me talk about digital preservation will know that I am a big fan of the NDSA Levels of Preservation.

This is also pretty obvious if you visit me in my office – a print out of the NDSA Levels is pinned to the notice board above my PC monitor!

When talking to students and peers about how to get started in digital preservation in a logical, pragmatic and iterative way, I always recommend using the NDSA Levels to get started. Start at level 1 and move forward to the more advanced levels as and when you are able. This is a much more accessible and simple way to start addressing digital preservation than digesting some of the bigger and more complex certification standards and benchmarking tools.

Over the last few months I have been doing a lot of documentation work. Both ensuring that our digital archiving procedures are written down somewhere and documenting where we are going in the future.

As part of this documentation it seemed like a good idea to use the NDSA Levels:



Previously I have used the NDSA Levels in quite a superficial way – as a guide and a talking point, it has been quite a different exercise actually mapping where we stand.

It was not always straightforward to establish where we are and to unpick and interpret exactly what each level meant in practice. I guess this is one of the problems of using a relatively simple set of metrics to describe what is really quite a complex set of processes.

Without publishing the whole document that I've written on this, here is a summary of where I think we are currently. I'm also including some questions I've been grappling with as part of the process.

Storage and geographic location

Currently at LEVEL 2: 'know your data' with some elements of LEVEL 3 and 4 in place

See the full NDSA levels here


Four years ago we carried out a ‘rescue mission’ to get all digital data in the archives off portable media and on to the digital archive filestore. This now happens as a matter of course when born digital media is received by the archives.

The data isn’t in what I would call a proper digital archive but it is on a fairly well locked down area of University of York filestore.

There are three copies of the data available at any one time (not including the copy that is on original media within the strongrooms). The University stores two copies of the data on spinning disk. One at a data centre on one campus and the other at a data centre on another campus with another copy backed up to tape which is kept for 90 days.

I think I can argue that storage of the data on two different campuses is two different geographic locations but these locations are both in York and only about 1 mile apart. I'm not sure whether they could be described as having different disaster threats so I'm going to hold back from putting us at Level 3 though IT do seem to have systems in place to ensure that filestore is migrated on a regular schedule.

Questions:



File fixity and data integrity

Currently at LEVEL 4: 'repair your data'

See the full NDSA levels here


Having been in this job for five years now I can say with confidence that I have never once received file fixity information alongside data that has been submitted to us. Obviously if I did receive it I would check it on ingest, but I can not envisage this scenario occurring in the near future! I do however create fixity information for all content as part of the ingest process.

I use a tool called Foldermatch to ensure that the digital data I have copied into the archive is identical to the original. Foldermatch allows you to compare the contents of two folders and one of the comparison methods (the one I use at ingest) uses checksums to do this.

Last year I purchased a write blocker for use when working with digital content delivered to us on portable hard drives and memory sticks. A check for viruses is carried out on all content that is ingested into the digital archive so this fulfills the requirements of level 2 and some of level 3.

Despite putting us at Level 4, I am still very keen to improve our processes and procedures around fixity. Fixity checks are carried out at intervals (several times a month) and these checks are logged but at the moment this is all initiated manually. As the digital archive gets bigger, we will need to re-think our approaches to this important area and find solutions that are scalable.

Questions:




Information Security

Currently at LEVEL 2: 'know your data' with some elements of LEVEL 3 in place

See the full NDSA levels here


Access to the digital archive filestore is limited to the digital archivist and IT staff who administer the filestore. If staff or others need to see copies of data within the digital archive filestore, copies are made elsewhere after appropriate checks are made regarding access permissions. The master copy is always kept on the digital archive filestore to ensure that the authentic original version of the data is maintained. Access restrictions are documented.

We are also moving towards the higher levels here. A recent issue reported on a mysterious change of last modified dates for .eml files has led to discussions with colleagues in IT, and I have been informed that an operating system upgrade for the server should include the ability to provide logs of who has done what to files in the archive.

It is worth pointing out that as I don't currently have systems in place for recording PREMIS (preservation) metadata. I am currently taking a hands off approach to preservation planning within the digital archive. Preservation actions such as file migration are few and far between and are recorded in a temporary way until a more robust system is established.


Metadata

Currently at LEVEL 3: 'monitor your data'

See the full NDSA levels here


We do OK with metadata currently, (considering a full preservation system is not yet in place). Using DROID at ingest is helpful at fulfilling some of the requirements of levels 1 to 3 (essentially, having a record of what was received and where it is).

Our implementation of AtoM as our archival management system has helped fulfil some of the other metadata requirements. It gives us a place to store administrative metadata (who gave us it and when) as well as providing a platform to surface descriptive metadata about the digital archives that we hold.

Whether we actually have descriptive metadata or not for digital archives will remain an issue. Much metadata for the digital archive can be generated automatically but descriptive metadata isn't quite as straightforward. In some cases a basic listing is created for files within the digital archive (using Dublin Core as a framework) but this will not happen in all cases. Descriptive metadata typically will not be created until an archive is catalogued which may come at a later date.

Our plans to implement Archivematica next year will help us get to Level 4 as this will create full preservation metadata for us as PREMIS.

Questions:




File formats

Currently at LEVEL 2: 'know your data' with some elements of LEVEL 3 in place

See the full NDSA levels here


It took me a while to convince myself that we fulfilled Level 1 here! This is a pretty hard one to crack, especially if you have lots of different archives coming in from different sources, and sometimes with little notice. I think it is useful that the requirement at this level is prefaced with "When you can..."!

Thinking about it, we do do some work in this area - for example:

To get us to Level 2, as part of the ingest process we run DROID to get a list of file formats included within a digital archive. Summary stats are kept within a spreadsheet that covers all content within the digital archive so we can quickly see the range of formats that we hold and find out which archives they are in.

This should allow us to move towards Level 3 but we are not there yet. Some pretty informal and fairly ad hoc thinking goes into  file format obsolescence but I won't go as far as saying that we 'monitor' it. I have an awareness of some specific areas of concern in terms of obsolete files (for example I've still got those WordStar 4.0 files and I really do want to do something with them!) but there are no doubt other formats that need attention that haven't hit my radar yet.

As mentioned earlier, we are not really doing migration right now - not until I have a better system for creating the PREMIS metadata, so Level 4 is still out of reach.

Questions:




Conclusions

This has been a useful exercise and it is good to see where we need to progress. Going from using the Levels in the abstract and actually trying to apply them as a tool has been a bit challenging in some areas. I think additional information and examples would be useful to help clear up some of the questions that I have raised.

I've also found that even where we meet a level there is often other ways we could do things better. File fixity and data integrity looks like a strong area for us but I am all too aware that I would like to find a more sustainable and scalable way to do this. This is something we'll be working on as we get Archivematica in place. Reaching Level 4 shouldn't lead to complacency!

An interesting blog post last year by Shira Peltzman from the UCLA Library talked about Expanding the NDSA Levels of Preservation to include an additional row focused on Access. This seems sensible given that the ability to provide access is the reason why we preserve archives. I would be keen to see this developed further so long as the bar wasn't set too high. At the Borthwick my initial consideration has been preservation - getting the stuff and keeping it safe - but access is something that will be addressed over the next couple of years as we move forward with our plans for Archivematica and AtoM.

Has anyone else assessed themselves against the NDSA Levels?  I would be keen to see how others have interpreted the requirements.







Benchmarking with the NDSA Levels of Preservation

Published 18 Aug 2017 by Jenny Mitcham in Digital Archiving at the University of York.

Anyone who has heard me talk about digital preservation will know that I am a big fan of the NDSA Levels of Preservation.

This is also pretty obvious if you visit me in my office – a print out of the NDSA Levels is pinned to the notice board above my PC monitor!

When talking to students and peers about how to get started in digital preservation in a logical, pragmatic and iterative way, I always recommend using the NDSA Levels to get started. Start at level 1 and move forward to the more advanced levels as and when you are able. This is a much more accessible and simple way to start addressing digital preservation than digesting some of the bigger and more complex certification standards and benchmarking tools.

Over the last few months I have been doing a lot of documentation work. Both ensuring that our digital archiving procedures are written down somewhere and documenting where we are going in the future.

As part of this documentation it seemed like a good idea to use the NDSA Levels:



Previously I have used the NDSA Levels in quite a superficial way – as a guide and a talking point, it has been quite a different exercise actually mapping where we stand.

It was not always straightforward to establish where we are and to unpick and interpret exactly what each level meant in practice. I guess this is one of the problems of using a relatively simple set of metrics to describe what is really quite a complex set of processes.

Without publishing the whole document that I've written on this, here is a summary of where I think we are currently. I'm also including some questions I've been grappling with as part of the process.

Storage and geographic location

Currently at LEVEL 2: 'know your data' with some elements of LEVEL 3 and 4 in place

See the full NDSA levels here


Four years ago we carried out a ‘rescue mission’ to get all digital data in the archives off portable media and on to the digital archive filestore. This now happens as a matter of course when born digital media is received by the archives.

The data isn’t in what I would call a proper digital archive but it is on a fairly well locked down area of University of York filestore.

There are three copies of the data available at any one time (not including the copy that is on original media within the strongrooms). The University stores two copies of the data on spinning disk. One at a data centre on one campus and the other at a data centre on another campus with another copy backed up to tape which is kept for 90 days.

I think I can argue that storage of the data on two different campuses is two different geographic locations but these locations are both in York and only about 1 mile apart. I'm not sure whether they could be described as having different disaster threats so I'm going to hold back from putting us at Level 3 though IT do seem to have systems in place to ensure that filestore is migrated on a regular schedule.

Questions:



File fixity and data integrity

Currently at LEVEL 4: 'repair your data'

See the full NDSA levels here


Having been in this job for five years now I can say with confidence that I have never once received file fixity information alongside data that has been submitted to us. Obviously if I did receive it I would check it on ingest, but I can not envisage this scenario occurring in the near future! I do however create fixity information for all content as part of the ingest process.

I use a tool called Foldermatch to ensure that the digital data I have copied into the archive is identical to the original. Foldermatch allows you to compare the contents of two folders and one of the comparison methods (the one I use at ingest) uses checksums to do this.

Last year I purchased a write blocker for use when working with digital content delivered to us on portable hard drives and memory sticks. A check for viruses is carried out on all content that is ingested into the digital archive so this fulfills the requirements of level 2 and some of level 3.

Despite putting us at Level 4, I am still very keen to improve our processes and procedures around fixity. Fixity checks are carried out at intervals (several times a month) and these checks are logged but at the moment this is all initiated manually. As the digital archive gets bigger, we will need to re-think our approaches to this important area and find solutions that are scalable.

Questions:




Information Security

Currently at LEVEL 2: 'know your data' with some elements of LEVEL 3 in place

See the full NDSA levels here


Access to the digital archive filestore is limited to the digital archivist and IT staff who administer the filestore. If staff or others need to see copies of data within the digital archive filestore, copies are made elsewhere after appropriate checks are made regarding access permissions. The master copy is always kept on the digital archive filestore to ensure that the authentic original version of the data is maintained. Access restrictions are documented.

We are also moving towards the higher levels here. A recent issue reported on a mysterious change of last modified dates for .eml files has led to discussions with colleagues in IT, and I have been informed that an operating system upgrade for the server should include the ability to provide logs of who has done what to files in the archive.

It is worth pointing out that as I don't currently have systems in place for recording PREMIS (preservation) metadata. I am currently taking a hands off approach to preservation planning within the digital archive. Preservation actions such as file migration are few and far between and are recorded in a temporary way until a more robust system is established.


Metadata

Currently at LEVEL 3: 'monitor your data'

See the full NDSA levels here


We do OK with metadata currently, (considering a full preservation system is not yet in place). Using DROID at ingest is helpful at fulfilling some of the requirements of levels 1 to 3 (essentially, having a record of what was received and where it is).

Our implementation of AtoM as our archival management system has helped fulfil some of the other metadata requirements. It gives us a place to store administrative metadata (who gave us it and when) as well as providing a platform to surface descriptive metadata about the digital archives that we hold.

Whether we actually have descriptive metadata or not for digital archives will remain an issue. Much metadata for the digital archive can be generated automatically but descriptive metadata isn't quite as straightforward. In some cases a basic listing is created for files within the digital archive (using Dublin Core as a framework) but this will not happen in all cases. Descriptive metadata typically will not be created until an archive is catalogued which may come at a later date.

Our plans to implement Archivematica next year will help us get to Level 4 as this will create full preservation metadata for us as PREMIS.

Questions:




File formats

Currently at LEVEL 2: 'know your data' with some elements of LEVEL 3 in place

See the full NDSA levels here


It took me a while to convince myself that we fulfilled Level 1 here! This is a pretty hard one to crack, especially if you have lots of different archives coming in from different sources, and sometimes with little notice. I think it is useful that the requirement at this level is prefaced with "When you can..."!

Thinking about it, we do do some work in this area - for example:

To get us to Level 2, as part of the ingest process we run DROID to get a list of file formats included within a digital archive. Summary stats are kept within a spreadsheet that covers all content within the digital archive so we can quickly see the range of formats that we hold and find out which archives they are in.

This should allow us to move towards Level 3 but we are not there yet. Some pretty informal and fairly ad hoc thinking goes into  file format obsolescence but I won't go as far as saying that we 'monitor' it. I have an awareness of some specific areas of concern in terms of obsolete files (for example I've still got those WordStar 4.0 files and I really do want to do something with them!) but there are no doubt other formats that need attention that haven't hit my radar yet.

As mentioned earlier, we are not really doing migration right now - not until I have a better system for creating the PREMIS metadata, so Level 4 is still out of reach.

Questions:




Conclusions

This has been a useful exercise and it is good to see where we need to progress. Going from using the Levels in the abstract and actually trying to apply them as a tool has been a bit challenging in some areas. I think additional information and examples would be useful to help clear up some of the questions that I have raised.

I've also found that even where we meet a level there is often other ways we could do things better. File fixity and data integrity looks like a strong area for us but I am all too aware that I would like to find a more sustainable and scalable way to do this. This is something we'll be working on as we get Archivematica in place. Reaching Level 4 shouldn't lead to complacency!

An interesting blog post last year by Shira Peltzman from the UCLA Library talked about Expanding the NDSA Levels of Preservation to include an additional row focused on Access. This seems sensible given that the ability to provide access is the reason why we preserve archives. I would be keen to see this developed further so long as the bar wasn't set too high. At the Borthwick my initial consideration has been preservation - getting the stuff and keeping it safe - but access is something that will be addressed over the next couple of years as we move forward with our plans for Archivematica and AtoM.

Has anyone else assessed themselves against the NDSA Levels?  I would be keen to see how others have interpreted the requirements.







Botanical Wonderland events

Published 18 Aug 2017 by carinamm in State Library of Western Australia Blog.

From pressed seaweed, to wildflower painting, embroidery, to photography – botanical wonders have inspired and defined Western Australia. Hear from art historian, author, artist and curator Dr Dorothy Erickson in two events at the State Library of Western Australia.

WA wildflowers 17.jpg

Lecture: Professional women Artists in the Wildflower State by Dr Dorothy Erickson
Wednesday 23 August 2017 – 5:00-6:00 pm
Great Southern Room – State Library of Western Australia
Free. No bookings required

The first profession acceptable to be practiced by Middle Class women was as an Artist. They were the ‘Angels in the Studio’ at the time when gold was first being found in Western Australia. While a few Western Australian born were trained artists many others came in the wake of the gold rushes when Western Australia was the world’s El Dorado. A number were entranced by the unique wildflowers and made this the mainstay of their careers. This talk will focus on the professional women artists in Western Australia from 1890 to WWI with particular attention to the those who painted our unique botanical wonderland.

L W Greaves_CROP

Lilian Wooster Greaves was a prolific Western Australian wildflower artist , “no one else seems to be able to equal her skill in pressing and mounting wildflower specimens, in the form of panels, cards and booklets – The West Australian 21 May 1927. Portrait of Lilian Wooster Greaves Out of Doors in WA, 1927, State Library of Western Australia 821A(W)GRE.


Floor Talk on Botanical Wonderland exhibition with Dr Dorothy Erickson
Friday 1 September 2017  – 1:00-1:30 pm
The Nook – State Library of Western Australia
Free. No bookings required.

Be inspired by the botanical wonders of Western Australia as Australian artist Dr Dorothy Erickson discusses some of the marvels on display in the exhibition.

Nature's Showground 1940_001

Nature’s Showground, 1940. The Western Mail, State Library of Western Australia, 630.5WES.

Botanical Wonderland is a partnership between the Royal Western Australian Historical Society, the Western Australian Museum and the State Library of Western Australia. The exhibition is on display at the State Library until 24 September 2017.

Image: Acc 9131A/4: Lilian Wooster Greaves, pressed wildflower artwork, ‘Westralia’s Wonderful Wildflowers’, c1929



On reading Library Journal, September, 1877

Published 8 Aug 2017 by Karen Coyle in Coyle's InFormation.

Of the many advantages to retirement is the particular one of idle time. And I will say that as a librarian one could do no better than to spend some of that time communing with the history of the profession. The difficulty is that it is so rich, so familiar in many ways that it is hard to move through it quickly. Here is just a fraction of the potential value to be found in the September issue of volume two of Library Journal.* Admittedly this is a particularly interesting number because it reports on the second meeting of the American Library Association.

For any student of library history it is especially interesting to encounter certain names as living, working members of the profession.



Other names reflect works that continued on, some until today, such as Poole and Bowker, both names associated with long-running periodical indexes.

What is particularly striking, though, is how many of the topics of today were already being discussed then, although obviously in a different context. The association was formed, at least in part, to help librarianship achieve the status of a profession. Discussed were the educating of the public on the role of libraries and librarians as well as providing education so that there could be a group of professionals to take the jobs that needed that professional knowledge. There was work to be done to convince state legislatures to support state and local libraries.

One of the first acts of the American Library Association when it was founded in 1876 (as reported in the first issue of Library Journal) was to create a Committee on Cooperation. This is the seed for today's cooperative cataloging efforts as well as other forms of sharing among libraries. In 1877, undoubtedly encouraged by the participation of some members of the publishing community in ALA, there was hope that libraries and publishers would work together to create catalog entries for in-print works.
This is one hope of the early participants that we are still working on, especially the desire that such catalog copy would be "uniform." Note that there were also discussions about having librarians contribute to the periodical indexes of R. R. Bowker and Poole, so the cooperation would flow in both directions.

The physical organization of libraries also was of interest, and a detailed plan for a round (actually octagonal) library design was presented:
His conclusion, however, shows a difference in our concepts of user privacy.
Especially interesting to me are the discussions of library technology. I was unaware of some of the emerging technologies for reproduction such as the papyrograph and the electric pen. In 1877, the big question, though, was whether to employ the new (but as yet un-perfected) technology of the typewriter in library practice.

There was some poo-pooing of this new technology, but some members felt it may be reaching a state of usefulness.


"The President" in this case is Justin Winsor, Superintendent of the Boston Library, then president of the American Library Association. Substituting more modern technologies, I suspect we have all taken part in this discussion during our careers.

Reading through the Journal evokes a strong sense of "le plus ça change..." but I admit that I find it all rather reassuring. The historical beginnings give me a sense of why we are who we are today, and what factors are behind some of our embedded thinking on topics.


* Many of the early volumes are available from HathiTrust, if you have access. Although the texts themselves are public domain, these are Google-digitized books and are not available without a login. (Don't get me started!) If you do not have access to those, most of the volumes are available through the Internet Archive. Select "text" and search on "library journal". As someone without HathiTrust institutional access I have found most numbers in the range 1-39, but am missing (hint, hint): 5/1880; 8-9/1887-88; 17/1892; 19/1894; 28-30/1903-1905; 34-37;1909-1912. If I can complete the run I think it would be good to create a compressed archive of the whole and make that available via the Internet Archive to save others the time of acquiring them one at a time. If I can find the remainder that are pre-1923 I will add those in.


Archival software survey

Published 8 Aug 2017 by inthemailbox in In the mailbox.

A few months ago, I asked my colleagues in the Archives Live Archives and recordkeeping software group to undertake a short survey for me, looking at archival description and management systems in use in Australia. I used the free SurveyMonkey site (ten simple questions) and promoted the survey on the Archives Live site and via my personal twitter account. I got 39 responses from a possible pool of 230 members, in a four week period.

The majority of respondents worked in a combination archive, taking both transfers from inhouse records creators as well as accepting donations or purchasing material for their collections (58.97%).  Small archives, with 2-4 staff (qualifications not specified), were slightly ahead of lone arrangers (48.7% and 30.7%). 11 were school archives and 7 from universities. There was a smattering of religious institutions, local council collections and government institutions, plus a couple of companies who held archives of their business.

Most archivists said they could use excel and word (92%), so it is not surprising that 25.6% of them created finding aids and archival aids using word documents and spreadsheets. However, the majority of finding aids are created using online systems and archive management software.

Software identified in responses to the survey included:

Both Tabularium and Archive Manager were created here in Australia and have good compliance with the Australian series system.   Tabularium was created by David Roberts and distributed by State Records NSW; however, it is no longer maintained. Archive Manager was created for use with Windows PCs, and has recently been sold to the UK.

In looking at new software requirements, respondents expressed a remarkable degree of frustration with old, clunky software which was not properly maintained or could not be easily updated either by themselves or by a provider. Ease of use, the ability to make collection content available online, integrate digital components and work with an EDRMS or other records management system were all identified as something for the modern archival management system. Concerns were raised about making  donor and other personal and confidential information available, so some degree of authority control and viewing permissions was also required.

Whether one system can meet all these requirements is yet to be seen. It may be better to focus on a range of systems that have some degree of interoperability and on standards for transferring data from one to the other. Either way, archivists in Australia are eager and ready to embrace new ways of working and for a new generation of archival software.

 

 


The mysterious case of the changed last modified dates

Published 31 Jul 2017 by Jenny Mitcham in Digital Archiving at the University of York.

Today's blog post is effectively a mystery story.

Like any good story it has a beginning (the problem is discovered, the digital archive is temporarily thrown into chaos), a middle (attempts are made to solve the mystery and make things better, several different avenues are explored) and an end (the digital preservation community come to my aid).

This story has a happy ending (hooray) but also includes some food for thought (all the best stories do) and as always I'd be very pleased to hear what you think.

The beginning

I have probably mentioned before that I don't have a full digital archive in place just yet. While I work towards a bigger and better solution, I have a set of temporary procedures in place to ingest digital archives on to what is effectively a piece of locked down university filestore. The procedures and workflows are both 'better than nothing' and 'good enough' as a temporary measure and actually appear to take us pretty much up to Level 2 of the NDSA Levels of Preservation (and beyond in some places).

One of the ways I ensure that all is well in the little bit of filestore that I call 'The Digital Archive' is to run frequent integrity checks over the data, using a free checksum utility. Checksums (effectively unique digital fingerprints) for each file in the digital archive are created when content is ingested and these are checked periodically to ensure that nothing has changed. IT keep back-ups of the filestore for a period of three months, so as long as this integrity checking happens within this three month period (in reality I actually do this 3 or 4 times a month) then problems can be rectified and digital preservation nirvana can be seamlessly restored.

Checksum checking is normally quite dull. Thankfully it is an automated process that runs in the background and I can just get on with my work and cheer when I get a notification that tells me all is well. Generally all is well, it is very rare that any errors are highlighted - when that happens I blog about it!

I have perhaps naively believed for some time that I'm doing everything I need to do to keep those files safe and unchanged because if the checksum is the same then all is well, however this month I encountered a problem...

I've been doing some tidying of the digital archive structure and alongside this have been gathering a bit of data about the archives, specifically looking at things like file formats, number of unidentified files and last modified dates.

Whilst doing this I noticed that one of the archives that I had received in 2013 contained 26 files with a last modified date of 18th January 2017 at 09:53. How could this be so if I have been looking after these files carefully and the checksums are the same as they were when the files were deposited?

The 26 files were all EML files - email messages exported from Microsoft Outlook. These were the only EML files within the whole digital archive. The files weren't all in the same directory and other files sitting in those directories retained their original last modified dates.

The middle

So this was all a bit strange...and worrying too. Am I doing my job properly? Is this something I should be bringing to the supportive environment of the DPC's Fail Club?

The last modified dates of files are important to us as digital archivists. This is part of the metadata that comes with a file. It tells us something about the file. If we lose this date are we losing a little piece of the authentic digital object that we are trying to preserve?

Instead of beating myself up about it I wanted to do three things:

  1. Solve the mystery (find out what happened and why)
  2. See if I could fix it
  3. Stop it happening again
So how could it have happened? Has someone tampered with these 26 files? Perhaps unlikely considering they all have the exact same date/time stamp which to me suggests a more automated process. Also, the digital archive isn't widely accessible. Quite deliberately it is only really me (and the filestore administrators) who have access.

I asked IT whether they could explain it. Had some process been carried out across all filestores that involved EML files specifically? They couldn't think of a reason why this may have occurred. They also confirmed my suspicions that we have no backups of the files with the original last modified dates.

I spoke to a digital forensics expert from the Computer Science department and he said he could analyse the files for me and see if he could work out what had acted on them and also suggest a methodology of restoring the dates.

I have a record of the last modified dates of these 26 files when they arrived - the checksum tool that I use writes the last modified date to the hash file it creates. I wondered whether manually changing the last modified dates back to what they were originally was the right thing to do or whether I should just accept and record the change.

...but I decided to sit on it until I understood the problem better.

The end

I threw the question out to the digital preservation community on Twitter and as usual I was not disappointed!




In fact, along with a whole load of discussion and debate, Andy Jackson was able to track down what appears to be the cause of the problem.


He very helpfully pointed me to a thread on StackExchange which described the issue I was seeing.

It was a great comfort to discover that the cause of this problem was apparently a bug and not something more sinister. It appears I am not alone!

...but what now?

So I now I think I know what caused the problem but questions remain around how to catch issues like this more quickly (not six months after it has happened) and what to do with the files themselves.

IT have mentioned to me that an OS upgrade may provide us with better auditing support on the filestore. Being able to view reports on changes made to digital objects within the digital archive would be potentially very useful (though perhaps even that wouldn't have picked up this Windows bug?). I'm also exploring whether I can make particular directories read only and whether that would stop issues such as this occurring in the future.

If anyone knows of any other tools that can help, please let me know.

The other decision to make is what to do with the files themselves. Should I try and fix them? More interesting debate on Twitter on this topic and even on the value of these dates in the first place. If we can fudge them then so can others - they may have already been fudged before they got to the digital archive - in which case, how much value do they really have?


So should we try and fix last modified dates or should we focus our attention on capturing and storing them within the metadata. The later may be a more sustainable solution in the longer term, given their slightly slippery nature!

I know there are lots of people interested in this topic - just see this recent blog post by Sarah Mason and in particular the comments - When was that?: Maintaining or changing ‘created’ and ‘last modified’ dates. It is great that we are talking about real nuts and bolts of digital preservation and that there are so many people willing to share their thoughts with the community.

...and perhaps if you have EML files in your digital archive you should check them too!



The mysterious case of the changed last modified dates

Published 31 Jul 2017 by Jenny Mitcham in Digital Archiving at the University of York.

Today's blog post is effectively a mystery story.

Like any good story it has a beginning (the problem is discovered, the digital archive is temporarily thrown into chaos), a middle (attempts are made to solve the mystery and make things better, several different avenues are explored) and an end (the digital preservation community come to my aid).

This story has a happy ending (hooray) but also includes some food for thought (all the best stories do) and as always I'd be very pleased to hear what you think.

The beginning

I have probably mentioned before that I don't have a full digital archive in place just yet. While I work towards a bigger and better solution, I have a set of temporary procedures in place to ingest digital archives on to what is effectively a piece of locked down university filestore. The procedures and workflows are both 'better than nothing' and 'good enough' as a temporary measure and actually appear to take us pretty much up to Level 2 of the NDSA Levels of Preservation (and beyond in some places).

One of the ways I ensure that all is well in the little bit of filestore that I call 'The Digital Archive' is to run frequent integrity checks over the data, using a free checksum utility. Checksums (effectively unique digital fingerprints) for each file in the digital archive are created when content is ingested and these are checked periodically to ensure that nothing has changed. IT keep back-ups of the filestore for a period of three months, so as long as this integrity checking happens within this three month period (in reality I actually do this 3 or 4 times a month) then problems can be rectified and digital preservation nirvana can be seamlessly restored.

Checksum checking is normally quite dull. Thankfully it is an automated process that runs in the background and I can just get on with my work and cheer when I get a notification that tells me all is well. Generally all is well, it is very rare that any errors are highlighted - when that happens I blog about it!

I have perhaps naively believed for some time that I'm doing everything I need to do to keep those files safe and unchanged because if the checksum is the same then all is well, however this month I encountered a problem...

I've been doing some tidying of the digital archive structure and alongside this have been gathering a bit of data about the archives, specifically looking at things like file formats, number of unidentified files and last modified dates.

Whilst doing this I noticed that one of the archives that I had received in 2013 contained 26 files with a last modified date of 18th January 2017 at 09:53. How could this be so if I have been looking after these files carefully and the checksums are the same as they were when the files were deposited?

The 26 files were all EML files - email messages exported from Microsoft Outlook. These were the only EML files within the whole digital archive. The files weren't all in the same directory and other files sitting in those directories retained their original last modified dates.

The middle

So this was all a bit strange...and worrying too. Am I doing my job properly? Is this something I should be bringing to the supportive environment of the DPC's Fail Club?

The last modified dates of files are important to us as digital archivists. This is part of the metadata that comes with a file. It tells us something about the file. If we lose this date are we losing a little piece of the authentic digital object that we are trying to preserve?

Instead of beating myself up about it I wanted to do three things:

  1. Solve the mystery (find out what happened and why)
  2. See if I could fix it
  3. Stop it happening again
So how could it have happened? Has someone tampered with these 26 files? Perhaps unlikely considering they all have the exact same date/time stamp which to me suggests a more automated process. Also, the digital archive isn't widely accessible. Quite deliberately it is only really me (and the filestore administrators) who have access.

I asked IT whether they could explain it. Had some process been carried out across all filestores that involved EML files specifically? They couldn't think of a reason why this may have occurred. They also confirmed my suspicions that we have no backups of the files with the original last modified dates.

I spoke to a digital forensics expert from the Computer Science department and he said he could analyse the files for me and see if he could work out what had acted on them and also suggest a methodology of restoring the dates.

I have a record of the last modified dates of these 26 files when they arrived - the checksum tool that I use writes the last modified date to the hash file it creates. I wondered whether manually changing the last modified dates back to what they were originally was the right thing to do or whether I should just accept and record the change.

...but I decided to sit on it until I understood the problem better.

The end

I threw the question out to the digital preservation community on Twitter and as usual I was not disappointed!




In fact, along with a whole load of discussion and debate, Andy Jackson was able to track down what appears to be the cause of the problem.


He very helpfully pointed me to a thread on StackExchange which described the issue I was seeing.

It was a great comfort to discover that the cause of this problem was apparently a bug and not something more sinister. It appears I am not alone!

...but what now?

So I now I think I know what caused the problem but questions remain around how to catch issues like this more quickly (not six months after it has happened) and what to do with the files themselves.

IT have mentioned to me that an OS upgrade may provide us with better auditing support on the filestore. Being able to view reports on changes made to digital objects within the digital archive would be potentially very useful (though perhaps even that wouldn't have picked up this Windows bug?). I'm also exploring whether I can make particular directories read only and whether that would stop issues such as this occurring in the future.

If anyone knows of any other tools that can help, please let me know.

The other decision to make is what to do with the files themselves. Should I try and fix them? More interesting debate on Twitter on this topic and even on the value of these dates in the first place. If we can fudge them then so can others - they may have already been fudged before they got to the digital archive - in which case, how much value do they really have?


So should we try and fix last modified dates or should we focus our attention on capturing and storing them within the metadata. The later may be a more sustainable solution in the longer term, given their slightly slippery nature!

I know there are lots of people interested in this topic - just see this recent blog post by Sarah Mason and in particular the comments - When was that?: Maintaining or changing ‘created’ and ‘last modified’ dates. It is great that we are talking about real nuts and bolts of digital preservation and that there are so many people willing to share their thoughts with the community.

...and perhaps if you have EML files in your digital archive you should check them too!



Roundup: Welcome, on news, bad tools and great tools

Published 28 Jul 2017 by Carlos Fenollosa in Carlos Fenollosa — Blog.

I'm starting a series of posts with a summary of the most interesting links I found. The concept of "social bookmarks" has always been interesting, but no implementation is perfect. del.icio.us was probably the closest to a good enough service, but in the end, we all just post them to Twitter and Facebook for shares and likes.

Unfortunately, Twitter search sucks, and browser bookmarks rot quickly. That's why I'm trying this new model of social + local, not only for my readers but also for myself. Furthermore, writing a tapas-sized post is much faster than a well-thought one.

Hopefully, forcing myself to post periodically —no promises, though— will encourage me to write regular articles sometimes.

Anyway, these posts will try to organize links I post on my Twitter account and provide a bit more of context.

While other friends publish newsletters, I still believe RSS can work well, so subscribe to the RSS if you want to get these updates. Another option is to use some of the services which deliver feeds by email, like Feenbox which, by the way may never leave alpha, so drop me an email if you want an invitation.

Nostalgia

RTVE, the Spanish public TV, has uploaded a few Bit a bit episodes. It was a rad early-90s show that presented video games and the early Internet.

On news

I quit reading news 3 years ago. A recent article from Tobias Rose-Stockwell digs deep into how your fear and outrage are being sold for profit by the Media.

@xurxof recommended a 2012 article from Rolf Dobelli, Avoid News. Towards a Healthy News Diet

LTE > Fiber

I was having router issues and realized how my cellphone internet is sometimes more reliable than my home fiber.

It seems to be more common than you'd think, read the Twitter replies! XKCD also recently posted a comic on this

Journaling

There was a discussion on Lobste.rs on tools to journal your workday, which was one of the reasons that led me to try out these roundup posts.

New keyboard

I bought a Matias Clicky mechanical keyboard which sounds like a minigun. For all those interested in mechanical keyboards, you must watch Thomas's Youtube channel

The new board doesn't have a nav cluster, so I configured Ctrl-HJKL to be the arrow keys. It gets a few days to get used to, but since then, I've been using that combination even when I'm using a keyboard with arrow keys.

Slack eats CPU cycles

Slack was eating a fair amount of my CPU while my laptop was trying to build a Docker image and sync 3000 files on Dropbox. Matthew O'Riordan also wrote Where’s all my CPU and memory gone? The answer: Slack

Focus, focus, focus!

I'm a brain.fm subscriber and use it regularly, especially when I'm working on the train or in a busy cafe.

musicForProgramming() is a free resource with a variety of music and also provides a podcast feed for updates.

Tags: roundup

Comments? Tweet  


My letter to the Boy Scouts of America

Published 25 Jul 2017 by legoktm in The Lego Mirror.

The following is a letter I just mailed to the Boy Scouts of America, following President Donald Trump's speech at the National Jamboree. I implore my fellow scouts to also contact the BSA to express their feelings.

25 July 2017

Boy Scouts of America
PO Box 152079
Irving, TX
75015-2079

Dear Boy Scouts of America,

Like many others I was extremely disappointed and disgusted to hear about the contents of President Donald Trump’s speech to the National Jamboree. Politics aside, I have no qualms with inviting the president, or having him speak to scouts. I was glad that some of the Eagle Scouts currently serving at high levels of our government were recognized for their accomplishments.

However above all, the Boy Scouts of America must adhere to the values of the Scout Law, and it was plainly obvious that the president’s speech did not. Insulting opponents is not “kindness”. Threatening to fire a colleague is not “loyal”. Encouraging boos of a former President is not “courteous”. Talking about fake news and media is not “trustworthy”. At the end of the day, the values of the Scout Law are the most important lesson we must instill in our youth – and President Trump showed the opposite.

The Boy Scouts of America must send a strong message to the public, and most importantly the young scouts that were present, that the president’s speech was not acceptable and does not embody the principles of the Boy Scouts of America.

I will continue to speak well of scouting and the program to all, but incidents like this will only harm future boys who will be dissuaded from joining the organization in the first place.

Sincerely,
Kunal Mehta
Eagle Scout, 2012
Troop 294
San Jose, CA


How do I get my MediaWiki site to use templates? [closed]

Published 21 Jul 2017 by Cyberherbalist in Newest questions tagged mediawiki - Webmasters Stack Exchange.

My MediaWiki site is currently using v1.24.4.

I don't seem to have many templates installed, and some very important ones seem to be missing. For example, I can't use the Reference List template. If I do put references in an article, with {{reflist}} at the bottom, the template comes across as a redlink:

Template:Reflist

Are templates something that have to be installed separately? And if so, how do I go about it.

My site is hosted by DreamHost.


Building the Lego Saturn V rocket 48 years after the moon landing

Published 20 Jul 2017 by legoktm in The Lego Mirror.

Full quality video available on Wikimedia Commons.

On this day 48 years ago, three astronauts landed on the moon after flying there in a Saturn V rocket.

Today I spent four hours building the Lego Saturn V rocket - the largest Lego model I've ever built. Throughout the process I was constantly impressed with the design of the rocket, and how it all came together. The attention paid to the little details is outstanding, and made it such a rewarding experience. If you can find a place that has them in stock, get one. It's entirely worth it.

The rocket is designed to be separated into the individual stages, and the lander actually fits inside the rocket. Vertically, it's 3ft, and comes with three stands so you can show it off horizontally.

As a side project, I also created a timelapse of the entire build, using some pretty cool tools. After searching online how to have my DSLR take photos on a set interval and being frustrated with all the examples that used a TI-84 calculator, I stumbled upon gphoto2, which lets you control digital cameras. I ended up using a command as simple as gphoto2 --capture-image-and-download -I 30 to have it take and save photos every 30 seconds. The only negative part is that it absolutely killed the camera's battery, and within an hour I needed to switch the battery.

To stitch the photos together (after renaming them a bit), ffmpeg came to the rescue: ffmpeg -r 20 -i "%04d.jpg" -s hd1080 -vcodec libx264 time-lapse.mp4. Pretty simple in the end!


Song Club Showcase

Published 14 Jul 2017 by Dave Robertson in Dave Robertson.

While the finishing touches are being put on the album, I’m going solo with other Freo songwriter’s at the Fib.

Share


Song Club Showcase

Published 14 Jul 2017 by Dave Robertson in Dave Robertson.

While the finishing touches are being put on the album, I’m going solo with other Freo songwriter’s at the Fib.

Share


Wikidata Map July 2017

Published 11 Jul 2017 by addshore in Addshore.

It’s been 9 months since my last Wikidata map update and once again we have many new noticable areas appearing, including Norway, South Africa, Peru and New Zealand to name but a few.  As with the last map generation post I once again created a diff image so that the areas of change are easily identifiable comparing the data from July 2017 with that from my last post on October 2016.

The various sizes of the generated maps can be found on Wikimedia Commons:

Reasons for increases

If you want to have a shot at figuring out the cause of the increases in specific areas then take a look at my method described in the last post using the Wikidata Query Service.

Peoples discoveries so far:

I haven’t included the names of those that discovered reasons for areas of increase above, but if you find your discovery here and want credit just ask!

The post Wikidata Map July 2017 appeared first on Addshore.


The Work

Published 9 Jul 2017 by Karen Coyle in Coyle's InFormation.

I've been on a committee that was tasked by the Program for Cooperative Cataloging folks(*) to help them understand some of the issues around works (as defined in FRBR, RDA, BIBFRAME, etc.). There are huge complications, not the least being that we all are hard-pressed to define what a work is, much less how it should be addressed in some as-yes-unrealized future library system. Some of what I've come to understand may be obvious to you, especially if you are a cataloger who provides authority data for your own catalog or the shared environment. Still, I thought it would be good to capture these thoughts. Of course, I welcome comments and further insights on this.



There are at least four different meanings to the term work as it is being discussed in library venues.

"Work-ness"

First there is the concept that every resource embodies something that could be called a "work" and that this work is a human creation. The idea of the work probably dates back as far as the recognition that humans create things, and that those things have meaning. There is no doubt that there is "work-ness" in all created things, although prior to FRBR there was little attempt to formally define it as an aspect of bibliographic description. It entered into cataloging consciousness in the 20th century: Patrick Wilson saw works as families of resources that grow and branch with each related publication;[1] Richard Smiraglia looked at works as a function of time;[2] and Seymour Lubetzky seems to have been the first to insist on viewing the work as intellectual content separate from the physical piece.[3]

"Work Description"

Second, there is the work in the bibliographic description: the RDA cataloging rules define the attributes or data elements that make up the work description, like the names of creators and the subject matter of the resource. Catalogers include these elements in descriptive cataloging even when the work is not defined as a stand-alone entity, as in the case of doing RDA cataloging in a MARC21 record environment. Most of the description of works is not new; creators and subjects have been assigned to cataloged items for a century or more. What is changed is that conceptually these are considered to be elements of the work that is inherent in the resource that is being cataloged but not limited to the item in hand.

It is this work description that is addressed in FRBR. The FRBR document of 1998 describes the scope of its entities to be solely bibliographic,  specifically excluding authority data:
"The present study does not analyse those additional data associated with persons, corporate bodies, works, and subjects that are typically recorded only in authority records."
Notably, FRBR is silent on the question of whether the work description is unique within the catalog, which would be implied by the creation of a work authority "record".

"Work Decision"

Next there is the work decision: this is the situation when a data creator determines whether the work to be described needs a unique and unifying entry within the stated cataloging environment to bring together exemplars of the same work that may be described differently. If so, the cataloger defines the authoritative identity for the work and provides information that distinguishes that work from all other works, and that brings together all of the variations of that work. The headings ("uniform titles") that are created also serve to disambiguate expressions of the same work by adding dates, languages, and other elements of the expression. To back all of this up, the cataloger gives evidence of his/her decision, primarily what sources were consulted that support the decision.

In today's catalog, a full work decision, resulting in a work authority record, is done for only a small number of works, with the exception of musical works where such titles are created for nearly all. The need to make the work decision may vary from catalog to catalog and can depend on whether the library holds multiple expressions of the work or other works that may need clarification in the catalog. Note that there is nothing in FRBR that would indicate that every work must have a unique description, just that works should be described. However, some have assumed that the FRBR work is always a representation of a unique creation. I don't find that expressed in FRBR nor the FRBR-LRM.

"Work Entity"

Finally there is the work entity: this is a data structure that encapsulates the description of the work. This data structure could be realized in any number of different encodings, such as ISO 2709 (the underlying record structure for MARC21), RDF, XML, or JSON. The latter two can also accommodate linked data in the form of RDFXML or JSON-LD.

Here we have a complication in our current environment because the main encodings of bibliographic data, MARC21 and BIBFRAME, both differ from the work concept presented in FRBR and in the RDA cataloging rules, which follow FRBR fairly faithfully. With a few exceptions, MARC21 does not distinguish work elements from expression or manifestation elements. Encoding RDA-defined data in the MARC21 "unit record" can be seen as proof of the conceptual nature of the work (and expression and manifestation) as defined in FRBR.

BIBFRAME, the proposed replacement for MARC21, has re-imagined the bibliographic work entity, departing from the entity breakdown in FRBR by defining a BIBFRAME work entity that tends to combine elements from FRBR's work and expression. However, where FRBR claims a neat divison between the entities, with no overlapping descriptive elements, BIBFRAME 2.0 is being designed as a general bibliographic model, not an implementation of FRBR. (Whether or not BIBFRAME achieves this goal is another question.)

The diagrams in the 1998 FRBR report imply that there would be a work entity structure. However, the report also states unequivocally that it is not defining a data format.(**) In keeping with 1990's library technology, FRBR anticipates that each entity may have an identifier, but the identifier is a descriptive element (think: ISBN), not an anchor for all of the data elements of the entity (think: IRI).

As we see with the implementation of RDA cataloging in the MARC21 environment, describing a work conceptually does not require the use of a separate work "record." Whether work decisions are required for every cataloged manifestation is a cataloging decision; whether work entities are required for every work is a data design decision. That design decision should be based on the services that the system is expected to render.  The "entity" decision may or may not require any action on the part of the cataloger depending on the interface in which cataloging takes place. Just as today's systems do not store the MARC21 data as it appears on the cataloger's screen, future systems will have internal data storage formats that will surely differ from the view in the various user interfaces.

"The Upshot"

We can assume that every human-created resource has an aspect of work-ness, but this doesn't always translate well to bibliographic description nor to a work entity in bibliographic data. Past practice in relation to works differs significantly from, say, the practice in relation to agents (persons, corporate bodies) for whom one presumes that the name authority control decision is always part of the cataloging workflow. Instead, work "names" have been inconsistently developed (with exceptions, such as in music materials). It is unclear if, in the future, every work description will be assumed to have undergone a "work name authority" analysis, but even more unreliable is any assumption that can be made about whether an existing bibliographic description without a uniform title has had its "work-ness" fully examined.

This latter concern is especially evident in the transformations of current MARC21 cataloging into either RDA, BIBFRAME, or schema.org. From what I have observed, the transformations do not preserve the difference between a manifestation title that does not have a formal uniform title to represent the work, and those titles that are currently coded in MARC21 fields 130, 240, or the $t of an author/title field. Instead, where a coded uniform title is not available in the MARC21 record, the manifestation title is copied to the work title element. This means that the fact that a cataloger has carefully crafted a work title for the resource is lost. Even though we may agree that the creation of work titles has been inconsistent at best, copying transcribed titles to the work title entity wherever no uniform title field is present in the MARC21 record seems to be a serious loss of information. Or perhaps I should put this as a question: in the absence of a unform title element, can we assume that the transcribed title is the appropriate work title?

To conclude, I guess I will go ahead and harp on a common nag of mine, which is that copying data from one serialization to another is not the transformation that will help us move forward. The "work" is very complex; I would feel less concerned if we had a strong and shared concept of what services we want the work to provide in the future, which should help us decide what to do with the messy legacy that we have today.


Footnotes

* Note that in 1877 there already was a "Co-operation committee" of the American Library Association, tasked with looking at cooperative cataloging and other tasks. That makes this a 140-year-old tradition.
"Of the standing committees, that on co-operation will probably prove the most important organ of the Association..." (see more at link)

** If you want more about what FRBR is and is not, I will recommend my book "FRBR: Before and After" (open access copy) for an in-depth analysis. If you want less, try my SWIB talk "Mistakes Have Been Made" which gets into FRBR at about 13:00, but you might enjoy the lead-up to that section.

References

[1] Wilson, Patrick. Two Kinds of Power : an Essay on Bibliographical Control. University of California Publications: Librarianship. Berkeley, Los Angeles, London: University of California Press, 1978.
[2] Smiraglia, Richard. The Nature of “a Work”; Implications for the Organization of Knowledge. Lanham: Scarecrow Press, 2001.
[3] Lubetzky, Seymour. Principles of Cataloging. Final report. Phase I. In: Seymour Lubtezky: writings on the classical art of cataloging. Edited by Elaine Svenonius and Dorothy McGarry. Englewood, CO, Libraries Unlimited. 2001

Preserving Google docs - decisions and a way forward

Published 7 Jul 2017 by Jenny Mitcham in Digital Archiving at the University of York.

Back in April I blogged about some work I had been doing around finding a suitable export (and ultimately preservation) format for Google documents.

This post has generated a lot of interest and I've had some great comments both on the post itself and via Twitter.

I was also able to take advantage of a slot I had been given at last week's Jisc Research Data Network event to introduce the issue to the audience (who had really come to hear me talk about something else but I don't think they minded).

There were lots of questions and discussion at the end of this session, mostly focused on the Google Drive issue rather than the rest of the talk. I was really pleased to see that the topic had made people think. In a lightening talk later that day, William Kilbride, Executive Director of The Digital Preservation Coalition mused on the subject of "What is data?". Google Drive was one of the examples he used, asking where does the data end and the software application start?

I just wanted to write a quick update on a couple of things - decisions that have been made as a result of this work and attempts to move the issue forward.

Decisions decisions

I took a summary of the Google docs data export work to my colleagues in a Research Data Management meeting last month in order to discuss a practical way forward for the institutional research data we are planning on capturing and preserving.

One element of the Proof of Concept that we had established at the end of phase 3 of Filling the Digital Preservation Gap was a deposit form to allow researchers to deposit data to the Research Data York service.

As well as the ability to enable researchers to browse and select a file or a folder on their computer or network, this deposit form also included a button to allow deposit to be carried out via Google Drive.

As I mentioned in a previous post, Google Drive is widely used at our institution. It is clear that many researchers are using Google Drive to collect, create and analyse their research data so it made sense to provide an easy way for them to deposit direct from Google Drive. I just needed to check out the export options and decide which one we should support as part of this automated export.

However, given the inconclusive findings of my research into export options it didn't seem that there was one clear option that adequately preserved the data.

As a group we decided the best way out of this imperfect situation was to ask researchers to export their own data from Google Drive in whatever format they consider best captures the significant properties of the item. By exporting themselves in a manual fashion prior to upload, this does give them the opportunity to review and check their files and make their own decision on issues such as whether comments are included in the version of their data that they upload to Research Data York.

So for the time being we are disabling the Google Drive upload button from our data deposit interface....which is a shame because a certain amount of effort went into getting that working in the first place.

This is the right decision for the time being though. Two things need to happen before we can make this available again:


  1. Understanding the use case - We need to gain a greater understanding of how researchers use Google Drive and what they consider to be 'significant' about their native Google Drive files.
  2. Improving the technology - We need to make some requests to Google to make the export options better.


Understanding the use case

We've known for a while that some researchers use Google Drive to store their research data. The graphic below was taken from a survey we carried out with researchers in 2013 to find out about current practice across the institution. 

Of the 188 researchers who answered the question "Where is your digital research data stored (excluding back up copies)?" 22 mentioned Google Drive. This is only around 12% of respondents but I would speculate that over the last four years, use of Google Drive will have increased considerably as Google applications have become more embedded within the working practices of staff and students at the University.

Where is your digital research data stored (excluding back up copies)?

To understand the Google Drive use case today I really need to talk to researchers.

We've run a couple of Research Data Management teaching sessions over the last term. These sessions are typically attended by PhD students but occasionally a member of research staff also comes along. When we talk about data storage I've been asking the researchers to give a show of hands as to who is using Google Drive to store at least some of their research data.

About half of the researchers in the room raise their hand.

So this is a real issue. 

Of course what I'd like to do is find out exactly how they are using it. Whether they are creating native Google Drive files or just using Google Drive as a storage location or filing system for data that they create in another application.

I did manage to get a bit more detail from one researcher who said that they used Google Drive as a way of collaborating on their research with colleagues working at another institution but that once a document has been completed they will export the data out of Google Drive for storage elsewhere. 

This fits well with the solution described above.

I also arranged a meeting with a Researcher in our BioArCh department. Professor Matthew Collins is known to be an enthusiastic user of Google Drive.

Talking to Matthew gave me a really interesting perspective on Google Drive. For him it has become an essential research tool. He and his colleagues use many of the features of the Google Suite of tools for their day to day work and as a means to collaborate and share ideas and resources, both internally and with researchers in other institutions. He showed me PaperPile, an extension to Google Drive that I had not been aware of. He uses this to manage his references and share them with colleagues. This clearly adds huge value to the Google Drive suite for researchers.

He talked me through a few scenarios of how they use Google - some, (such as the comments facility) I was very much aware of. Others, I've not used myself such as the use of the Google APIs to visualise for example activity on preparing a report in Google Drive - showing a time line and when different individuals edited the document. Now that looks like fun!

He also talked about the importance of the 'previous versions' information that is stored within a native Google Drive file. When working collaboratively it can be useful to be able to track back and see who edited what and when. 

He described a real scenario in which he had had to go back to a previous version of a Google Sheet to show exactly when a particular piece of data had been entered. I hadn't considered that the previous versions feature could be used to demonstrate that you made a particular discovery first. Potentially quite important in the competitive world of academic research.

For this reason Matthew considered the native Google Drive file itself to be "the ultimate archive" and "a virtual collaborative lab notebook". A flat, static export of the data would not be an adequate replacement.

He did however acknowledge that the data can only exist for as long as Google provides us with the facility and that there are situations where it is a good idea to take a static back up copy.

He mentioned that the precursor to Google Docs was a product called Writely (which he was also an early adopter of). Google bought Writely in 2006 after seeing the huge potential in this online word processing tool. Matthew commented that backwards compatibility became a problem when Google started making some fundamental changes to the way the application worked. This is perhaps the issue that is being described in this blog post: Google Docs and Backwards Compatibility.

So, I'm still convinced that even if we can't preserve a native Google Drive file perfectly in a static form, this shouldn't stop us having a go!

Improving the technology

Along side trying to understand how researchers use Google Drive and what they consider to be significant and worthy of preservation, I have also been making some requests and suggestions to Google around their export options. There are a few ideas I've noted that would make it easier for us to archive the data.

I contacted the Google Drive forum and was told that as a Google customer I was able to log in and add my suggestions to Google Cloud Connect so this I did...and what I asked for was as follows:

  • Please can we have a PDF/A export option?
  • Please could we choose whether or not to export comments or not ...and if we are exporting comments can we choose whether historic/resolved comments are also exported
  • Please can metadata be retained - specifically the created and last modified dates. (Author is a bit trickier - in Google Drive a document has an owner rather than an author. The owner probably is the author (or one of them) but not necessarily if ownership has been transferred).
  • I also mentioned a little bug relating to comment dates that I found when exporting a Google document containing comments out into docx format and then importing it back again.
Since I submitted these feature requests and comments in early May it has all gone very very quiet...

I have a feeling that ideas only get anywhere if they are popular ...and none of my ideas are popular ...because they do not lead to new and shiny functionality.

Only one of my suggestions (re comments) has received a vote by another member of the community.

So, what to do?

Luckily, since having spoken about my problem at the Jisc Research Data Network, two people have mentioned they have Google contacts who might be interested in hearing my ideas.

I'd like to follow up on this, but in the meantime it would be great if people could feedback to me. 

  • Are my suggestions sensible? 
  • Are there are any other features that would help the digital preservation community preserve Google Drive? I can't imagine I've captured everything...

Preserving Google docs - decisions and a way forward

Published 7 Jul 2017 by Jenny Mitcham in Digital Archiving at the University of York.

Back in April I blogged about some work I had been doing around finding a suitable export (and ultimately preservation) format for Google documents.

This post has generated a lot of interest and I've had some great comments both on the post itself and via Twitter.

I was also able to take advantage of a slot I had been given at last week's Jisc Research Data Network event to introduce the issue to the audience (who had really come to hear me talk about something else but I don't think they minded).

There were lots of questions and discussion at the end of this session, mostly focused on the Google Drive issue rather than the rest of the talk. I was really pleased to see that the topic had made people think. In a lightening talk later that day, William Kilbride, Executive Director of The Digital Preservation Coalition mused on the subject of "What is data?". Google Drive was one of the examples he used, asking where does the data end and the software application start?

I just wanted to write a quick update on a couple of things - decisions that have been made as a result of this work and attempts to move the issue forward.

Decisions decisions

I took a summary of the Google docs data export work to my colleagues in a Research Data Management meeting last month in order to discuss a practical way forward for the institutional research data we are planning on capturing and preserving.

One element of the Proof of Concept that we had established at the end of phase 3 of Filling the Digital Preservation Gap was a deposit form to allow researchers to deposit data to the Research Data York service.

As well as the ability to enable researchers to browse and select a file or a folder on their computer or network, this deposit form also included a button to allow deposit to be carried out via Google Drive.

As I mentioned in a previous post, Google Drive is widely used at our institution. It is clear that many researchers are using Google Drive to collect, create and analyse their research data so it made sense to provide an easy way for them to deposit direct from Google Drive. I just needed to check out the export options and decide which one we should support as part of this automated export.

However, given the inconclusive findings of my research into export options it didn't seem that there was one clear option that adequately preserved the data.

As a group we decided the best way out of this imperfect situation was to ask researchers to export their own data from Google Drive in whatever format they consider best captures the significant properties of the item. By exporting themselves in a manual fashion prior to upload, this does give them the opportunity to review and check their files and make their own decision on issues such as whether comments are included in the version of their data that they upload to Research Data York.

So for the time being we are disabling the Google Drive upload button from our data deposit interface....which is a shame because a certain amount of effort went into getting that working in the first place.

This is the right decision for the time being though. Two things need to happen before we can ma