Sam's news

Here are some of the news sources I follow.

My main website is at https://samwilson.id.au/.


Team Profile: David

Published 10 Dec 2017 by Nicola Nye in FastMail Blog.

This is the tenth post in the 2017 Fastmail Advent Calendar. The previous post was about managing a lock on non-existent rows.

Check back tomorrow for more goodies.


The next victim in our series of team profiles is David Gurvich.

David Photo

Name: David Gurvich

Role: Marketing

Time working for FastMail: 2.5 years


What I work on

I work on marketing. What does that cover? I look at all the customer touchpoints, which when I came on board was just the FastMail product, but that now extends to our full product suite. As someone with a less technical background than many of the team, I try to represent the everyday user.

I work on the websites and videos. I write customer communications, develop partnerships and sponsorships, and provide strong coffee recommendations.

How did you get involved with FastMail?

I'd actually been a FastMail user for about 10 years. One day I happened to be reading the FastMail blog when I read the job ad for a technical writer. While I interviewed for that role, FastMail felt they could make better use of my marketing experience and built a new role for me to help spread the message of their email excellence to the world.

What’s your favourite thing you’ve worked on this year?

The CoinJar customer story video.

I like telling the human story: that's where our product really shines. I'm passionate about finding out how we help everyday people. The CoinJar story explains how we help a company grow their business through providing reliable email.

What’s your favourite FastMail feature?

I use the search functionality all the time, both in basic mode and the advanced search. It makes finding email easy, and fast.

I also use the rules a lot: they make my inbox less overwhelming!

What’s your favourite or most used piece of technology?

My phone! Podcasts and photos fill my phone.

What’s your preferred mobile platform?

iOS. I tried Android once but I had to go back.

What are you listening to / watching these days?

I am a huge fan of podcasts which brighten my long commute.

Music-wise, I enjoy chillout music and 80s Australian Rock.

I also enjoy cooking and food shows.

Cooking

My favourite dish to make is slow cooked ribs. I'm also on a mission to perfect my pizza dough.

What do you like to do outside of work?

I like to stay fit with cycling and running: this lets me eat all the great food!

Any FM staffers you want to brag on?

Nicola, who also has a passion for words. She's great at explaining to customers how wonderful our products are.

Rob N who, from day 1, has been patient, giving freely of his time to explain anything and everything. He doesn't realize how good he is at marketing!

Jamie and I met each other years ago, and I enjoy working with him so much, I got him to apply to work with us at FastMail.

What is your proudest moment at FastMail?

Every day being part of a team with a company that's growing and building a great future. It's such an exciting journey shaping the future of email!

... And beating Jamie at Go-Karting.


You can also find David on Twitter at @dgurvoir


MySQL and an atomic 'check ... on none insert'

Published 9 Dec 2017 by Rob Mueller in FastMail Blog.

This is the ninth post in the 2017 Fastmail Advent Calendar. The previous post was an insight into our activities at IETF. The next post is an Interview with David our Marketing Manager.

Stay tuned tomorrow for the next update.


This post could have been titled:

"Why SELECT ... FOR UPDATE doesn't work on non-existent rows like you might think it does".

FastMail has been a long term user of the MySQL database server, and in particular the InnoDB storage engine. InnoDB provides many core database features (ACID, transactions, row-level locking, etc).

One thing that comes up from time to time is the desire to do the following sequence atomically:

  1. Check if a row with a particular id exists in a table
  2. If yes, fetch the row data
  3. If not, perform an expensive calculation (e.g. external web request) to get the data and insert it into the table

The important point is you want to do this atomically. That is: if one process checks for the row, but another process is already doing the "expensive calculation", it blocks until that process completes and has inserted the row into the table.

So if you have a table:

CREATE TABLE foo (
    Id INT NOT NULL PRIMARY KEY,
    Data BLOB
);

The primary key ensures Id is unique.

One thought would be that you could do a SELECT and then INSERT IGNORE if the row doesn't exist. This doesn't return an error if the INSERT would result in a duplicate key, but does mean that the "expensive calculation" might be done multiple times.

After looking at some MySQL documentation, you might think a transaction with a SELECT ... FOR UPDATE statement is what you want. This would check for the data row, but lock the row gap if it doesn't exist. Excellent!

Unfortunately this doesn't work.

Quoting directly from an answer by Heikki Tuuri (the original developer of InnoDB)

http://forums.mysql.com/read.php?22,10506,11347#msg-11347

READ COMMITTED and REPEATABLE READ only affect plain SELECTs, not SELECT ... FOR UPDATE.

In the 4.1.8 execution:

1.mysql> select * from T where C = 42 for update;

2.mysql> select * from T where C = 42 for update;

-- The above select doesn't block now.

1.mysql> insert into T set C = 42;

-- The above insert blocks.

2.mysql> insert into T set C = 42;

ERROR 1213: Deadlock found when trying to get lock; Try restarting
transaction

the explanation is that the X lock on the 'gap' set in SELECT ... FOR UPDATE is purely 'inhibitive'. It blocks inserts by other users to the gap. But it does not give the holder of the lock a permission to insert.

Why the inhibitiveness: if we have three index records:

aab <first gap> aba <second gap> acc

there are two gaps there. Suppose user 1 locks the first gap, and user 2 locks the second gap.

But if 'aba' is delete-marked, purge can remove it, and these two gaps merge. Then BOTH user 1 and user 2 have an exclusive lock on the same gap. This explains why a lock on gap does not give a user a permission to insert. There must not be locks by OTHER users on the gap, only then is the insert allowed.

Best regards,

Heikki Oracle Corp./Innobase Oy

Although this post is over 12 years old, it appears to be still relevant to InnoDB today, as any quick testing will show.

In fact in some common scenarios, it's even worse than you might expect. Say you have one table that has an auto increment primary key. You then use that created id to calculate and insert "expensive to calculate" data into another table with the same id. The first FOR UPDATE locks the gap for any id's after it as well.

Say table T has no C's > 41.

t1> select * from T where C = 42 for update;
t2> select * from T where C = 43 for update;
t1> mysql> insert into T set C = 42;
-- The above insert blocks.
t2> insert into T set C = 43;
ERROR 1213: Deadlock found when trying to get lock; Try restarting
transaction

There's some more commentary online about this issue:

The options appear to be:

Sometimes this is rather frustrating, but there don't appear to be any other good options.

Why FastMail uses MySQL InnoDB

For the curious, when we originally started developing FastMail in 2000 we started with PostgreSQL. However at the time PostgreSQL had a limit of 8kb of data per-row. Because (again, at the time) we stored emails in the database, this was not viable and we switched to MySQL InnoDB. As these things are, it was only a short while later that we switched to IMAP and Cyrus for email storage.

Sometimes the state of a product is the sum of a series of seemingly inconsequential decisions at the time.


MediaWiki installed with nginx reverse proxy configuration

Published 8 Dec 2017 by Daniel Gao in Newest questions tagged mediawiki - Stack Overflow.

I would like to install MediaWiki in such a way.

  1. Download and extract all the MediaWiki installation files to a location where wiki.gaobo.org nginx server block points to;
  2. Set up a reverse proxy configuration in gao.bo nginx server block so that gao.bo/wiki is equivalent to wiki.gaobo.org ;
  3. Launch the installation from gao.bo/wiki so that initially the server URL is gao.bo/wiki instead of wiki.gaobo.org .

Questions:

  1. Is the design above implementable after all?
  2. If possible, how to implement Step 2?

FastMail at the IETF

Published 8 Dec 2017 by Bron Gondwana in FastMail Blog.

This is the eighth post in the 2017 FastMail Advent series. The previous post was about how people use Topicbox, our new product which happens to be based on the JMAP protocol! The following post looks at an interesting problem we had with MySQL locks and what we learned from it.


One of our core company values is a committment to interoperability through open source and open standards. FastMail believes very strongly in contributing to a healthy email ecosystem. We want people to become our customers, and to remain our customers, because they love our service and because we’re good at what we do – not because they have no choice.

When we started work on JMAP, “a better way to email”, we wanted to bring our advances in email protocol design to the rest of the world. We did this not only for altruistic reasons, but for the very selfish reason that too many new and popular email clients are tied to a specific email service (and not FastMail!), because the open protocols are too hard to use or lacking in features. With JMAP, we intend to make the easiest way to build the best experience be based on open standards.

Chartering a working group

After a couple of years of talking about JMAP at conferences and finding others who wanted to work with us it was time to bring JMAP to the Internet Engineering Task Force. The IETF is the body that produces what are whimsically known as RFC or “Request for Comments” documents, and if you want something to become an Internet standard, the RFC Standards track is the way to go.

There are various areas within the IETF, and JMAP naturally fell under the scope of the Applications and Real Time Area.

Since there was no active working group within ART which could take on JMAP, a new working group was proposed. JMAP-the-IETF-working-group was officially chartered just before the 98th IETF meeting in Chicago in March 2017. I was appointed as one of the chairs of the working group, with an experienced co-chair (Barry) to mentor me. We also found co-authors from outside FastMail for each of the standards track documents the working group plans to produce.

It’s important that the final JMAP be a community consensus and not dominated by our company. We are as keen as everyone else to see JMAP improved by input from the most experienced email people in the world.

Many changes

It became clear upon joining the IETF that there were many people with experience running very different email infrastructures than we run, as well as people from other areas such as Security, HTTP, and Authentication who could bring their own expertise to the standard.

We have a very active issue tracker, with (at the time of writing) 81 closed issues and 19 still in progress, covering just the core and mail specifications. There are also in-progress documents for calendar and contacts which are awaiting work on the data representation. Robert already wrote about his work within the CalConnect group to standardise a representation for calendar events in a JMAP-compatible way, and I’ll be helping that work its way through the calext working group where it best fits.

Robert has also taken the lead on implementing JMAP in the open source Cyrus IMAPd server, while I work on keeping the JMAP Proxy up to date in between my other responsibilities.

The great thing about the IETF is that all the discussions happen in the open, so everybody is welcome to follow along on the mailing list.

JMAP has now had 3 working group sessions, one at each of the three IETF meetings in 2017, where the interested parties all get together to discuss the remaining issues face-to-face. We will be holding another session at IETF 101, London in March 2018 - hopefully with complete core and mail specifications, and starting to work on our next milestones.

Extra tasks

While JMAP is working on a brand new protocol for the future, there’s another working group which chartered towards the end of 2017 to continue to improve the existing IMAP and Sieve standards - Email mailstore and eXtensions To Revise or Amend (EXTRA). If there’s one thing the IETF loves more than anything else, it’s a good acronym for a working group name. Bonus points if you can work a pun in. I sometimes suspect that the primary criterion for starting a new working group is whether the name is cool enough.

Anyway... I’m also a chair on the EXTRA working group, and EXTRA and JMAP authors are keeping an eye on each others’ work to ensure that it remains possible to support both protocols in the same server.

Hackathons

Over the past few years, the IETF has been hosting a hackathon immediately before the main conference. This gives an opportunity for developers and authors to work together to test ideas and interoperability throughout the standards process.

In Chicago (IETF98) we arrived too late to participate in the hackathon.

In Prague (IETF99) we had no specific plans for the hackathon, but we did arrive in time to join. Since most of the mail people were sitting on the ARC working group table, we joined them. While Neil worked on JMAP standards documents, I hacked an initial implementation of ARC into the perl Mail::DKIM module and Authentication Milter.

I handed my initial experiments over to Marc to be tidied up and made production grade, and it is now being used to authenticate and seal all incoming mail at FastMail, as well as being available as open source to everybody else. The OpenARC implementation from the Trusted Domain Project was just released today and we expect to continue working on ARC both at IETF and elsewhere over the coming year.

At the recent IETF100 in Singapore just last month, Neil and I updated both the client and proxy to the latest spec changes, and made good progress in testing the new submission object. My day job includes significantly less programming these days, so the hackathon was a nice opportunity to focus on code for a few uninterupted hours!

At IETF101 in London we plan to spend the hackathon with JMAP implementors on interoperability testing a mostly complete set of core and mail drafts.


Topicbox: the solution for team email!

Published 7 Dec 2017 by Nicola Nye in FastMail Blog.

This is the seventh post in the 2017 Fastmail Advent Calendar. The previous post looked at the development and philosophy behind our account recovery tool. The following post looks at our work at the IETF, including development of the JMAP standard which Topicbox is build on.


FastMail’s newest product is Topicbox that we 💙 LOVE 💙 using. Like many organizations, we were overwhelmed with group communication - so we built a service to fix it. Topicbox harnesses the best of email to help teams communicate better. It helps us each manage what hits our inbox, what we can leave until later and gives us a central place to search for something half-remembered.

Topicbox website on devices

But enough about us. This is a post about you.

Who else uses Topicbox? Here's a quick overview of just five types of organizations we serve with Topicbox that benefit from team email management. No more email overload. No more long cc lists. And so much more than a mere mailing list.

Team communication in business

Companies often have several teams: Sales, Engineering and Admin, for example. Each team has its own role to play. Lots of discussion occurs within the team, with crossover points to other teams.

Sending email to all staff fills everyone's inboxes with mail that isn't relevant to them, but managing a CC list inevitably means someone gets left off by accident.

Topicbox lets you set up a group for each team, helping you keep your email organized. There's just one email address for each team, and you can add or remove people in one central location.

The groups are discoverable in the Topicbox group directory, allowing members to see what's happening in other teams, but without having to get overwhelmed by email.

You can even set up groups per project, where communication needs to happen across team members.

Consultancy with long-lived customer projects

You're a business who does projects for customers that are more than just a short term engagement. You need somewhere to keep the communication alive on the project. But the project may need different staff involved over time, as it matures from sale to delivery to troubleshooting and maintenance.

A Topicbox group per customer project means that the entire history of the project is available for searching online (only to those who have access!) enabling more rapid onboarding of staff as they join a project mid-flight.

The flexible privacy controls in Topicbox let you set which groups are only available to staff, which groups a user from a particular client company can see, and which groups are visible to the world (if any).

Public and private group types on Topicbox

Community group with lots of members

Open source software groups, user and hobby groups, conferences, events and other community-based organizations that need to engage with the public can use Topicbox.

Create as many publicly visible groups as you need to manage communication among your members, while keeping one private group hidden for discussion among the organizers.

You can control who can send mail to the group, whether it's anyone in the world, just the group members, or only the administrators (to broadcast news), while the discussions are visible on the web without anyone needing to log in to see them.

For an example of this in action, take a look at the illumos Topicbox instance. Illumos is a free open source Unix operating system.

Group email among timezones

Topicbox is a fantastic alternative to Slack (a chat platform) when you have team members around the world.

It is frustrating to spend the first chunk of your day reading what your colleagues on the other side of the planet spoke about while you were sleeping. Chat tends to have a high ratio of instant situational matters (who wants coffee, check out this cute cat picture, did you respond to that urgent customer request yet), to issues of greater long term importance (a new feature rolled out please tell your customers, need feedback on a blog post by the end of the week).

And while you could just skip all the backscroll, what if there is something in there that's important?

Topicbox to the rescue. Keep the local chatter to the chat program and the important communication that needs thinking time and cross-timezone interactions can go to a Topicbox group.

Need to have a real-time conversation about a Topicbox matter? You can link to any Topicbox message and if the recipient has permission to see the post, you can bring them up to speed instantly.

Homeowners association / Body corporate

Many countries have a collection of people who pay fees towards maintenance and upkeep of a shared property features: a communal garden or the structure and plumbing of the building containing a number of apartments.

Having a mailing list allows the group to talk to one another and coordinate efforts without having to remember each individual member's email address. Even more helpful is that Topicbox gives you a searchable archive on the web of all communication. Everyone can see what was agreed upon, by whom, and when, even if they've only just joined the association.

Keen to try it out?

Just like FastMail, we offer a free one month trial for a Topicbox organization.

Contact our sales team support@topicbox.com if you have further questions and we'd be glad to help you out.


Security case study: Account Recovery

Published 6 Dec 2017 by Rob Mueller in FastMail Blog.

This is the sixth post in the 2017 FastMail Advent Calendar. The previous post was about the FastMail security mindset and this post outlines an example of how our philosophy turns into reality. The next post is about different ways people use Topicbox.


Yesterday we talked about how we don’t like the idea of security theatre. Instead we develop measures which meaningfully improve the confidentiality, availability and/or integrity of our customer’s data.

Today we're going to look at a great example of this: when we re-did our password, 2FA and account recovery systems.

Our original password system with alternative logins was a classic "evolved" system that was begun over 10 years ago. Parts were upgraded over time and additions added on. The world we started in 15 years ago was completely different to today. We were an early adopter of something as simple as SSL encryption for all protocols (web, IMAP, SMTP, etc); this was relatively rare at the time. Back then attackers were less sophisticated and online identities weren't as important. Stolen accounts were rare.

But all of that has changed. Online attackers are much more sophisticated, and the things they want vary enormously, from

About two years ago, we decided to completely redo our password and recovery mechanisms. We designed an entirely new system, carefully considering all the cases.

The first result of this was a completely new authentication system. We moved all protocol passwords (e.g. IMAP, SMTP, CalDAV, etc) to separate application passwords. These allow us to have different access controls for each password:

We also completely redesigned how we store user web passwords to allow us to easily store passwords hashed with any hash function we choose. This lets us change over time to keep up with best practice.

Account recovery is a vital part of our security process. Users get locked out for a variety of reasons, from lost passwords, to hacked accounts (most often caused by phishing or password reuse). We want to make sure we are recovering an account to a legitimate user, and not to a malicious attacker attempting to gain access.

When considering our security policy, we tried to enumerate a set of situations that accounts might find themselves in. Quoting some of the thinking from the initial design document:

The primary cases where account recovery is required are: 1. Locked accounts. (Hacker steals password and sends spam.) 2. Forgotten password. Most commonly immediately after changing it (or creating the account for the first time).

In more detail, here are the following scenarios where we would like to allow recovery. In all the following, U = User, pw = password, H = Hacker:

  1. U changes pw (no 2FA) and immediately doesn't know what they changed it to.
  2. U hasn't used service for a while and can't remember what pw is.
  3. U has 2FA and loses Yubikey/U2F/Phone.
  4. U has 2FA and forgets password, or changes it and can't remember new one.
  5. PW stolen. H changes pw. U tries to recover.
  6. PW stolen. H changes pw + recovery options.
  7. PW stolen. H changes pw + recovery options + adds 2FA. U tries to recover.
  8. PW stolen. H adds 2FA. U tries to recover.
  9. PW stolen. H uses to send spam. Account locked. User has no recovery options.

Conversely, we must deny recovery to anyone else. When in doubt, we should side with denying access. In more details, here are some of the scenarios that must be taken into account:

  1. PW stolen. U changes pw immediately. H tries to recover with old pw.
  2. PW stolen. U adds 2FA. H tries to recover with current pw.
  3. PW stolen. H accesses account. U changes pw. H tries to recover with old pw.
  4. PW stolen. H goes through reset process but waits at end screen. User changes pw or adds 2FA. H than tries to change pw at end screen.
  5. PW stolen. Account locked. H tries to unlock.
  6. PW stolen, but account has 2FA. H tries to recover.
  7. Phone stolen. H has access to email account via IMAP/app + any 2F token + SMS.
  8. PW stolen. H changes recovery. U changes pw + recovery. H tries to recover.

From that as a starting point, we then proceeded to design a completely automated system to meet all these needs. We are acutely aware of the risks of social engineering and the goal of the new tool was to completely remove human interaction in as many cases as possible.

No automated system can resolve 100% of cases, and this type of system must err on the side of denying access in cases where there is sufficient doubt. Where users contact support because they were unable to regain access to their account through the recovery tool, we have a policy in place to escalate to senior security team members. This lets us review each case and further refine our automated system.

There is a tension between keeping your account secured while maintaining recovery in the face of everyday life events such as: password loss, device reset, or changing phone numbers. We believe with our current system we have found a reasonable balance. Users who elect to enable two-step verification can weight the balance towards higher confidentiality at the expense of potential loss of recovery. We make sure that recovering your account if you have 2FA enabled is no less stringent than login and still requires two independent factors.

As an example of one of the many small tradeoffs that occur: some users have questioned why we require people to have a recovery phone number in place when you setup a 2FA option as phone numbers are known to be insecure. With 2FA set up, a phone number alone is not enough to perform an account recovery: a correct password is still required. Someone compromising your phone alone won't be able to gain access to your account. In fact if they tried, we'd email you letting you know that there was a failed attempt to reset your password, likely alerting you to the fact that your phone number had been stolen!

This is a good balance. If you really don't want your phone as a recovery option, you can remove it after adding the 2FA, but this is risky. We've seen several people do this, and then lose their 2FA device (they lost their phone, or their TOTP app got reset, etc), or even change their password and immediately forget it, without a recovery option. If that happens, you will lose access to your account permanently. Setting up 2FA without a recovery option suggests that you value security over availability at all costs, and it's up to you to ensure you never lose your password or 2FA token.

Working on the design and implementation of the new security and account recovery system took well over a year with many discussions along the way. Real world security is hard, but we believe we've now got one of the best systems in place.


Untitled

Published 6 Dec 2017 by Sam Wilson in Sam's notebook.

2.5 of 7 m of ripping done.


Searching for an English identity

Published 5 Dec 2017 by in New Humanist Articles and Posts.

In our uncertain times, there is a new demand for stories of England – but this search is desperate and confused.

The FastMail Security Mindset

Published 5 Dec 2017 by Neil Jenkins in FastMail Blog.

This is the fifth post in the 2017 FastMail Advent Calendar. The previous post was about what we are up to at CalConnect. The next post is an example of our philosophy in action with a look at our revamp of 2FA, passwords and recovery.


“Security” is a word that gets bandied around a lot in the IT world, often with little actual thought or substance behind its use. The phrase “we take your privacy and security seriously” is the preamble to many a mea culpa from companies who, frankly, didn’t.

FastMail has always been an engineering-focused company, from the top down. As such there is a strong culture of no-bullshit, and an intense dislike of security theatre.

Our approach to security is to proactively develop and adopt any measures which meaningfully improve the confidentiality, availability or integrity of our customer’s data. We are not interested in implementing things that sound good in marketing spiel but don’t actually help, or may even actively hurt, our customers’ security. We also strongly believe that usability is part of security; we need to make it easy to stay safe, hard to get wrong, in order to be secure.

As an example of this mindset, we were one of the early adopters of opportunistic TLS encryption of SMTP connections when sending and receiving mail. This prevents passive man-in-the-middle attacker from snooping on your data, making mass surveillance much harder.

This even protects interception of metadata; someone watching our outbound connections might just know FastMail connected to Gmail, for example. There’s a lot of email sent between us by many different users, so observing this connection would not leak much information. (Interestingly this is where there is safety in numbers: if you and your intended recipient both hosted your own email on individual servers, then encrypting the connection doesn’t really hide who the message is from or to!)

Supporting encrypted SMTP meaningfully improved the confidentiality of our customer’s data, without impacting our users’ workflow. And there’s still more we can do in this area! Initiatives like MTA-STS will allow us to further protect against active man-in-the-middle attacks on mail delivery, and all without impacting usability.

Just as important as what we do do is what we don’t. For example, we don’t do full message encryption (e.g. PGP) in the browser. In theory it means you “don’t have to trust us”. However in reality, every time you open your email you would be trusting the code delivered to your browser. If the server were compromised, it could easily be made to return code that intercepted and sent back your password next time you logged in; it could even just do this for specific users. It is very unlikely that a user would notice.

We therefore don’t believe this offers a meaningful increase in security, and can be actively harmful in a number of ways. It reduces availability, because if you forget your password we cannot help you recover access to your own mail. It makes phishing (by far the biggest cause of compromised accounts) much harder to filter out.

It can also be seriously dangerous when users misunderstand the security characteristics. For example, if you were a journalist working undercover in certain countries, you may justifiably require secure, anonymous communication. “Encrypted email” sounds like just the thing you need. But if your mail host doesn’t proxy images to hide your IP, someone could simply send you a message which when opened made your device connect directly to their servers. This reveals your IP address, which can often be used to fairly precisely determine your location, and sends cookies that may allow them to correlate your email address with visits to other sites on the web. That’s a much bigger risk.

Ultimately, security is a process, not a checkbox. We are always looking for further measures that will help secure our customer’s sensitive data. But we don’t do stuff just to check a marketing box. It may lose us a few customers enticed by razzle-dazzle claims, but we feel better about the integrity of our service.


Untitled

Published 4 Dec 2017 by Sam Wilson in Sam's notebook.

I am so excited about this cup of tea.


AI in Practice

Published 4 Dec 2017 by Alejandro (Alex) Jaimes in The DigitalOcean Blog.

AI in Practice

This is the final installment in a three-part series on artificial intelligence by DigitalOcean’s Head of R&D, Alejandro (Alex) Jaimes. Read the first post about the state of AI, and the second installment about how data and models feed computing.

So what does AI as a service mean for hobbyists, professional developers, engineering teams, the open source community, and companies today?

Starting an AI (or machine learning) project can be a daunting task at any level, and the steps should be different depending on the context. It’s important to note that sophisticated algorithms are not a requirement for AI and more often than not solutions may be simple. Even the most basic machine learning algorithm can do a decent job for some problems and once a process is set up, more sophisticated iterations are possible.

An alternative is starting with sophisticated algorithms—as long as there’s a good understanding of what those algorithms do and it’s “easy” to get them up and running. You don’t want to start your first iteration setting a large number of parameters you don’t understand.

There are some exceptions, and arguably, choices that depend on many factors, including level of expertise, but in general, it’s feasible to start small, build, and iterate quickly—you want to build an initial solution that demonstrates value. Even if it’s imperfect, setting up a process, and obtaining data, gets you off the ground. It’s imperative, however, to ask the right questions, focus on the solution, and the needs of who will be using whatever you build, and be resourceful and creative in combining data, models, and open source frameworks. Here’s how that applies to different players in the tech space:

The Future

The field is evolving extremely quickly and one could argue that most of the research being published consists mainly of experimentation, on either applying known deep learning architectures to new problems or tweaking parameters. It’s clear, however, that efforts—and progress—are being made n areas such as transfer learning, reinforcement learning, and unsupervised learning, among others. In terms of hardware, it’s too early to say, but it’s very positive to see new developments in the space.

Perhaps more important than advancements in algorithms, we can expect advances in how AI augments human abilities. There will be a much tighter integration between humans and machines than what computing has created thus far. For hobbyists, professional developers, engineering teams, the open source community and companies, this really translates to having a strong human-centered focus.

Conclusion

I’ve referred to AI throughout this series, but most of my examples relate to machine learning. One of the key differences between the two is that true AI applications will have an even stronger focus on user interaction and experience. At the end of the day, it’s the applications we build that will make a difference, AI or not. How “smart” the system is, or what algorithms it uses, won’t matter.

Alejandro (Alex) Jaimes is Head of R&D at DigitalOcean. Alex enjoys scuba diving and started coding in Assembly when he was 12. In spite of his fear of heights, he's climbed a peak or two, gone paragliding, and ridden a bull in a rodeo. He's been a startup CTO and advisor, and has held leadership positions at Yahoo, Telefonica, IDIAP, FujiXerox, and IBM TJ Watson, among others. He holds a Ph.D. from Columbia University.
Learn more by visiting his personal website or LinkedIn profile. Find him on Twitter: @tinybigdata.


Cakes, quizzes, blogs and advocacy

Published 4 Dec 2017 by Jenny Mitcham in Digital Archiving at the University of York.

Last Thursday was International Digital Preservation Day and I think I needed the weekend to recover.

It was pretty intense...

...but also pretty amazing!

Amazing to see what a fabulous international community there is out there working on the same sorts of problems as me!

Amazing to see quite what a lot of noise we can make if we all talk at once!

Amazing to see such a huge amount of advocacy and awareness raising going on in such a small space of time!

International Digital Preservation Day was crazy but now I have had a bit more time to reflect, catch up...and of course read a selection of the many blog posts and tweets that were posted.

So here are some of my selected highlights:

Cakes

Of course the highlights have to include the cakes and biscuits including those produced by Rachel MacGregor and Sharon McMeekin. Turning the problems that we face into something edible helps does seem to make our challenges easier to digest!

Quizzes and puzzles

A few quizzes and puzzles were posed on the day via social media - a great way to engage the wider world and have a bit of fun in the process.


There was a great quiz from the Parliamentary Archives (the answers are now available here) and a digital preservation pop quiz from Ed Pinsent of CoSector which started here. Also for those hexadecimal geeks out there, a puzzle from the DP0C Fellows at Oxford and Cambridge which came just at the point that I was firing up a hexadecimal viewer as it happens!

In a blog post called Name that item in...? Kirsty Chatwin-Lee at Edinburgh University encourages the digital preservation community to help her to identify a mysterious large metal disk found in their early computing collections. Follow the link to the blog to see a picture - I'm sure someone out there can help!

Announcements and releases

There were lots of big announcements on the day too. IDPD just kept on giving!

Of course the 'Bit List' (a list of digitally endangered species) was announced and I was able to watch this live. Kevin Ashley from the Digital Curation Coalition discusses this in a blog post. It was interesting to finally see what was on the list (and then think further about how we can use this for further advocacy and awareness raising).

I celebrated this fact with some Fake News but to be fair, William Kilbride had already been on the BBC World Service the previous evening talking about just this so it wasn't too far from the truth!

New versions of JHOVE and VeraPDF were released as well as a new PRONOM release.  A digital preservation policy for Wales was announced and a new course on file migration was launched by CoSector at the University of London. Two new members also joined the Digital Preservation Coalition - and what a great day to join!

Roadshows

Some institutions did a roadshow or a pop up museum in order to spread the message about digital preservation more widely. This included the revival of the 'fish screensaver' at Trinity College Dublin and a pop up computer museum at the British Geological Survey.

Digital Preservation at Oxford and Cambridge blogged about their portable digital preservation roadshow kit. I for one found this a particularly helpful resource - perhaps I will manage to do something similar myself next IDPD!

A day in the life

Several institutions chose to mark the occasion by blogging or tweeting about the details of their day. This gives an insight into what we DP folks actually do all day and can be really useful being that the processes behind digital preservation work are often less tangible and understandable than those used for physical archives!

I particularly enjoyed the nostalgia of following ex colleagues at the Archaeology Data Service for the day (including references to those much loved checklists!) and hearing from  Artefactual Systems about the testing, breaking and fixing of Archivematica that was going on behind the scenes.

The Danish National Archives blogged about 'a day in the life' and I was particularly interested to hear about the life-cycle perspective they have as new software is introduced, assessed and approved.

Exploring specific problems and challenges

Plans are my reality from Yvonne Tunnat of the ZBW Leibniz Information Centre for Economics was of particular interest to me as it demonstrates just how hard the preservation tasks can be. I like it when people are upfront and honest about the limitations of the tools or the imperfections of the processes they are using. We all need to share more of this!

In Sustaining the software that preserves access to web archives, Andy Jackson from the British Library tells the story of an attempt to maintain a community of practice around open source software over time and shares some of the lessons learned - essential reading for any of us that care about collaborating to sustain open source.

Kirsty Chatwin-Lee from Edinburgh University invites us to head back to 1985 with her as she describes their Kryoflux-athon challenge for the day. What a fabulous way to spend the day!

Disaster stories

Digital Preservation Day wouldn't be Digital Preservation Day without a few disaster stories too! Despite our desire to move away beyond the 'digital dark age' narrative, it is often helpful to refer to worse case scenarios when advocating for digital preservation.

Cees Hof from DANS in the Netherlands talks about the loss of digital data related to rare or threatened species in The threat of double extinction, Sarah Mason from Oxford University uses the recent example of the shutdown of DCist to discuss institutional risk, José Borbinha from Lisbon University, Portugal talks about his own experiences of digital preservation disaster and Neil Beagrie from Charles Beagrie Ltd highlights the costs of inaction.

The bigger picture

Other blogs looked at the bigger picture

Preservation as a present by Barbara Sierman from the National Library of the Netherlands is a forward thinking piece about how we could communicate and plan better in order to move forward.

Shira Peltzman from the University of California, Los Angeles tries to understand some of the results of the 2017 NDSA Staffing Survey in It's difficult to solve a problem if you don't know what's wrong.

David Minor from the University of San Diego Library, provides his thoughts on What we’ve done well, and some things we still need to figure out.

I enjoyed reading a post from Euan Cochrane from Yale University Library on The Emergence of “Digital Patinas”. A really interesting piece... and who doesn't like to be reminded of the friendly and helpful Word 97 paperclip?

In Towards a philosophy of digital preservation, Stacey Erdman from Beloit College, Wisconsin USA asks whether archivists are born or made and discusses her own 'archivist "gene"'.




So much going on and there were so many other excellent contributions that I missed.

I'll end with a tweet from Euan Cochrane which I thought nicely summed up what International Digital Preservation Day is all about and of course the day was also concluded by William Kilbride of the DPC with a suitably inspirational blog post.



Congratulations to the Digital Preservation Coalition for organising the day and to the whole digital preservation community for making such a lot of noise!



What we are up to at CalConnect

Published 4 Dec 2017 by Robert Stepanek in FastMail Blog.

This is the fourth post in the 2017 FastMail Advent Calendar. The previous post was a staff profile on Jamie, the man who makes us look good. The next post is about the FastMail security mindset.


Standards and CalConnect

Three years ago that we launched our calendar service, and since then we've continually been working on it, be that

and other improvements.

analemmatic sundial clock

In calendaring, as in email, we are committed to open standards, and we stay true to the specifications that enable calendar synchronisation between different service providers and devices. These are currently CalDAV (and its myriad of extensions) as well as the calendar data format iCalendar. When we started working on the calendar product it was clear to us that we want to help shape and improve online calendaring. That's why we chose to join CalConnect.

CalConnect is The Calendaring and Scheduling Consortium, and the capital The is well deserved. Consisting of both large-scale productivity suite providers as well as boutique calendar app developers and startups, it's the place to learn what the industry is working on in online calendaring. Any upcoming calendar RFC standards is first discussed (at length!) within the group both on the mailing lists and at meetings. The moment a proposal starts to become an internet standard, there's a good chance that it is already supported by a bunch of CalConnect members.

The conferences tend to happen three times a year (in the US, Europe and Asia) and they're a great opportunity to catch up on latest developments; or just to geek out at dinner why an one-day event actually lasts for 50 hours; or how to deal with non-Gregorian calendars.

Since we joined CalConnect, we've been regular visitors to the conferences. Over two years ago Neil started heading the TC-API technical committee in which we proposed doing with calendaring something similar as we are doing for email with JMAP. When Ken joined our Cyrus IMAP FastMail team last summer, we not only had recruited the developer who single-handedly had added CalDAV support to our open-source server backends but also enlarged our team with one of the most prolific persons at CalConnect. Ken chairs the TC-CALENDAR technical committee which covers most of the current developments in CalDAV.

The TC-API working group

In TC-API, we are working on defining a new interchange data model and format for calendar events.

While iCalendar is supported by most calendaring systems, it comes with a couple of gotchas that regularly trip up implementors. Most notoriously, embedded timezones as well as a variety of time formats and types, prove to be a challenge for developers new to calendaring.

In addition, just as CalDAV is a chatty protocol, the iCalendar format isn't particularly well suited for mobile and web client communication. As a consequence, our own calendaring clients speak a custom-built interchange format to save network resource and battery, and many other calendar service providers came up with their own formats.

What most of them have in common is that they are based on JSON, restrict themselves to timezones defined by IANA and introduce proprietary features around scheduling as well as rich media support. To address this, Neil first proposed a new calendar format as part of JMAP, and when he got too busy with the email parts I took over to evolve the calendaring sections into what's now proposed as JSCalendar at IETF.

Being a draft, it's not the end of the story, but just as we keep the Cyrus IMAP server on being able to serve the proposed format, a couple of other vendors are building their products around it. If you are interested in helping out or status updates, contact us at draft-jenkins-jscalendar@ietf.org or raise a JSCalendar labeled issue at Github.

The TC-CALENDAR working group

Meanwhile, Ken and the TC-CALENDAR work group are working on a whole load of improvements to CalDAV, be that managed attachments together with Cyrus Daboo from Apple and Arnaud Quillaud from Oracle, enhanced CalDAV synchronisation, or extensions to calendar event VALARMs.

Together with Peter from Ribose, Ken is currently working on VPATCH, which should help address the aforementioned network resource issues that we also try to tackle in JSCalendar. Generally, you'll be hard pressed to find a document in the standards draft pipeline where Ken hasn't contributed in one way or another. There's a joke at the conferences that while the group keeps on discussing a standards proposal, Ken already has hacked it into the Cyrus IMAP server. As with any good joke there's a grain of truth to it.

What's next?

That's the current state of FastMail at CalConnect. But while we are busy with our current work, we already prepare for the next steps. Neil and Bron are pushing to make JMAP happen not only for email exchange but also a generic RPC protocol, and we can't wait to shuffle around JSCalendar-formatted payloads over it.

Also, there's that other thing besides email and calendars: contacts! The current vCard interchange format comes with its own set of issues that not only us but many other providers are working around. There is, of course, now a group at CalConnect drafting a new way to exchange contacts and addressbooks and we are keen to join the discussion! As always, all of our efforts ultimately show up as one or the other feature in our FastMail calendar and contact products to make our customers happy.


Apocalyptic populism

Published 4 Dec 2017 by in New Humanist Articles and Posts.

From Donald Trump to Brexit, the establishment is under fierce attack. But political populism is not simply a challenge to the neoliberal order – it is a product of it.

Monday MediaWiki

Published 3 Dec 2017 by Sam Wilson in Sam's notebook.

Monday morning, hot and humid, and the rain’s been falling all night (nearly 5 mm!). It’s one of those lovely days when you can look out to the ocean and stand on the limestone and feel this place.

I’m reading through the position statements that have been accepted for the Wikimedia Developer Summit in January. It’s great to read other people’s ideas in this form. I think there’s not really enough of that, in MediaWiki development: it’s hard to get an idea of other people’s ‘big picture’ thoughts of what the future should hold.


Wikidata Map November 2017

Published 3 Dec 2017 by addshore in Addshore.

It has only been 4 months since my last Wikidata map update post, but the difference on the map in these 4 months is much greater than the diff shown in my last post covering 9 months. The whole map is covered with pink (additions to the map). The main areas include Norway, Germany, Malaysia, South Korea, Vietnam and New Zealand to name just a few.

As with previous posts varying sizes of the images generated can be found on Wikimedia Commons along with the diff image:

July to November in numbers

In the last 4 months (roughly speaking):

All of these numbers were roughly pulled out of graphs by eye. The graphs can be seen below:


Modifying this if-statement that's in LocalSettings.php

Published 3 Dec 2017 by Hausa Dictionary in Newest questions tagged mediawiki - Stack Overflow.

I wanted to redirect all the redlinks to a search page and found this: https://stackoverflow.com/a/29331597/8409515 which works great.

The thing is however, I would like for the redlink that's on the search page to no longer point to itself since it's on that page. How can I accomplish that?


Wikibase docker images

Published 3 Dec 2017 by addshore in Addshore.

This is a belated post about the Wikibase docker images that I recently created for the Wikidata 5th birthday. You can find the various images on docker hub and matching Dockerfiles on github. These images combined allow you to quickly create docker containers for Wikibase backed by MySQL and with a SPARQL query service running alongside updating live from the Wikibase install.

A setup was demoed at the first Wikidatacon event in Berlin on the 29th of October 2017 and can be seen at roughly 41:10 in the demo of presents video which can be seen below.

The images

The ‘wikibase‘ image is based on the new official mediawiki image hosted on the docker store. The only current version, which is also the version demoed is for MediaWiki 1.29. This image contains MediaWiki running on PHP 7.1 served by apache. Right now the image does some sneaky auto installation of the MediaWiki database tables which might be disappearing in the future to make the image more generic.

The ‘wdqs‘ image is based on the official openjdk image hosted on the docker store. This image also only has one version, the current latest version of the Wikidata Query Service which is downloaded from maven. This image can be used to run the blazegraph service as well as run an updater that reads from the recent changes feed of a wikibase install and adds the new data to blazegraph.

The ‘wdqs-frontend‘ image hosts the pretty UI for the query service served by nginx. This includes auto completion and pretty visualizations. There is currently an issue which means the image will always serve examples for Wikidata which will likely not work on your custom install.

The ‘wdqs-proxy‘ image hosts an nginx proxy that restricts external access to the wdqs service meaning it is READONLY and also has a time limit for queries (not currently configurable). This is very important as if the wdqs image is exposed directly to the world then people can also write to your blazegraph store.

You’ll also need to have some mysql server setup for wikibase to use, you can use the default mysql or mariadb images for this, this is also covered in the example below.

All of the wdqs images should probably be renamed as they are not specific to Wikidata (which is where the wd comes from), but right now the underlying repos and packages have the wd prefix and not a wb prefix (for Wikibase) so we will stick to them.

Compose example

The below example configures volumes for all locations with data that should / could persist. Wikibase is exposed on port 8181 with the query service UI on 8282 and the queryservice itself (behind the proxy) on 8989.

Each service has a network alias defined (that probably isn’t needed in most setups), but while running on WMCS it was required to get around some bad name resolving.

version: '3'

services:
  wikibase:
    image: wikibase/wikibase
    restart: always
    links:
      - mysql
    ports:
     - "8181:80"
    volumes:
      - mediawiki-images-data:/var/www/html/images
    depends_on:
    - mysql
    networks:
      default:
        aliases:
         - wikibase.svc
  mysql:
    image: mariadb
    restart: always
    volumes:
      - mediawiki-mysql-data:/var/lib/mysql
    environment:
      MYSQL_DATABASE: 'my_wiki'
      MYSQL_USER: 'wikiuser'
      MYSQL_PASSWORD: 'sqlpass'
      MYSQL_RANDOM_ROOT_PASSWORD: 'yes'
    networks:
      default:
        aliases:
         - mysql.svc
  wdqs-frontend:
    image: wikibase/wdqs-frontend
    restart: always
    ports:
     - "8282:80"
    depends_on:
    - wdqs-proxy
    networks:
      default:
        aliases:
         - wdqs-frontend.svc
  wdqs:
    image: wikibase/wdqs
    restart: always
    build:
      context: ./wdqs/0.2.5
      dockerfile: Dockerfile
    volumes:
      - query-service-data:/wdqs/data
    command: /runBlazegraph.sh
    networks:
      default:
        aliases:
         - wdqs.svc
  wdqs-proxy:
    image: wikibase/wdqs-proxy
    restart: always
    environment:
      - PROXY_PASS_HOST=wdqs.svc:9999
    ports:
     - "8989:80"
    depends_on:
    - wdqs
    networks:
      default:
        aliases:
         - wdqs-proxy.svc
  wdqs-updater:
    image: wikibase/wdqs
    restart: always
    command: /runUpdate.sh
    depends_on:
    - wdqs
    - wikibase
    networks:
      default:
        aliases:
         - wdqs-updater.svc

volumes:
  mediawiki-mysql-data:
  mediawiki-images-data:
  query-service-data:

Questions

I’ll vaugly keep this section up to date with Qs & As, but if you don’t find you answer here, leave a comment, send an email or file a phabricator ticket.

Can I use these images in production?

I wouldn’t really recommend running any of these in ‘production’ yet as they are new and not well tested. Various things such as upgrade for the query service and upgrades for mediawiki / wikibase are also not yet documented very well.

Can I import data into these images from an existing wikibase / wikidata? (T180216)

In theory, although this is not documented. You’ll have to import everything using an XML dump of the existing mediawiki install, the configuration will also have to match on both installs. When importing using an XML dump the query service will not be updated automatically, and you will likely have to read the manual.

Where was the script that you ran in the demo video?

There is a copy in the github repo called setup.sh, but I can’t guarantee it works in all situations! It was specifically made for a WMCS debian jessie VM.

Links


Team Profile: Jamie

Published 3 Dec 2017 by David Gurvich in FastMail Blog.

This is the third post in the 2017 Fastmail Advent Calendar. The previous post was about Ways to use calendars at FastMail. The next post is about What we are up to at CalConnect.


For our first FastMail team profile of this year’s Advent Calendar we’re speaking with one of newer team members, Jamie, who is our Senior Graphic Designer.

Jamie is responsible for helping to keep all of our branding and marketing collateral in check and is an integral part of our team.

Jamie Photo

Name: Jamie Toyama

Role: Senior Graphic Designer

Time working for FastMail: One year

Background

I have been a Designer for over 19 years now. After finishing uni I got my first job at a small company in Melbourne called IS Advertising. There I worked closely with the Studio Manager, Douglas – he taught me a lot of valuable things about our industry, things I still use now.

I then left Australia and went to the UK for five years, working for Transport for London, and on my return to Melbourne I started freelancing, meeting fellow FastMailer, David in my travels at another software company, MYOB. Little did I know at the time that we would be working with each other again in the near future!

I have been designing campaigns for digital and print, but mainly digital over the last 10 years now. I love designing for brands and I’m constantly driven by creative problem solving.

Describe a typical day at FastMail

If it is anything that involves ‘design’ then I’m involved, but it’s not always about digital design – I could be working on designing logos, t-shirts, product icons, product UI and even cycling gear (many of the team cycle to work and we’re looking to get our own kit soon).

Topicbox has been the priority for the majority of this year – boosting the customer experience through website and product UI design improvements.

Most recently I have been looking at re-designing the Topicbox blog, improving the page structure and tweaking the content on offer. I started by researching and collecting relevant information on blogs (things like purpose, design possibilities and audience) and then took these learnings and ideas, and started to apply these to my designs. I’m now at a stage where I am about to present these concepts back to the team for further development. (Stay tuned for a new look soon!)

Favourite thing worked on this year?

The Topicbox logo was something that I am proud to have designed this year – it evolved into a well thought-out logo that has helped build a strong foundation for the Topicbox brand.

It wasn’t an easy road but working with passionate people who push for success, helps you to elevate yourself and your thinking to get a successful end result.

We also moved into a new office this year and I was heavily involved with that process too – from liaising with interior designers to managing trades on site. Considering the enormity of this project, a few of the team may have thought that I didn’t enjoy the office fit-out – and others may think it could have broken me (if anyone has done a home renovation they can relate to this!) – but truth be told it was a thrilling rollacoaster ride, full of great learnings and insights. Now it’s time to write ‘The Dummies Guide to Office Fitouts’ !

Are you an Android, iOS or other user?

That’s a crazy question for a designer, iOS of course!

Favourite piece of tech/hardware or software?

My iPhone – not only do I use it to help test designs at work, I have been using it to help track the kilometres when I cycle into work.

This year, I'm also taking action by raising funds for prostate cancer, testicular cancer, mental health and suicide prevention through the Movember Foundation.

For my Movember Move Challenge, I set a target of 480kms to cycle during the month of November, which I have achieved. (You can donate to Jamie’s cause and help him make a difference too).

What’s your favourite FastMail feature?

This isn’t a product feature as such, but I would have to say the latest business cards; they are looking great. They have a Matte Laminate Anti-Scuff with a Spot Gloss UV applied to the FM logo (in non-designer speak this means they look very impressive!)

What are you listening to/reading/watching these days?

I have loved watching Stranger Things 2 and seeing all the 80s references, which was awesome. Now I have started watching Ozark.

I also really like listening to local artists, but I can’t go past listening to Melodrama by Lorde.

What do you like to do when you’re not designing?

Besides drinking coffee, sport has been a big part of my life, although I’m doing more watching than playing these days.

After I damaged my meniscus, I pretty much had to give up all team sports. I thought my prowess as amateur athlete was all but over, but I found my mojo again on my bicycle!

I also ‘try’ to grow the perfect lawn. However, most of my time is spent with my two beautiful children Joshua (5yrs) and Adele (1yr).

The reality is I’m never really ‘not’ designing – if it’s not for work, it’s for friends and family.

What’s the best thing about working at FastMail?

I have enjoyed the diversity in projects that are thrown my way. We are a small team so you get to work on things that you probably wouldn’t normally be involved with in a much larger company (managing a office fit-out and move) and for this reason it keeps me engaged and excited to get my hands dirty.

What might (someone) be surprised to know about you?

When I was 15, I went on an exchange program with American Field Service (AFS) to Fukuoka, Japan for 12 months. I lived with two host families and went to a Japanese high school. It was tough as a 15 year old and being away from home, and not having any family support that you would normally take for granted. I learnt a lot about myself, and thought it was one of the best things I did growing up.


You can also find Jamie on @jamietoyama


Can't log in or disable extension mediawiki

Published 2 Dec 2017 by Alex Swak in Newest questions tagged mediawiki - Stack Overflow.

Yesterday I activated the Inputbox extension. Today I can't log in into my account or edit/create pages:

"There seems to be a problem with your login session; this action has been canceled as a precaution against session hijacking. Go back to the previous page, reload that page and then try again."

I tried deactivating the extension, by deleting wfLoadExtension( 'InputBox' ); from localsetting.php, but that gave me this error:

"[WiL8XrmwKzsAACaYCMUAAABG] 2017-12-02 19:17:51: Fatal exception of type "Wikimedia\Rdbms\DBQueryError""

What am I supposed to do?

I'm using:

MediaWiki 1.29.1

PHP 5.6.32

MySQL 5.7

This is my website. I don't use caching.

I tried:


can't make Wikibase work

Published 2 Dec 2017 by theorist in Newest questions tagged mediawiki - Stack Overflow.

i installed the Wikibase client extension without repository as per these instructions, but now the whole wiki isn't working. every page says there are problems on the web site because it can't access the database.

is something missing? how to make the client work with wikidata.org?


Ways to use calendars

Published 2 Dec 2017 by Nicola Nye in FastMail Blog.

This is the second post in the 2017 Fastmail Advent Calendar. The previous post was a Retrospective looking back at our 2017 predictions from the end of 2016. The next post is an Interview with Jamie our Senior Graphic Designer.


Of all our products, only FastMail has any built-in calendaring (... so far ...), and there's a number of different ways you can put one to use.

sample calendar showing events

Types of calendars

Personal calendar

When people think of using calendars, this is what they usually mean.

A calendar for yourself, to record your appointments. You can invite others to your events, which causes a special kind of email to be sent to your guests. When your guest responds, it sends a special rsvp email in return which will update your calendar.

You can set reminders on your events via email or a popup so you don't forget what's coming up.

More information about personal calendars.

Subscribed calendar

A read-only calendar you subscribe to from somewhere else for information like public holidays, or whether someone else is free or busy. This calendar comes in the form of a URL which points to a .ics file. You can't update this calendar, just see its events.

You can't invite people to these events.

Setting alarms may or may not work, depending on how the external calendar provider manages their calendar events.

More information about subscribed calendars.

Synchronised calendar

Use a synchronised calendar when you have a calendar hosted on another service (such as iCloud, or Google) but you want to be able to see and edit the events from your FastMail interface.

Use this if you have a different calendar provider and you're not ready to migrate your calendar to FastMail yet. Use this if you have a shared calendar with someone else and want to be able to make edits in each other's calendars (not just sending invites), for notifying about family events and the like.

Any changes made to these events in FastMail or on the other server will be mirrored in both locations.

You can make new events in these calendars from FastMail and invite others to your events providing you have an identity and alias that matches your remote calendar host.

Reminders can be set from FastMail or the remote server and both instances will send out the alert.

More information about synchronised calendars.

Shared calendar

For multi-user FastMail accounts, you can share a calendar with some or all of your colleagues, choosing whether they can see the events, see and edit events, or whether they can do anything you can do, on that calendar.

FastMail shared calendars currently only support "secretary" mode. This means any event made by someone else in your shared calendar is owned by you: invitations are emailed out using your address, and rsvps are returned to your email account.

Each user can set their own alarm for any event, however, whether they are invited or not.

More information about shared calendars.

Accessing your calendar from your mobile device

All calendars are visible from your mobile device, either directly through the FastMail apps, or synchronised to a app on your phone, tablet or desktop. We have instructions on how to configure your calendar client.

Calendar settings

General

Event defaults

Invitations

Summary

If you'd like to find out more, read our help on calendaring, which has tips on troubleshooting, date formatting and keyboard shortcuts to help you navigate your calendar faster. (Like using "t" to go to today. )

We can also help you out if you're looking to import your calendar or migrate your calendar from another provider.


MediaWiki: Warning: session_write_close(): Failed to write session data (user)

Published 1 Dec 2017 by Ben Parsons in Newest questions tagged mediawiki - Stack Overflow.

I have the following error message from MediaWiki, since upgrading to 1.29.2. PHP version: 5.6.32 (cgi-fcgi)

Warning: session_write_close(): Failed to write session data (user). Please verify that the current setting of session.save_path is correct (/tmp) in [PATH TO]/includes/session/SessionManager.php on line 469

Looking at SessionManager.php line 469, I see this line:

session_write_close();

in this function:

public function shutdown() {
    if ( $this->allSessionBackends ) {
        $this->logger->debug( 'Saving all sessions on shutdown' );
        if ( session_id() !== '' ) {
            // @codeCoverageIgnoreStart
            session_write_close();
        }
        // @codeCoverageIgnoreEnd
        foreach ( $this->allSessionBackends as $backend ) {
            $backend->shutdown();
        }
    }
}

How can I resolve this?


Standalone Mediawiki parsers to render content

Published 30 Nov 2017 by Neel Vasa in Newest questions tagged mediawiki - Stack Overflow.

Is there a standalone mediawiki parsers that takes content in mediawiki syntax, and returns HTML, without depending on a running Mediawiki instance. The use case is to call the standalone parser by passing in Mediawiki syntaxed content, and rendering the returned HTML output on a browser.

My content use templates, a few extensions (such as Toggle display), magic words and parser functions ( such as #if, #titleparts) quite extensively. Hence, a basic mediawiki parser won't satisfy my use case.

I have already briefly gone through the alternate parsers list, but the only one marked as 'full support' (Parsoid) seems to require a Mediawiki instance running to work.

It will also be really helpful for me, if you can share information if you have tried doing something like this, and what issues you ran into.


What shall I do for International Digital Preservation Day?

Published 30 Nov 2017 by Jenny Mitcham in Digital Archiving at the University of York.

I have been thinking about this question for a few months now and have only recently come up with a solution.

I wanted to do something big on International Digital Preservation Day. Unfortunately other priorities have limited the amount of time available and I am doing something a bit more low key. To take a positive from a negative I would like to suggest that as with digital preservation more generally, it is better to just do something rather than wait for the perfect solution to come along!

I am sometimes aware that I spend a lot of time in my own echo chamber - for example talking on Twitter and through this blog to other folks who also work in digital preservation. Though this is undoubtedly a useful two-way conversation, for International Digital Preservation Day I wanted to target some new audiences.

So instead of blogging here (yes I know I am blogging here too) I have blogged on the Borthwick Institute for Archives blog.

The audience for the Borthwick blog is a bit different to my usual readership. It is more likely to be read by users of our services at the Borthwick Institute and those who donate or deposit with us, perhaps also by staff working in other archives in the UK and beyond. Perfect for what I had planned.

In response to the tagline of International Digital Preservation Day ‘Bits Decay: Do Something Today’ I wanted to encourage as many people as possible to ‘Do Something’. This shouldn’t be just limited to us digital preservation folks, but to anyone anywhere who uses a computer to create or manage data.

This is why I decided to focus on Personal Digital Archiving. The blog post is called “Save your digital stuff!” (credit to the DPC Technology Watch Report on Personal Digital Archiving for this inspiring title - it was noted that at a briefing day hosted by the Digital Preservation Coalition (DPC) in April 2015, one of the speakers suggested that the term ‘personal digital archiving’ be replaced by the more urgent exhortation, ‘Save your digital stuff!’).

The blog post aimed to highlight the fragility of digital resources and then give a few tips on how to protect them. Nothing too complicated or technical, but hopefully just enough to raise awareness and perhaps encourage engagement. Not wishing to replicate all the great work that has already been done on Personal Digital Archiving, by the Library of Congress, the Paradigm project and others I decided to focus on just a few simple pieces of advice and then link out to other resources.

At the end of the post I encourage people to share information about any actions they have taken to protect their own digital legacies (of course using the #IDPD17 hashtag). If I inspire just one person to take action I'll consider it a win!

I'm also doing a 'Digital Preservation Takeover' of the Borthwick twitter account @UoYBorthwick. I lined up a series of 'fascinating facts' about the digital archives we hold here at the Borthwick and tweeted them over the course of the day.



OK - admittedly they won't be fascinating to everyone, but if nothing else it helps us to move further away from the notion that an archive is where you go to look at very old documents!

...and of course I now have a whole year to plan for International Digital Preservation Day 2018 so perhaps I'll be able to do something bigger and better?! I'm certainly feeling inspired by the range of activities going on across the globe today.


A Father’s daughter

Published 30 Nov 2017 by in New Humanist Articles and Posts.

In "Priestdaddy", the poet Patricia Lockwood has written a hilarious and revealing account of growing up with a Roman Catholic priest for a dad.

Untitled

Published 29 Nov 2017 by Sam Wilson in Sam's notebook.

I can finally post comments on blogs! For years I’ve been blocked by Akismet, which is used all over the place, but after a friendly email exchange with their helpdesk I seem to once again be a dog on the ‘net.


How to specify a docker container database to a app running in docker with docker-compose.yml?

Published 29 Nov 2017 by Cristian in Newest questions tagged mediawiki - Stack Overflow.

Context

There is this docker-compose.yml:

version: '3'
services:
  mediawiki:
    image: mediawiki
    restart: always
    ports:
      - 8080:80
    links:
      - database
    volumes:
      - /var/www/html/images
      # After initial setup, download LocalSettings.php to the same directory as
      # this yaml and uncomment the following line and use compose to restart
      # the mediawiki service
      # - ./LocalSettings.php:/var/www/html/LocalSettings.php
  database:
    image: mariadb
    restart: always
    environment:
      # @see https://phabricator.wikimedia.org/source/mediawiki/browse/master/includes/DefaultSettings.php
      MYSQL_DATABASE: my_wiki
      MYSQL_USER: wikiuser
      MYSQL_PASSWORD: example
      MYSQL_RANDOM_ROOT_PASSWORD: yes

When I run docker ps I get:

89db8794029a  mysql:latest "docker-entrypoint..."  ...  0.0.0.0:8083->3306/tcp   some-mysql

This is a mysql docker container running.

Question

How can I modify the docker-compose.yml in a way that the database to point to the mysql docker container (89db8794029a) already running?


Preserving Google Drive: What about Google Sheets?

Published 29 Nov 2017 by Jenny Mitcham in Digital Archiving at the University of York.

There was lots of interest in a blog post earlier this year about preserving Google Docs.

Often the issues we grapple with in the field of digital preservation are not what you'd call 'solved problems' and that is what makes them so interesting. I always like to hear how others are approaching these same challenges so it is great to see so many comments on the blog itself and via Twitter.

This time I'm turning my focus to the related issue of Google Sheets. This is the native spreadsheet application for Google Drive.

Why?

Again, this is an application that is widely used at the University of York in a variety of different contexts, including for academic research data. We need to think about how we might preserve data created in Google Sheets for the longer term.


How hard can it be?

Quite hard actually - see my earlier post!


Exporting from Google Drive

For Google Sheets I followed a similar methodology to Google Docs. Taking a couple of sample spreadsheets and downloading them in the formats that Google provides, then examining these exported versions to assess how well specific features of the spreadsheet were retained.

I used the File...Download as... menu in Google Sheets to test out the available export formats

The two spreadsheets I worked with were as follows:


Here is a summary of my findings:

Microsoft Excel - xlsx

I had high hopes for the xlsx export option - however, on opening the exported xlsx version of my flexisheet I was immediately faced with an error message telling me that the file contained unreadable content and asking whether I wanted to recover the contents.

This doesn't look encouraging...

Clicking 'Yes' on this dialogue box then allows the sheet to open and another message appears telling you what has been repaired. In this case it tells me that a formula has been removed.


Excel can open the file if it removes the formula

This is not ideal if the formula is considered to be worthy of preservation.

So clearly we already know that this isn't going to be a perfect copy of the Google sheet.

This version of my flexisheet looks pretty messed up. The dates and values look OK, but none of the calculated values are there - they are all replaced with "#VALUE".

The colours on the original flexisheet are important as they flag up problems and issues with the data entered. These however are not fully retained - for example, weekends are largely (but not consistently) marked as red and in the original file they are green (because it is assumed that I am not actually meant to be working weekends).

The XLSX export does however give a better representation of the more simple menu choices Google sheet. The data is accurate, and comments are present in a partial way. Unfortunately though, replies to comments are not displayed and the comments are not associated with a date or time.


Open Document Format - ods

I tried opening the ODS version of the flexisheet in LibreOffice on a Macbook. There were no error messages (which was nice) but the sheet was a bit of a mess. There were similar issues to those that I encountered in the Excel export though it wasn't identical. The colours were certainly applied differently, neither entirely accurate to the original.

If I actually tried to use the sheet to enter more data in, the formula do not work - they do not calculate anything, though it does appear that the formula itself appears to be retained. Any values that are calculated on the original sheet are not present.

Comments are retained (and replies to comments) but no date or time appears to be associated with them (note that the data may be there but just not displaying in LibreOffice).

I also tried opening the ODS file in Microsoft Office. On opening it the same error message was displayed to the one originally encountered in the XLSX version described above and this was followed by notification that “Excel completed file level validation and repair. Some parts of this workbook may have been repaired or discarded.” Unlike the XLSX file there didn't appear to be any additional information available about exactly what had been repaired or discarded - this didn't exactly fill me with confidence!

PDF document - pdf

When downloading a spreadsheet as a PDF you are presented with a few choices - for example:
  • Should the export include all sheets, just the current sheet or current selection (note that current sheet is the default response)
  • Should the export include the document title?
  • Should the export include sheet names?
To make the export as thorough as possible I chose to export all sheets and include document title and sheet names.

As you might expect this was a good representation of the values on the spreadsheet - a digital print if you like - but all functionality and interactivity was lost. In order to re-use the data, it would need to be copied and pasted or re-typed back into a spreadsheet application.

Note that comments within the sheet were not retained and also there was no option to export sheets that were hidden.

Web page - html

This gave an accurate representation of the values on the spreadsheet, but, similar to the PDF version, not in a way that really encourages reuse. Formula were not retained and the resulting copy is just a static snapshot.

Interestingly, the comments in the menu choices example weren't retained. This surprised me because when using the html export option for Google documents one of the noted benefits was that comments were retained. Seems to be a lack of consistency here.

Another thing that surprised me about this version of the flexisheet was that it included hidden sheets (I hadn't until this point realised that there were hidden sheets!). I later discovered that the XLSX and ODS also retained the hidden sheets ...but they were (of course) hidden so I didn't immediately notice them! 

Tab delimited and comma separated values - tsv and csv

It is made clear on export that only the current sheet is exported so if using this as an export strategy you would need to ensure you exported each individual sheet one by one.

The tab delimited export of the flexisheet surprised me. In order to look at the data properly I tried importing it into MS Excel. It came up with a circular reference warning which surprised me - were some of the dynamic properties of the sheets being somehow retained (all be it in a way that was broken)?

tab_delim_error_when_import_to_Excel.png
A circular reference warning when opening the tab delimited file in Microsoft Excel

Both of these formats did a reasonable job of capturing the simple menu choices data (though note that the comments were not retained) but neither did an acceptable job of representing the complex data within the flexisheet (given that the more complex elements such as formulas and colours were not retained).

What about the metadata?

I won't go into detail again about the other features of a Google Sheet that won't be saved with these export options - for example information about who created it and when and the complete revision history that is available through Google Drive - this is covered in a previous post. Given my findings when I interviewed a researcher here at the University of York about their use of Google Sheets, the inability of the export options to capture the version history will be seen as problematic for some use cases.

What is the best export format for Google Sheets?

The short answer is 'it depends'.

The export options available all have pros and cons and as ever, the most suitable one will very much depend on the nature of the original file and the properties that you consider to be most worthy of preservation.


  • If for example the inclusion of comments is an essential requirement, XLSX or ODS will be the only formats that retain them (with varying degrees of success). 
  • If you just want a static snapshot of the data in its final form, PDF will do a good job (you must specify that all sheets are saved), but note that if you want to include hidden sheets, HTML may be a better option. 
  • If the data is required in a usable form (including a record of the formula used) you will need to try XLSX or ODS but note that calculated values present in the original sheet may be missing. Similar but not identical results were noted with XLSX and ODS so it would be worth trying them both and seeing if either is suitable for the data in question.


It should be possible to export an acceptable version of the data for a simple Google Sheet but for a complex dataset it will be difficult to find an export option that adequately retains all features.

Exporting Google Sheets seems even more problematic and variable than Google Documents and for a sheet as complex as my flexisheet it appears that there is no suitable option that retains the functionality of the sheet as well as the content.

So, here's hoping that native Google Drive files appear on the list of World's Endangered Digital Species...due to be released on International Digital Preservation Day! We will have to wait until tomorrow to find out...



A disclaimer: I carried out the best part of this work about 6 months ago but have only just got around to publishing it. Since I originally carried out the exports and noted my findings, things may have changed!


The Why, How, and What of Metrics and Observability

Published 28 Nov 2017 by Sneha Inguva in The DigitalOcean Blog.

The Why, How, and What of Metrics and Observability

If you are reading this post, you are probably aware that DigitalOcean is an infrastructure company. (And if you weren't aware of that: surprise!) As a cloud platform provider, we have a varied tech stack ranging from the many services that power the cloud, from hardware to virtualization software, and even container orchestration tooling. But with many moving pieces comes a vital need: that of observability. Observability is often defined as consisting of three “pillars”: logging, metrics, and tracing. In this post, however, I will focus on metrics, namely how we at DigitalOcean have leveraged Prometheus and Alertmanager for whitebox monitoring both our services and our container clusters. Whether you are a software engineer writing services or an operations engineer maintaining a container orchestration solution, these monitoring and alerting examples should prove useful.

More to Monitoring Than “Knowing When Things Break”

Before delving into the what and how of monitoring, however, I’ll first focus on the why: specifically, why you—an engineer—should monitor your applications or infrastructure stack. The immediate answer might be “to know when things break”, and naturally, alerting upon downtime or other metrics issues is a vital reason to set up monitoring. In fact, most applications at DigitalOcean have either a whitebox monitoring or blackbox monitoring setup and some basic downtime alerts (sometimes, we have both). But beyond just alerting, proper monitoring allows one to identify long-term trends, analyze performance issues, and set up visualizations. For example, every containerized application deployed on our Kubernetes clusters at DigitalOcean has an automatic dashboard generated with basic stats such as memory and CPU usage as well as traffic. Our clusters themselves also have dashboards. These are essential in visualizing general trends and for debugging during on call rotations:

The Why, How, and What of Metrics and ObservabilityFig. 1: doccserver application dashboard

I also mentioned that we currently use Prometheus and Alertmanager for both monitoring and alerting: our hows of monitoring. The journey towards using this tooling is also quite interesting and one that I’ve had the opportunity to bear witness to. DigitalOcean originally leveraged Chef and a hodgepodge of scripts or CI/CD tools for provisioning, updates, and deployments. Nagios was commonly utilized (and still is) to perform blackbox monitoring on hosts. This, however, was not enough. While blackbox monitoring is one piece of the puzzle, it doesn’t provide sufficient introspection into applications or truly help with debugging a variety of issues. As a result, engineers went through a long process of trying out several other solutions, which were always lacking in some element. Some were difficult to set up and operationally maintain while others didn’t provide useful visualizations or UX. Furthermore, actually leveraging the data for analysis proved difficult...but along came Prometheus and Alertmanager.

Four base metrics types are at the core of Prometheus— counters, gauges, summaries, and histograms—which can be combined alongside a powerful query language with various functions for analysis and debugging. Prometheus proved to be extremely easy to install, either via a simple Go binary or Docker container. Furthermore, the fact that the time-series data was multidimensional proved immensely helpful, as our adoption of Prometheus coincided with our move towards containerization; having the ability to label data made analysis and aggregation all the more simple when using tools such as Kubernetes. As a result, Prometheus swiftly became our go-to tool for whitebox monitoring alongside Alertmanager for alerting.

Metrics: Leveraging The Four Golden Signals

We’ve established the why and how of monitoring; let us now look into the what. Two categories of metrics we leverage at DigitalOcean are the four golden signals (of Google SRE-book fame) for our request-based microservices and USE metrics, which we heavily utilize to monitor our container orchestration clusters such as Kubernetes or Mesos.

The four golden signals consist of latency, saturation, traffic, and error. Latency refers to the duration of requests; one important thing to note is that it is vital to consider the distribution of request duration, especially the longtail or 99th percentile. After all, what affects only a few of your users can often be an indication of impending saturation—another golden signal. Saturation itself is defined as the “fullness” of a system; it indicates how long something is waiting to be serviced. As a result, we often carefully track an alert based upon 95th or 99th percentile request latencies:

The Why, How, and What of Metrics and ObservabilityFig. 2: Kube-dns 95th percentile request latency in ms

Note that generating these metrics, and subsequently configuring alerts, is fairly easy using the Prometheus histogram metric type:

histogram_quantile(0.95, sum(rate(kubedns_probe_kubedns_latency_ms_bucket[1m])) BY (le, kubernetes_pod_name)) > 1000  

As histogram metrics are collected as counts in various buckets, we merely need to specify which percentile measurement we would like to calculate and leverage the histogram_quantile function. It is also possible to calculate latency quantiles using the summary metric by specifying the exact quantile we wish to track. While this may reduce quantile estimation error, it is a more expensive client side calculation.

Now onto the final two golden signals! Traffic refers to the amount of demand placed on your system. In a request-based system, this is often measured in HTTP requests per second. In order to measure traffic using Prometheus, we often instrument our applications to expose a request count metric that is monotonically increasing and then calculate the per-second rate. Why? Looking at a constantly increasing counts alone would not provide any indication of abnormal traffic increases or decreases:

The Why, How, and What of Metrics and ObservabilityFig. 3: Note that request count constantly increases.

However, looking at rate of requests gives us a meaningful idea of traffic; we can then subsequently set up alerts for per-second rate exceeding above or dropping below a particular value:

The Why, How, and What of Metrics and ObservabilityFig. 4: By applying rate() over a 15 minute window, we get a meaningful idea of traffic.

Error rates of failing or succeeding requests—the final golden signal—are calculated similarly. Applications are instrumented to expose error count with labels indicating status code or other information; per-second rate is then calculated to provide a meaningful metric:

The Why, How, and What of Metrics and ObservabilityFig. 5: Per-second error rate for cassandra

Note that the per-second rate suddenly spiked but decreased. However, as this was lower than the alert configured below (at 50 errors per-second), no alert was triggered:

sum(rate(cassandra_query_latency_count{docc_app=~"timeseries-ingest-index[0-9]+",result="error"}[5m])) > 50  

Using these four golden signals, we can gain a basic idea of both the health of our request-based services as well as an idea of our end user’s experiences. We can both visualize these metrics on a dashboard and also set up alerts for any abnormal metrics.

In addition to monitoring our services, we also monitor our infrastructure. As a former member of the team that maintained our container clusters, I noticed enormous benefits when leveraging the USE method: utilization, saturation, and errors. Coined by Brendan Gregg, the USE method allows one to solve “80% of server issues with 5% of the effort”.

Let us take a look at how we leveraged these metrics to monitor our Kubernetes clusters. Each cluster consists of multiple worker nodes known as kubelets. Monitoring overall CPU, memory utilization, and reservation on these nodes has proven essential:

The Why, How, and What of Metrics and ObservabilityFig. 6: Kubernetes worker node CPU utilization for a single worker node

Note that CPU utilization is measured in CPU seconds, a constantly increasing counter. As a result, calculating the per-second rate of CPU seconds gives us the number of CPU cores being utilized at a given time. We can further tweak this calculation to craft an alert to detect greater than a given percentage of CPU utilization on a worker node. (If interested in how exactly to do this, be sure to check out this blog post.)

Another key component of our Kubernetes architecture is our HAProxy ingress load balancers; these are components in our network stack and direct much of outside traffic to appropriate services within the cluster. As you can imagine, load balancer connection utilization and saturation are therefore vital to measure:

The Why, How, and What of Metrics and ObservabilityFig. 7: Ingress load balancer connection utilization as a %

If all connections are utilized to an ingress load balancer, no additional connections can be made until some are dropped. As a result, we also elect to alert for greater than 50% load balancer connection utilization:

max(haproxy_frontend_current_sessions / haproxy_frontend_limit_sessions) BY (kubernetes_node_name, frontend) * 100 > 50  

Conclusion

And there you have it— a small slice of our monitoring and alerting setup at DigitalOcean! With these few examples, hopefully you can see how and why we have elected to leverage the four golden signals and the USE method to monitor and alert on our microservices and container clusters. Doing so has allowed both ops teams and service owners to maintain observability of running applications and key infrastructure. We have also leveraged this data to track long-term trends and look into improving performance. Hopefully you can do the same!

Sources:

[1] Logs and Metrics, Cindy Sridharan. https://medium.com/@copyconstruct/logs-and-metrics-6d34d3026e38

[2] Chapter 6: Monitoring Distributed Systems. Site Reliability Engineering at Google. https://landing.google.com/sre/book/index.html

[3] USE Method. Brendan Gregg. http://www.brendangregg.com/usemethod.html

[4] An Appropriate Use of Metrics. Martin Fowler. https://martinfowler.com/articles/useOfMetrics.html

[5] Monitoring and Observability. Cindy Sridharan. https://medium.com/@copyconstruct/monitoring-and-observability-8417d1952e1c


Sneha Inguva is an enthusiastic software engineer working on building developer tooling at DigitalOcean. Previously, Sneha worked at a number of startups. Her experience across an eclectic range of verticals, from education to 3D printing to casinos, has given her a unique perspective on building and deploying software. When she isn't bashing away on a project or reading about the latest emerging technology, Sneha is busy molding the minds of young STEM enthusiasts in local NYC schools.


SSH timeout when running importDump.php on a Bitnami Mediawiki instance on Google Cloud server

Published 28 Nov 2017 by AESH in Newest questions tagged mediawiki - Stack Overflow.

The import seems to start out ok, showing the contents of the mediawiki in the terminal window. At some point (often around the same point in the content), the SSH terminal freezes up. Opera Browser returns an 'out of memory' message.

2 questions -

  1. Can I just start the import and ask the server to run it regardless of the status of the terminal window on my machine (or the internet connection)?

  2. If no to #1, what can I modify to prevent the terminal from timing out?


Server failures in october and november 2017

Published 28 Nov 2017 by Pierrick Le Gall in The Piwigo.com Blog.

The huge downtime at OVH that occurred on November 9th 2017 was quite like an earthquake for the European web. Of course Piwigo.com was impacted. But before that, we lived the server failure of October 7th and another one on October 14th. Let’s describe and explain what happened.

Photo by Johannes Plenio on Unsplash

Photo by Johannes Plenio on Unsplash

A) October 7th, the first server failure

On October 7th 2017, during saturday evening, our “reverse-proxy” server, the one through which all web traffic goes, crashed. OVH, our technical host, has identified a problem on the motherboard and replaced it. Web traffic was routed to the spare server during the short downtime. A server failure without real gravity, without loss of data, but which announced the start of a painful series of technical problems.

B) October 14th, a more serious server failure

A week later, on October 14th, the very same “reverse-proxy” server saw his load go into such high levels it was unable to deliver web pages… Web traffic is again switched to the spare server, in read-only mode for accounts hosted on this server. About 10 hours of investigation later, we were still not able to understand the origin of the problem. We have to decide to switch the spare server to write mode. This decision was difficult to take because it meant losing data produced between the last backup (1am) and the switch to spare server (about 8am). In other words, for the accounts hosted on this server, the photos added during the night simply “disappeared” from their Piwigo.

This is the first time in the history of Piwigo.com that we switch a spare server to write mode. Unfortunately, another problem has happened, related to the first one. To explain this problem, it is necessary to understand how Piwigo.com servers infrastructure works.

On the Piwigo.com infrastructure, servers work in pairs: a main server and its spare server. There are currently 4 pairs in production. The main server takes care of the “live operations”, while the spare server is synchronized with its main server every night and receives the web traffic in read-only during downtimes.

In the usual way, spare servers only allow read operations, ie you can visit the albums or view the photos, but not enter the administration or add photos.

One of the server pairs is what we call the “reverse-proxy”: all the web traffic of *.piwigo.com goes through this server and according to the piwigo concerned, the traffic goes to one or the other pair. Normally the reverse-proxy is configured to point to the main servers, not spare servers.

When a problem occurs on one of the main servers, we switch the traffic to its spare server. If the reverse-proxy server is concerned, we switch the IP address Fail-Over (IPFO): a mechanism that we manage on our OVH administration pannel. For other servers, we change the reverse-proxy configuration.

That’s enough for infrastructure details… let’s go back to October 14th: so we switched the IPFO to use the spare reverse-proxy server. Unfortunately, we met 2 problems in cascade:

  1. the spare reverse-proxy server, for one of the server pairs, pointed to the spare server
  2. this very spare server was configured in write mode instead of read-only

Why such an unexpected configuration?

Because we sometimes use the spare infrastructure to do real-life tests. In this case, these were IPV6 tests.

What impact for users?

During the many hours when the web traffic went through the spare reverse-proxy server, accounts hosted on the faulty server returned to the state of the previous night where photos added during night & morning had apparently disappeared but they were able to keep adding photos. This state did not trigger any specific alert : the situation seemed “normal” for the users concerned and for our monitor system. When the problem was detected, we changed the reverse proxy configuration to point back to the main server. Consequence: all the photos added during the downtime apparently disappeared.

What actions have been taken after October 14th?

1) Checks on reverse-proxy configuration

A new script was pushed on production. It checks very often that reverse-proxy is configured to send web traffic on main servers only.

2) Checks on write Vs read-only mode

Another script was pushed to production. This one checks main servers are configured in write mode and spare severs are in read-only mode.

3) Isolate third-party web applications

The “non-vital” web applications, on which we have less expertise, were switched to a third-party server dedicated to this use: 2 WordPress blogs, wiki, forum and piwik (analytics for visits). Indeed, one of the possibilities for the server failure, is that an application entered the 4th dimension or was under attack. Moving these applications into an “isolated” server helps to limit the impact of any future issue.

4) New backup system

The decision to switch a spare server to write mode, ie turn it into a main server, is a hard to take. Indeed it means giving up any hope to return to the main server. This decision is difficult because it involves accepting a loss of data.

To make this decision simpler, two measures have been taken: first to define a time threshold after which we apply the switch. In our case, if the failure lasts more than 2 hours, we will switch. Then backups must be more frequent than once a day: if the backups were only 1 or 2 hours old, the decision would have been much easier!

In addition to the daily backup, we have added a new “rolling backups” system: every 15 minutes, the script analyzes each Piwigo on specific criteria (new/modified/deleted photos/users/albums/groups…). If anything has changed since the last backup, the script backs up the Piwigo (files + database) with a synchronization on the spare server.

C) What about the giant downtime on OVH network, on October 9th and 10th ?

Being hosted at OVH, especially in the datacenter of Strasbourg (France, Europe), the downtime has greatly impacted our own infrastructure. First because our main reverse-proxy server is in Strasbourg. The datacenter failure put Piwigo.com completely out of order during the morning of November 9th (Central Europe time). Then because we could not switch the IP Fail Over. Or rather, OVH allowed us to do it, but instead of requiring ~60 seconds, it took ~10 hours! Hours when the accounts hosted on the reverse-proxy server were in read-only.

Unlike the October 14th situation, we could not make the decision to switch the spare server in write mode because an IPFO switch request was in progress, and we had no idea how long it would take OVH to apply the action.

The Piwigo.com infrastructure has returned to its normal state on November 10th at 14:46, Paris time (France).

OVH has just provided compensation for these failures. We were waiting for it to publish this blog post. The compensation is not much, compared to the actual damage, but we will fully transfer this compensation to our customers. After very high level calculations, 3 days of time credits were added to each account. It’s a small commercial gesture but we think we have to reverse it to you as a symbol!

We are sorry for these inconveniences. As you read in this blog post, we’ve improved our methods to mitigate risk in the future and reduce the impact of an irreversible server failure.


VisualEditor for MediaWiki, how to install?

Published 28 Nov 2017 by Salahdin in Newest questions tagged mediawiki - Stack Overflow.

How to install visualEditor for MediaWiki, I'm using wamp server and I do not want to use parsoid server.


What the Stoics did for us

Published 28 Nov 2017 by in New Humanist Articles and Posts.

Could a 2,300-year-old Graeco-Roman philosophy be the key to a happy 21st-century life?

How to Make a MediaWiki Password

Published 28 Nov 2017 by Eliter in Newest questions tagged mediawiki - Stack Overflow.

How do I take a plaintext password and insert an entry in the MySQL database that MediaWiki uses, that would comply to MediaWiki's password rules?

I found this article on how they store their passwords, but I can't seem to figure out how I can create my own passwords in PHP to then upload them to the database using their "pbkdf2" hashing stuff.

I have my own PHP user registration script, which authenticates my users into multiple applications and I want to copy everyone's hash and paste it into all their accounts, so all their passwords would be the same on all the applications.


Attempting to "importDump.php" in mediawiki on bitnami instance on Google Cloud, getting error "failed to open stream..."

Published 27 Nov 2017 by AESH in Newest questions tagged mediawiki - Stack Overflow.

Followed the 2x options for commands at: https://www.mediawiki.org/wiki/Manual:Importing_XML_dumps#Using_importDump.php,_if_you_have_shell_access

In both instances, getting a 'failed to open stream' error.

I have chmod 777 for folder and file. (Thinking it may be permission error)

I have moved the file to import right into the maintenance folder (some postings suggesting PHP needs to be able to find the file, so think putting it there will help)...

on command:

"root@bitnami-mediawiki-___:/opt/bitnami/apps/mediawiki/htdocs/maintenance# php importDump.php --co nf ../LocalSettings.php /FILENAME.xml"

I see error:

"PHP Warning: fopen(/FILENAME.xml): failed to open stream: No such file or directory in /opt/bitnam i/apps/mediawiki/htdocs/maintenance/importDump.php on line 267 PHP Warning: feof() expects parameter 1 to be resource, boolean given in /opt/bitnami/apps/mediawiki/ htdocs/includes/import/ImportStreamSource.php on line 41 PHP Warning: fread() expects parameter 1 to be resource, boolean given in /opt/bitnami/apps/mediawiki /htdocs/includes/import/ImportStreamSource.php on line 48 PHP Warning: feof() expects parameter 1 to be resource, boolean given in /opt/bitnami/apps/mediawiki/ htdocs/includes/import/ImportStreamSource.php on line 41 PHP Warning: XMLReader::read(): uploadsource://9115d0bbe5ae974e1fe2d411e035aeaa:1: parser error : Ext ra content at the end of the document in /opt/bitnami/apps/mediawiki/htdocs/includes/import/WikiImport er.php on line 551 PHP Warning: XMLReader::read(): in /opt/bitnami/apps/mediawiki/htdocs/includes/import/WikiImporter.p hp on line 551 PHP Warning: XMLReader::read(): ^ in /opt/bitnami/apps/mediawiki/htdocs/includes/import/WikiImporter. php on line 551 Set $wgShowExceptionDetails = true; in LocalSettings.php to show detailed debugging information."

I'm not clear what the error means and how to grant access to the file... Newb to BASH and servers. Any help would be greatly appreciated!


PhpFlickr 4.1.0

Published 27 Nov 2017 by Sam Wilson in Sam's notebook.

I’ve just tagged version 4.1.0 of my new fork of the PhpFlickr package. It introduces oauth support, and hopefully improves the documentation of the user authentication process. This release deprecates some old behaviour, but I hope it doesn’t break any. Bug reports are welcome!

There are some parts that are still not converted to the new request flow, but I’ll get to them next.


Mediawiki extension on Docker

Published 27 Nov 2017 by Troyan in Newest questions tagged mediawiki - Stack Overflow.

I search the best practice to add extension on my container MediaWiki. Create a dockerfile with the download of the extension or add the extension file on the volume data? Thanks for the answer :D


Untitled

Published 26 Nov 2017 by Sam Wilson in Sam's notebook.

A good weekend: double-glazed window installed, and a solar panel screwed to a roof. On different houses!


My best books of 2017

Published 25 Nov 2017 by Tom Wilson in thomas m wilson.

My best books of 2017… Deeply insightful works from Yale University Press on geopolitics today, a history of consumerism in the West, a wave making read on the Anthropocene as a new era, a powerful explanation of the nature/nurture question for human identity by a very funny Californian, and a charming meander through the English […]

My best books of 2017

Published 25 Nov 2017 by Tom Wilson in thomas m wilson.

My best books of 2017… Deeply insightful works from Yale University Press on geopolitics today, a history of consumerism in the West, a wave making read on the Anthropocene as a new era, a powerful explanation of the nature/nurture question for human identity by a very funny Californian, and a charming meander through the English […]

CFB Folder 1 done

Published 25 Nov 2017 by Sam Wilson in Sam's notebook.

The first folder of the C.F. Barker Archives’ material is done: finished scanning and initial entry into ArchivesWiki. This is my attempt to use MediaWiki as a digital archive platform for physical records (and digitally-created ones, although they don’t feature as much in the physical folders). It’s reasonably satisfactory so far, although there’s lots that’s a bit frustrating. I’m attempting to document what I’m doing (in a Wikibook), and there’s more to figure out.

There are a few key parts to it; two stand out as a bit weird. Firstly, the structure of access control is that completely separate wikis are created for each group of access required. This can make it tricky linking things together, but makes for much clearer separation of privacy, and almost removes the possibility of things being inadvertently made public when they shouldn’t be. The second is that the File namespace is not used at all for file descriptions. Files are considered more like ‘attachments’ and their metadata is contained on main-namespace pages, where the files are displayed. This means that files are not considered to be archival items (except of course when they are; i.e. digitally-created ones!), but just representations of them, and for example multiple file types or differently cropped photos can all appear on a single item’s record. The basic idea is to have a single page that encapsulates the entire item (it doesn’t matter if the item is just a single photograph, and the system also works when the ‘item’ is an aggregate item of, for example, a whole box of photos being accessioned into ArchivesWiki).


Tabulate updated to not require REST API plugin

Published 24 Nov 2017 by Sam Wilson in Sam's notebook.

I’ve removed Tabulate’s dependency on the REST API plugin, because that’s now been moved in to core WordPress. (Actually, that happened rather a while ago, but I’m slack and haven’t been paying enough attention to Tabulate this year; other things going on!)

I hope to get back to adding file-field support to Tabulate sometime soon. That’d be a useful addition for me. Also, the whole situation with Reports needs sorting out: better documentation, easier to use, support for embedding in posts and as sidebar wigets, that sort of thing.


A prophecy of violence

Published 24 Nov 2017 by in New Humanist Articles and Posts.

The persecution of Myanmar’s Rohingya Muslims has shocked many in the West – because it shows that Buddhists are also capable of sectarian conflict.

Untitled

Published 24 Nov 2017 by Sam Wilson in Sam's notebook.

Time now to knock off for the week.


Untitled

Published 24 Nov 2017 by Sam Wilson in Sam's notebook.

I’ve got oauth working now for PhpFlickr. Just needs some tidying up around the edges and I’ll tag a new release.


How to add something to edit summary when using Pywikibot?

Published 24 Nov 2017 by Fructibus in Newest questions tagged mediawiki - Stack Overflow.

I am using Pywikibot at this moment, to add a lot of files in a category, and the Edit summary looks like this: "Bot: Adding category Taken with Sony DSC-WX350)"

I would like to add the text "using Pywikibot in automatic mode"

How to do that?


How do I promote a user automatically in Mediawiki and create a log of those promotions?

Published 24 Nov 2017 by sau226 in Newest questions tagged mediawiki - Webmasters Stack Exchange.

I control a Mediawiki site. Here you can see users being automatically updated and added into the extended confirmed user group.

If I have a group called "util" where I just want to add relevant code to enable autopromotion with a log entry like that, make an edit and get promoted automatically into the group before removing the bit of code would it be possible? Also what code would I have to use to gain a level of access like that?


Is it possible to find broken link anchors in MediaWiki?

Published 24 Nov 2017 by Lyubomyr Shaydariv in Newest questions tagged mediawiki - Webmasters Stack Exchange.

Probably a simple question answered million times, but I can't find an answer. MediaWiki can track missing pages and report those with Special:WantedPages. I'm not sure if it's possible, but can MediaWiki report broken anchors? Say, I have the Foo page that refers the Bar page like this: [[Bar#Name]]. Let's assume the Bar page lacks this section therefore the Name section does not exist there, but Special:WantedPages won't report this link as broken because the Bar page exists. Is there any way to find all broken anchors? Thanks in advance!


Where should the favicon.ico file go in Raspberry pi -> Mediawiki?

Published 23 Nov 2017 by user3573562 in Newest questions tagged mediawiki - Stack Overflow.

I am running MediaWiki on a Raspberry Pi 3, but I just cannot get a favicon.ico to work.

Has anyone successfully done this?

I tried putting my favicon.ico in the same folder with the Mediawiki index.php file, assuming it is the root of the domain.

I tried putting the favicon.ico file in ~/Pictures/ and edit the $wgFavicon setting in LocalSettings.php: $wgFavicon = "/home/pi/Pictures/favicon.ico";

Some people have suggested making sure that root was the file owner, and to make it executable.

Nothing works. Any other ideas would be appreciated.


Using Wiki API in Angularjs

Published 23 Nov 2017 by Eswar Chukaluri in Newest questions tagged mediawiki - Stack Overflow.

I am completely new to Angular (so a lot of code below is taken from a different app). I am trying to include a Wiki app on in Angular. I am not getting errors in the console and the app doesn't show the results as it should. I am suspecting that I am doing something wrong in wiki.component.ts with what I am doing at onSearchResultsComplete but since I am a complete novice, I can't be sure. So, I am posting everything that I think is relevant. Would really appreciate if you can guide me.
My wiki.service.ts

import { Injectable } from '@angular/core';
import { Http, Response} from '@angular/http';


@Injectable()
export class WikiService {

  constructor(public http: Http) { }

    getJobs(title: string) {
      let url = 'https://cors-anywhere.herokuapp.com/'+'http://en.wikipedia.org/w/api.php?action=query&prop=extracts&format=json&exintro=&titles='+ title;
      // let url = 'dist/php/CareerService.php?title=' + title;
      return this.http.get(url)
          .map((res: Response) => res.json());
    }

} 

My wiki-result.component.ts

import { Component, OnInit, Input } from '@angular/core';

@Component({
  selector: 'app-wiki-result',
  templateUrl: './wiki-result.component.html'
})
export class WikiResultComponent implements OnInit {


  @Input() title:string;
  @Input() extract:string;


  constructor() { }

  ngOnInit() {
  }

}

My wiki-result.component.html

<h3>{{title}}</h3>
    <h4 style="margin-top: 5px; margin-bottom: 10px; font-style: italic;">{{extract}}</h4>

My wiki.component.html

<div style="background-color: rgba(255, 255, 255, 0.85098); border: 1px solid rgb(219, 219, 219); padding: 25px; margin-bottom: 25px;">
  <h2>Wikipedia</h2>
  <!-- <p>We help you to search for jobs in Vaxjo. Enter a job title, and click on the search button to find your dream job.</p>
   --><hr>
  <input #wikiInput placeholder="keyword" style="padding: 10px 5px; width: 100%;">
  <button (click)="onSearch(wikiInput)">Search</button>
  <div #target style="margin-top: 20px;">


  </div>
</div>

My wiki.component.ts is,

import {Component, OnInit, ViewChild, ViewContainerRef, ComponentFactoryResolver} from '@angular/core';

import { WikiResultComponent } from './wiki-result.component';

import { WikiService } from './wiki.service';

@Component({
  selector: 'app-wiki',
  templateUrl: './wiki.component.html'
})
export class WikiComponent implements OnInit {

  @ViewChild('target', {read: ViewContainerRef}) target:any;

  cmpArray:Array<any> = []
  cmpRefArray:Array<any> = []
  noResult:boolean = false;

  constructor(public wikiService:WikiService, 
              public resolver:ComponentFactoryResolver) { 

  }

  ngOnInit() {
  }

  onSearch(input) {
    let title = input.value || input.placeholder;

    this.wikiService.getResults(title)
          .subscribe((response)=>{      
             this.onSearchResultsComplete(response);
          },
          error => console.error(error)
        )
    }

    onSearchResultsComplete(response) {

      for (let i of this.cmpRefArray) {
        i.destroy();
      }

      if(response.results == undefined){
        this.noResult = true;
      }
      else { 
        this.noResult = false;

            var t = response.results[0].title;
            var e = response.results[0].extract;


            this.createComponent(t,e);

      }  
    }

    createComponent(t:string,d:string)
    {
        let newComp = this.resolver.resolveComponentFactory(WikiResultComponent);
        let cmpRef = this.target.createComponent(newComp);

        let cmp              = cmpRef.instance;
            cmp.title        = t;
            cmp.extract      = e;

        this.cmpRefArray.push(cmpRef);
        this.cmpArray.push(cmp);
    }
}

How to fix a corrupted MediaWiki page?

Published 22 Nov 2017 by waanders in Newest questions tagged mediawiki - Stack Overflow.

One of our templates seems corrupted. We can't edit the page, even not show (yes, it exists) and so on. All this results in a time-out. Now we try to import a back-up version but also Special:Import gives a time-out.

How do I fix this?


How to add a category to a list of files using Pywikibot and category.py?

Published 22 Nov 2017 by Fructibus in Newest questions tagged mediawiki - Stack Overflow.

I have a text file (in.txt) containing a list of Wikimedia Commons files and I want to add the category [[Category:Fruits]] to all of them, how can I do that?

I can't find the option to specify the name of the category, so the script is prompting me for the name of the category. But that's quite annoying, I need to specify the name of the category to add in the script that runs Pywikibot.

For the moment, my script looks like this:

python pwb.py category add -file:in.txt

mediawiki content version management

Published 21 Nov 2017 by Bo Wang in Newest questions tagged mediawiki - Stack Overflow.

We have mediawiki as our wiki site. Is it possible to manage many pages as a release with version, e.g. 1.1. And after we update some pages to new, I also can simple select 1.1 to read 1.1 content only?

Page level history looks work for one page with timestamp, but it is not easy to manage many pages.


Hacktoberfest 2017 at a Glance

Published 20 Nov 2017 by Stephanie Morillo in The DigitalOcean Blog.

Hacktoberfest 2017 at a Glance

Hacktoberfest 2017, which ended this past October 31, was epic by any measure. It saw the greatest level of participation of any Hacktoberfest ever; in 2016, 10,227 participants completed the challenge, and this year, 31,901 successfully submitted all 4 pull requests. Companies like SendGrid also ran their own Hacktoberfest-inspired contents, and we saw contributions to 64,166 projects.

Contributors

As in previous years, developers in the open source community shared some of their Hacktoberfest stories:


Not to be outdone, first-time contributors also discussed what it was like opening their first pull requests both via blog posts and on Twitter:

Events

As Hacktoberfest-themed Meetups entered its second year, we saw an uptick in participation from around the world. This year, events were organized in 119 locations around the world, with first-ever Hacktoberfest Meetups held in 27 countries including Australia, Bosnia & Herzegovina, Brazil, Colombia, Indonesia, Israel, Malaysia, Mexico, Nepal, Nigeria, Pakistan, Peru, Philippines, Romania, Russia, Sri Lanka, and Taiwan!

Hacktoberfest 2017 at a Glance

Looking Ahead

Thank you to our friends at GitHub for embarking on yet another Hacktoberfest with us. And thank you to the countless folks on social media and open source communities around the world who encouraged even more people to participate!

If you finished your 4 pull requests, you should’ve received an email about your T-shirt. Still have questions? Reach out to us at hacktoberfest@digitalocean.com.

What Hacktoberfest stories do you want to share? Tell us in the comments below.

See you all in 2018; happy hacking!


Who's afraid of diversity in education?

Published 20 Nov 2017 by in New Humanist Articles and Posts.

There appears to be an old-fashioned backlash against any challenge to a lack of diversity in academia.

How to remove MediaWiki Help Link from page

Published 20 Nov 2017 by Salahdin in Newest questions tagged mediawiki - Stack Overflow.

Help icon in corner of special page

Please, how can I remove this link from my MediaWiki Special Page?


Can I incrementally add text/content to a previous section on a mediawiki page?

Published 16 Nov 2017 by EmKar in Newest questions tagged mediawiki - Stack Overflow.

I'm trying to develop an alternative (icon-based) Table Of Contents feature in a template, as a replacement to the default TOC for wiki pages that would get updated based on the contents of the page.

The page content will be generated by other templates. The problem is that I have no way for the templates to communicate/add text back at the top of the TOC (similar to how the == headers == do for the built-in TOC).

Is there a way to add content to a previous section of the wiki page similar to what can be done using javascript with HTML span sections (like span.appendChild(document.createTextNode( "some new content" )); for example)?

Alternatively, since many of the page contents can be assumed to belong to an enumerated set of pre-defined sections - can my TOC initially print all headings as "hidden", and then later sections toggle them on ("show") them later as and when they are used, but without user-intervention (i.e. without relying on users to manually click on show/hide).


SLAM POETRY DAD

Published 16 Nov 2017 by timbaker in Tim Baker.

I recently made my public slam poetry debut at the Men of Letters event in Brisbane in the salubrious surrounds of one of Australia’s most iconic live music venues, the Zoo, in Fortitude Valley. Men of Letters is a spin off of the hugely successful...

Creating a dynamic checklist in a form in MediaWiki

Published 16 Nov 2017 by RtotheP in Newest questions tagged mediawiki - Stack Overflow.

I currently use the following formtable to create a checklist for selecting properties out of a checklist in a form in MediaWiki:

{| class="formtable" 
| <div class="pf-auto-select-childs">{{{field|Can be applied for|input 
type=tree|mandatory|list|height=600|structure=
*A
*B
*C}}}</div>
|}

Now I tried to make the structure dynamic by adding a combination of #arraymap and #ask:

{| class="formtable" 
| <div class="pf-auto-select-childs">{{{field|Can be applied for|input 
type=tree|mandatory|list|height=600|structure=
{{#arraymap:{{#ask:[[Category:Blocks]]}}|,|x|*[[x]]|\n}}
}}}</div>
|}

This does the trick visually, but it does not add the chosen property to the page created with the form.

How can I make this checklist dynamic such that it automatically adds the chosen property from the list to the page?


Capitalism: the winter 2017 New Humanist

Published 16 Nov 2017 by in New Humanist Articles and Posts.

Out now - how the system shapes the way we think.

Your Fun and Informative Guide to Consuming “Oil, Love & Oxygen”

Published 16 Nov 2017 by Dave Robertson in Dave Robertson.

The Paradox of Choice says that too many options can demotivate people, so here’s a short guide to the options for getting your ears on “Oil, Love & Oxygen”.

Gigs
For the personal touch you can always get CDs at our shows. They come with a lush booklet of lyrics and credits, and the enchanting artwork of Frans Bisschops. Discounted digital download codes are also available for Bandcamp…

Bandcamp
Bandcamp is a one-stop online shop for your album consumption needs. You can get a digital download in your choice of format, including high-resolution formats for “audiophiles and nerds”. If you go for one of the “lossless” formats such as ALAC, then you are getting the highest sound quality possible (higher than CD). Downloads also come with a digital version of the aforementioned booklet.

Bandcamp is also where you can place a mail-order for the CD if you want to get physical. Another feature of Bandcamp is fans can pay more than the minimum price if they want to support the artist.

iTunes
The iTunes store is a great simple option for those in the Apple ecosystem, because it goes straight into the library on your device(s). You also get the same digital booklet as Bandcamp, and the audio for this release has been specially “Mastered for iTunes”. This means the sound quality is a bit better than most digital downloads (though not as good as the lossless formats available on Bandcamp).

This album was mastered by Ian Shepherd who has been a vigorous campaigner against the “loudness wars”. Did you ever notice that much, maybe most, music after the early 90s started to sound flat and bland? Well one reason was the use of “brick wall limiters” to increase average loudness, but this came at the expense of dynamics. I’m glad my release is not a causality of this pointless war, but I digress.

Other Digital Download Services
The album is on many other services, so just search for “Oil, Love & Oxygen” on your preferred platform. These services don’t provide you the booklet though and are not quite as high sound quality as the above two.

Streaming (Spotify etc.)
The album is also available across all the major streaming platforms. While streaming is certainly convenient, it is typically low sound quality and pays tiny royalties to artists.

Vinyl and Tape
Interestingly these formats are seeing a bit of a resurgence around the world. I would argue this is not because they are inherently better than digital, but because digital is so often abused (e.g. the aforementioned loudness wars and the use of “lossy” formats like mp3). If you seriously want vinyl or tape though, let me know and I will consider getting old school!

Share the Love
If you like the album, then please consider telling friends, rating or reviewing the album on iTunes etc., liking our page on the book of face…

Short enough?

Share


Your Fun and Informative Guide to Consuming “Oil, Love & Oxygen”

Published 16 Nov 2017 by Dave Robertson in Dave Robertson.

The Paradox of Choice says that too many options can demotivate people, so here’s a short guide to the options for getting your ears on “Oil, Love & Oxygen”.

Gigs
For the personal touch you can always get CDs at our shows. They come with a lush booklet of lyrics and credits, and the enchanting artwork of Frans Bisschops. Discounted digital download codes are also available for Bandcamp…

Bandcamp
Bandcamp is a one-stop online shop for your album consumption needs. You can get a digital download in your choice of format, including high-resolution formats for “audiophiles and nerds”. If you go for one of the “lossless” formats such as ALAC, then you are getting the highest sound quality possible (higher than CD). Downloads also come with a digital version of the aforementioned booklet.

Bandcamp is also where you can place a mail-order for the CD if you want to get physical. Another feature of Bandcamp is fans can pay more than the minimum price if they want to support the artist.

iTunes
The iTunes store is a great simple option for those in the Apple ecosystem, because it goes straight into the library on your device(s). You also get the same digital booklet as Bandcamp, and the audio for this release has been specially “Mastered for iTunes”. This means the sound quality is a bit better than most digital downloads (though not as good as the lossless formats available on Bandcamp).

This album was mastered by Ian Shepherd who has been a vigorous campaigner against the “loudness wars”. Did you ever notice that much, maybe most, music after the early 90s started to sound flat and bland? Well one reason was the use of “brick wall limiters” to increase average loudness, but this came at the expense of dynamics. I’m glad my release is not a causality of this pointless war, but I digress.

Other Digital Download Services
The album is on many other services, so just search for “Oil, Love & Oxygen” on your preferred platform. These services don’t provide you the booklet though and are not quite as high sound quality as the above two.

Streaming (Spotify etc.)
The album is also available across all the major streaming platforms. While streaming is certainly convenient, it is typically low sound quality and pays tiny royalties to artists.

Vinyl and Tape
Interestingly these formats are seeing a bit of a resurgence around the world. I would argue this is not because they are inherently better than digital, but because digital is so often abused (e.g. the aforementioned loudness wars and the use of “lossy” formats like mp3). If you seriously want vinyl or tape though, let me know and I will consider getting old school!

Share the Love
If you like the album, then please consider telling friends, rating or reviewing the album on iTunes etc., liking our page on the book of face…

Short enough?

Share


Parsing Wikipedia .bz2 dump

Published 15 Nov 2017 by Anas Al-Masri in Newest questions tagged mediawiki - Stack Overflow.

I have downloaded the compressed Wikipedia corpus in .bz2 form. Is there a way to search through that bulk of data for a keyword without having to use an API? I need to upload the whole database on a server and search through it for data mining purposes.


In MediaWiki: Check if user account is blocked within page

Published 14 Nov 2017 by Regis May in Newest questions tagged mediawiki - Stack Overflow.

In MediaWiki I'd like to create a page that links to a user account. As we use blocking in order to deactivate user accounts (as there is no other way provided to achieve this) it would be nice to present a note next to a link if the account has been blocked. Therefor I require some way to distinguish between different states of a user account.

Therefor my question: Is there a parser function or some other kind of instrument to detect if a user account is blocked? I can't find one. Or is there some other way how to achieve this functionality within a page?

Note: Deleting a user page is not an option. There is an {{#ifexist}} parser function which could check for existence of a page but I do not want to delete the user pages and do not want to confuse admins. The "user-is-blocked" flag is the only thing available that we could perform some kind of branching on. Do you have any ideas how to accomplish this?


An attack on free speech in Malta

Published 14 Nov 2017 by in New Humanist Articles and Posts.

The murder of anti-corruption journalist Daphne Caruana Galizia has worrying implications.

Lessons from Organizing Company-wide Hackathons

Published 13 Nov 2017 by Jackie De La Rosa in The DigitalOcean Blog.

Lessons from Organizing Company-wide Hackathons

When I joined DigitalOcean in October 2016 as Chief of Staff to our CTO, one of my first tasks was to organize a company-wide Hackathon. I was intrigued as to why our leadership was so excited about a Hackathon; in my mind, it was an event for employees to work together and collaborate similar to a DO basketball game or dinner outing. Why were they so interested in the company-wide participation?

I did not get my answer until after the Hackathon. I realized not only were there positive outcomes around collaboration and team-building, but the actual outputs were extremely innovative, intelligent, and productive. As our VP of Engineering Greg Warden put it, “I love seeing the passion and ideas from people I don’t always interact with. The best ideas come from the most curious of places.” Enough said!

Over the past year, we’ve successfully hosted three Hackathons. From our initial Hackathon of 120 participants, we’ve successfully increased our participation to over 40% of our 380+ employee company. In order to achieve this growth, we constantly iterated each event. We’ve added guest technical judges to help evaluate the more complicated projects, extended the pitching time from 120 seconds to 5 minutes, and brought in fun extras like a barista, late night pizza, and happy hour. To foster a healthy sense of friendly competition, we give awards in categories like "Best Business Solution", "Best Technology", "Most Cross-functional Team", and "People’s Choice". All of our Hackathon participants get fun swag, too!

One of the best parts of the Hackathon is seeing projects that end up becoming a part of our internal toolset. Some notable past winners include:

Lessons from Organizing Company-wide Hackathons

Lessons from Organizing Company-wide Hackathons

Many projects, including those mentioned above, focus not only on customer-facing tools, but they also optimize the employee experience. It was a unique balance, whereas some people focused on how they could improve their everyday life, while others looked to cloud computing.

Here are five tips to consider for hosting a successful Hackathon.

  1. Leadership support. Our CEO Ben Uretsky constantly sends company-wide emails weeks before the actual Hackathons to express his excitement and support. He encourages all employees to participate, and encourages managers to support team participation by setting aside time specifically for the Hackathon.

  2. Cross-functional participation. The teams that have created truly innovative products and addressed real, practical problems are cross-functional teams with members from all across the company. We put procedures in place to assure that both technical and non-technical people participate. Since Hackathons are technical in nature, it should be of the utmost importance to encourage non-technical colleagues to join teams. Their skills are desperately needed. Because our workforce is 50% remote, this also added another complexity. How can we make it feel like one, centralized Hackathon all over the world? We encourage remote teams to all meet at one central location, even if it’s not our office in NYC. If that was not practical, we encouraged teams to hack via Google Hangout. Pitches are also live streamed so everyone can participate (and even pitch) to our remote employees.

  3. A work embargo. There are actually two parts to this point: 1) Make sure you get approval for a company-wide work embargo and get manager buy-in, and 2) Confirm you communicate that bottoms-up and top-down. When people are busy with work, teams are discouraged, lop-sided and usually s-t-r-e-s-s-e-d. To truly have focused teams, you need to cut out the noise (the noise being your routine daily responsibilities).

  4. No limitations on project ideas. If your Hackathon’s true mission is to support innovation within your company, the best rule is to have no (or very few) rules around projects. Allow your employees to work on whatever they want, whatever problem they have witnessed, or pain point that needs addressing. The only rules we had were to disallow leadership from participating (because they were judges and resources), and for projects to be somehow relevant to DO.

  5. Fun! Some Hackathons can still feel like work, and at some points, can be even more challenging (I only have 48 hours to make a super impressive product? Ugh!). But adding fun events, food vendors, and breaks makes it more enjoyable. At DO, we hired a barista to make delicious artisanal coffees, and we had pizza parties, and donut breaks.

Company-wide Hackathons are a great way to promote company cohesion in a fun and relaxed atmosphere. They give people room to be creative and to work with people across teams that they would normally not interface with in their day-to-day life. By giving people the space to experiment, tinker, and get to know each other, you give them permission to create something of value to themselves and even the company at large.

Lessons from Organizing Company-wide Hackathons

A scene from our most recent Hackathon this past October.

Jackie De La Rosa is a Senior Program Manager at DigitalOcean and focuses on overall strategy, business operations and executive initiatives. She was one of the first employees at DigitalOcean’s second office in Cambridge, Massachusetts.


Fighting for abortion rights in Northern Ireland

Published 13 Nov 2017 by in New Humanist Articles and Posts.

50 years after the 1967 Abortion Act was passed, women in Northern Ireland still has some of the most restrictive abortion laws in Europe.

Restrict access to a MediaWiki page to specific users

Published 13 Nov 2017 by 3bdalla in Newest questions tagged mediawiki - Stack Overflow.

I have some pages which I want to restrict access to specific users, i.e, I want user A and user B only to view this page. How can this be done? Do I need additional extensions or it can be done through LocalSettings.php for example?


MediaWiki how to apply CSS to pages

Published 13 Nov 2017 by AspiringDev in Newest questions tagged mediawiki - Stack Overflow.

I've conducted some research, but I'm new to the MediaWiki platform. The formatting is fairly similar to that of web production. How can I apply CSS to a page on the Wiki?

For example. With the MediaWiki formatting I've created:

<div style="border: 2px solid #aaa; height: 50px; border-radius: 5px; width: 20%; padding: 5px 10px; display:inline-block;">

If I want to style this div tag, not on the same page, how can I achieve this? Is it the same principles as following CSS with web production.

I've found this page, but I'm not sure if this is the correct place to do it at:

MediaWiki:Common.css

Thanks in advance.


1.5.1

Published 11 Nov 2017 by mblaney in Tags from simplepie.

1.5.1 (#559)

* Revert sanitisation type change for author and category.

* Check if the Sanitize class has been changed and update the registry.
Also preference links in the headers over links in the body to
comply with WebSub specification.

* Improvements to mf2 feed parsing.

* Switch from regex to xpath for microformats discovery.

* 1.5.1 release.

* Remove PHP 5.3 from testing.


Regex to insert content just before mediawiki category links

Published 10 Nov 2017 by Ken Hilton in Newest questions tagged mediawiki - Stack Overflow.

I'm trying to use a regex to insert a template into a page, before all category or interwiki links, but after everything else. So if you have a page that ends like this:

== See Also ==
* [[Link one]]
* [[more link]]
* [//external.link external link]

[[Category:Pages]]
[[de:Spezial Page]]

I want the template {{template}} to be inserted before the [[Category:Pages]] but after everything else.

Note: The last section is not necessarily a list - it could be

== References ==
<references/>

or even something else. The point is to insert it before all category/interwiki links at the end, but after the last section.

What regex can help me do this? I've tried (?P<pre>[\s\S]+)(?P<cats>(?:\[\[[^]]:[^]]\]\])*$) as the matching expression with \g<pre>{{template}}\n\g<cats> as the substituting expression, but that simply inserts it at the very end.

Regex flavor: Python 2


How to replace mediawiki in url as my wiki name

Published 9 Nov 2017 by Sam Daniel in Newest questions tagged mediawiki - Stack Overflow.

I am using mediawiki. My url is

http://localhost:1028/mediawiki/index.php/Main_Page

I need to replace mediawiki in the url say samswiki. So the url should be like

http://localhost:1028/samswiki/index.php/Main_Page

Any suggestions on how to do this? Thanks

Things that I have tried.

1.) Renamed the folder mediawiki to samswiki. 2.) Changed $wgScriptPath to samswiki in LocalSettings.php But both did not work


Stuck on importing bitnami mediawiki in Google Cloud environment

Published 9 Nov 2017 by AESH in Newest questions tagged mediawiki - Stack Overflow.

Trying to import a mediawiki.xml into Bitnami mediawiki instance hosted on Google Cloud.

Per Google instructions, moved the xml into the Google Cloud Storage bucket using the browser.

In that browser, I can see the file under what looks like a path - Buckets/2017-11-09-mediawiki_import/mediawiki.xml

Within the Google Cloud SSH client, I am in the Bitnami instance at the path:

me@bitnami-mediawiki-dm-8cff:/opt/bitnami/apps/mediawiki/htdocs/maintenance$ running:

php importDump.php < file name here...

I am not clear what the actual path is to the Google Buckets, or how to find my XML file through the SSH client.

Newb here - so it might be really simple, but I'm not familiar with these tools or command line, any help would be appreciated.


Spaces Object Storage: Now Available in Amsterdam and New York

Published 8 Nov 2017 by John Gannon in The DigitalOcean Blog.

Spaces Object Storage: Now Available in Amsterdam and New York

Today we’re excited to announce the expansion of DigitalOcean Spaces to Amsterdam (AMS3). Spaces is a simple, standalone object storage service that enables developers to store and serve any amount of data with automatic scalability, performance, and reliability. With today’s announcement, Spaces now has locations in New York and Amsterdam, with more regions on the roadmap for early 2018.

Object storage has been one of the most requested products that we’ve been asked to build. When we embarked on developing a scalable storage product that is abstracted from compute resources, we realized we had an opportunity to refactor and improve how developers solve this problem today.

Pricing

We believe in simplifying our products to enable developers to build great software. To do that, we look at every opportunity to remove friction from the development process including spending less time estimating costs associated with storage, transfer, number of requests, pricing tiers, and regional pricing.

Spaces is available for a simple $5 per month price and includes 250GB of storage and 1TB of outbound bandwidth. There are no costs per request and additional storage is priced at the lowest rate available: $0.01 per GB transferred and $0.02 per GB stored. Uploads are free.

Spaces provides cost savings of up to 10x along with predictable pricing and no surprises on your monthly bill.

To make it easy for anyone to try, we are offering a 60-day free trial.

Scales with Your Data

Spaces is designed to scale automatically; as your application data grows, you won't need to worry about scaling any storage infrastructure. Although your Space can be configured to be accessed from anywhere, we realize that some customers prefer to keep their data close to their customers or to their own compute nodes.

To that end, Spaces is available in NYC3 and AMS3. More global regions will follow in early 2018—stay tuned for future updates.

Designed for Developers

Our goal was to simplify the essential components of object storage into a clean design. We tested several designs with developers to ensure Spaces was easy to use and manage with deployed applications. With Spaces, you can:

You can use your favorite storage management tools and libraries with Spaces. A large ecosystem of S3-compatible tools and libraries can be used to manage your Space. (We’ve published articles about some of these tools on our Community site; find the links in the “Getting Started” section below.)

Secure, Reliable, and Performant

Files you store in Spaces are encrypted on physical disks with 256-bit AES-XTS full-disk encryption. In addition, you can encrypt files with your own keys before uploading them to Spaces. You can limit access to Spaces and the files within using your Spaces API key(s) and permissioning.

Files stored in Spaces are distributed using a fault-tolerant placement technique called erasure coding. Spaces can tolerate multiple host failures without blocking any client I/O or experiencing any data loss.

Spaces is designed to provide high availability for storing and serving web assets, media, backups, log files, and application data. At DigitalOcean, we use Spaces for a variety of applications including serving of web assets (html, images, js) for cloud.digitalocean.com, and for backups of data critical to our business. During the early access period, thousands of users stored millions of objects and Spaces performed as expected with high throughput.

Getting Started

Join the hundreds of thousands of customers who already set up a Space since we’ve launched. Find out more about how your application could use Spaces for cost effective and scalable object storage by reading these articles and tutorials:

Overview

API Documentation

Migrating

Command-Line Clients

GUI Clients

We’ll be adding new features and regions over the coming months and look forward to hearing your feedback!


Install MediaWiki locally with a large DB: "LocalSettings.php" couldn't be generated

Published 8 Nov 2017 by user3621950 in Newest questions tagged mediawiki - Stack Overflow.

I'm trying to install MediaWiki (1.29.1 or 1.27.3) locally with a large Wiktionary dump (3GB). After converting the xml dump into an sql file and importing the latter into my DB that I create with this script, I followed the MediaWiki installation instructions in the browser to generate my specific "LocalSettings.php". I get the message

There are MediaWiki tables in this database. To upgrade them to MediaWiki 1.29.1, click Continue."

By clicking the "continue" button, the browser stays in loading state forever.

My understanding is that my DB containing the wiktionary dump has some tables that are not compatible with the version of wikimedia that I'm using. Therefore, an update of the DB is required. I tried to run install.php from the command line to avoid having timeout with the browser. The command didn't return anything (after waiting more than 2 hours).

I tried as well a workaround:

I got then a blank page with this message

Exception caught inside exception handler.Set $wgShowExceptionDetails = true; and $wgShowDBErrorBacktrace = true; at the bottom of LocalSettings.php to show detailed debugging information.

All the examples and tutorials that I found online about this matter are assuming/using a small or new created DB.

Any idea what's wrong? Did really someone tried to use an existing wikimedia dump and run it locally? Why is there no such an advanced example?


Why does psychoanalysis use the couch?

Published 8 Nov 2017 by in New Humanist Articles and Posts.

Q&A with Nathan Kravis, author of "On the Couch: A Repressed History of the Analytic Couch from Plato to Freud"

How to change the image gallery format options in a MediaWiki category

Published 8 Nov 2017 by Nathaniel Johnston in Newest questions tagged mediawiki - Stack Overflow.

There are numerous optional attributes that can be placed on MediaWiki galleries. Also, when images are placed in a category on MediaWiki, they are automatically displayed in a gallery on that category page (example). However, I cannot find any way to combine these two observations. Can I apply any of those optional gallery attributes to the automatically-generated category galleries to, for example, make the thumbnails 180 pixels wide instead of 120 pixels wide?


Security updates 1.3.3, 1.2.7 and 1.1.10 released

Published 7 Nov 2017 by Roundcube Webmail Dev Team in Roundcube Webmail Project News.

We just published updates to all stable versions from 1.1.x onwards delivering fixes for a recently discovered file disclosure vulnerability in Roundcube Webmail.

Apparently this zero-day exploit is already being used by hackers to read Roundcube’s configuration files. It requires a valid username/password as the exploit only works with a valid session. More details will be published soon under CVE-2017-16651.

The Roundcube series 1.0.x is not affected by this vulnerability but we nevertheless back-ported the fix in order to protect from yet unknown exploits.

See the full changelog for the according version in the release notes on the Github download pages: v1.3.3, v1.2.7, v1.1.10 v1.0.12

We strongly recommend to update all productive installations of Roundcube with either one of these versions.

Mitigation

In order to check whether your Roundcube installation has been compromised check the access logs for requests like ?_task=settings&_action=upload-display&_from=timezone. As mentioned above, the file disclosure only works for authenticated users and by finding such requests in the logs you should also be able to identify the account used for this unauthorized access. For mitigation we recommend to change the all credentials to external services like database or LDAP address books and preferably also the des_key option in your config.


DO India’s Q4 Update: Conferences, Webinars, and More

Published 6 Nov 2017 by Prabhakar (PJ) Jayakumar in The DigitalOcean Blog.

DO India’s Q4 Update: Conferences, Webinars, and More

A lot has happened since our last update from DO India. India continues to be DigitalOcean’s fastest growing international market, and we’re always thinking about the best ways to serve the needs of the local developer community.

In this post, we’ll share the latest happenings from DO India, including:

The Rising Tide

In early October, we hosted the third edition of our user conference, Tide, in Mumbai. With over 250 in attendance, participants got the opportunity to learn best practices for managing and scaling applications in the cloud via hands-on workshops, attend talks by startup founders, and build new connections with people from the local ecosystem, including fellow developers, entrepreneurs, mentors, and VCs.

DO India’s Q4 Update: Conferences, Webinars, and MoreScenes from October's Tide Conference in Mumbai.

The conference saw more than 20 influencers speaking on a diverse set of topics relevant to developer and startup communities, including:

Head over to our YouTube channel to watch video recordings of these and other interesting talks and panel discussions.

Bringing Out the Best in Student Developers

Campus Champ is a contest that aims to identify and recognize the best student developers and entrepreneurs across Indian universities by providing them a platform to showcase their product design and development capabilities. We were enthused by the participation and by the product ideas that came about in our inaugural edition of the contest organized in 2016.

This year’s edition of the contest is currently underway, and students are being tasked with building something useful and relevant to the community around them. The contest will run in two phases, with Phase 1 (ending on November 11th) requiring students to provide a brief write-up of the product idea and its potential utility and impact on the community. Phase 2 (running between November 16th and December 22nd) will involve the shortlisted student teams building on their idea and converting it into a working prototype. Outside of the experience, students get a chance to win attractive cash prizes and swag from DigitalOcean!

We are excited to see the innovative projects that students will build in the current edition of the contest. If you are a student from India who wants to participate, sign up now for Campus Champ (registration closes Friday, November 10th).

Learning More About Containers

If you’re interested in learning more about containers and container orchestration, we’re hosting a free, 6-part webinar series led by cloud expert MSV Janakiram on Deploying and Managing Containerized Workloads in the Cloud. It will cover the essentials of containers including container lifecycle management, deploying multi-container applications, and scaling workloads. The series will also cover Kubernetes and highlight the architecture, deployment, and best practices of running stateful applications.

This webinar series would be beneficial for developers of all skill levels interested in designing, developing, and deploying microservices and containerized applications. You can sign up for the webinars here and feel free to spread the word amongst friends or colleagues who may find these sessions useful.

As always, if you have any feedback or suggestions on how you would like to engage with DigitalOcean, let us know in the comments below. Happy holidays!

Prabhakar Jayakumar (PJ) is Country Director (India) responsible for running DigitalOcean's operations in India. His team is focused on building out the DO community and supporting the localized needs of India’s developer and startup ecosystem.


An Introduction to Design Operations

Published 30 Oct 2017 by Dave Malouf in The DigitalOcean Blog.

An Introduction to Design Operations

If you’re reading this post, chances are you’re aware of the term “DevOps”. The term “DesignOps”, however, is probably not as familiar. You might try to connect DesignOps to DevOps in order to help you decipher its meaning. And you wouldn’t be entirely wrong, but you’d only be looking at the tip of the iceberg.

Design operations (“DesignOps”) is a growing area of concern for design teams seeking to help their teams increase the value they produce for their host organizations and that organization’s customers. As this is a burgeoning area of interest, inconsistencies in the usage of “DesignOps”, both as a term and as a practice, still exists.

In this blog post, I present my own thinking around DesignOps, which has been highly influenced by growing conversations among design peers from organizations like Autodesk, Uber, Airbnb, Pinterest, Expedia, and even non-technology companies like CapitalOne, Royal Bank of Canada (RBC), and GE.

Developer and Designer Tools Aren’t (Always) the Same

If you’re a developer, you code. You might write code in TextEdit or Notepad and use FTP, or you might prefer editors like Emacs and vi. You can then start to add in other tools like an IDE, version control, or data repos. And you can eventually layer in a CI system and add issue tracking and automated testing. Each one of these layers of tools is meant to somehow make you better as a coder and more valuable to your organization. The choice of tools one uses is an operational decision.

Operational systems, which are made up of processes, practices, and tools—like Agile, project management software (JIRA, etc), and quiet spaces and collaborative spaces—exist to make you as a developer more productive, thereby improving your output. All of these things aim to make you better at doing your craft: coding.

In the design world, we contemplate similar things. Some operational systems support the design process itself; others support cross-functional collaboration with engineering to ensure design intentions are executed as closely as possible to what the design team had in mind. The last point in particular is the driving force behind design operations: how we collaborate, communicate, hand off deliverables, and partner with engineering has become a central part of how DesignOps evolved.

While many organizations have operationalized their design teams by adopting the same tools developers use—JIRA, GitHub, or Confluence, for example—these decisions don’t account for the different tools and processes unique to the designer workflow. As design teams scale, using development tools and processes for design has the unintended consequence of diluting design value. Designers don’t use IDEs; they use graphic tools, and not even a singular graphic tool. Designers don’t share or collaborate on code via text files in open-source environments; they use a mix of vector and raster graphic formats that are often created in closed, proprietary systems. Designers don’t iterate the same way developers do (well, not all designers, and they shouldn’t). They'll explore multiple options at almost every stage of the design process. The concept of “forking” won’t scale when looking at 10-20 alternatives of a single microinteraction in the same file where 5-10 alternatives of a layout are managed.

At the same time, the growing trend of creating design systems sometimes forces designers to work in files where they might have to interact with code on some level. These are usually meant to be points of interaction between designers and developers. As design systems are made up of components written with a mix of CSS, HTML, and JavaScript, using standard version control, issue management, and even Agile project management tools starts to make more sense.

This creates a new type of complication for designers, where some operations use one type of system and another part requires something else entirely. The good news is this pushes designers to be more cognizant and intentional about how they use those systems, especially those within the direct lines of sight and communication of developers.

Organizing Design Teams

Using different systems to manage assets is but one aspect of design operations, and operations as a whole is but one aspect of making individuals and teams get better at their craft. Human resource management is another important consideration for teams who want to amplify their value. And similar to the asset management areas of operations, different organizations will find different ways of managing and organizing their design teams.

For example, many design teams centralize their workers in a single design group, assigning members to projects while still directly reporting to a single design leader. Other organizations do the opposite: designers are hired to work directly within project teams, and “guilds” are created to help manage their specific design needs.

In summary, different organizational and team cultures require different solutions, which depend on many factors. And when accounting for other aspects of operations such as running design systems, design processes, and even research practices, DesignOps gets significantly more complex.

Resources

We’ll share more insight into DigitalOcean’s specific DesignOps philosophy and processes in future blog posts. In the meantime, you can read more about DesignOps at Amplify Design.

If you’re in NYC, join us for the first-ever conference on the topic of DesignOps this November 6-8 at the Museum of the Moving Image, called the DesignOps Summit. DigitalOcean will be there and your team might find value in meeting with a broad group of people who are interested in cultivating an operational mindset to amplify the value of their design teams. We hope to see you there!

Dave Malouf is the Director of Product Design at DigitalOcean. Dave’s 25+ year career in design includes enterprise, agency, and consumer spaces. He is a founder of the Interaction Design Association (IxDA) and has founded the Enterprise UX and the Design Ops Summit conferences. ave writes, speaks, and teaches around the world. You can find Dave talking about design, design leadership, and design operations on Twitter @daveixd, LinkedIn, and on Medium.


Update 1.3.2 released

Published 30 Oct 2017 by Roundcube Webmail Dev Team in Roundcube Webmail Project News.

We proudly announce the second service release to update the stable version 1.3. It contains fixes to several bugs reported by our dear community members as well as translation updates synchronized from Transifex.

We also changed the wording for the setting that controls the time after which an opened message is marked as read. This was previously only affecting messages being viewed in the preview panel but now applies to all means of opening a message. That change came with 1.3.0 an apparently confused many users. Some translation work is still needed here.

See the full changelog in the release notes on the Github download page.

This release is considered stable and we recommend to update all productive installations of Roundcube with this version. Download it from roundcube.net.

Please do backup your data before updating!


Tales from DigitalOcean’s Inaugural Intern Program

Published 24 Oct 2017 by Danny Arango in The DigitalOcean Blog.

Tales from DigitalOcean’s Inaugural Intern Program

This past June, DigitalOcean welcomed its first-ever group of summer interns (who we endearingly called “minnows”). For 10 weeks, our interns worked in teams across the organization—from data to cloud engineering to marketing—based out of both our New York City HQ and Cambridge, MA office. (If you’re interested in becoming a Minnow, apply for a spot in our 2018 intern class, or send us your questions at internships@digitalocean.com. We’re hiring interns for various teams including Product Management, User Experience, Engineering, Data & Analytics, and Marketing & Communications. And in the meantime, learn more about the hiring experience at DO.)

The program was the culmination of years’ worth of research, planning, and recruitment. We knew companies large and small were reaping enormous quantitative and qualitative benefits by introducing internship programs, and we took a lot of care to craft something that would be an enriching experience for our interns, while providing real value to the teams they would be working with. We feel fortunate to have been able to support a talented cohort of people. (Three of them recount their experiences later in this post as they share what they worked on during the program.)

Determining The “Why” of an Internship Program

To answer the question of when the right time would be to start an intern program, we considered a few things: How much does leadership buy into the concept? Can the program be structured in such a way that interns and their mentors get the guidance they need to be successful? Can we recruit a set of talented and diverse interns to bring added value to the company? For us, it started with a simple premise: how can we pay back our experiences to our community, especially younger technologists who love using DO to learn about tech? We wanted to pay it forward to young people who wanted to work in the cloud and help build the next set of great companies.

At a startup, every team member’s time is critical. We knew we wanted to structure a program that allowed students the opportunity to network with our employees, learn from them, and contribute to the organization. Orienting their growth, with some guidance from the people who love developers and want to build tools for them, helped identify other areas they could explore. We were lucky to have this first set of great interns and we look forward to having more of them join our ranks in the coming years!

Tales from DigitalOcean’s Inaugural Intern Program(The Inaugural 2017 DO Intern Class. From top left to top right: Luke Grgas, Sasha Krutiy, Alisha KC, Evan Mena, Jordan Shea, Devin Morgan, Mariano Salinas, Anand Vyas, Kevin Wei. From bottom left to bottom right: Shweta Agrawal, Moises Eskinazi, Andrew Rouditchenko)

Meet the Minnows

For more on our inaugural class of interns, read some of their stories below. (We’re publishing a second post with even more intern stories later this fall—stay tuned!)

Devin Morgan, Support Tools Team intern

DigitalOcean invests a significant amount of time and resources into monitoring for Droplets being used in DoS attacks, which I’ll call the Flood Monitoring System (FMS). The FMS uses data structures called FloodAlerts and FloodOccurrences to track when a Droplet has been flagged for “flooding” and to record information about the Droplet while it is flooding. This information then gets sent to an internal team that manually combs through the provided data and determines whether or not a flagged Droplet is acting legitimately or maliciously. However, as DigitalOcean’s customer base grew, the need to automate or semi-automate this process was becoming more serious. But, before DO could begin automating the FMS, certain infrastructure changes needed to be made.

I also had to create a dashboard that displays real time, visual representations of information derived from the FloodAlert and FloodOccurrence (FAO) data. Because this job heavily involved distributed systems, some of the technologies that I worked with included Apache Kafka, Apache Hadoop, Apache Hive, and Apache Spark. Other technologies that I used included Golang, Python, gRPC, and PrestoDB.

Both parts of this project were technologically challenging and forced me to learn a substantial amount in a short amount of time. I was very fortunate that the team I worked with knew their codebase and their tech stack inside and out and were incredibly willing to answer questions when I had them.

Shweta Agrawal, Product Management Team intern

I was looking for firsthand experience owning a product and making an impact. I felt that owning a product as a PM intern would be valuable because it was actionable and results-oriented. I wanted to further hone my leadership skills, and learn how to scope down to an MVP feature and release it within a few weeks. My DO summer project was to improve One Click Apps, responsible for setting the vision for the product, and coordinating all aspects of One Click Apps product development, from creating the business case for adding new features to deciding when and what features should be retired.

It was valuable to compare the product’s historical metrics with the success criteria, to see how it had been performing over time. I took time to research what users were really like and what goals they were trying to accomplish. I analyzed the data, looked at customer support tickets, and interviewed a few customers to figure out the problems users were having.

The next step was to find solutions to these problems. I began by building hypotheses related to customer behavior and emotion. I had to wear several hats throughout the summer: I was responsible for identifying opportunities in the One Click Apps, launching two new One Click Apps to market, oversee them, and analyze metrics to ensure that they met the adoption goals. From writing specifications, prioritizing features, finalizing go-to-market, writing documentation, analyzing data, and coordinating with internal teams, I did it all.

Evan Mena, Insights Team intern

As a hobbyist developer I’ve worked with many different programming languages and technologies to create things that I wanted for myself. While I tried to include what I considered to be relevant industry tools, such as git, into my independent workflow I’ve always wondered just how different the real world was from my solo and small team projects. During my internship, I was given the chance to participate in the development environment and cycle on the Insights team. I got the chance to work on a real feature that will provide value to both the company and end users. My code received the same care during review as a full time engineer, and getting feedback on my work as a software developer was wonderful.

My project for the summer was to provide users a way of establishing webhooks for alerts related to their Droplets. A webhook is a way for an application to provide real time information to another application over HTTP. As an end-user you’ll register your webhook, receive a request with a challenge message on your webhook URL from DigitalOcean then verify that you control that URL by responding with the challenge message. Once the URL is verified it will be sent JSON payloads with information related to alerts when they occur. This is a fantastic addition to alert notifications as webhooks allow end-user developers to act programmatically when alerts occur. Rather than receiving an email or slack notification, a developer could write a small program that would receive webhook payloads and do nearly anything with them. They could log alerts and generate graphical output, make use of DigitalOcean’s API and adjust their server’s storage size automatically, or a myriad of other countless options. I developed this particular feature from end to end, working in EmberJS in the UI and writing request handlers in Go on the backend.


In our next post, we’ll hear from interns who worked on our Cloud Engineering, Marketing, Networking, Data, Compute, and Frontend Infrastructure teams.

Danny Arango is a Senior Tech Recruiter at DigitalOcean. He’s passionate about building diverse teams and finding the right fit for the right people at the right time. He’s also a raging Arsenal fan (both in the positive and negative sense) and will debate anyone on the merits of 1994 being the best year in hip hop history. Follow Danny on Twitter @ElPibe627.


Community goal: Modern and Global Text Input For Every User

Published 23 Oct 2017 by eike hein in blogs.kde.org blogs.

KDE Project:

A few months ago, I had the opportunity to give a talk on Input Methods in Plasma 5 at Akademy 2017 in lovely Almería in Spain. If you were interest in my talk but were unable to attend, there's now video (complementary slides) available courtesy of the Akademy conference team. Yay!

A big part of my talk was a long laundry list of issues we need to tackle in Plasma, other KDE software and the wider free desktop ecosystem. It's now time to take the next step and get started.

I've submitted the project under Modern and Global Text Input For Every User as part of the KDE community's new community goals initiative, a new thing we're trying exactly for challenges like this - goals that need community-wide, cross-project collaboration over a longer time period to achieve.

If you're interested in this work, make sure to read the proposal and add yourself at the bottom!


High CPU Droplets Now Available in SGP1

Published 22 Oct 2017 by Ben Schaechter in The DigitalOcean Blog.

High CPU Droplets Now Available in SGP1

Today, we’re excited to share that High CPU Droplet plans are now available in Singapore (SGP1). These Droplet plans are designed for CPU-intensive workloads including CI/CD servers, data analytics applications, and any application that requires more powerful underlying computing power.

With this expansion, Droplets are now available through the Control Panel and the API in NYC1, NYC3, AMS3, SFO2, SGP1, LON1, FRA1, TOR1, and BLR1.

Use Cases

Here are some use cases that can benefit from CPU-optimized compute servers:

Plans

We are offering five new Droplet plans. They start from $40/mo for two dedicated vCPUs, up to $640/mo for 32 dedicated vCPUs.

High CPU Droplets Now Available in SGP1

We've partnered with Intel to back these Droplets with Intel's most powerful processors, delivering a maximum, reliable level of performance. Going forward, we’ll regularly evaluate and use the best CPUs available to ensure they always deliver the best performance for your applications.

The current CPUs powering High CPU Droplets are the Intel Broadwell 2697Av4 with a clock speed of 2.6Ghz, and the Intel Skylake 8168 with a clock speed of 2.7Ghz. Customers in our early access period have seen up to four times the performance of Standard Droplet CPUs, and on average see about 2.5 times the performance.

Ben Schaechter
Product Manager, Droplet


Understanding WordStar - check out the manuals!

Published 20 Oct 2017 by Jenny Mitcham in Digital Archiving at the University of York.

Last month I was pleased to be able to give a presentation at 'After the Digital Revolution' about some of the work I have been doing on the WordStar 4.0 files in the Marks and Gran digital archive that we hold here at the Borthwick Institute for Archives. This event specifically focused on literary archives.

It was some time ago now that I first wrote about these files that were recovered from 5.25 inch floppy (really floppy) disks deposited with us in 2009.

My original post described the process of re-discovery, data capture and file format identification - basically the steps that were carried out to get some level of control over the material and put it somewhere safe.

I recorded some of my initial observations about the files but offered no conclusions about the reasons for the idiosyncrasies.

I’ve since been able to spend a bit more time looking at the files and investigating the creating application (WordStar) so in my presentation at this event I was able to talk at length (too long as usual) about WordStar and early word processing. A topic guaranteed to bring out my inner geek!

WordStar is not an application I had any experience with in the past. I didn’t start word processing until the early 90’s when my archaeology essays and undergraduate dissertation were typed up into a DOS version of Word Perfect. Prior to that I used a typewriter (now I feel old!).

WordStar by all accounts was ahead of its time. It was the first Word Processing application to include mail merge functionality. It was hugely influential, introducing a number of keyboard shortcuts that are still used today in modern word processing applications (for example control-B to make text bold). Users interacted with WordStar using their keyboard, selecting the necessary keystrokes from a set of different menus. The computer mouse (if it was present at all) was entirely redundant.

WordStar was widely used as home computing and word processing increased in popularity through the 1980’s and into the early 90’s. However, with the introduction of Windows 3.0 and Word for Windows in 1989, WordStar gradually fell out of favour (info from Wikipedia).

Despite this it seems that WordStar had a loyal band of followers, particularly among writers. Of course the word processor was the key tool of their trade so if they found an application they were comfortable with it is understandable that they might want to stick with it.

I was therefore not surprised to hear that others presenting at 'After the Digital Revolution' also had WordStar files in their literary archives. Clear opportunities for collaboration here! If we are all thinking about how to provide access to and preserve these files for the future then wouldn't it be useful to talk about it together?

I've already learnt a lot through conversations with the National Library of New Zealand who have been carrying out work in this area (read all about it here: Gattuso J, McKinney P (2014) Converting WordStar to HTML4. iPres.)

However, this blog post is not about defining a preservation strategy for the files it is about better understanding them. My efforts have been greatly helped by finding a copy of both a WordStar 3 manual and a WordStar 4 manual online.

As noted in my previous post on this subject there were a few things that stand out when first looking at the recovered WordStar files and I've used the manuals and other research avenues to try and understand these better.


Created and last modified dates

The Marks and Gran digital archive consists of 174 files, most of which are WordStar files (and I believe them to be WordStar version 4).

Looking at the details that appear on the title pages of some of the scripts, the material appears to be from the period 1984 to 1987 (though not everything is dated).

However the system dates associated with the files themselves tell a different story. 

The majority of files in the archive have a creation date of 1st January 1980.

This was odd. Not only would that have been a very busy New Year's Day for the screen writing duo, but the timestamps on the files suggest that they were also working in the very early hours of the morning - perhaps unexpected when many people are out celebrating having just seen in the New Year!

This is the point at which I properly lost my faith in technical metadata!

In this period computers weren't quite as clever as they are today. When you switched them on they would ask you what date it was. If you didn't tell them the date, the PC would fall back to a system default ....which just so happens to be 1st January 1980.

I was interested to see Abby Adams from the Harry Ransom Center, University of Texas at Austin (also presenting at 'After the Digital Revolution') flag up some similarly suspicious dates on files in a digital archive held at her institution. Her dates differed just slightly to mine, falling on the evening of the 31st December 1979. Again, these dates looked unreliable as they were clearly out of line with the rest of the collection.

This is the same issue as mine, but the differences relate to the timezone. There is further explanation here highlighted by David Clipsham when I threw the question out to Twitter. Thanks!


Fragmentation

Another thing I had noticed about the files was the way that they were broken up into fragments. The script for a single episode was not saved as a single file but typically as 3 or 4 separate files. These files were named in such a way that it was clear that they were related and that the order that the files should be viewed or accessed was apparent - for example GINGER1, GINGER2 or PILOT and PILOTB.

This seemed curious to me - why not just save the document as a single file? The WordStar 4 manual didn't offer any clues but I found this piece of information in the WordStar 3 manual which describes how files should be split up to help manage the storage space on your diskettes:

From the WordStar 3 manual




Perhaps some of the files in the digital archive are from WordStar 3, or perhaps Marks and Gran had been previously using WordStar 3 and had just got into the habit of splitting a document into several files in order to ensure they didn't run out of space on their floppy disks.

I can not imagine working this way today! Technology really has come on a long way. Imagine trying to format, review or spell check a document that exists as several discrete files potentially sitting on different media!


Filenames

One thing that stands out when browsing the disks is that all the filenames are in capital letters. DOES ANYONE KNOW WHY THIS WAS THE CASE?

File names in this digital archive were also quite cryptic.This is the 1980’s so filenames conform to the 8.3 limit. Only 8 characters are allowed in a filename and it *may* also include a 3 character file extension.

Note that the file extension really is optional and WordStar version 4 doesn’t enforce the use of a standard file extension. Users were encouraged to use those last 3 characters of the file name to give additional context to the file content rather than to describe the file format itself.

Guidance on file naming from the WordStar 4 manual
Some of the tools and processes we have in place to analyse and process the files in our digital archives use the file extension information to help understand the format. The file naming methodology described here therefore makes me quite uncomfortable!

Marks and Gran tended not to use the file extension in this way (though there are a few examples of this in the archive). The majority of WordStar files have no extension at all. The real consistent use of file extensions related to their back up files.


Backup files

Scattered amongst the recovered data were a set of files that had the extension BAK. This clearly is a file extension that WordStar creates and uses consistently. These files clearly contained very similar content to other documents within the archive but typically with just a few differences in content. These files were clearly back up files of some sort but I wondered whether they had been created automatically or by the writers themselves.

Again the manual was helpful in moving forward my understanding on this:

Backup files from the WordStar 4 manual

This backup procedure is also summarised with the help of a diagram in the WordStar 3 manual:


The backup procedure from WordStar 3 manual


This does help explain why there were so many back up files in the archive. I guess the next question is 'should we keep them?'. It does seem that they are an artefact of the application rather than representing a conscious process by the writers to back their files up at a particular point in time and that may impact on their value. However, as discussed in a previous post on preserving Google documents there could be some benefit in preserving revision history (even if only partial).



...and finally

My understanding of these WordStar files has come on in leaps and bounds by doing a bit of research and in particular through finding copies of the manuals.

The manuals even explain why alongside the scripts within the digital archive we also have a disk that contains a copy of the WordStar application itself. 

The very first step in the manual asks users to make a copy of the software:


I do remember having to do this sort of thing in the past! From WordStar 4 manual


Of course the manuals themselves are also incredibly useful in teaching me how to actually use the software. Keystroke based navigation is hardly intuitive to those of us who are now used to using a mouse, but I think that might be the subject of another blog post!



Crime and Punishment

Published 19 Oct 2017 by leonieh in State Library of Western Australia Blog.

Many Western Australians have a convict or pensioner guard in their ancestral family. The State Library has digitised some items from our heritage collections relating to convicts, the police and the early criminal justice system.

Convicts slwa_b2462917_1

Convicts Tom the dealer, Davey Evans and Paddy Paternoster b2462917

Police Gazette of Western Australia, 1876-1900
The Police Gazettes include information under various headings including apprehensions (name of person arrested, arresting constable, charge and sentence), police appointments, tickets of leave, certificates of freedom, and conditional pardons issued to convicts. You may find physical descriptions of prisoners. Deserters from military service and escaped prisoners are sought. Mention is also made of expirees leaving the colony; inquests (where held, date, name and date of death of person, verdict); licences (publican, gallon, eating, boarding and lodging houses, railway refreshment rooms, wine and beer and spirit merchants, etc. giving name of licensee, name of hotel and town or district). There are listings for missing friends; prisoners discharged; people tried at Quarter Sessions (name, offence, district, verdict); and warrants issued. There are many reasons for a name to appear in the gazettes.

We thank the Friends of Battye Library and the Sholl Bequest, for supporting the digitising of the Police Gazettes.

Click to view slideshow.

 

A great resource for researching the broader experience of WA convicts is The convict system in Western Australia, 1850-1870 by Cherry Gertzel. This thesis explains the workings of the convict system, and explores the conditions under which the convicts lived and worked, their effect on the colony and, to some extent, the attitudes of colonists to the prisoners.

Click to view slideshow.

Another valuable publication is Further correspondence on the subject of convict discipline and transportation. This comprises official documents relating to the transportation of convicts to Australia, covering the period 1810-1865, and is bound in 8 volumes.
This set from our rare book collection gives an excellent background to the subject for anyone researching convicts or convict guards, with individuals (very) occasionally being named.
The easiest way to access this wonderful resource is to type convict system under Title in our catalogue and select State Library Online from the drop-down box. Once you’ve selected a volume, you can browse through the pages by placing your cursor on the edge of a page and clicking. If you have the volume turned on, this makes a very satisfying page-turning noise! If you want to search for names, scroll down and select the Download button. You can then save a searchable PDF version to your PC. The files are fairly large so you may need to be patient.

Return of the number of wives and families of ticket-of-leave holders to be sent out to Western Australia 1859

Return of the number of wives and families of ticket-of-leave holders to be sent out to Western Australia 1859 From: Further correspondence on the subject of convict discipline and transportation, 1859-1865 p.65. [vol.8]

 There are several online diaries relating to convict voyages. The diary, including copies of letters home, of convict John Acton Wroth was kept during his transportation to Western Australia on the Mermaid in 1851 and for a while after his arrival. Wroth was only 17 years old at the time of his conviction. Apparently he was enamoured of a young woman and resorted to fraud in order to find the means to impress her. The diary spans 1851-1853 and it reveals one young man’s difficulty in finding himself far from the love and support of his family while accepting of the circumstance he has brought upon himself. Wroth subsequently settled in Toodyay and became a respected resident, raising a large family and running several businesses as well as acting for some time as local school master. Click to view slideshow.

Another interesting read is the transcript of the diary of John Gregg, carpenter on the convict ship York. This 1862 diary gives details of work each day, which was often difficult when the weather was foul and the carpenter sea-sick, and uncommon events such as attempts by convicts to escape –

“…the affair altogether must be admitted to reflect little credit on the military portion of the convict guard, for although the officer of the watch called loud and long for the guard, none were forthcoming until the prisoners were actually in custody.”

Click to view slideshow.

Diary of John Gregg, carpenter on the convict ship ‘York’, with definitions of nautical terms, compiled by Juliet Ludbrook.

Picture1

 

 

 

A letter from a convict in Australia to a brother in England, originally published in the Cornhill Magazine, April 1866 contains insights into the experience of a more educated felon and some sharp observations on convict life as lived by him upon his arrival in Western Australia-

“…you can walk about and talk with your friends as you please. So long as there is no disturbance, there is no interference”

and

“…the bond class stand in the proportion of fully five-sevenths of the entire population, and are fully conscious of their power…”

Other miscellaneous convict -related items include:

Two posters listing convict runaways with details of their convictions and descriptions:
Return of convicts who have escaped from the colony, and whose absconding has been notified to this office between the 1st June, 1850, and the 31st of March, 1859
and
List of convicts who are supposed to have escaped the Colony (a broadsheet giving the name, number and description of 83 escaped convicts).


Parade state of the Enrolled Guard, 30 March 1887, on the occasion of the inspection of the guard by Sir Frederick Napier Broome, prior to disbandment.

Parade_state_of_the_Enrolled_Guard___b1936163_2017-10-11_1638

Parade state of the Enrolled Guard… b1936163

 

British Army pensioners came out to Western Australia as convict guards. This document gives the following details for those still serving in 1887:- rank, name, regiment, age, rate of pension, length of Army service, rank when pensioned, date of joining the Enrolled Guard, medals and clasps.

 

 

 

 

 

 

Scale of remission for English convicts sentenced to penal servitude subsequent to 1 July 1857  is a table showing how much time in good behaviour convicts needed to accrue in order to qualify for privileges.

Certificate of freedom, 1869 [Certificates of freedom of convict William Dore]

This is just a small sample of convict-related material in the State Library collections that you can explore online. You can also visit the Battye Library of West Australian History to research individual convicts, policemen, pensioner guards or others involved in the criminal justice system.

 


Filed under: Family History, Uncategorized, WA history, Western Australia Tagged: convicts, police gazettes, western australia - history, western australia-convicts, western australia-police gazettes

What's New With the DigitalOcean Network

Published 17 Oct 2017 by Luca Salvatore in The DigitalOcean Blog.

What's New With the DigitalOcean Network

Early this year the network engineering team at DigitalOcean embarked on a fairly ambitious project. We were thinking about areas of our network that needed improvement both for our customers and for our internal systems. One of the key things that we strive for at DO is to provide our customers with a stable and high performing cloud platform. As we continue to grow and release new products, it becomes clear that network infrastructure is a critical component and it must keep up with our customers needs. In order to allow our customers to grow, the network must be able to scale, it must be performant, and above all, must be reliable.

With those factors in mind, we went to work building out the DigitalOcean global backbone. It’s not finished yet, but we wanted to share what has been done so far, what is in progress, and what the end state will be.

Creating a Backbone Network

DigitalOcean currently operates 12 datacenter regions (DCs) all around the world. Up until recently, these datacenters have functioned as independent “island” networks. This means that if you have Droplets in multiple locations and they need to communicate with each other, that communication goes across the public internet. For the most part, that “just works”, but the internet is susceptible to a multitude of potential problems: ISPs can have technical problems, congestion is common, and there are malicious attacks that can cause widespread issues. If you have an application that requires communication between multiple regions, the factors mentioned above could throw a wrench in even the most well designed system. To mitigate this risk, we are building our own backbone network.

A backbone network allows us to interconnect our DCs using a variety of technologies such as dark fiber and wavelengths. This means that communication between DO locations no longer needs to traverse the public internet. Instead, traffic between locations runs over dedicated links that DigitalOcean manages. This gives our customers predictable and reliable transport between regions. Predictable and reliable are the key words here, and this is immensely important for anyone who is building mission critical applications. It allows developers and engineers to know exactly how their application will perform, and feel safe in the fact that their traffic is running over dedicated and redundant infrastructure.

Our customers have probably noticed a number of “Network Maintenance Notifications” that we’ve sent out. In order to build out our backbone and ensure that it is scalable, reliable, and performant, we’ve had to make a number of changes to our existing network infrastructure. This includes software upgrades, new hardware, and a number of complex configuration changes. The end result will ensure that our current and future customers will benefit from all of this work.

Now, onto the details. This is what we have built so far, and what we'll build in the future.

Networking Through DO-Owned Links

We’ve interconnected our three NYC locations; All Droplet-to-Droplet traffic between NYC1, NYC2, and NYC3 now traverses DO-owned links. Latency is predictable and stable, and packet loss is nonexistent.

We’ve done the same thing around all of our European locations: LON1, AMS2, AMS3, and FRA1 are all now interconnected together. Again, all traffic between Droplets within the EU now stays within the DO network. Here is how it looks:

What's New With the DigitalOcean Network

We’ve also provisioned transatlantic links connecting our NYC regions to our European regions. This means that your communication between NYC and any datacenter in Europe also stays within our own network:

What's New With the DigitalOcean Network

Adding more to the mix, we’ve connected our NYC locations to our two facilities in California, SFO1 and SFO2. All communication around North America as well as communication within and to Europe now stays within the DO backbone:

What's New With the DigitalOcean Network

Next up will be connectivity from the SFO region to SGP1. We also have plans to link Singapore to Europe which is slated for Q1 2018 as well as TOR1 to NYC. Once fully completed, the DO global backbone will look like this:

What's New With the DigitalOcean Network

We are very excited about what these upgrades mean for DO and for you, our users. We’re continually striving to create better performing and more reliable infrastructure, and I feel that these upgrades to the network will set the stage for some really awesome things to be built on top of the DO platform.

Luca Salvatore is currently the manager of the Networking Engineering Team at DigitalOcean. Over the past decade Luca has held various network engineering roles both in Australia and the USA. He has designed and built large enterprise and datacenter networks and has first hand experience dealing with massively scalable networks such as DigitalOcean. He has been working in the cloud networking space for the past 5 years and is committed to peering and an open internet for all.


“Why archivists need a shredder…”

Published 13 Oct 2017 by inthemailbox in In the mailbox.

Struggling to explain what it is that you do and why you do it? President of the Australian Society of Archivists, Julia Mant, gives it a red hot go in an interview for the University of Technology Sydneyhttps://itunes.apple.com/au/podcast/glamcity/id1276048279?mt=2

https://player.whooshkaa.com/player/playlist/show/1927?visual=true&sharing=true

 



Cthulhu: Organizing Go Code in a Scalable Repo

Published 10 Oct 2017 by Matt Layher in The DigitalOcean Blog.

Cthulhu: Organizing Go Code in a Scalable Repo

At DigitalOcean, we’ve used a “mono repo” called cthulhu to organize our Go code for nearly three years. A mono repo is a monolithic code repository which contains many different projects and libraries. Bryan Liles first wrote about cthulhu in early 2015, and I authored a follow-up post in late 2015.

A lot has changed over the past two years. As our organization has scaled, we have faced a variety of challenges while scaling cthulhu, including troubles with vendoring, CI build times, and code ownership. This post will cover the state of cthulhu as it is today, and dive into some of the benefits and challenges of using a mono repo for all of our Go code at DigitalOcean.

Cthulhu Overview

Our journey using Go with a mono repo began in late 2014. Since then, the repository, called "cthulhu", has grown exponentially in many ways. As of October 6th, 2017, cthulhu has:

As the scale of the repository has grown over the past three years, it has introduced some significant tooling and organizational challenges.

Before we dive into some of these challenges, let’s discuss how cthulhu is structured today (some files and directories have been omitted for brevity):

cthulhu  
├── docode
│   └── src
│       └── do
│           ├── doge
│           ├── exp
│           ├── services
│           ├── teams
│           ├── tools
│           └── vendor
└── script

docode/ is the root of our GOPATH. Readers of our previous posts may notice that third_party no longer exists, and do/ is now the prefix for all internal code.

Code Structure

All Go code lives within our GOPATH, which starts at cthulhu/docode. Each directory within the do/ folder has a unique purpose, although we have deprecated the use of services/ and tools/ for the majority of new work.

doge/ stands for “DigitalOcean Go Environment”, our internal “standard library”. A fair amount of code has been added and removed from doge/ over time, but it still remains home to a great deal of code shared across most DigitalOcean services. Some examples include our internal logging, metrics, and gRPC interaction packages.

exp/ is used to store experimental code: projects which are in a work-in-progress state and may never reach production. Use of exp/ has declined over time, but it remains a useful place to check in prototype code which may be useful in the future.

services/ was once used as a root for all long-running services at DO. Over time, it became difficult to keep track of ownership of code within this directory, and it was replaced by the teams/ directory.

teams/ stores code owned by specific teams. As an example, a project called “hypervisor” owned by team “compute” would reside in do/teams/compute/hypervisor. This is currently the preferred method for organizing new projects, but it has its drawbacks as well. More on this later on.

tools/ was once used to store short-lived programs used for various purposes. These days, it is mostly unused except for CI build tooling, internal static analysis tools, etc. The majority of team-specific code that once resided in tools/ has been moved to teams/.

Finally, vendor/ is used to store third-party code which is vendored into cthulhu and shared across all projects. We recently added the prefix do/ to all of our Go code because existing Go vendoring solutions did not work well when vendor/ lived at the root of the GOPATH (as was the case with our old third_party/ approach).

script/ contains shell scripts which assist with our CI build process. These scripts perform tasks such as static analysis, code compilation and testing, and publishing newly built binaries.

CI Build Tooling

One of the biggest advantages of using a mono repo is being able to effectively make large, cross-cutting changes to the entire repository without fear of breaking any “downstream” repositories. However, as the amount of code within cthulhu has grown, our CI build times have grown exponentially.

Even though Go code builds rather quickly, in early 2016, CI builds took an average of 20 minutes to complete. This resulted in extremely slow development cycles. If a poorly written test caused a spurious failure elsewhere in the repo, the entire build could fail, causing a great deal of frustration for our developers.

After experiencing a great deal of pain because of slow and unreliable builds, one of our engineers, Justin Hines, set out to solve the problem once and for all. After a few hours of work, he authored a build tool called gta, which stands for “Go Test Auto”. gta inspects the git history to determine which files changed between master and a feature branch, and uses this information to determine which packages must be tested for a given build (including packages that import the changed package).

As an example, suppose a change is committed which modifies a package, do/teams/example/droplet. Suppose this package is imported by another package, do/teams/example/hypervisor. gta is used to inspect the git history and determine that both of these packages must be tested, although only the first package was changed.

For very large changes, it can occasionally be useful to test the entire repository, regardless of which files have actually changed. Adding “force-test” anywhere in a branch name disables the use of gta in CI builds, restoring the old default behavior of “build everything for every change”.

The introduction of gta into our CI build process dramatically reduced the amount of time taken by builds. An average build now takes approximately 2-3 minutes—a dramatic improvement over the 20 minute builds of early 2016. This tool is used almost everywhere in our build pipeline, including static analysis checks, code compilation and testing, and artifact builds and deployment.

Static Analysis Tooling

Every change committed to cthulhu is run through a bevy of static analysis checks, including tools such as gofmt, go vet, golint, and others. This ensures a high level of quality and consistency between all of our Go code. Some teams have even introduced additional tools such as staticcheck for code that resides within their teams/ folder.

We have also experimented with the creation of custom linting tools that resolve common problems found in our Go code. One example is a tool called buildlint that checks for a blessed set of build tags, ensuring that tags such as !race (exclude this file from race detector builds) cannot be used.

Static analysis tools are incredibly valuable, but it can be tricky to introduce a new tool into the repository. Before we decided to run golint in CI, there were nearly 1,500 errors generated by the tool for the entirety of cthulhu. It took a concerted effort and several Friday afternoons to fix all of these errors, but it was well worth the effort. Our internal godoc instance now provides a vast amount of high quality documentation for every package that resides within cthulhu.

Challenges

While there are many advantages to the mono repo approach, it can be challenging to maintain as well.

Though many different teams contribute to the repository, it can be difficult to establish overall ownership of the repository, its tooling, and its build pipelines. In the past, we tried several different approaches, but most were unsuccessful due to the fact that customer-facing project work typically takes priority over internal tooling improvements. However, this has recently changed, and we now have engineers working specifically to improve cthulhu and our build pipelines, alongside regular project work. Time will tell if this approach suits our needs.

The issue of code vendoring remains unsolved, though we have made efforts to improve the situation. As of now, we use the tool “govendor” to manage our third-party dependencies. The tool works well on Linux, but many of our engineers who run macOS have reported daunting issues while running the tool locally. In some cases, the tool will run for a very long time before completion. In others, the tool will eventually fail and require deleting and re-importing a dependency to succeed. In the future, we’d also like to try out “dep”, the “official experiment” vendoring tool for the Go project. At this time, GitHub Enterprise does not support Go vanity imports, which we would need to make use of dep.

As with most companies, our organizational structure has also evolved over time. Because we typically work in the teams/ directory in cthulhu, this presents a problem. As of now, our code structure is somewhat reliant on our organizational structure. Because of this, code in teams/ can become out of sync with the organizational structure, causing issues with orphaned code, or stale references to teams that no longer exist. We don’t have a concrete solution to this problem yet, but we are considering creating a discoverable “project directory service” of sorts so that our code structure need not be tied to our organizational structure.

Finally, as mentioned previously, scaling our CI build process has been a challenge over time. One problem in particular is that non-deterministic or “flappy” tests can cause spurious failures in unrelated areas of the repository. A test typically flaps when it relies on some assumption which cannot be guaranteed, such as timing or ordering of concurrent operations. This problem is compounded when interacting with a service such as MySQL in an integration test. For this reason, we encourage our engineers to do everything in their power to make their tests as deterministic as possible.

Summary

We’ve been using cthulhu for three years at DigitalOcean, and while we’ve faced some significant hurdles along the way, the mono repo approach has been a huge benefit to our organization as a whole. Over time, we’d like to continue sharing our knowledge and tools, so that others can reap the benefits of a mono repo just as we have.

Matt Layher is a software engineer on the Network Services team, and a regular contributor to a wide variety of open source networking applications and libraries written in Go. You can find Matt on Twitter and GitHub.


Why do I write environmental history?

Published 8 Oct 2017 by Tom Wilson in thomas m wilson.

Why bother to tell the history of the plants and animals that make up my home in Western Australia?  Partly its about reminding us of what was here on the land before, and in some ways, could be here again. In answering this question I’d like to quote the full text of Henry David Thoreau’s […]

Why do I write environmental history?

Published 8 Oct 2017 by Tom Wilson in thomas m wilson.

Why bother to tell the history of the plants and animals that make up my home in Western Australia?  Partly its about reminding us of what was here on the land before, and in some ways, could be here again. In answering this question I’d like to quote the full text of Henry David Thoreau’s […]

Hatch at One Year: Helping More Startups Grow

Published 3 Oct 2017 by Hollie Haggans in The DigitalOcean Blog.

Hatch at One Year: Helping More Startups Grow

Our global incubator program Hatch turned one year this past September. Since 2016, we’ve partnered with over 170 incubators, venture capital firms, and accelerators globally to provide infrastructure credit and technical support to startups in over 61 countries, and we’ve moved Hatch out of the beta phase to make the program more widely available across the world.

Hatch startup participants include companies like:

In addition to providing infrastructure support, we’ve hosted forums, called Hatch Founders’ Circles, in New York, Berlin, and Bangalore that facilitate thought partnership between our Hatch startups and other successful tech entrepreneurs, and are launching an invite-only Slack community for our Hatch startup founders.

To celebrate this milestone, DigitalOcean co-founder Mitch Wainer recently interviewed DO CEO Ben Uretsky for an episode of The Deep End podcast. They discussed DO’s humble beginnings and what’s changed for the company over the past six years.

The following excerpts were edited and adapted from the podcast, which you can listen to in full here:

How DigitalOcean Got Its Start

Mitch Wainer: So Ben, why don't you just quickly introduce yourself. You’re the CEO of DigitalOcean, but give a little background on your history.

Ben Uretsky: I was born in Russia, immigrated here when I was five years old with my family, my brother, my mom and my dad, and one of our grandmas as well. I went to school in New York City, graduated from Pace University. I actually managed to start my first company while attending college, so that was great. I built that business over a number of years, and had the pleasure of starting DigitalOcean in the summer of 2011 with four other co-founders, you being one of them. That was definitely a fun journey. We rented a three-bedroom ranch out in Boulder, Colorado.

That was for the Techstars program. What was the most exciting or the most interesting memory that you can share from Techstars? Which memory stands out in your mind?

I'd say demo day. A lot of practice and months of preparation went into getting ready…and there were about a thousand people in the audience. I think it was a high pressure situation because it's investors and people from the community; it's not just a general crowd.

The other event that came to mind the year prior to that, or actually just a few months earlier—the New York Tech Meetup. That was 800 people, but it felt much more supportive because it's the tech community coming out to see a few companies demo and showcase their work, whereas the Techstars demo day, you feel like you're being judged by a thousand people. So that was definitely an intense experience. I remember doing practice sessions with you in the backyard; getting ready for demo day, and you did the Karate Kid on me: “Wax on, wax off.”

Overcoming Challenges

DigitalOcean has grown, not only on the people side, but also on the tech side. We've gone through a lot of different technology challenges and struggles, so I would love for you to talk about some of those struggles and how we overcame those challenges.

Initially, most of the software was actually written by a single person, Jeff Carr, our co-founder. And in those days, the way that we would reference cloud health could be measured in hours. Essentially, how many hours can Jeff stay away from the console before something would break and he would need to get back in there and fix it? The good news is that we applied simplicity to our architecture as well. So we ensured that, no matter what happens, customer Droplets and customer environments wouldn't be affected by most of the outages and most of the service interruptions.

It allowed us to maintain a really high level of uptime and build the performance and reliability that our customers expect, but at the same time, if you're a single person building the system, a lot of difficulties, [and] challenges come up that you may not have foreseen. [And] the product really scaled. Jeff more or less single-handedly got it to nearly 100,000 customers. What you start building day one looks radically different when you have 100,000 users on the service. I'd say that was one challenge.

The second is really as we started to grow: building and engineering team morale into the service and getting people familiar [with] the ins and outs of the systems. And what was really exciting is that first team, one of their main driving objectives was to refactor a lot of the code that Jeff wrote. Turning it from a proof-of-concept into a much more stable and reliable environment, with documentation, with a more modular understanding, and so that kind of speaks to the shift that we're still going through today. Moving away from the monolithic app that was originally built into a more distributed, micro-service enabled architecture. We're making good progress, but with a more scalable service environment comes more complexity. We have to invest a lot more engineers into building new features and capabilities. And so there are trade-offs in each of those scenarios.

It All Comes Down to People

How has the engineering team structure changed throughout the years to support that evolution of our back-end code base and stack?

There are a few interesting mutations along the way: Going from one engineer to 30; bringing in our first set of engineering managers. We really promoted from within our first six. And I think what was really inspiring is a few years ago, we sat down and came up with a mission document, and said, "Okay, if we're gonna scale this to a million customers, and even more revenue, how do we see ourselves getting there?" And everyone contributed towards what their team's mission and objective was.

[For a while] it was more or less a few frontend teams and quite a few backend teams, but nonetheless, that structure held for a couple of years. And prior to that, I feel like we were reworking maybe every other quarter. So that stability allowed us to grow the team, from 30 or 40 people to a little bit over a hundred. Just a few months ago, engineering management along with [the] Product [team] had the opportunity to re-envision a different way to organize the teams, and today, we've moved to a much more vertical structure, building a team around each of the products. We [now] have a team for Droplet, a team for our Storage services, and a team for the Network services. And that's full stack from the frontend, the API, all the way to the backend services. We're in a much more verticalized structure today.

As CEO of the company, what are some of your challenges and what really keeps you up at night?

The interesting thing is that the role has changed year by year, and different challenges come up and are top of mind. I would say the two that I feel are most recurrent [are] related to the people. Whether it's employees or even the senior leadership team, and making sure that you have that right, that everyone's engaged, they're motivated, that you're making the right hiring decisions. That's all pretty complex stuff when we only hired 20 people [at first]. Today, DigitalOcean is roughly 350 people. And as a result, the amount of decisions that you have to make multiplies, and also the effects within the company become that much more complicated. That's always an interesting aspect of the work.
The second challenge that ties very close to that is making sure you paint the right vision for the business, so that people feel like when they come to work, they know what needs to be done. They're in alignment with where the company is headed. And that they're motivated and inspired by what we're trying to build.

So it all comes down to people?

Companies are collections of people first and foremost. They're not the service, they're not the product, it's really people, and once you comprehend that, I think it allows you to take your leadership to the next level.

Hollie Haggans heads up Global Partnerships for DigitalOcean’s Hatch program. She is passionate about startups and cold brew coffee. Get in touch with questions at hatch@digitalocean.com.


Come dine with the KDE e.V. board in Berlin in October!

Published 29 Sep 2017 by eike hein in blogs.kde.org blogs.

KDE Project:

As has become tradition in recent years, the KDE e.V. board will have an open dinner alongside its in-person meeting in Berlin, Germany on October 14th, at 7 PM.

We know there will be a lot of cool people in town next month, thanks to a KDE Edu development sprint, Qt World Summit, the GNOME Foundation hackfest and probably other events, and you're all invited to drop by and have a chat with us and amongst yourselves - and enjoy good food.

We're still picking out a location currently, so if you're interested in attending, please drop me a mail to pre-register and I will (space permitting) confirm and send details soon.


Oil, Love & Oxygen – Album Launch

Published 29 Sep 2017 by Dave Robertson in Dave Robertson.

“Oil, Love & Oxygen” is a collection of songs about kissing, climate change, cult 70s novels and more kissing. Recorded across ten houses and almost as many years, the album is diverse mix of bittersweet indie folk, pop, rock and blues. The Kiss List bring a playful element to Dave Robertson’s songwriting, unique voice and percussive acoustic guitar work. This special launch night also features local music legends Los Porcheros, Dave Johnson, Sian Brown, Rachel Armstrong and Merle Fyshwick.

Tickets $15 through https://www.trybooking.com/SDCA , or on the door if still available

Share


Oil, Love & Oxygen – Album Launch

Published 29 Sep 2017 by Dave Robertson in Dave Robertson.

“Oil, Love & Oxygen” is a collection of songs about kissing, climate change, cult 70s novels and more kissing. Recorded across ten houses and almost as many years, the album is diverse mix of bittersweet indie folk, pop, rock and blues. The Kiss List bring a playful element to Dave Robertson’s songwriting, unique voice and percussive acoustic guitar work. This special launch night also features local music legends Los Porcheros, Dave Johnson, Sian Brown, Rachel Armstrong and Merle Fyshwick.

Tickets $15 through https://www.trybooking.com/SDCA , or on the door if still available

Share


Block Storage Comes to NYC3 and LON1; One More Datacenter on the Way!

Published 27 Sep 2017 by DigitalOcean in The DigitalOcean Blog.

Block Storage Comes to NYC3 and LON1; One More Datacenter on the Way!

Today, we're excited to share that Block Storage is available to Droplets in NYC3 and LON1. With Block Storage, you can scale your storage independently of your compute and have more control over how you grow your infrastructure, enabling you to build and scale larger applications more easily. Block Storage has been a key part of our overall focus on strengthening the foundation of our platform to increase performance and enable our customers to scale.

We've seen incredible engagement since our launch last July. Users have created Block Storage volumes in SFO2, NYC1, FRA1, SGP1, TOR1, and BLR1 to scale databases, take backups, store media, and much more; NYC3 and LON1 are our seventh and eighth datacenters with Block Storage respectively.

As we continue to upgrade and augment our other datacenters, we'll be ensuring that Block Storage is added too. In order to help you plan your deployments, we've finalized the timeline for AMS3. Here is the schedule we're targeting for Block Storage rollout:

Block Storage Comes to NYC3 and LON1; One More Datacenter on the Way!

Inside LON1, our London datacenter region.

Additionally, Kubernetes now offers support for DigitalOcean Block Storage thanks to StackPointCloud. Learn more about it here.

Thanks to everyone who has given us feedback and used Block Storage so far. Please keep it coming. You can create your first Block Storage volume in NYC3 or LON1 today!

Please note: For our NYC3 region, we recommend that you add a volume at the time you create your Droplet to ensure access to Block Storage.

—DigitalOcean Storage Team


v2.4.4

Published 27 Sep 2017 by fabpot in Tags from Twig.


v1.35.0

Published 27 Sep 2017 by fabpot in Tags from Twig.


The first UK AtoM user group meeting

Published 27 Sep 2017 by Jenny Mitcham in Digital Archiving at the University of York.

Yesterday the newly formed UK AtoM user group met for the first time at St John's College Cambridge and I was really pleased that myself and a colleague were able to attend.
Bridge of Sighs in Autumn (photo by Sally-Anne Shearn)

This group has been established to provide the growing UK AtoM community with a much needed forum for exchanging ideas and sharing experiences of using AtoM.

The meeting was attended by about 15 people though we were informed that there are nearly 50 people on the email distribution list. Interest in AtoM is certainly increasing in the UK.

As this was our first meeting, those who had made progress with AtoM were encouraged to give a brief presentation covering the following points:
  1. Where are you with AtoM (investigating, testing, using)?
  2. What do you use it for? (cataloguing, accessions, physical storage locations)
  3. What do you like about it/ what works?
  4. What don’t you like about it/ what doesn’t work?
  5. How do you see AtoM fitting into your wider technical infrastructure? (do you have separate location or accession databases etc?)
  6. What unanswered questions do you have?
It was really interesting to find out how others are using AtoM in the UK. A couple of attendees had already upgraded to the new 2.4 release so that was encouraging to see.

I'm not going to summarise the whole meeting but I made a note of people's likes and dislikes (questions 3 and 4 above). There were some common themes that came up.

Note that most users are still using AtoM 2.2 or 2.3, those who have moved to 2.4 haven't had much chance to explore it yet. It may be that some of these comments are already out of date and fixed in the new release.


What works?


AtoM seems to have lots going for it!

The words 'intuitive', 'user friendly', 'simple', 'clear' and 'flexible' were mentioned several times. One attendee described some user testing she carried out during which she found her users just getting on and using it without any introduction or explanation! Clearly a good sign!

The fact that it was standards compliant was mentioned as well as the fact that consistency was enforced. When moving from unstructured finding aids to AtoM it really does help ensure that the right bits of information are included. The fact that AtoM highlights which mandatory fields are missing at the top of a page is really helpful when checking through your own or others records.

The ability to display digital images was highlighted by others as a key selling point, particularly the browse by digital objects feature.

The way that different bits of the AtoM database interlink was a plus point that was mentioned more than once - this allows you to build up complex interconnecting records using archival descriptions and authority records and these can also be linked to accession records and a physical location.

The locations section of AtoM was thought to be 'a good thing' - for recording information about where in the building each archive is stored. This works well once you get your head around how best to use it.

Integration with Archivematica was mentioned by one user as being a key selling point for them - several people in the room were either using, or thinking of using Archivematica for digital preservation.

The user community itself and the quick and helpful responses to queries posted on the user forum were mentioned by more than one attendee. Also praised was the fact that AtoM is in continuous active development and very much moving in the right direction.


What doesn't work?


Several attendees mentioned the digital object functionality in AtoM. As well as being a clear selling point, it was also highlighted as an area that could be improved. The one-to-one relationship between an archival description and a digital object wasn't thought to be ideal and there was some discussion about linking through to external repositories - it would be nice if items linked in this way could be displayed in the AtoM image carousel even where the url doesn't end in a filename.

The typeahead search suggestions when you enter search terms were not thought to be helpful all of the time. Sometimes the closest matches do not appear in the list of suggested results.

One user mentioned that they would like a publication status that is somewhere in between draft and published. This would be useful for those records that are complete and can be viewed internally by a selected group of users who are logged in but are not available to the wider public.

More than one person mentioned that they would like to see a conservation module in AtoM.

There was some discussion about the lack of an audit trail for descriptions within AtoM. It isn't possible to see who created a record, when it was created and information about updates. This would be really useful for data quality checking, particularly when training new members of staff and volunteers.

Some concerns about scalability were mentioned - particularly for one user with a very large number of records within AtoM - the process of re-indexing AtoM can take three days.

When creating creator or access points, the drop down menu doesn’t display all the options so this causes difficulties when trying to link to the right point or establishing whether the desired record is in the system or not. This can be particularly problematic for common surnames as several different records may exist.

There are some issues with the way authority records are created currently, with no automated way of creating a unique identifier and no ability to keep authority records in draft.

A comment about the lack of auto-save and the issue of the web form timing out and losing all of your work seemed to be a shared concern for many attendees.

Other things that were mentioned included an integration with Active Directory and local workarounds that had to be put in place to make finding aids bi-lingual.


Moving forward


The group agreed that it would be useful to keep a running list of these potential areas of development for AtoM and that perhaps in the future members may be able to collaborate to jointly sponsor work to improve AtoM. This would be a really positive outcome for this new network.

I was also able to present on a recent collaboration to enable OAI-PMH harvesting of EAD from AtoM and use it as an opportunity to try to drum up support for further development of this new feature. I had to try and remember what OAI-PMH stood for and think I got 83% of it right!

Thanks to St John's College Cambridge for hosting. I look forward to our next meeting which we hope to hold here in York in the Spring.

Hacktoberfest 2017: The Countdown Begins!

Published 26 Sep 2017 by Stephanie Morillo in The DigitalOcean Blog.

Hacktoberfest 2017: The Countdown Begins!

Contributors of the world, we’re excited to announce that DigitalOcean’s fourth annual Hacktoberfest officially kicks off on Sunday, October 1. If you’ve been meaning to give back to your favorite open source projects—or if you want to make your first-ever contributions—set aside time this October to start hacking. You can earn a limited-edition Hacktoberfest T-shirt and stickers!

This year, we have resources available on local Hacktoberfest Meetups (and how to start one), finding issues to work on, learning how to contribute to open source, and resources for project maintainers who want to attract participants to their projects. You can find all of these resources and register to participate on the official Hacktoberfest website.

The Details

If you’re wondering what Hacktoberfest is, it’s a month-long celebration of all things open source. Here’s what you need to know:

Over the course of the month, you can find new projects to work on from the Hacktoberfest site. Every time you visit the site, you'll see issues labeled "Hacktoberfest". Additionally, we’ll send registered participants digests with resources and projects that you can look at if you need ideas.

The Fine Print

To get a free T-shirt, you must register and make four pull requests between October 1-31. You can open a PR in any public, GitHub-hosted repo—not just on issues that have been labeled “Hacktoberfest”.

(Please note: Review a project’s Code of Conduct before submitting a PR. If a maintainer reports your PR as spam, or if you violate the project’s Code of Conduct, you will be ineligible to participate in Hacktoberfest.)

Mark Your Calendars

With just four days away until Hacktoberfest 2017 gets underway, take a look at what Hacktoberfest 2016 and Hacktoberfest 2015 looked like.

Have you participated in Hacktoberfest before? If so, share some of your stories or tips for newcomers in the comments below. If you have favorite projects, or if you’re a project maintainer, tell us what projects participants should visit in the comments. And be sure to see what others are saying in the #Hacktoberfest hashtag on your favorite social media platforms!

See you all on October 1!


Announcing DigitalOcean Currents: A Quarterly Report on Developer Cloud Trends

Published 24 Sep 2017 by Ryan Quinn in The DigitalOcean Blog.

Announcing DigitalOcean Currents: A Quarterly Report on Developer Cloud Trends

The landscape developers work in is ever-changing. Keeping up means following numerous press sources, blogs, and social media sites and joining the communities they are involved in. We decided that the best way to truly understand how developers work and how the tools we build help them was to ask—so we did!

DigitalOcean Currents is our inaugural quarterly report where we will share insights into how developers work and how they are affected by the latest tech industry trends. To get the data for this report, we reached out to developers through social media and our newsletter, our community, social news aggregators like Reddit, and more. We collected opinions from more than 1,000 developers across the world and company sectors.

Among the many insights we gained from the survey, we found that developers rely on online documentation and tutorials more than any other method of learning about new technologies. Will this continuing trend mean developers have a wider or narrower base of knowledge (as bite sized pieces of technical content displace lengthy books)?

Despite the tech industry’s important focus on maintaining a good work-life balance, only 12% of the developers we surveyed reported that they put the keyboard away at home; many opt to use their free time to write code for work or for personal projects. While developers are often passionate about their work, this result indicates that developers may be more likely to face burnout even when working for employers who make work-life balance a focus.

Here are other key findings from the first report:

Helpful Companies and Projects Make Developers Successful

52% of respondents said their preferred way of learning about new technologies is through online tutorials, and 28% said official documentation is their preferred way of learning. This appears to indicate that companies who invest in great documentation see a payoff in developer loyalty.

Announcing DigitalOcean Currents: A Quarterly Report on Developer Cloud Trends

Linux Marketshare is More Than Meets the Eye

While recent market share numbers show Linux rising to just over 3% of the desktop market, this number may be misleading. Instead of simply asking our respondents which desktop they used, we asked which operating system environment they spent most of their time using. 39% of respondents reported spending more time in a Linux environment than elsewhere, outpacing both macOS (36%) and Windows (23%).

PHP and MySQL Still Reign Supreme

Despite all the buzz about the latest and greatest languages and frameworks, PHP remains the most popular language among our respondents with MySQL as the most popular database. Meanwhile, Nginx far outpaced Apache as the preferred web server.

Announcing DigitalOcean Currents: A Quarterly Report on Developer Cloud Trends

Vendor Lock-in Does Not Scare Developers

With the rise of SaaS, PaaS, and IaaS over legacy hosting platforms, vendor lock-in could be a concern. But with modern APIs and interoperability either directly or indirectly available through multi-vendor libraries, 77% of our respondents said that they’ve never decided not to use a cloud service for fear of being locked into that vendor’s ecosystem.

Moving to Hybrid and Multi-cloud Isn’t a Given

According to Gartner, 90% of organizations will adopt hybrid infrastructure management by 2020, but the majority of survey respondents said they aren’t planning to use simultaneous cloud services in the next year; only 15% said they would consider their current strategy a hybrid cloud strategy. Just 10% of respondents said they would consider their current strategy multi-cloud, and 70% said they have no plans to implement a multi-cloud strategy in the next year.

The full DigitalOcean Currents (September 2017) report can be found here.

The tech industry moves fast and the cutting edge moves even faster. In order to bring you the most recent information, DigitalOcean Currents will be shared every quarter, highlighting the latest trends among developers.

If you would like to be among the first to receive Currents each quarter, sign up here. You’ll receive the latest report each time it is released and will be among those asked to share your views and experiences.


Moving a proof of concept into production? it's harder than you might think...

Published 20 Sep 2017 by Jenny Mitcham in Digital Archiving at the University of York.

Myself and colleagues blogged a lot during the Filling the Digital Preservation Gap Project but I’m aware that I’ve gone a bit quiet on this topic since…

I was going to wait until we had a big success to announce, but follow on work has taken longer than expected. So in the meantime here is an update on where we are and what we are up to.

Background


Just to re-cap, by the end of phase 3 of Filling the Digital Preservation Gap we had created a working proof of concept at the University of York that demonstrated that it is possible create an automated preservation workflow for research data using PURE, Archivematica, Fedora and Samvera (then called Hydra!).

This is described in our phase 3 project report (and a detailed description of the workflow we were trying to implement was included as an appendix in the phase 2 report).

After the project was over, it was agreed that we should go ahead and move this into production.

Progress has been slower than expected. I hadn’t quite appreciated just how different a proof of concept is to a production-ready environment!

Here are some of the obstacles we have encountered (and in some cases overcome):

Error reporting


One of the key things that we have had to build in to the existing code in order to get it ready for production is error handling.

This was not a priority for the proof of concept. A proof of concept is really designed to demonstrate that something is possible, not to be used in earnest.

If errors happen and things stop working (which they sometimes do) you can just kill it and rebuild.

In a production environment we want to be alerted when something goes wrong so we can work out how to fix it. Alerts and errors are crucial to a system like this.

We are sorting this out by enabling Archivematica's own error handling and error catching within Automation Tools.


What happens when something goes wrong?


...and of course once things have gone wrong in Archivematica and you've fixed the underlying technical issue, you then need to deal with any remaining problems with your information packages in Archivematica.

For example, if the problems have resulted in failed transfers in Archivematica then you need to work out what you are going to do with those failed transfers. Although it is (very) tempting to just clear out Archivematica and start again, colleagues have advised me that it is far more useful to actually try and solve the problems and establish how we might handle a multitude of problematic scenarios if we were in a production environment!

So we now have scenarios in which an automated transfer has failed so in order to get things moving again we need to carry out a manual transfer of the dataset into Archivematica. Will the other parts of our workflow still work if we intervene in this way?

One issue we have encountered along the way is that though our automated transfer uses a specific 'datasets' processing configuration that we have set up within Archivematica, when we push things through manually it uses the 'default' processing configuration which is not what we want.

We are now looking at how we can encourage Archivematica to use the specified processing configuration. As described in the Archivematica documentation, you can do this by including an XML file describing your processing configuration within your transfer.

It is useful to learn lessons like this outside of a production environment!


File size/upload


Although our project recognised that there would be limit to the size of dataset that we could accept and process with our application, we didn't really bottom out what size dataset we intended to support.

It has now been agreed that we should reasonably expect the data deposit form to accept datasets of up to 20 GB in size. Anything larger than this would need to be handed in a different way.

Testing the proof of concept in earnest showed that it was not able to handle datasets of over 1 GB in size. Its primary purpose was to demonstrate the necessary integrations and workflow not to handle larger files.

Additional (and ongoing) work was required to enable the web deposit form to work with larger datasets.


Space


In testing the application we of course ended up trying to push some quite substantial datasets through it.

This was fine until everything abrubtly seemed to stop working!

The problem was actually a fairly simple one but because of our own inexperience with Archivematica it took a while to troubleshoot and get things moving in the right direction again.

It turned out that we hadn’t allocated enough space in one of the bits of filestore that Archivematica uses for failed transfers (/var/archivematica/sharedDirectory/failed). This had filled up and was stopping Archivematica from doing anything else.

Once we knew the cause of the problem the available space was increased but then everything ground to a halt again because we had quickly used that up again ….increasing the space had got things moving but of course while we were trying to demonstrate the fact that it wasn't working, we had deposited several further datasets which were waiting in the transfer directory and quickly blocked things up again.

On a related issue, one of the test datasets I had been using to see how well Research Data York could handle larger datasets consisted of c.5 GB consisting of about 2000 JPEG images. Of course one of the default normalisation tasks in Archivematica is to convert all of these JPEGs to TIFF.

Once this collection of JPEGs were converted to TIFF the size of the dataset increased to around 80 GB. Until I witnessed this it hadn't really occurred to me that this could cause problems.

The solution - allocate Archivematica much more space than you think it will need!

We also now have the filestore set up so that it will inform us when the space in these directories gets to 75% full. Hopefully this will allow us to stop the filestore filling up in the future.


Workflow


The proof of concept did not undergo rigorous testing - it was designed for demonstration purposes only.

During the project we thought long and hard about the deposit, request and preservation workflows that we wanted to support, but we were always aware that once we had it in an environment that we could all play with and test, additional requirements would emerge.

As it happens, we have discovered that the workflow implemented is very true to that described in the appendix of our phase 2 report and does meet our needs. However, there are lots of bits of fine tuning required to enhance the functionality and make the interface more user friendly.

The challenge here is to try to carry out the minimum of work required to turn it into an adequate solution to take into production. There are so many enhancements we could make – I have a wish list as long as my arm – but until we better understand whether a local solution or a shared solution (provided by the Jisc Research Data Shared Service) will be adopted in the future it is not worth trying to make this application perfect.

Making it fit for production is the priority. Bells and whistles can be added later as necessary!





My thanks to all those who have worked on creating, developing, troubleshooting and testing this application and workflow. It couldn't have happened without you!


How do you deal with mass spam on MediaWiki?

Published 19 Sep 2017 by sau226 in Newest questions tagged mediawiki - Webmasters Stack Exchange.

What would be the best way to find a users IP address on MediaWiki if all the connections were proxied through squid proxy server and you have access to all user rights?

I am a steward on a centralauth based wiki and we have lots of spam accounts registering and making 1 spam page each.

Can someone please tell me what the best way to mass block them is as I keep on having to block each user individually and lock their accounts?


HAPPY RETIREMENT, MR GAWLER

Published 18 Sep 2017 by timbaker in Tim Baker.

The author (centre) with Ruth and Ian Gawler Recently a great Australian, a man who has helped thousands of others in their most vulnerable and challenging moments, a Member of the Order of Australia, quietly retired from a long and remarkable career of public service....

Harvesting EAD from AtoM: we need your help!

Published 18 Sep 2017 by Jenny Mitcham in Digital Archiving at the University of York.

Back in February I published a blog post about a project to develop AtoM to allow EAD (Encoded Archival Description) to be harvested via OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting): “Harvesting EAD from AtoM: a collaborative approach

Now that AtoM version 2.4 is released (hooray!), containing the functionality we have sponsored, I thought it was high time I updated you on what has been achieved by this project, where more work is needed and how the wider AtoM community can help.


What was our aim?


Our development work had a few key aims:

  • To enable finding aids from AtoM to be exposed as EAD 2002 XML for others to harvest. The partners who sponsored this project were particularly keen to enable the Archives Hub to harvest their EAD.
  • To change the way that EAD was generated by AtoM in order to make it more scalable. Moving EAD generation from the web browser to the job scheduler was considered to be the best approach here.
  • To make changes to the existing DC (Dublin Core) metadata generation feature so that it also works through the job scheduler - making this existing feature more scalable and able to handle larger quantities of data

A screen shot of the job scheduler in AtoM - showing the EAD and
DC creation jobs that have been completed

What have we achieved?

The good

We believe that the EAD harvesting feature as released in AtoM version 2.4 will enable a harvester such as the Archives Hub to harvest our catalogue metadata from AtoM as EAD. As we add new top level archival descriptions to our catalogue, subsequent harvests should pick up and display these additional records. 

This is a considerable achievement and something that has been on our wishlist for some time. This will allow our finding aids to be more widely signposted. Having our data aggregated and exposed by others is key to ensuring that potential users of our archives can find the information that they need.

Changes have also been made to the way metadata (both EAD and Dublin Core) are generated in AtoM. This means that the solution going forward is more scalable for those AtoM instances that have very large numbers of records or large descriptive hierarchies.

The new functionality in AtoM around OAI-PMH harvesting of EAD and settings for moving XML creation to the job scheduler is described in the AtoM documentation.

The not-so-good

Unfortunately the EAD harvesting functionality within AtoM 2.4 will not do everything we would like it to do. 

It does not at this point include the ability for the harvester to know when metadata records have been updated or deleted. It also does not pick up new child records that are added into an existing descriptive hierarchy. 

We want to be able to edit our records once within AtoM and have any changes reflected in the harvested versions of the data. 

We don’t want our data to become out of sync. 

So clearly this isn't ideal.

The task of enabling full harvesting functionality for EAD was found to be considerably more complex than first anticipated. This has no doubt been confounded by the hierarchical nature of the EAD which differs from the simplicity of the traditional Dublin Core approach.

The problems encountered are certainly not insurmountable, but lack of additional resources and timelines for the release of AtoM 2.4 stopped us from being able to finish off this work in full.

A note on scalability


Although the development work deliberately set out to consider issues of scalability, it turns out that scalability is actually on a sliding scale!

The National Library of Wales had the forethought to include one of their largest archival descriptions as sample data for inclusion in the version of AtoM 2.4 that Artefactual deployed for testing. Their finding aid for St David’s Diocesan Records is a very large descriptive hierarchy consisting of 33,961 individual entries. This pushed the capabilities of EAD creation (even when done via the job scheduler) and also led to discussions with The Archives Hub about exactly how they would process and display such a large description at their end even if EAD generation within AtoM were successful.

Some more thought and more manual workarounds will need to be put in place to manage the harvesting and subsequent display of large descriptions such as these.

So what next?


We are keen to get AtoM 2.4 installed at the Borthwick Institute for Archives over the next couple of months. We are currently on version 2.2 and would like to start benefiting from all the new features that have been introduced available... and of course to test in earnest the EAD harvesting feature that we have jointly sponsored.

We already know that this feature will not fully meet our needs in its current form, but would like to set up an initial harvest with the Archives Hub and further test some of our assumptions about how this will work.

We may need to put some workarounds in place to ensure that we have a way of reflecting updates and deletions in the harvested data – either with manual deletes or updates or a full delete and re-harvest periodically.

Harvesting in AtoM 2.4 - some things that need to change


So we have a list of priority things that need to be improved in order to get EAD harvesting working more smoothly in the future:


In line with the OAI-PMH specification

  • AtoM needs to expose updates to the metadata to the harvester
  • AtoM needs to expose new records (at any level of description) to the harvester
  • AtoM needs to expose information about deletions to the harvester
  • AtoM also needs to expose information about deletions to DC metadata to the harvester (it has come to my attention during the course of this project that this isn’t happening at the moment) 

Some other areas of potential work


I also wanted to bring together and highlight some other areas of potential work for the future. These are all things that were discussed during the course of the project but were not within the scope of our original development goals.

  • Harvesting of EAC (Encoded Archival Context) - this is the metadata standard for authority records. Is this something people would like to see enabled in the future? Of course this is only useful if you have someone who actually wants to harvest this information!
  • On the subject of authority records, it would be useful to change the current AtoM EAD template to use @authfilenumber and @source - so that an EAD record can link back to the relevant authority record in the local AtoM site. The ability to create rich authority records is such a key strength of AtoM, allowing an institution to weave rich interconnecting stories about their holdings. If harvesting doesn’t preserve this inter-connectivity then I think we are missing a trick!
  • EAD3 - this development work has deliberately not touched on the new EAD standard. Firstly, this would have been a much bigger job and secondly, we are looking to have our EAD harvested by The Archives Hub and they are not currently working with EAD3. This may be a priority area of work for the future.
  • Subject source - the subject source (for example "Library of Congress Subject Headings") doesn't appear in AtoM generated EAD at the moment even though it can be entered into AtoM - this would be a really useful addition to the EAD.
  • Visible elements - AtoM allows you to decide which elements you wish to display/hide in your local AtoM interface. With the exception of information relating to physical storage, the XML generation tasks currently do not take account of visible elements and will carry out an export of all fields. Further investigation of this should be carried out in the future. If an institution is using the visible elements feature to hide certain bits of information that should not be more widely distributed, they would be concerned if this information was being harvested and displayed elsewhere. As certain elements will be required in order to create valid EAD, this may get complicated!
  • ‘Manual’ EAD generation - the project team discussed the possibility of adding a button to the AtoM user interface so that staff users can manually kick-off EAD regeneration for a single descriptive hierarchy. Artefactual suggested this as a method of managing the process of EAD generation for large descriptive hierarchies. You would not want the EAD to regenerate with each minor tweak if a large archival description was undergoing several updates, however, you need to be able to trigger this task when you are ready to do so. It should be possible to switch off the automatic EAD re-generation (which normally triggers when a record is edited and saved) but have a button on the interface that staff can click when they want to initiate the process - for example when all edits are complete. 
  • As part of their work on this project, Artefactual created a simple script to help with the process of generating EAD for large descriptive hierarchies - it basically provides a way of finding out which XML files relate to a specific archival description so that EAD can be manually enhanced and updated if it is too large for AtoM to generate via the job scheduler. It would be useful to turn this script into a command-line task that is maintained as part of the AtoM codebase.

We need your help!


Although we believe we have something we can work with here and now, we are not under any illusions that this feature does all that it needs to in order to meet our requirements in the longer term. 

I would love to find out what other AtoM users (and harvesters) think of the feature. Is it useful to you? Are there other things we should put on the wishlist? 

There is a lot of additional work described in this post which the original group of project partners are unlikely to be able to fund on their own. If EAD harvesting is a priority to you and your organisation and you think you can contribute to further work in this area either on your own or as part of a collaborative project please do get in touch.


Thanks


I’d like to finish with a huge thanks to those organisations who have helped make this project happen, either through sponsorship, development or testing and feedback.



Jason Scott Talks His Way Out of It: A Podcast

Published 14 Sep 2017 by Jason Scott in ASCII by Jason Scott.

Next week I start a podcast.

There’s a Patreon for the podcast with more information here.

Let me unpack a little of the thinking.

Through the last seven years, since I moved back to NY, I’ve had pretty variant experiences of debt or huge costs weighing me down. Previously, I was making some serious income from a unix admin job, and my spending was direct but pretty limited. Since then, even with full-time employment (and I mean, seriously, a dream job), I’ve made some grandiose mistakes with taxes, bills and tracking down old obligations that means I have some notable costs floating in the background.

Compound that with a new home I’ve moved to with real landlords that aren’t family and a general desire to clean up my life, and I realized I needed some way to make extra money that will just drop directly into the bill pit, never to really pass into my hands.

How, then, to do this?

I work very long hours for the Internet Archive, and I am making a huge difference in the world working for them. It wouldn’t be right or useful for me to take on any other job. I also don’t want to be doing something like making “stuff” that I sell or otherwise speculate into some market. Leave aside I have these documentaries to finish, and time has to be short.

Then take into account that I can no longer afford to drop money going to anything other than a small handful of conferences that aren’t local to me (the NY-CT-NJ Tri-State area), and that people really like the presentations I give.

So, I thought, how about me giving basically a presentation once a week? What if I recorded me giving a sort of fireside chat or conversational presentation about subjects I would normally give on the road, but make them into a downloadable podcast? Then, I hope, everyone would be happy: fans get a presentation. I get away from begging for money to pay off debts. I get to refine my speaking skills. And maybe the world gets something fun out of the whole deal.

Enter a podcast, funded by a Patreon.

The title: Jason Talks His Way Out of It, my attempt to write down my debts and share the stories and thoughts I have.

I announced the Patreon on my 47th birthday. Within 24 hours, about 100 people had signed up, paying some small amount (or not small, in some cases) for each published episode. I had a goal of $250/episode to make it worthwhile, and we passed that handily. So it’s happening.

I recorded a prototype episode, and that’s up there, and the first episode of the series drops Monday. These are story-based presentations roughly 30 minutes long apiece, and I will continue to do them as long as it makes sense to.

Public speaking is something I’ve done for many, many years, and I enjoy it, and I get comments that people enjoy them very much. My presentation on That Awesome Time I Was Sued for Two Billion Dollars has passed 800,000 views on the various copies online.

I spent $40 improving my sound setup, which should work for the time being. (I already had a nice microphone and a SSD-based laptop which won’t add sound to the room.) I’m going to have a growing list of topics I’ll work from, and I’ll stay in communication with the patrons.

Let’s see what this brings.

One other thing: Moving to the new home means that a lot of quality of life issues have been fixed, and my goal is to really shoot forward finishing those two documentaries I owe people. I want them done as much as everyone else! And with less looming bills and debts in my life, it’ll be all I want to do.

So, back the new podcast if you’d like. It’ll help a lot.


Does Mediawiki encrypt logins by default as the browser sends them to the server?

Published 11 Sep 2017 by user1258361 in Newest questions tagged mediawiki - Server Fault.

Several searches only turned up questions about encrypting login info on the server side. Does Mediawiki encrypt logins after you type them in the browser and send them? (to prevent a man-in-the-middle from reading them in transit and taking over an account)


The Bounty of the Ted Nelson Junk Mail

Published 9 Sep 2017 by Jason Scott in ASCII by Jason Scott.

At the end of May, I mentioned the Ted Nelson Junk Mail project, where a group of people were scanning in boxes of mailings and pamphlets collected by Ted Nelson and putting them on the Internet Archive. Besides the uniqueness of the content, it was also unique in that we were trying to set it up to be self-sustaining from volunteer monetary contributions, and the compensate the scanners doing the work.

This entire endeavor has been wildly successful.

We are well past 18,000 pages scanned. We have taken in thousands in donations. And we now have three people scanning and one person entering metadata.

Here is the spreadsheet with transparency and donation information.

I highly encourage donating.

But let’s talk about how this collection continues to be amazing.

Always, there are the pure visuals. As we’re scanning away, we’re starting to see trends in what we have, and everything seems to go from the early 1960s to the early 1990s, a 30-year scope that encompasses a lot of companies and a lot of industries. These companies are trying to thrive in a whirlpool of competing attention, especially in certain technical fields, and they try everything from humor to class to rudimentary fear-and-uncertainty plays in the art.

These are exquisitely designed brochures, in many cases – obviously done by a firm or with an in-house group specifically tasked with making the best possible paper invitations and with little expense spared. After all, this might be the only customer-facing communication a company could have about its products, and might be the best convincing literature after the salesman has left or the envelope is opened.

Scanning at 600dpi has been a smart move – you can really zoom in and see detail, find lots to play with or study or copy. Everything is at this level, like this detail about a magnetic eraser that lets you see the lettering on the side.

Going after these companies for gender roles or other out-of-fashion jokes almost feels like punching down, but yeah, there’s a lot of it. Women draped over machines, assumptions that women will be doing the typing, and clunky humor about fulfilling your responsibilities as a (male) boss abounds. Cultural norms regarding what fears reigned in business or how companies were expected to keep on top of the latest trends are baked in there too.

The biggest obstacle going forward, besides bringing attention to this work, is going to be one of findability. The collection is not based on some specific subject matter other than what attracted Ted’s attention over the decades. He tripped lightly among aerospace, lab science, computers, electronics, publishing… nothing escaped his grasp, especially in technical fields.

If people are looking for pure aesthetic beauty, that is, “here’s a drawing of something done in a very old way” or “here are old fonts”, then this bounty is already, at 1,700 items, a treasure trove that could absorb weeks of your time. Just clicking around to items that on first blush seem to have boring title pages will often expand into breathtaking works of art and design.

I’m not worried about that part, frankly – these kind of sell themselves.

But there’s so much more to find among these pages, and as we’re now up to so many examples, it’s going to be a challenge to get researching folks to find them.

We have the keywording active, so you can search for terms like monitor, circuit, or hypercard and get more specific matches without concentrating on what the title says or what graphics appear on the front. The Archive has a full-text search, and so people looking for phrases will no doubt stumble into this collection.

But how easily will people even think to know about a wristwatch for the Macintosh from 1990, a closed circuit camera called the Handy Looky..  or this little graphic, nestled away inside a bland software catalog:

…I don’t know. I’ll mention that this is actually twitter-fodder among archivists, who are unhappy when someone is described as “discovering” something in the archives, when it was obvious a person cataloged it and put it there.

But that’s not the case here. Even Kyle, who’s doing the metadata, is doing so in a descriptive fashion, and on a rough day of typing in descriptions, he might not particularly highlight unique gems in the pile (he often does, though). So, if you discover them in there, you really did discover them.

So, the project is deep, delightful, and successful. The main consideration of this is funding; we are paying the scanners $10/hr to scan and the metadata is $15/hr. They work fast and efficiently. We track them on the spreadsheet. But that means a single day of this work can cause a notable bill. We’re asking people on twitter to raise funds, but it never hurts to ask here as well. Consider donating to this project, because we may not know for years how much wonderful history is saved here.

Please share the jewels you find.


Update 1.2.6 released

Published 9 Sep 2017 by Roundcube Webmail Dev Team in Roundcube Webmail Project News.

This is a service and security update to the stable version 1.2. It contains some important bug fixes and improvements which we picked from the upstream branch.

See the full changelog in the release notes on the Github download page.

This release is considered stable and we recommend to update all productive 1.2.x installations of Roundcube with this version. Download it from roundcube.net.

Please do backup your data before updating!


4 Months!

Published 9 Sep 2017 by Jason Scott in ASCII by Jason Scott.

It’s been 4 months since my last post! That’s one busy little Jason summer, to be sure.

Obviously, I’m still around, so no heart attack lingering or problems. My doctor told me that my heart is basically healed, and he wants more exercise out of me. My diet’s continued to be lots of whole foods, leafy greens and occasional shameful treats that don’t turn into a staple.

I spent a good month working with good friends to clear out the famous Information Cube, sorting out and mailing/driving away all the contents to other institutions, including the Internet Archive, the Strong Museum of Play, the Vintage Computer Federation, and parts worldwide.

I’ve moved homes, no longer living with my brother after seven up-and-down years of siblings sharing a house. It was time! We’re probably not permanently scarred! I love him very much. I now live in an apartment with very specific landlords with rules and an important need to pay them on time each and every month.

To that end, I’ve cut back on my expenses and will continue to, so it’s the end of me “just showing up” to pretty much any conferences that I’m not being compensated for, which will of course cut things down in terms of Jason appearances you can find me at.

I’ll still be making appearances as people ask me to go, of course – I love travel. I’m speaking in Amsterdam in October, as well as being an Emcee at the Internet Archive in October as well. So we’ll see how that goes.

What that means is more media ingestion work, and more work on the remaining two documentaries. I’m going to continue my goal of clearing my commitments before long, so I can choose what I do next.

What follows will be (I hope) lots of entries going deep into some subjects and about what I’m working on, and I thank you for your patience as I was not writing weblog entries while upending my entire life.

To the future!


Godless for God’s Sake: Now available for Kindle for just $5.99

Published 6 Sep 2017 by James Riemermann in NontheistFriends.org.

godsake_large

Godless for God’s Sake: Nontheism in Contemporary Quakerism

In this book edited by British Friend and author David Boulton, 27 Quakers from 4 countries and 13 yearly meetings tell how they combine active and committed membership in the Religious Society of Friends with rejection of traditional belief in the existence of a transcendent, personal and supernatural God.

For some, God is no more (but no less) than a symbol of the wholly human values of “mercy, pity, peace and love”. For others, the very idea of God has become an archaism.

Readers who seek a faith free of supernaturalism, whether they are Friends, members of other religious traditions or drop-outs from old-time religion, will find good company among those whose search for an authentic 21st century understanding of religion and spirituality has led them to declare themselves “Godless – for God’s Sake”.

Contents

Preface: In the Beginning…

1. For God’s Sake? An Introduction

 

David Boulton

2. What’s a Nice Nontheist Like You Doing Here?

 

Robin Alpern

3. Something to Declare

 

Philip Gross

4. It’s All in the Numbers

Joan D Lucas

5. Chanticleer’s Call: Religion as a Naturalist Views It

Os Cresson

6. Mystery: It’s What we Don’t Know

James T Dooley Riemermann

7. Living the Questions

Sandy Parker

8. Listening to the Kingdom

Bowen Alpern

9. The Making of a Quaker Nontheist Tradition

David Boulton and Os Cresson

10. Facts and Figures

David Rush

11. This is my Story, This is my Song…


 

Ordering Info

Links to forms for ordering online will be provided here as soon as they are available. In the meantime, contact the organizations listed below, using the book details at the bottom of this page.

QuakerBooks of Friends General Conference

(formerly FGC Bookstore)

1216 Arch St., Ste 2B

Philadelphia, PA 19107

215-561-1700 fax 215-561-0759

http://www.quakerbooks.org/get/333011

(this is the “Universalism” section of Quakerbooks, where the book is currently located)

(this is the “Universalism” section of Quakerbooks, where the book is currently located)

or

The

Quaker Bookshop

173 Euston Rd London NW1 2BJ

020 7663 1030, fax 020 7663 1008 bookshop@quaker.org.uk

 

Those outside the United Kingdom and United States should be able to order through a local bookshop, quoting the publishing details below – particularly the ISBN number. In case of difficulty, the book can be ordered direct from the publisher’s address below.

Title: “Godless for God’s Sake: Nontheism in Contemporary Quakerism” (ed. David Boulton)

Publisher: Dales Historical Monographs, Hobsons Farm, Dent, Cumbria LA10 5RF, UK. Tel 015396 25321. Email davidboulton1@compuserve.com.

Retail price: ?9.50 ($18.50). Prices elsewhere to be calculated on UK price plus postage.

Format: Paperback, full colour cover, 152 pages, A5

ISBN number: 0-9511578-6-8 (to be quoted when ordering from any bookshop in the world)


Konversation 2.x in 2018: New user interface, Matrix support, mobile version

Published 5 Sep 2017 by eike hein in blogs.kde.org blogs.

KDE Project:

It's time to talk about exciting new things in store for the Konversation project!

Konversation is KDE's chat application for communities. No matter whether someone is a newcomer seeking community, a seasoned participant in one, or a community administrator: our mission is to bring groups of people together, allow them to delight in each other's company, and support their pursuit of shared interests and goals.

One of the communities we monitor for changes to your needs is our own: KDE. Few things make a Konversation hacker happier than journeying to an event like Akademy in Almería, Spain and seeing our app run on many screens all around.

The KDE community has recently made progress defining what it wants out of a chat solution in the near future. To us, those initial results align very strongly with Konversation's mission and display a lot of overlap with the things it does well. However, they also highlight trends where the current generation of Konversation falls short, e.g. support for persistence across network jumps, mobile device support and better media/file handling.

This evolution in KDE's needs matches what we're seeing in other communities we cater to. Recently we've started a new development effort to try and answer those needs.

Enter Konversation 2.x

Konversation 2.x R&D mockup screenshot
Obligatory tantilizing sneak preview (click to enlarge)

Konversation 2.x will be deserving of the version bump, revamping the user interface and bringing the application to new platforms. Here's a rundown of our goals:

  • A more modern, cleaner user interface, built using Qt Quick and KDE's Kirigami technology
    • Adopting a responsive window layout, supporting more varied desktop use cases and putting us on a path towards becoming a desktop/mobile convergent application
    • Scaling to more groups with an improved tab switcher featuring better-integrated notifications and mentions
    • Redesigned and thoroughly cleaned-up settings, including often-requested per-tab settings
    • Richer theming, including a night mode and a small selection of popular chat text layouts for different needs
  • Improved media/file handling, including image sharing, a per-tab media gallery, and link previews
  • A reduced resource footprint, using less memory and battery power
  • Support for the Matrix protocol
  • Supporting a KDE-wide Global and Modern Text Input initiative, in particular for emoji input
  • Versions for Plasma Mobile and Android
  • Updating Konversation's web presence

Let's briefly expand on a few of those:

Kirigami

KDE's Kirigami user interface technology helps developers make applications that run well on both desktop and mobile form factors. While still a young project, too, it's already being put to good use in projects such as Peruse, Calligra Gemini, Gwenview, and others. When we tried it out Kirigami quickly proved useful to us as well. We've been enjoying a great working relationship with the Kirigami team, with code flowing both ways. Check it out!

Design process

To craft the new user interface, we're collaborating with KDE's Visual Design Group. Within the KDE community, the VDG itself is a driver of new requirements for chat applications (as their collaboration workflows differ substantially from coding contributors). We've been combining our experience listening to many years of user feedback with their design chops, and this has lead to an array of design mockups we've been working from so far. This is just the beginning, with many, many details left to hammer out together - we're really grateful for the help! :)

Matrix

Currently we're focused on bringing more of the new UI online, proving it on top of our robust IRC backend. However, Matrix support will come next. While we have no plans to drop support for IRC, we feel the Matrix protocol has emerged as a credible alternative that retains many of IRC's best qualities while better supporting modern needs (and bridging to IRC). We're excited about what it will let us do and want to become your Matrix client of choice next year!

Work done so far

The screenshot shown above is sort of a functional R&D mockup of where we're headed with the new interface. It runs, it chats - more on how to try it out in a moment - but it's quite incomplete, wonky, and in a state of flux. Here's a few more demonstrations and explorations of what it can do:

Repsonsive window layout
Responsive window layout: Front-and-center vs. small-and-in-a-corner (click for smoother HD/YouTube)

Toggling settings mode
Friction-free switching to and from settings mode (click for smoother HD/YouTube

Overlay context sidebar
Overlay context sidebar: Tab settings and media gallery will go here (click to enlarge)

See a gallery with an additional screenshot of the settings mode.

Trying it out

The work is being carried out on the wip/qtquick branch of konversation.git. It needs Qt 5.9 and the master branch of kirigami.git to build and run, respectively. We also have a Flatpak nightly package soon on the way, pending sorting out some dependency issues.

Be sure to check out this wiki page with build and testing instructions. You'll learn how to retrieve either the sources or the Flatpak, as well as a number of command line arguments that are key when test-driving.

Sneak preview of great neat-ness: It's possible to toggle between the old and new Konversation UIs at any time using the F10 key. This makes dogfooding at this early stage much more palatable!

Joining the fun

We're just starting out to use this workboard on KDE's Phabricator instance to track and coordinate tasks. Subscribe and participate! Phabricator is also the platform of choice to submit code contributions.

As noted above, Konversation relies on Kirigami and the VDG. Both projects welcome new contributors. Helping them out helps Konversation!

To chat with us, you can stop by the #konversation and #kde-vdg channels on freenode (using IRC or the Matrix bridge). Hop on and introduce yourself!

Side note: The Kirigami team plans to show up in force at the KDE Randa meeting this fall to hack on things the Konversation team is very much interested in, including expanding support for keyboard navigation in Kirigami UI. Check out the Randa fundraising campaign which e.g. enables KDE to bring more devs along, it's really appreciated!


Update 1.3.1 released

Published 3 Sep 2017 by Roundcube Webmail Dev Team in Roundcube Webmail Project News.

We just published the first service release to update the stable version 1.3 which is the result of some touching-up on the new features introduced with the 1.3.0 release. For example it brings back the double-click behavior to open messages which was reduced to the list-only view. Or because the switch to change the mail view layout was a bit hidden, we also added it to the preferences section.

The update also includes fixes to reported bugs and one potential XSS vulnerability as well as optimizations to smoothly run on the latest version of PHP.

See the full changelog in the release notes on the Github download page.

This release is considered stable and we recommend to update all productive installations of Roundcube with this version. Download it from roundcube.net.

Please do backup your data before updating!


MassMessage hits 1,000 commits

Published 28 Aug 2017 by legoktm in The Lego Mirror.

The MassMessage MediaWiki extension hit 1,000 commits today, following an update of the localization messages for the Russian language. MassMessage replaced a Toolserver bot that allowed sending a message to all Wikimedia wikis, by integrating it into MediaWiki and using the job queue. We also added some nice features like input validation and previewing. Through it, I became familiar with different internals of MediaWiki, including submitting a few core patches.

I made my first commit on July 20, 2013. It would get a full rollout to all Wikimedia wikis on November 19, 2013, after a lot of help from MZMcBride, Reedy, Siebrand, Ori, and other MediaWiki developers.

I also mentored User:wctaiwan, who worked on a Google Summer of Code project that added a ContentHandler backend to the extension, to make it easier for people to create and maintain page lists. You can see it used by The Wikipedia Signpost's subscription list.

It's still a bit crazy to think that I've been hacking on MediaWiki for over four years now, and how much it has changed my life in that much time. So here's to the next four years and next 1,000 commits to MassMessage!


Requiring HTTPS for my Toolforge tools

Published 27 Aug 2017 by legoktm in The Lego Mirror.

My Toolforge (formerly "Tool Labs") tools will now start requiring HTTPS, and redirecting any HTTP traffic. It's a little bit of common code for each tool, so I put it in a shared "toolforge" library.

from flask import Flask
import toolforge

app = Flask(__name__)
app.before_request(toolforge.redirect_to_https)

And that's it! Your tool will automatically be HTTPS-only now.

$ curl -I "http://tools.wmflabs.org/mwpackages/"
HTTP/1.1 302 FOUND
Server: nginx/1.11.13
Date: Sat, 26 Aug 2017 07:58:39 GMT
Content-Type: text/html; charset=utf-8
Content-Length: 281
Connection: keep-alive
Location: https://tools.wmflabs.org/mwpackages/
X-Clacks-Overhead: GNU Terry Pratchett

My DebConf 17 presentation - Bringing MediaWiki back into Debian

Published 26 Aug 2017 by legoktm in The Lego Mirror.

Full quality video available on Wikimedia Commons, as well as the slides.

I had a blast attending DebConf '17 in Montreal, and presented about my efforts to bring back MediaWiki into Debian. The talks I went to were all fantastic, and got to meet some amazing people. But the best parts about the conference was the laid-back atmosphere and the food. I've never been to another conference that had food that comes even close to DebConf.

Feeling very motivated, I have three new packages in the pipeline: LuaSandbox, uprightdiff, and libkiwix.

I hope to be at DebConf again next year!


Benchmarking with the NDSA Levels of Preservation

Published 18 Aug 2017 by Jenny Mitcham in Digital Archiving at the University of York.

Anyone who has heard me talk about digital preservation will know that I am a big fan of the NDSA Levels of Preservation.

This is also pretty obvious if you visit me in my office – a print out of the NDSA Levels is pinned to the notice board above my PC monitor!

When talking to students and peers about how to get started in digital preservation in a logical, pragmatic and iterative way, I always recommend using the NDSA Levels to get started. Start at level 1 and move forward to the more advanced levels as and when you are able. This is a much more accessible and simple way to start addressing digital preservation than digesting some of the bigger and more complex certification standards and benchmarking tools.

Over the last few months I have been doing a lot of documentation work. Both ensuring that our digital archiving procedures are written down somewhere and documenting where we are going in the future.

As part of this documentation it seemed like a good idea to use the NDSA Levels:



Previously I have used the NDSA Levels in quite a superficial way – as a guide and a talking point, it has been quite a different exercise actually mapping where we stand.

It was not always straightforward to establish where we are and to unpick and interpret exactly what each level meant in practice. I guess this is one of the problems of using a relatively simple set of metrics to describe what is really quite a complex set of processes.

Without publishing the whole document that I've written on this, here is a summary of where I think we are currently. I'm also including some questions I've been grappling with as part of the process.

Storage and geographic location

Currently at LEVEL 2: 'know your data' with some elements of LEVEL 3 and 4 in place

See the full NDSA levels here


Four years ago we carried out a ‘rescue mission’ to get all digital data in the archives off portable media and on to the digital archive filestore. This now happens as a matter of course when born digital media is received by the archives.

The data isn’t in what I would call a proper digital archive but it is on a fairly well locked down area of University of York filestore.

There are three copies of the data available at any one time (not including the copy that is on original media within the strongrooms). The University stores two copies of the data on spinning disk. One at a data centre on one campus and the other at a data centre on another campus with another copy backed up to tape which is kept for 90 days.

I think I can argue that storage of the data on two different campuses is two different geographic locations but these locations are both in York and only about 1 mile apart. I'm not sure whether they could be described as having different disaster threats so I'm going to hold back from putting us at Level 3 though IT do seem to have systems in place to ensure that filestore is migrated on a regular schedule.

Questions:



File fixity and data integrity

Currently at LEVEL 4: 'repair your data'

See the full NDSA levels here


Having been in this job for five years now I can say with confidence that I have never once received file fixity information alongside data that has been submitted to us. Obviously if I did receive it I would check it on ingest, but I can not envisage this scenario occurring in the near future! I do however create fixity information for all content as part of the ingest process.

I use a tool called Foldermatch to ensure that the digital data I have copied into the archive is identical to the original. Foldermatch allows you to compare the contents of two folders and one of the comparison methods (the one I use at ingest) uses checksums to do this.

Last year I purchased a write blocker for use when working with digital content delivered to us on portable hard drives and memory sticks. A check for viruses is carried out on all content that is ingested into the digital archive so this fulfills the requirements of level 2 and some of level 3.

Despite putting us at Level 4, I am still very keen to improve our processes and procedures around fixity. Fixity checks are carried out at intervals (several times a month) and these checks are logged but at the moment this is all initiated manually. As the digital archive gets bigger, we will need to re-think our approaches to this important area and find solutions that are scalable.

Questions:




Information Security

Currently at LEVEL 2: 'know your data' with some elements of LEVEL 3 in place

See the full NDSA levels here


Access to the digital archive filestore is limited to the digital archivist and IT staff who administer the filestore. If staff or others need to see copies of data within the digital archive filestore, copies are made elsewhere after appropriate checks are made regarding access permissions. The master copy is always kept on the digital archive filestore to ensure that the authentic original version of the data is maintained. Access restrictions are documented.

We are also moving towards the higher levels here. A recent issue reported on a mysterious change of last modified dates for .eml files has led to discussions with colleagues in IT, and I have been informed that an operating system upgrade for the server should include the ability to provide logs of who has done what to files in the archive.

It is worth pointing out that as I don't currently have systems in place for recording PREMIS (preservation) metadata. I am currently taking a hands off approach to preservation planning within the digital archive. Preservation actions such as file migration are few and far between and are recorded in a temporary way until a more robust system is established.


Metadata

Currently at LEVEL 3: 'monitor your data'

See the full NDSA levels here


We do OK with metadata currently, (considering a full preservation system is not yet in place). Using DROID at ingest is helpful at fulfilling some of the requirements of levels 1 to 3 (essentially, having a record of what was received and where it is).

Our implementation of AtoM as our archival management system has helped fulfil some of the other metadata requirements. It gives us a place to store administrative metadata (who gave us it and when) as well as providing a platform to surface descriptive metadata about the digital archives that we hold.

Whether we actually have descriptive metadata or not for digital archives will remain an issue. Much metadata for the digital archive can be generated automatically but descriptive metadata isn't quite as straightforward. In some cases a basic listing is created for files within the digital archive (using Dublin Core as a framework) but this will not happen in all cases. Descriptive metadata typically will not be created until an archive is catalogued which may come at a later date.

Our plans to implement Archivematica next year will help us get to Level 4 as this will create full preservation metadata for us as PREMIS.

Questions:




File formats

Currently at LEVEL 2: 'know your data' with some elements of LEVEL 3 in place

See the full NDSA levels here


It took me a while to convince myself that we fulfilled Level 1 here! This is a pretty hard one to crack, especially if you have lots of different archives coming in from different sources, and sometimes with little notice. I think it is useful that the requirement at this level is prefaced with "When you can..."!

Thinking about it, we do do some work in this area - for example:

To get us to Level 2, as part of the ingest process we run DROID to get a list of file formats included within a digital archive. Summary stats are kept within a spreadsheet that covers all content within the digital archive so we can quickly see the range of formats that we hold and find out which archives they are in.

This should allow us to move towards Level 3 but we are not there yet. Some pretty informal and fairly ad hoc thinking goes into  file format obsolescence but I won't go as far as saying that we 'monitor' it. I have an awareness of some specific areas of concern in terms of obsolete files (for example I've still got those WordStar 4.0 files and I really do want to do something with them!) but there are no doubt other formats that need attention that haven't hit my radar yet.

As mentioned earlier, we are not really doing migration right now - not until I have a better system for creating the PREMIS metadata, so Level 4 is still out of reach.

Questions:




Conclusions

This has been a useful exercise and it is good to see where we need to progress. Going from using the Levels in the abstract and actually trying to apply them as a tool has been a bit challenging in some areas. I think additional information and examples would be useful to help clear up some of the questions that I have raised.

I've also found that even where we meet a level there is often other ways we could do things better. File fixity and data integrity looks like a strong area for us but I am all too aware that I would like to find a more sustainable and scalable way to do this. This is something we'll be working on as we get Archivematica in place. Reaching Level 4 shouldn't lead to complacency!

An interesting blog post last year by Shira Peltzman from the UCLA Library talked about Expanding the NDSA Levels of Preservation to include an additional row focused on Access. This seems sensible given that the ability to provide access is the reason why we preserve archives. I would be keen to see this developed further so long as the bar wasn't set too high. At the Borthwick my initial consideration has been preservation - getting the stuff and keeping it safe - but access is something that will be addressed over the next couple of years as we move forward with our plans for Archivematica and AtoM.

Has anyone else assessed themselves against the NDSA Levels?  I would be keen to see how others have interpreted the requirements.







Botanical Wonderland events

Published 18 Aug 2017 by carinamm in State Library of Western Australia Blog.

From pressed seaweed, to wildflower painting, embroidery, to photography – botanical wonders have inspired and defined Western Australia. Hear from art historian, author, artist and curator Dr Dorothy Erickson in two events at the State Library of Western Australia.

WA wildflowers 17.jpg

Lecture: Professional women Artists in the Wildflower State by Dr Dorothy Erickson
Wednesday 23 August 2017 – 5:00-6:00 pm
Great Southern Room – State Library of Western Australia
Free. No bookings required

The first profession acceptable to be practiced by Middle Class women was as an Artist. They were the ‘Angels in the Studio’ at the time when gold was first being found in Western Australia. While a few Western Australian born were trained artists many others came in the wake of the gold rushes when Western Australia was the world’s El Dorado. A number were entranced by the unique wildflowers and made this the mainstay of their careers. This talk will focus on the professional women artists in Western Australia from 1890 to WWI with particular attention to the those who painted our unique botanical wonderland.

L W Greaves_CROP

Lilian Wooster Greaves was a prolific Western Australian wildflower artist , “no one else seems to be able to equal her skill in pressing and mounting wildflower specimens, in the form of panels, cards and booklets – The West Australian 21 May 1927. Portrait of Lilian Wooster Greaves Out of Doors in WA, 1927, State Library of Western Australia 821A(W)GRE.


Floor Talk on Botanical Wonderland exhibition with Dr Dorothy Erickson
Friday 1 September 2017  – 1:00-1:30 pm
The Nook – State Library of Western Australia
Free. No bookings required.

Be inspired by the botanical wonders of Western Australia as Australian artist Dr Dorothy Erickson discusses some of the marvels on display in the exhibition.

Nature's Showground 1940_001

Nature’s Showground, 1940. The Western Mail, State Library of Western Australia, 630.5WES.

Botanical Wonderland is a partnership between the Royal Western Australian Historical Society, the Western Australian Museum and the State Library of Western Australia. The exhibition is on display at the State Library until 24 September 2017.

Image: Acc 9131A/4: Lilian Wooster Greaves, pressed wildflower artwork, ‘Westralia’s Wonderful Wildflowers’, c1929



Filed under: community events, Exhibitions, Illustration, SLWA collections, SLWA displays, SLWA events, SLWA Exhibitions, SLWA news, State Library of Western Australia, WA history, Western Australia Tagged: botanical wonderland, Botanical Wonderland Events, Dr Dorothy Erickson, Royal WA Historical Society, State Library of Western Australia, WA Museum, Western Australian Museum, Wildflowers WA, Wildflowers Western Australia

Running applications and unittests without "make install"

Published 15 Aug 2017 by dfaure in blogs.kde.org blogs.

KDE Project:

In our Akademy presentation, Kévin and I showed the importance for a better developer story to be able to work on a KDE module without having to install it. Running unittests and running applications without installing the module at all is possible, it turns out, it just needs a bit of effort to set things up correctly.

Once you require ECM version 5.38 (using find_package(ECM 5.38)), your libraries, plugins and executables will all go to the builddir's "bin" directory, instead of being built in the builddir where they are defined.
Remember to wipe out your builddir first, to avoid running outdated unit tests!
This change helps locating helper binaries, and plugins (depending on how they are loaded).

After doing that, see if this works:

  • make uninstall
  • ctest . (or run the application)

Oops, usually it doesn't work. Here's what you might have to do to fix things.

  • XMLGUI files: since KDE Frameworks 5.4, they can be embedded into a qrc file so that they can be found without being installed.
    The qrc should put the xmlgui file under ":/kxmlgui5/". You can use the script kde-dev-scripts/kf5/bundle_data_files.pl to automate most of this change.
  • Uninstalled plugins can be found at runtime if they are installed into the same subdir of the "bin" dir as they will be in their final destination. For instance, the cmake line install(TARGETS kio_file DESTINATION ${KDE_INSTALL_PLUGINDIR}/kf5/kio) indicates that you want the uninstalled plugin to be in builddir/bin/kf5/kio, which can be done with the following line:
    set_target_properties(kio_file PROPERTIES LIBRARY_OUTPUT_DIRECTORY "${CMAKE_BINARY_DIR}/bin/kf5/kio")
    Qt uses the executable's current directory as one of the search paths for plugins, so this then works out of the box.
  • If ctest complains that it can't find the unittest executable, the fix is very simple: instead of the old syntax add_test(testname myexec) you want to use the newer syntax add_test(NAME testname COMMAND myexec)
  • Helper binaries for libraries: look for them locally first. Example from KIO:
    QString kioexec = QCoreApplication::applicationDirPath() + "/kioexec";
    if (!QFileInfo::exists(kioexec))
        kioexec = CMAKE_INSTALL_FULL_LIBEXECDIR_KF5 "/kioexec"; // this was the original line of code
    
  • Helper binaries for unittests: an easy solution is to just change the current directory to the bin dir, so that ./myhelper continues to work. This can be done with QDir::setCurrent(QCoreApplication::applicationDirPath());

There are two issues I didn't solve yet: trader queries that should find uninstalled desktop files, and QML components, like in kirigami. It seems that the only solution for the latter is to reorganize the source dir to have the expected layout "org/kde/kirigami.2/*"?

Update: this howto is now a wiki page.


Archival software survey

Published 8 Aug 2017 by inthemailbox in In the mailbox.

A few months ago, I asked my colleagues in the Archives Live Archives and recordkeeping software group to undertake a short survey for me, looking at archival description and management systems in use in Australia. I used the free SurveyMonkey site (ten simple questions) and promoted the survey on the Archives Live site and via my personal twitter account. I got 39 responses from a possible pool of 230 members, in a four week period.

The majority of respondents worked in a combination archive, taking both transfers from inhouse records creators as well as accepting donations or purchasing material for their collections (58.97%).  Small archives, with 2-4 staff (qualifications not specified), were slightly ahead of lone arrangers (48.7% and 30.7%). 11 were school archives and 7 from universities. There was a smattering of religious institutions, local council collections and government institutions, plus a couple of companies who held archives of their business.

Most archivists said they could use excel and word (92%), so it is not surprising that 25.6% of them created finding aids and archival aids using word documents and spreadsheets. However, the majority of finding aids are created using online systems and archive management software.

Software identified in responses to the survey included:

Both Tabularium and Archive Manager were created here in Australia and have good compliance with the Australian series system.   Tabularium was created by David Roberts and distributed by State Records NSW; however, it is no longer maintained. Archive Manager was created for use with Windows PCs, and has recently been sold to the UK.

In looking at new software requirements, respondents expressed a remarkable degree of frustration with old, clunky software which was not properly maintained or could not be easily updated either by themselves or by a provider. Ease of use, the ability to make collection content available online, integrate digital components and work with an EDRMS or other records management system were all identified as something for the modern archival management system. Concerns were raised about making  donor and other personal and confidential information available, so some degree of authority control and viewing permissions was also required.

Whether one system can meet all these requirements is yet to be seen. It may be better to focus on a range of systems that have some degree of interoperability and on standards for transferring data from one to the other. Either way, archivists in Australia are eager and ready to embrace new ways of working and for a new generation of archival software.

 

 



The mysterious case of the changed last modified dates

Published 31 Jul 2017 by Jenny Mitcham in Digital Archiving at the University of York.

Today's blog post is effectively a mystery story.

Like any good story it has a beginning (the problem is discovered, the digital archive is temporarily thrown into chaos), a middle (attempts are made to solve the mystery and make things better, several different avenues are explored) and an end (the digital preservation community come to my aid).

This story has a happy ending (hooray) but also includes some food for thought (all the best stories do) and as always I'd be very pleased to hear what you think.

The beginning

I have probably mentioned before that I don't have a full digital archive in place just yet. While I work towards a bigger and better solution, I have a set of temporary procedures in place to ingest digital archives on to what is effectively a piece of locked down university filestore. The procedures and workflows are both 'better than nothing' and 'good enough' as a temporary measure and actually appear to take us pretty much up to Level 2 of the NDSA Levels of Preservation (and beyond in some places).

One of the ways I ensure that all is well in the little bit of filestore that I call 'The Digital Archive' is to run frequent integrity checks over the data, using a free checksum utility. Checksums (effectively unique digital fingerprints) for each file in the digital archive are created when content is ingested and these are checked periodically to ensure that nothing has changed. IT keep back-ups of the filestore for a period of three months, so as long as this integrity checking happens within this three month period (in reality I actually do this 3 or 4 times a month) then problems can be rectified and digital preservation nirvana can be seamlessly restored.

Checksum checking is normally quite dull. Thankfully it is an automated process that runs in the background and I can just get on with my work and cheer when I get a notification that tells me all is well. Generally all is well, it is very rare that any errors are highlighted - when that happens I blog about it!

I have perhaps naively believed for some time that I'm doing everything I need to do to keep those files safe and unchanged because if the checksum is the same then all is well, however this month I encountered a problem...

I've been doing some tidying of the digital archive structure and alongside this have been gathering a bit of data about the archives, specifically looking at things like file formats, number of unidentified files and last modified dates.

Whilst doing this I noticed that one of the archives that I had received in 2013 contained 26 files with a last modified date of 18th January 2017 at 09:53. How could this be so if I have been looking after these files carefully and the checksums are the same as they were when the files were deposited?

The 26 files were all EML files - email messages exported from Microsoft Outlook. These were the only EML files within the whole digital archive. The files weren't all in the same directory and other files sitting in those directories retained their original last modified dates.

The middle

So this was all a bit strange...and worrying too. Am I doing my job properly? Is this something I should be bringing to the supportive environment of the DPC's Fail Club?

The last modified dates of files are important to us as digital archivists. This is part of the metadata that comes with a file. It tells us something about the file. If we lose this date are we losing a little piece of the authentic digital object that we are trying to preserve?

Instead of beating myself up about it I wanted to do three things:

  1. Solve the mystery (find out what happened and why)
  2. See if I could fix it
  3. Stop it happening again
So how could it have happened? Has someone tampered with these 26 files? Perhaps unlikely considering they all have the exact same date/time stamp which to me suggests a more automated process. Also, the digital archive isn't widely accessible. Quite deliberately it is only really me (and the filestore administrators) who have access.

I asked IT whether they could explain it. Had some process been carried out across all filestores that involved EML files specifically? They couldn't think of a reason why this may have occurred. They also confirmed my suspicions that we have no backups of the files with the original last modified dates.

I spoke to a digital forensics expert from the Computer Science department and he said he could analyse the files for me and see if he could work out what had acted on them and also suggest a methodology of restoring the dates.

I have a record of the last modified dates of these 26 files when they arrived - the checksum tool that I use writes the last modified date to the hash file it creates. I wondered whether manually changing the last modified dates back to what they were originally was the right thing to do or whether I should just accept and record the change.

...but I decided to sit on it until I understood the problem better.

The end

I threw the question out to the digital preservation community on Twitter and as usual I was not disappointed!




In fact, along with a whole load of discussion and debate, Andy Jackson was able to track down what appears to be the cause of the problem.


He very helpfully pointed me to a thread on StackExchange which described the issue I was seeing.

It was a great comfort to discover that the cause of this problem was apparently a bug and not something more sinister. It appears I am not alone!

...but what now?

So I now I think I know what caused the problem but questions remain around how to catch issues like this more quickly (not six months after it has happened) and what to do with the files themselves.

IT have mentioned to me that an OS upgrade may provide us with better auditing support on the filestore. Being able to view reports on changes made to digital objects within the digital archive would be potentially very useful (though perhaps even that wouldn't have picked up this Windows bug?). I'm also exploring whether I can make particular directories read only and whether that would stop issues such as this occurring in the future.

If anyone knows of any other tools that can help, please let me know.

The other decision to make is what to do with the files themselves. Should I try and fix them? More interesting debate on Twitter on this topic and even on the value of these dates in the first place. If we can fudge them then so can others - they may have already been fudged before they got to the digital archive - in which case, how much value do they really have?


So should we try and fix last modified dates or should we focus our attention on capturing and storing them within the metadata. The later may be a more sustainable solution in the longer term, given their slightly slippery nature!

I know there are lots of people interested in this topic - just see this recent blog post by Sarah Mason and in particular the comments - When was that?: Maintaining or changing ‘created’ and ‘last modified’ dates. It is great that we are talking about real nuts and bolts of digital preservation and that there are so many people willing to share their thoughts with the community.

...and perhaps if you have EML files in your digital archive you should check them too!



Roundup: Welcome, on news, bad tools and great tools

Published 28 Jul 2017 by Carlos Fenollosa in Carlos Fenollosa — Blog.

I'm starting a series of posts with a summary of the most interesting links I found. The concept of "social bookmarks" has always been interesting, but no implementation is perfect. del.icio.us was probably the closest to a good enough service, but in the end, we all just post them to Twitter and Facebook for shares and likes.

Unfortunately, Twitter search sucks, and browser bookmarks rot quickly. That's why I'm trying this new model of social + local, not only for my readers but also for myself. Furthermore, writing a tapas-sized post is much faster than a well-thought one.

Hopefully, forcing myself to post periodically —no promises, though— will encourage me to write regular articles sometimes.

Anyway, these posts will try to organize links I post on my Twitter account and provide a bit more of context.

While other friends publish newsletters, I still believe RSS can work well, so subscribe to the RSS if you want to get these updates. Another option is to use some of the services which deliver feeds by email, like Feenbox which, by the way may never leave alpha, so drop me an email if you want an invitation.

Nostalgia

RTVE, the Spanish public TV, has uploaded a few Bit a bit episodes. It was a rad early-90s show that presented video games and the early Internet.

On news

I quit reading news 3 years ago. A recent article from Tobias Rose-Stockwell digs deep into how your fear and outrage are being sold for profit by the Media.

@xurxof recommended a 2012 article from Rolf Dobelli, Avoid News. Towards a Healthy News Diet

LTE > Fiber

I was having router issues and realized how my cellphone internet is sometimes more reliable than my home fiber.

It seems to be more common than you'd think, read the Twitter replies! XKCD also recently posted a comic on this

Journaling

There was a discussion on Lobste.rs on tools to journal your workday, which was one of the reasons that led me to try out these roundup posts.

New keyboard

I bought a Matias Clicky mechanical keyboard which sounds like a minigun. For all those interested in mechanical keyboards, you must watch Thomas's Youtube channel

The new board doesn't have a nav cluster, so I configured Ctrl-HJKL to be the arrow keys. It gets a few days to get used to, but since then, I've been using that combination even when I'm using a keyboard with arrow keys.

Slack eats CPU cycles

Slack was eating a fair amount of my CPU while my laptop was trying to build a Docker image and sync 3000 files on Dropbox. Matthew O'Riordan also wrote Where’s all my CPU and memory gone? The answer: Slack

Focus, focus, focus!

I'm a brain.fm subscriber and use it regularly, especially when I'm working on the train or in a busy cafe.

musicForProgramming() is a free resource with a variety of music and also provides a podcast feed for updates.

Tags: roundup

Comments? Tweet  


My letter to the Boy Scouts of America

Published 25 Jul 2017 by legoktm in The Lego Mirror.

The following is a letter I just mailed to the Boy Scouts of America, following President Donald Trump's speech at the National Jamboree. I implore my fellow scouts to also contact the BSA to express their feelings.

25 July 2017

Boy Scouts of America
PO Box 152079
Irving, TX
75015-2079

Dear Boy Scouts of America,

Like many others I was extremely disappointed and disgusted to hear about the contents of President Donald Trump’s speech to the National Jamboree. Politics aside, I have no qualms with inviting the president, or having him speak to scouts. I was glad that some of the Eagle Scouts currently serving at high levels of our government were recognized for their accomplishments.

However above all, the Boy Scouts of America must adhere to the values of the Scout Law, and it was plainly obvious that the president’s speech did not. Insulting opponents is not “kindness”. Threatening to fire a colleague is not “loyal”. Encouraging boos of a former President is not “courteous”. Talking about fake news and media is not “trustworthy”. At the end of the day, the values of the Scout Law are the most important lesson we must instill in our youth – and President Trump showed the opposite.

The Boy Scouts of America must send a strong message to the public, and most importantly the young scouts that were present, that the president’s speech was not acceptable and does not embody the principles of the Boy Scouts of America.

I will continue to speak well of scouting and the program to all, but incidents like this will only harm future boys who will be dissuaded from joining the organization in the first place.

Sincerely,
Kunal Mehta
Eagle Scout, 2012
Troop 294
San Jose, CA


How do I get my MediaWiki site to use templates? [closed]

Published 21 Jul 2017 by Cyberherbalist in Newest questions tagged mediawiki - Webmasters Stack Exchange.

My MediaWiki site is currently using v1.24.4.

I don't seem to have many templates installed, and some very important ones seem to be missing. For example, I can't use the Reference List template. If I do put references in an article, with {{reflist}} at the bottom, the template comes across as a redlink:

Template:Reflist

Are templates something that have to be installed separately? And if so, how do I go about it.

My site is hosted by DreamHost.


Building the Lego Saturn V rocket 48 years after the moon landing

Published 20 Jul 2017 by legoktm in The Lego Mirror.

Full quality video available on Wikimedia Commons.

On this day 48 years ago, three astronauts landed on the moon after flying there in a Saturn V rocket.

Today I spent four hours building the Lego Saturn V rocket - the largest Lego model I've ever built. Throughout the process I was constantly impressed with the design of the rocket, and how it all came together. The attention paid to the little details is outstanding, and made it such a rewarding experience. If you can find a place that has them in stock, get one. It's entirely worth it.

The rocket is designed to be separated into the individual stages, and the lander actually fits inside the rocket. Vertically, it's 3ft, and comes with three stands so you can show it off horizontally.

As a side project, I also created a timelapse of the entire build, using some pretty cool tools. After searching online how to have my DSLR take photos on a set interval and being frustrated with all the examples that used a TI-84 calculator, I stumbled upon gphoto2, which lets you control digital cameras. I ended up using a command as simple as gphoto2 --capture-image-and-download -I 30 to have it take and save photos every 30 seconds. The only negative part is that it absolutely killed the camera's battery, and within an hour I needed to switch the battery.

To stitch the photos together (after renaming them a bit), ffmpeg came to the rescue: ffmpeg -r 20 -i "%04d.jpg" -s hd1080 -vcodec libx264 time-lapse.mp4. Pretty simple in the end!


Song Club Showcase

Published 14 Jul 2017 by Dave Robertson in Dave Robertson.

While the finishing touches are being put on the album, I’m going solo with other Freo songwriter’s at the Fib.

Share


Song Club Showcase

Published 14 Jul 2017 by Dave Robertson in Dave Robertson.

While the finishing touches are being put on the album, I’m going solo with other Freo songwriter’s at the Fib.

Share


Wikidata Map July 2017

Published 11 Jul 2017 by addshore in Addshore.

It’s been 9 months since my last Wikidata map update and once again we have many new noticable areas appearing, including Norway, South Africa, Peru and New Zealand to name but a few.  As with the last map generation post I once again created a diff image so that the areas of change are easily identifiable comparing the data from July 2017 with that from my last post on October 2016.

The various sizes of the generated maps can be found on Wikimedia Commons:

Reasons for increases

If you want to have a shot at figuring out the cause of the increases in specific areas then take a look at my method described in the last post using the Wikidata Query Service.

Peoples discoveries so far:

I haven’t included the names of those that discovered reasons for areas of increase above, but if you find your discovery here and want credit just ask!


Preserving Google docs - decisions and a way forward

Published 7 Jul 2017 by Jenny Mitcham in Digital Archiving at the University of York.

Back in April I blogged about some work I had been doing around finding a suitable export (and ultimately preservation) format for Google documents.

This post has generated a lot of interest and I've had some great comments both on the post itself and via Twitter.

I was also able to take advantage of a slot I had been given at last week's Jisc Research Data Network event to introduce the issue to the audience (who had really come to hear me talk about something else but I don't think they minded).

There were lots of questions and discussion at the end of this session, mostly focused on the Google Drive issue rather than the rest of the talk. I was really pleased to see that the topic had made people think. In a lightening talk later that day, William Kilbride, Executive Director of The Digital Preservation Coalition mused on the subject of "What is data?". Google Drive was one of the examples he used, asking where does the data end and the software application start?

I just wanted to write a quick update on a couple of things - decisions that have been made as a result of this work and attempts to move the issue forward.

Decisions decisions

I took a summary of the Google docs data export work to my colleagues in a Research Data Management meeting last month in order to discuss a practical way forward for the institutional research data we are planning on capturing and preserving.

One element of the Proof of Concept that we had established at the end of phase 3 of Filling the Digital Preservation Gap was a deposit form to allow researchers to deposit data to the Research Data York service.

As well as the ability to enable researchers to browse and select a file or a folder on their computer or network, this deposit form also included a button to allow deposit to be carried out via Google Drive.

As I mentioned in a previous post, Google Drive is widely used at our institution. It is clear that many researchers are using Google Drive to collect, create and analyse their research data so it made sense to provide an easy way for them to deposit direct from Google Drive. I just needed to check out the export options and decide which one we should support as part of this automated export.

However, given the inconclusive findings of my research into export options it didn't seem that there was one clear option that adequately preserved the data.

As a group we decided the best way out of this imperfect situation was to ask researchers to export their own data from Google Drive in whatever format they consider best captures the significant properties of the item. By exporting themselves in a manual fashion prior to upload, this does give them the opportunity to review and check their files and make their own decision on issues such as whether comments are included in the version of their data that they upload to Research Data York.

So for the time being we are disabling the Google Drive upload button from our data deposit interface....which is a shame because a certain amount of effort went into getting that working in the first place.

This is the right decision for the time being though. Two things need to happen before we can make this available again:


  1. Understanding the use case - We need to gain a greater understanding of how researchers use Google Drive and what they consider to be 'significant' about their native Google Drive files.
  2. Improving the technology - We need to make some requests to Google to make the export options better.


Understanding the use case

We've known for a while that some researchers use Google Drive to store their research data. The graphic below was taken from a survey we carried out with researchers in 2013 to find out about current practice across the institution. 

Of the 188 researchers who answered the question "Where is your digital research data stored (excluding back up copies)?" 22 mentioned Google Drive. This is only around 12% of respondents but I would speculate that over the last four years, use of Google Drive will have increased considerably as Google applications have become more embedded within the working practices of staff and students at the University.

Where is your digital research data stored (excluding back up copies)?

To understand the Google Drive use case today I really need to talk to researchers.

We've run a couple of Research Data Management teaching sessions over the last term. These sessions are typically attended by PhD students but occasionally a member of research staff also comes along. When we talk about data storage I've been asking the researchers to give a show of hands as to who is using Google Drive to store at least some of their research data.

About half of the researchers in the room raise their hand.

So this is a real issue. 

Of course what I'd like to do is find out exactly how they are using it. Whether they are creating native Google Drive files or just using Google Drive as a storage location or filing system for data that they create in another application.

I did manage to get a bit more detail from one researcher who said that they used Google Drive as a way of collaborating on their research with colleagues working at another institution but that once a document has been completed they will export the data out of Google Drive for storage elsewhere. 

This fits well with the solution described above.

I also arranged a meeting with a Researcher in our BioArCh department. Professor Matthew Collins is known to be an enthusiastic user of Google Drive.

Talking to Matthew gave me a really interesting perspective on Google Drive. For him it has become an essential research tool. He and his colleagues use many of the features of the Google Suite of tools for their day to day work and as a means to collaborate and share ideas and resources, both internally and with researchers in other institutions. He showed me PaperPile, an extension to Google Drive that I had not been aware of. He uses this to manage his references and share them with colleagues. This clearly adds huge value to the Google Drive suite for researchers.

He talked me through a few scenarios of how they use Google - some, (such as the comments facility) I was very much aware of. Others, I've not used myself such as the use of the Google APIs to visualise for example activity on preparing a report in Google Drive - showing a time line and when different individuals edited the document. Now that looks like fun!

He also talked about the importance of the 'previous versions' information that is stored within a native Google Drive file. When working collaboratively it can be useful to be able to track back and see who edited what and when. 

He described a real scenario in which he had had to go back to a previous version of a Google Sheet to show exactly when a particular piece of data had been entered. I hadn't considered that the previous versions feature could be used to demonstrate that you made a particular discovery first. Potentially quite important in the competitive world of academic research.

For this reason Matthew considered the native Google Drive file itself to be "the ultimate archive" and "a virtual collaborative lab notebook". A flat, static export of the data would not be an adequate replacement.

He did however acknowledge that the data can only exist for as long as Google provides us with the facility and that there are situations where it is a good idea to take a static back up copy.

He mentioned that the precursor to Google Docs was a product called Writely (which he was also an early adopter of). Google bought Writely in 2006 after seeing the huge potential in this online word processing tool. Matthew commented that backwards compatibility became a problem when Google started making some fundamental changes to the way the application worked. This is perhaps the issue that is being described in this blog post: Google Docs and Backwards Compatibility.

So, I'm still convinced that even if we can't preserve a native Google Drive file perfectly in a static form, this shouldn't stop us having a go!

Improving the technology

Along side trying to understand how researchers use Google Drive and what they consider to be significant and worthy of preservation, I have also been making some requests and suggestions to Google around their export options. There are a few ideas I've noted that would make it easier for us to archive the data.

I contacted the Google Drive forum and was told that as a Google customer I was able to log in and add my suggestions to Google Cloud Connect so this I did...and what I asked for was as follows:

  • Please can we have a PDF/A export option?
  • Please could we choose whether or not to export comments or not ...and if we are exporting comments can we choose whether historic/resolved comments are also exported
  • Please can metadata be retained - specifically the created and last modified dates. (Author is a bit trickier - in Google Drive a document has an owner rather than an author. The owner probably is the author (or one of them) but not necessarily if ownership has been transferred).
  • I also mentioned a little bug relating to comment dates that I found when exporting a Google document containing comments out into docx format and then importing it back again.
Since I submitted these feature requests and comments in early May it has all gone very very quiet...

I have a feeling that ideas only get anywhere if they are popular ...and none of my ideas are popular ...because they do not lead to new and shiny functionality.

Only one of my suggestions (re comments) has received a vote by another member of the community.

So, what to do?

Luckily, since having spoken about my problem at the Jisc Research Data Network, two people have mentioned they have Google contacts who might be interested in hearing my ideas.

I'd like to follow up on this, but in the meantime it would be great if people could feedback to me. 

  • Are my suggestions sensible? 
  • Are there are any other features that would help the digital preservation community preserve Google Drive? I can't imagine I've captured everything...

The UK Archivematica group goes to Scotland

Published 6 Jul 2017 by Jenny Mitcham in Digital Archiving at the University of York.



Yesterday the UK Archivematica group met in Scotland for the first time. The meeting was hosted by the University of Edinburgh and as always it was great to be able to chat informally to other Archivematica users in the UK and find out what everyone is up to.


The first thing to note was that since this group of Archivematica ‘explorers’ first met in 2015 real and tangible progress seems to have been made. This was encouraging to see. This is particularly the case at the University of Edinburgh. Kirsty Lee talked us through their Archivematica implementation (now in production) and the steps they are taking to ingest digital content.


One of the most interesting bits of her presentation was a discussion about appraisal of digital material and how to manage this at scale using the available tools. When using Archivematica (or other digital preservation systems) it is necessary to carry out appraisal at an early stage before an Archival Information Package (AIP) is created and stored. It is very difficult (perhaps impossible) to unpick specific files from an AIP at a later date.


Kirsty described how one of her test collections has been reduced from 5.9GB to 753MB using a combination of traditional and technical appraisal techniques. 

Appraisal is something that is mentioned frequently in digital preservation discussions. There was a group talking about just this a couple of weeks ago at the recent DPC unconference ‘Connecting the Bits’. 

As ever it was really valuable to hear how someone is moving forward with this in a practical way. 

It will be interesting to find out how these techniques can be applied at scale of some of the larger collections Kirsty intends to work with.


Kirsty recommended an article by Victoria Sloyan, Born-digital archives at the Wellcome Library: appraisal and sensitivity review of two hard drives which was helpful to her and her colleagues when formulating their approach to this thorny problem.


She also referenced the work that the Bentley Historical Library at University of Michigan have carried out with Archivematica and we watched a video showing how they have integrated Archivematica with DSpace. This approach has influenced Edinburgh’s internal discussions about workflow.


Kirsty concluded with something that rings very true for me (in fact I think I said it myself the two presentations I gave last week!). Striving for perfection isn’t helpful, the main thing is just to get started and learn as you go along.


Rachel McGregor from the University of Lancaster gave an entertaining presentation about the UK Archivematica Camp that was held in York in April, covering topics as wide ranging as the weather, the food and finally feeling the love for PREMIS!


I gave a talk on work at York to move Archivematica and our Research Data York application towards production. I had given similar talks last week at the Jisc Research Data Network event and a DPC briefing day but I took a slightly different focus this time. I wanted to drill in a bit more detail into our workflow, the processing configuration within Archivematica and some problems I was grappling with. 

It was really helpful to get some feedback and solutions from the group on an error message I’d encountered whilst preparing my slides the previous day and to have a broader discussion on the limitations of web forms for data upload. This is what is so good about presenting within a small group setting like this as it allows for informality and genuinely productive discussion. As a result of this I over ran and made people wait for their lunch (very bad form I know!)


After lunch John Kaye updated the group on the Jisc Research Data Shared Service. This is becoming a regular feature of our meetings! There are many members of the UK Archivematica group who are not involved in the Jisc Shared Service so it is really useful to be able to keep them in the loop. 

It is clear that there will be a substantial amount of development work within Archivematica as a result of its inclusion in the Shared Service and features will be made available to all users (not just those who engage directly with Jisc). One example of this is containerisation which will allow Archivematica to be more quickly and easily installed. This is going to make life easier for everyone!


Sean Rippington from the University of St Andrews gave an interesting perspective on some of the comparison work he has been doing of Preservica and Archivematica. 

Both of these digital preservation systems are on offer through the Jisc Shared Service and as a pilot institution St Andrews has decided to test them side by side. Although he hasn’t yet got his hands on both, he was still able to offer some really useful insights on the solutions based on observations he has made so far. 

First he listed a number of similarities - for example alignment with the OAIS Reference Model, the migration-based approach, the use of microservices and many of the tools and standards that they are built on.


He also listed a lot of differences - some are obvious, for example one system is commercial and the other open source. This leads to slightly different models for support and development. He mentioned some of the additional functionality that Preservica has, for example the ability to handle emails and web archives and the inclusion of an access front end. 

He also touched on reporting. Preservica does this out of the box whereas with Archivematica you will need to use a third party reporting system. He talked a bit about the communities that have adopted each solution and concluded that Preservica seems to have a broader user base (in terms of the types of institution that use it). The engaged, active and honest user community for Archivematica was highlighted as a specific selling point and the work of the Filling the Digital Preservation Gap project (thanks!).


Sean intends to do some more detailed comparison work once he has access to both systems and we hope he will report back to a future meeting.


Next up we had a collaborative session called ‘Room 101’ (even though our meeting had been moved to room 109). Considering we were encouraged to grumble about our pet hates this session came out with some useful nuggets:




After coffee break we were joined (remotely) by several representatives from the OSSArcFlow project from Educopia and the University of North Carolina. This project is very new but it was great that they were able to share with us some information about what they intend to achieve over the course of the two year project. 

They are looking specifically at preservation workflows using open source tools (specifically Archivematica, BitCurator and ArchivesSpace) and they are working with 12 partner institutions who will all be using at least two of these tools. The project will not only provide training and technical support, but will fully document the workflows put in place at each institution. This information will be shared with the wider community. 

This is going to be really helpful for those of us who are adopting open source preservation tools, helping to answer some of those niggling questions such as how to fill the gaps and what happens when there are overlaps in the functionality of two tools.


We registered our interest in continuing to be kept in the loop about this project and we hope to hear more at a future meeting.

The day finished with a brief update from Sara Allain from Artifactual Systems. She talked about some of the new things that are coming in version 1.6.1 and 1.7 of Archivematica.

Before leaving Edinburgh it was a pleasure to be able to join the University at an event celebrating their progress in digital preservation. Celebrations such as this are pretty few and far between - perhaps because digital preservation is a task that doesn’t have an obvious end point. It was really refreshing to see an institution publicly celebrating the considerable achievements made so far. Congratulations to the University of Edinburgh!

Hot off the press…

Published 4 Jul 2017 by Tom Wilson in thomas m wilson.

   

Hot off the press…

Published 4 Jul 2017 by Tom Wilson in thomas m wilson.

   

Can't connect to MediaWiki on Nginx server [duplicate]

Published 4 Jul 2017 by Marshall S. Lee in Newest questions tagged mediawiki - Server Fault.

This question is an exact duplicate of:

I downloaded and configured MediaWiki on the Ubuntu server. I'm running it on Nginx, so I opened the nginx.conf file and modified the server part as follows.

 38     server {
 39         listen 80;
 40         server_name wiki.mypage.com;
 41
 42         access_log /var/log/nginx/access-wiki.log;
 43         error_log /var/log/nginx/error-wiki.log;
 44
 45         charset utf-8;
 46         passenger_enabled on;
 47         client_max_body_size 50m;
 48
 49         location / {
 50             root /var/www/html/mediawiki;
 51             index index.php;
 52         }
 53
 54         # pass the PHP scripts to FastCGI server
 55         location ~ \.php$ {
 56             root           html;
 57             fastcgi_pass   unix:/var/run/php/php7.0-fpm.sock;
 58             fastcgi_index  index.php;
 59             fastcgi_param  SCRIPT_FILENAME  /scripts$fastcgi_script_name;
 60             include        fastcgi_params;
 61         }
 62
 63         # deny access to .htaccess files, if Apache's document root
 64         # concurs with nginx's one
 65
 66         location ~ /\.ht {
 67             deny  all;
 68         }
 69     }

After editing, I restarted the Nginx and now I started facing another problem. Every time I try to access the webpage by with the domain above, I keep failing to face the main page of MediaWiki but I receive a file instead, which says the following.

<?php
/**
 * This is the main web entry point for MediaWiki.
 *
 * If you are reading this in your web browser, your server is probably
 * not configured correctly to run PHP applications!
 *
 * See the README, INSTALL, and UPGRADE files for basic setup instructions
 * and pointers to the online documentation.
 *
 * https://www.mediawiki.org/wiki/Special:MyLanguage/MediaWiki
 *
 * ----------
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation; either version 2 of the License, or
 * (at your option) any later version.
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License along
 * with this program; if not, write to the Free Software Foundation, Inc.,
 * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
 * http://www.gnu.org/copyleft/gpl.html
 *
 * @file
 */

// Bail on old versions of PHP, or if composer has not been run yet to install
// dependencies. Using dirname( __FILE__ ) here because __DIR__ is PHP5.3+.
// @codingStandardsIgnoreStart MediaWiki.Usage.DirUsage.FunctionFound
require_once dirname( __FILE__ ) . '/includes/PHPVersionCheck.php';
// @codingStandardsIgnoreEnd
wfEntryPointCheck( 'index.php' );

require __DIR__ . '/includes/WebStart.php';

$mediaWiki = new MediaWiki();
$mediaWiki->run();

Now, in the middle of the setup, I'm almost lost and have no idea how to work it out. I created a file hello.html in the root directory and accessed the page via wiki.mypage.com/hello.html. This is working. I do believe that the PHP configuration part is causing the errors, but I don't know how to fix it.


v1.34.4

Published 4 Jul 2017 by fabpot in Tags from Twig.


Roundcube Webmail 1.3.0 released

Published 25 Jun 2017 by Roundcube Webmail Dev Team in Roundcube Webmail Project News.

We proudly announce the stable version 1.3.0 of Roundcube Webmail which is now available for download. With this milestone we introduce new features since the 1.2 version:

Plus security and deployment improvements:

And finally some code-cleanup:

IMPORTANT: The code-cleanup part brings major changes and possibly incompatibilities to your existing Roundcube installations. So please read the Changelog carefully and thoroughly test your upgrade scenario.

Please note that Roundcube 1.3

  1. no longer runs on PHP 5.3
  2. no longer supports IE < 10 and old versions of Firefox, Chrome and Safari
  3. requires an SMTP server connection to send mails
  4. uses jQuery 3.2 and will not work with current jQuery mobile plugin

With the release of Roundcube 1.3.0, the previous stable release branches 1.2.x and 1.1.x will switch in to LTS low maintenance mode which means they will only receive important security updates but no longer any regular improvement updates.


Wikimedia Hackathon at home project

Published 24 Jun 2017 by legoktm in The Lego Mirror.

This is the second year I haven't been able to attend the Wikimedia Hackathon due to conflicts with my school schedule (I finish at the end of June). So instead I decided I would try and accomplish a large-ish project that same weekend, but at home. I'm probably more likely to get stuff done while at home because I'm not chatting up everyone in person!

Last year I converted OOjs-UI to use PHP 5.5's traits instead of a custom mixin system. That was a fun project for me since I got to learn about traits and do some non-MediaWiki coding, while still reducing our technical debt.

This year we had some momentum on MediaWiki-Codesniffer changes, so I picked up one of our largest tasks which had been waiting - to upgrade to the 3.0 upstream PHP_CodeSniffer release. Being a new major release there were breaking changes, including a huge change to the naming and namespacing of classes. My current diffstat on the open patch is +301, -229, so it is roughly the same size as last year. The conversion of our custom sniffs wasn't too hard, the biggest issue was actually updating our test suite.

We run PHPCS against test PHP files and verify the output matches the sniffs that we expect. Then we run PHPCBF, the auto-fixer, and check that the resulting "fixed" file is what we expect. The first wasn't too bad, it just calls the relevant internal functions to run PHPCS, but the latter would have PHPBCF output in a virtual filesystem, shells out to create a diff, and then tries to put it back together. Now, we just get the output from the relevant PHPCS class, and compare it to the expected test output.

This change was included in the 0.9.0 release of MediaWiki-Codesniffer and is in use by many MediaWiki extensions.


Emulation for preservation - is it for me?

Published 23 Jun 2017 by Jenny Mitcham in Digital Archiving at the University of York.

I’ve previously been of the opinion that emulation isn’t really for me.

I’ve seen presentations about emulation at conferences such as iPRES and it is fair to say that much of it normally goes over my head.

This hasn’t been helped by the fact that I’ve not really had a concrete use case for it in my own work - I find it so much easier to relate and engage to a topic or technology if I can see how it might be directly useful to me.

However, for a while now I’ve been aware that emulation is what all the ‘cool kids’ in the digital preservation world seem to be talking about. From the very migration heavy thinking of the 2000’s it appears that things are now moving in a different direction.

This fact first hit my radar at the 2014 Digital Preservation Awards where the University of Freiburg won the The OPF Award for Research and Innovation award for their work on Emulation as a Service with bwFLA Functional Long Term Archiving and Access.

So I was keen to attend the DPC event Halcyon, On and On: Emulating to Preserve to keep up to speed... not only because it was hosted on the doorstep in the centre of my home town of York!

It was an interesting and enlightening day. As usual the Digital Preservation Coalition did a great job of getting all the right experts in the room (sometimes virtually) at the same time, and a range of topics and perspectives were covered.

After an introduction from Paul Wheatley we heard from the British Library about their experiences of doing emulation as part of their Flashback project. No day on emulation would be complete without a contribution from the University of Freiburg. We had a thought provoking talk via WebEx from Euan Cochrane of Yale University Library and an excellent short film created by Jason Scott from the Internet Archive. One of the highlights for me was Jim Boulton talking about Digital Archaeology - and that wasn’t just because it had ‘Archaeology’ in the title (honest!). His talk didn’t really cover emulation, it related more to that other preservation strategy that we don’t talk about much anymore - hardware preservation. However, many of the points he raised were entirely relevant to emulation - for example, how to maintain an authentic experience, how you define what the significant properties of an item actually are and what decisions you have to make as a curator of the digital past. It was great to see how engaged the public were with his exhibitions and how people interacted with it.

Some of the themes of the day and take away thoughts for me:


Thinking about how this all relates to me and my work, I am immediately struck by two use cases.

Firstly research data - we are taking great steps forward in enabling this data to be preserved and maintained for the long term but will it be re-usable? For many types of research data there is no clear migration strategy. Emulation as a strategy for accessing this data ten or twenty years from now needs to be seriously considered. In the meantime we need to ensure we can identify the files themselves and collect adequate documentation - it is these things that will help us to enable reuse through emulators in the future.

Secondly, there are some digital archives that we hold at the Borthwick Institute from the 1980's. For example I have been working on a batch of WordStar files in my spare moments over the last few years. I'd love to get a contemporary emulator fired up and see if I could install WordStar and work with these files in their native setting. I've already gone a little way down the technology preservation route, getting WordStar installed on an old Windows 98 PC and viewing the files, but this isn't exactly contemporary. These approaches will help to establish the significant properties of the files and assess how successful subsequent migration strategies are....but this is a future blog post.

It was a fun event and it was clear that everybody loves a bit of nostalgia. Jim Boulton ended his presentation saying "There is something quite romantic about letting people play with old hardware".

We have come a long way and this is most apparent when seeing artefacts (hardware, software, operating systems, data) from early computing. Only this week whilst taking the kids to school we got into a conversation about floppy disks (yes, I know...). I asked the kids if they knew what they looked like and they answered "Yes, it is the save icon on the computer"(see Why is the save icon still a floppy disk?)...but of course they've never seen a real one. Clearly some obsolete elements of our computer history will remain in our collective consciousness for many years and perhaps it is our job to continue to keep them alive in some form.



Quick Method to wget my local wiki... need advice (without dumping mysql)

Published 23 Jun 2017 by WubiUbuntu980 Unity7 Refugee in Newest questions tagged mediawiki - Ask Ubuntu.

I need advice.

I have a webserver vm (LAN, not on the internet), it has 2 wikis:

http://lanwiki/GameWiki

http://lanwiki/HomeworkWiki

I want to wget only the homework wiki pages, without crawling into the GameWiki?

My goal is to just get the .htmls (ignore all other files images etc), with wget. (I dont want to do a mysqldump or mediawiki export, but rather wget for my (non-IT) boss who just wants to double click the html).

How can I run wget to only crawl the HomeWorkWiki, and not the GameWiki on this VM.

Thanks


Using MediaWiki and external data, how can I show an image in a page, returned as a blob from a database?

Published 20 Jun 2017 by Masutatsu in Newest questions tagged mediawiki - Webmasters Stack Exchange.

I'm creating a wiki (using MediaWiki) which pulls data from a mySQL instance, and uses this alongside a template to generate the page dynamically.

My mySQL instance contains images, stored in a field of type BLOB.

Is it possible for MediaWiki to interpret this BLOB data into the actual image desired to be shown on the page?


A typical week as a digital archivist?

Published 16 Jun 2017 by Jenny Mitcham in Digital Archiving at the University of York.

Sometimes (admittedly not very often) I'm asked what I actually do all day. So at the end of a busy week being a digital archivist I've decided to blog about what I've been up to.

Monday

Today I had a couple of meetings. One specifically to talk about digital preservation of electronic theses submissions. I've also had a work experience placement in this week so have set up a metadata creation task which he has been busy working on.

When I had a spare moment I did a little more testing work on the EAD harvesting feature the University of York is jointly sponsoring Artefactual Systems to develop in AtoM. Testing this feature from my perspective involves logging into the test site that Artefactual has created for us and tweaking some of the archival descriptions. Once those descriptions are saved, I can take a peek at the job scheduler and make sure that new EAD files are being created behind the scenes for the Archives Hub to attempt to harvest at a later date.

This piece of development work has been going on for a few months now and communications have been technically quite complex so I'm also trying to ensure all the organisations involved are happy with what has been achieved and will be arranging a virtual meeting so we can all get together and talk through any remaining issues.

I was slightly surprised today to have a couple of requests to talk to the media. This has sprung from the news that the Queen's Speech will be delayed. One of the reasons for the delay relates to the fact that the speech has to be written on goat's skin parchment, which takes a few days to dry. I had previously been interviewed for a article entitled Why is the UK still printing its laws on vellum? and am now mistaken for someone who knows about vellum. I explained to potential interviewers that this is not my specialist subject!

Tuesday

In the morning I went to visit a researcher at the University of York. I wanted to talk to him about how he uses Google Drive in relation to his research. This is a really interesting topic to me right now as I consider how best we might be able to preserve current research datasets. Seeing how exactly Google Drive is used and what features the researcher considers to be significant (and necessary for reuse) is really helpful when thinking about a suitable approach to this problem. I sometimes think I work a little bit too much in my own echo chamber, so getting out and hearing different perspectives is incredibly valuable.

Later that afternoon I had an unexpected meeting with one of our depositors (well, there were two of them actually). I've not met them before but have been working with their data for a little while. In our brief meeting it was really interesting to chat and see the data from a fresh perspective. I was able to reunite them with some digital files that they had created in the mid 1980's, had saved on to floppy disk and had not been able to access for a long time.

Digital preservation can be quite a behind the scenes sort of job - we always give a nod to the reason why we do what we do (ie: we preserve for future reuse), but actually seeing the results of that work unfold in front of your eyes is genuinely rewarding. I had rescued something from the jaws of digital obsolescence so it could now be reused and revitalised!

At the end of the day I presented a joint webinar for the Open Preservation Foundation called 'PRONOM in practice'. Alongside David Clipsham (The National Archives) and Justin Simpson (Artefactual Systems), I talked about my own experiences with PRONOM, particularly relating to file signature creation, and ending with a call to arms "Do try this at home!". It would be great if more of the community could get involved!

I was really pleased that the webinar platform worked OK for me this time round (always a bit stressful when it doesn't) and that I got to use the yellow highlighter pen on my slides.

In my spare moments (which were few and far between), I put together a powerpoint presentation for the following day...

Wednesday

I spent the day at the British Library in Boston Spa. I'd been invited to speak at a training event they regularly hold for members of staff who want to find out a bit more about digital preservation and the work of the team.

I was asked specifically to talk through some of the challenges and issues that I face in my work. I found this pretty easy - there are lots of challenges - and I eventually realised I had too many slides so had to cut it short! I suppose that is better than not having enough to say!

Visiting Boston Spa meant that I could also chat to the team over lunch and visit their lab. They had a very impressive range of old computers and were able to give me a demonstration of Kryoflux (which I've never seen in action before) and talk a little about emulation. This was a good warm up for the DPC event about emulation I'm attending next week: Halcyon On and On: Emulating to Preserve.

Still left on my to do list from my trip is to download Teracopy. I currently use Foldermatch for checking that files I have copied have remained unchanged. From the quick demo I saw at the British Library I think that Teracopy would be a more simple one step solution. I need to have a play with this and then think about incorporating it into the digital ingest workflow.

Sharing information and collaborating with others working in the digital preservation field really is directly beneficial to the day to day work that we do!

Thursday

Back in the office today and a much quieter day.

I extracted some reports from our AtoM catalogue for a colleague and did a bit of work with our test version of Research Data York. I also met with another colleague to talk about storing and providing access to digitised images.

In the afternoon I wrote another powerpoint presentation, this time for a forthcoming DPC event: From Planning to Deployment: Digital Preservation and Organizational Change.

I'm going to be talking about our experiences of moving our Research Data York application from proof of concept to production. We are not yet in production and some of the reasons why will be explored in the presentation! Again I was asked to talk about barriers and challenges and again, this brief is fairly easy to fit! The event itself is over a week away so this is unprecedentedly well organised. Long may it continue!


Friday

On Fridays I try to catch up on the week just gone and plan for the week ahead as well as reading the relevant blogs that have appeared over the week. It is also a good chance to catch up with some admin tasks and emails.

Lunch time reading today was provided by William Kilbride's latest blog post. Some of it went over my head but the final messages around value and reuse and the need to "do more with less" rang very true.

Sometimes I even blog myself - as I am today!




Was this a typical week - perhaps not, but in this job there is probably no such thing! Every week brings new ideas, challenges and surprises!

I would say the only real constant is that I've always got lots of things to keep me busy.

Five minutes with Kylie Howarth

Published 7 Jun 2017 by carinamm in State Library of Western Australia Blog.

Kylie Howarth is an award winning Western Australian author, illustrator and graphic designer. Original illustrations and draft materials from her most recent picture book 1, 2, Pirate Stew (Five Mile Press) are currently showing in The Story Place Gallery.

We spent some time hearing from Kylie Howarth about the ideas and inspiration behind her work. Here’s what she had to say…

PirateStew.jpg

1, 2, Pirate Stew is all about the power of imagination and the joys of playing in a cardboard box. How do your real life experiences influence your picture book ideas? What role does imagination play?

The kids and I turned the box from our new BBQ into a pirate ship. We painted it together and made anchors, pirate hats and oars. They loved it so much they played in it every day for months… and so the idea for 1, 2, Pirate Stew was born. It eventually fell apart and so did our hot water system, so we used that box to build a rocket. Boxes live long lives around our place. I also cut them up and take them to school visits to do texture rubbings with the students.

Your illustrations for 1, 2, Pirate Stew are unique in that they incorporate painted textures created during backyard art sessions with your children. What encouraged you to do this? How do your children’s artworks inspire you?

I just love children’s paintings. They have an energy I find impossible to replicate. Including them in my book illustrations encourages kids to feel their art is important and that they can make books too. Kids sometimes find highly realistic illustrations intimidating and feel they could never do it themselves. During school and library visits, they love seeing the original finger paintings and potato stamp prints that were used in my books.
PirateStew_Sketch.JPG

Through digital illustration you have blended hand drawings with painted textures. How has your background and training as a graphic designer influenced your illustrative style?

Being a graphic designer has certainly influenced the colour and composition of my illustrations. In 1, 2, Pirate Stew particularly the use of white space. Many illustrators and designers are afraid of white space but it can be such an effective tool, it allows the book to breathe. The main advantage though is that I have been able to design all my own book covers, select fonts and arrange the text layout.

Sometimes ideas for picture books evolve and change a lot when working with the publisher. Sometimes the ideas don’t change much at all. What was your experience when creating 1, 2, Pirate Stew? Was it similar or different to your previous books Fish Jam and Chip?

I worked with a fabulous editor, Karen Tayleur on all three books. We tweaked the text for Fish Jam and Chip a little to make them sing as best we could. With 1, 2, Pirate Stew however, the text was based on the old nursery rhyme 1, 2, Buckle My Shoe. So there was little room to move as I was constrained to a limited number of syllables and each line had to rhyme. I think we only added one word. I did however further develop the illustrations from my original submission. Initially the character’s faces were a little more stylised so I refined them to be more universal. Creating the mini 3D character model helped me get them looking consistent from different angles throughout the book. I also took many photographs of my boys to sketch from.

1, 2, Pirate Stew – an exhibition is on display at the State Library of Western Australia until 22 June 2017. The exhibition is part of a series showcasing the diverse range of illustrative styles in picture books published by Western Australian authors and illustrators. For more information go to http://www.slwa.wa.gov.au


Filed under: Children's Literature, Exhibitions, Illustration, SLWA displays, SLWA Exhibitions, SLWA news Tagged: 1 2 Pirate Stew, Five Mile Press, Kylie Howarth, State Library of Western Australia, State Library WA, Story Place Gallery, WA authors, WA illustrators

v2.4.3

Published 7 Jun 2017 by fabpot in Tags from Twig.


v1.34.3

Published 7 Jun 2017 by fabpot in Tags from Twig.


MediaWiki fails to show Ambox

Published 7 Jun 2017 by lucamauri in Newest questions tagged mediawiki - Webmasters Stack Exchange.

I am writing you about the use of Template:Ambox in MediaWiki.

I have a version 1.28 hosted MediWiki installation that works well apparently at everything, but I can't get the boxes explain here https://www.mediawiki.org/wiki/Template:Ambox to work properly.

As a test I implemented in this page http://www.lucamauri.net/wikitrek/index.php?title=Pasticci the following code:

{{ambox
| type       = notice
| text       = Text for a big box, for the top of articles.
| smalltext  = Text for the top of article sections.
}}

and I expected a nice box to show up. Instead I simply see the text Template:Ambox shown at the top of the page.
It seems like this template is not defined in MediaWiki, but, as far as I understood, this is built-in and in all examples I saw it seems it should work out-of-the-box.

I guess I miss something basic here, but it really escapes me: any help you might provide will be appreciated.

Thanks

Luca


v2.4.2

Published 5 Jun 2017 by fabpot in Tags from Twig.


v1.34.2

Published 5 Jun 2017 by fabpot in Tags from Twig.


v2.4.1

Published 5 Jun 2017 by fabpot in Tags from Twig.


v1.34.1

Published 5 Jun 2017 by fabpot in Tags from Twig.


v2.4.0

Published 5 Jun 2017 by fabpot in Tags from Twig.


Ted Nelson’s Junk Mail (and the Archive Corps Pilot)

Published 31 May 2017 by Jason Scott in ASCII by Jason Scott.

I’ve been very lucky over the past few months to dedicate a few days here and there to helping legend Ted Nelson sort through his archives. We’ve known each other for a bunch of years now, but it’s always a privilege to get a chance to hang with Ted and especially to help him with auditing and maintaining his collection of papers, notes, binders, and items. It also helps that it’s in pretty fantastic shape to begin with.

Along with sorting comes some discarding – mostly old magazines and books; they’re being donated wherever it makes sense to. Along with these items were junk mail that Ted got over the decades.

About that junk mail….

After glancing through it, I requested to keep it and take it home. There was a lot of it, and even going through it with a cursory view showed me it was priceless.

There’s two kinds of people in the world – those who look at ephemera and consider it trash, and those who consider it gold.

I’m in the gold camp.

I’d already been doing something like this for years, myself – when I was a teenager, I circled so many reader service cards and pulled in piles and piles of flyers and mailings from companies so fleeting or so weird, and I kept them. These became digitize.textfiles.com and later the reader service collection, which encapsulates digitize.textfiles.com completely. There’s well over a thousand pages in that collection, which I’ve scanned myself.

Ted, basically, did what I was doing, but with more breadth, more variety, and with a few decades more time.

And because he was always keeping an eye out on many possibilities for future fields of study, he kept his mind (and mailbox) open to a lot of industries. Manufacturing, engineering, film-making, printing, and of course “computers” as expressed in a thousand different ways. The mail dates from the 1960s through to the mid 2000s, and it’s friggin’ beautiful.

Here’s where it gets interesting, and where you come in.

There’s now a collection of scanned mail from this collection up at the Internet Archive. It’s called Ted Nelson’s Junk Mail and you can see the hundreds of scanned pages that will soon become thousands and maybe tens of thousands of scanned pages.

They’re separated by mailing, and over time the metadata and the contents will get better, increase in size, and hopefully provide decades of enjoyment for people.

The project is being coordinated by Kevin Savetz, who has hired a temp worker to scan in the pages across each weekday, going through the boxes and doing the “easy” stuff (8.5×11 sheets) which, trust me, is definitely worth going through first. As they’re scanned, they’re uploaded, and (for now) I am running scripts to add them as items to the Junk Mail collection.

The cost of doing this is roughly $80 a day, during which hundreds of pages can be scanned. We’re refining the process as we go, and expect it to get even more productive over time.

So, here’s where Archive Corps comes in; this is a pilot program for the idea behind the new idea of Archive Corps, which is providing a funnel for all the amazing stuff out there to get scanned. If you want to see more stuff come from the operation that Kevin is running, he has a paypal address up at k@savetz.com – the more you donate the more days we are able to have the temp come in to scan.

I’m very excited to watch this collection grow, and see the massive variety of history that it will reveal. A huge thank-you to Ted Nelson for letting me take these items, and a thank-you to Kevin Savetz for coordinating.

Let’s enjoy some history!


Local illustration showcase

Published 30 May 2017 by carinamm in State Library of Western Australia Blog.

From digital illustration to watercolor painting and screen-printing, three very different styles of illustration highlight the diversity and originality of picture books published this year. 

In a series of exhibitions, The Story Place Gallery will showcase original artwork by Western Australian illustrators from the picture books 1,2 , Pirate Stew, (Five Mile Press 2017), One Thousand Trees and Colour Me (Fremantle Press 2017).

PirateStew_PromotionalImage.JPG

7, 8, he took the bait © Kylie Howarth 2017

In 1,2 , Pirate Stew,  Kylie Howarth has used a digital Illustration process to merge her drawings created using water soluble pencils, with background textures painted by her two adventurous children Beau and Jack. Kylie Howarth’s playful illustrations of gentle colours, together with her entertaining rhyming verse, take readers on an imaginative adventure all about the joys of playing in a cardboard box. Illustrations from 1,2, Pirate Stew are on display from 26 May – 22 June.

Beneath_KyleHughesOdgers_OneThousandTrees.JPG

Among © Kyle Hughes-Odgers 2017

Kyle Hughes-Odgers’ distinctive illustrations blend geometric shapes, patterns and forms. In his watercolour illustrations for One Thousand Trees, he uses translucent colours and a restricted colour palette to explore the relationship between humankind and the environment. Shades of green browns and grey blues emphasise contrasts between urban and natural scenes. Kyle Hughes-Odgers places the words of the story within his illustrations to accentuate meaning. One Thousand Trees is on display from 24 June to 23 July.

ColourMe_MoiraCourt_PromoImage.JPG

If I was red © Moira Court

Moira Court’s bold illustration for the book Colour Me (written by Ezekiel Kwaymullina) were created using a woodcut and screen printing technique. Each final illustration is made from layers of silk screen prints created using hand cut paper stencils and transparent ink. Each screen print was then layered with a patchy, textural woodcut or linoleum print. Colours were  printed one at a time to achieve a transparent effect. The story celebrates the power of each individual colour, as well as the power of their combination. Colour Me is on display from 26 July – 16 August.

Each exhibition in this series is curated especially for children and is accompanied by a story sharing area, self-directed activity, and discussion prompters for families


Filed under: Children's Literature, community events, Exhibitions, Illustration, SLWA displays, SLWA Exhibitions, SLWA news

A Lot of Doing

Published 28 May 2017 by Jason Scott in ASCII by Jason Scott.

If you follow this weblog, you saw there was a pause of a couple months. I’ve been busy! Better to do than to talk about doing.

A flood of posts are coming – they reflect accomplishments and thoughts of the last period of time, so don’t be freaked out as they pop up in your life very quickly.

Thanks.


TV Interview on Stepping Off

Published 26 May 2017 by Tom Wilson in thomas m wilson.


TV Interview on Stepping Off

Published 26 May 2017 by Tom Wilson in thomas m wilson.


Simon 0.4.90 beta released

Published 20 May 2017 by fux in blogs.kde.org blogs.

KDE Project:

The second version (0.4.90) towards Simon 0.5.0 is out in the wilds. Please download the source code, test it and send us feedback.

What we changed since the alpha release:

  • Bugfix: The download of Simon Base Models work again flawlessly (bug: 377968)
  • Fix detection of utterid APIs in Pocketsphinx

You can get it here:
https://download.kde.org/unstable/simon/0.4.90/simon-0.4.90.tar.xz.mirrorlist

In the work is also an AppImage version of Simon for easy testing. We hope to deliver one for the Beta release coming soon.

Known issues with Simon 0.4.90 are:

  • Some Scenarios available for download don't work anymore (BUG: 375819)
  • Simon can't add Arabic or Hebrew words (BUG: 356452)

We hope to fix these bugs and look forward to your feedback and bug reports and maybe to see you at the next Simon IRC meeting: Tuesday, 23rd of May, at 10pm (UTC+2) in #kde-accessibility on freenode.net.

About Simon
Simon is an open source speech recognition program that can replace your mouse and keyboard. The system is designed to be as flexible as possible and will work with any language or dialect. For more information take a look at the Simon homepage.


All Piwigo.com accounts updated to version 2.9

Published 20 May 2017 by Pierrick Le Gall in The Piwigo.com Blog.

17 days after Piwigo 2.9.0 was released and 4 days after we started to update Piwigo.com, all accounts are now up-to-date.

Piwigo 2.9 and new design on administration pages

Piwigo 2.9 and new design on administration pages

As you will learn from the release notes, your history will now be automatically purged to keep “only” the last 1 million lines. Yes, some of you, 176 to be exact, have more than 1 million lines, with a record set to 27 millions lines!


Join us at Akademy 2017 in Almería!

Published 19 May 2017 by eike hein in blogs.kde.org blogs.

KDE Project:

This July KDE's user and developer community is once again going to come together at Akademy, our largest annual gathering.

I'm going there this year as well, and you'll even be able to catch me on stage giving a talk on Input Methods in Plasma 5. Here's the talk abstract to hopefully whet your appetite:


An overview over the How and Why of Input Methods support (including examples of international writing systems, emoji and word completion) in Plasma on both X11 and Wayland, its current status and challenges, and the work ahead of us.

Text input is the foundational means of human-computer interaction: We configure or systems, program them, and express ourselves through them by writing. Input Methods help us along by converting hardware events into text - complex conversion being a requirement for many international writing systems, new writing systems such as emoji, and at the heart of assistive text technologies such as word completion and spell-checking.

This talk will illustrate the application areas for Input Methods by example, presenting short introductions to several international writing systems as well as emoji input. It will explain why solid Input Methods support is vital to KDE's goal of inclusivity and how Input Methods can make the act of writing easier for all of us.

It will consolidate input from the Input Methods development and user community to provide a detailed overview over the current Input Methods technical architecture and user experience in Plasma, as well as free systems in general. It will dive into existing pain points and present both ongoing work and plans to address them.


This will actually be the first time I'm giving a presentation at Akademy! It's a topic close to my heart, and I hope I can do a decent job conveying a snaphot of all the great and important work people are doing in this area to your eyes and ears.

See you there!


Wikimedia Commons Android App Pre-Hackathon

Published 19 May 2017 by addshore in Addshore.

Wikimedia Commons Logo

The Wikimedia Commons Android App allows users to upload photos to Commons directly from their phone.

The website for the app details some of the features and the code can be found on GitHub.

A hackathon was organized in Prague to work on the app in the run up to the yearly Wikimedia Hackathon which is in Vienna this year.

A group of 7 developers worked on the app over a few days and as well as meeting each other and learning from each other they also managed to work on various improvements which I have summarised below.

2 factor authentication (nearly)

Work has been done towards allowing 2fa logins to the app.

Lots of the login & authentication code has been refactored and the app now uses the clientlogin API module provided by Mediawiki instead of the older login module.

When building to debug the 2fa input box will appear if you have 2fa login enabled, however the current production build will not show this box and simply display a message saying that 2fa is not currently supported. This is due to a small amount of session handling work that the app still needs.

Better menu & Logout

As development on the app was fairly non existent between mid 2013 and 2016 the UI generally fell behind. This is visible in forms, buttons as well as app layout.

One significant push was made to drop the old style ‘burger’ menu from the top right of the app and replace it with a new slide out menu draw including a feature image and icons for menu items.

Uploaded images display limit

Some users have run into issues with the number of upload contributions that the app loads by default in the contributions activity. The default has always been 500 and this can cause memory exhaustion / OOM and a crash on some memory limited phones.

In an attempt to fix and generally speed up the app a recent upload limit has been added to the settings which will limit the number images and image details that are displayed, however the app will still fetch and store more than this on the device.

Nearby places enhancements

The nearby places enhancements probably account for the largest portion of development time at the pre hackathon. The app has always had a list of nearby places that don’t have images on commons but now the app also has a map!

The map is powered by the mapbox SDK and the current beta uses the mapbox tiles however part of the plan for the Vienna hackathon is to switch this to using the wikimedia hosted map tiles at https://maps.wikimedia.org.

The map also contains clickable pins that provide a small pop up pulling information from Wikidata including the label and description of the item as well as providing two buttons to get directions to the place or read the Wikipedia article.

Image info coordinates & image date

Extra information has also been added to the image details view and the image date and coordinates of the image can now be seen in the app.

Summary of hackathon activity

The contributions and authors that worked on the app during the pre hackathon can be found on Github at the following link.

Roughly 66 commits were made between the 11th and 19th of May 2017 by 9 contributors.

Screenshot Gallery


AtoM Camp take aways

Published 12 May 2017 by Jenny Mitcham in Digital Archiving at the University of York.

The view from the window at AtoM Camp ...not that there was
any time to gaze out of the window of course...
I’ve spent the last three days in Cambridge at AtoM Camp. This was the second ever AtoM Camp, and the first in Europe. A big thanks to St John’s College for hosting it and to Artefactual Systems for putting it on.

It really has been an interesting few days, with a packed programme and an engaged group of attendees from across Europe and beyond bringing different levels of experience with AtoM.

As a ‘camp counsellor’ I was able to take to the floor at regular intervals to share some of our experiences of implementing AtoM at the Borthwick, covering topics such as system selection, querying the MySQL database, building the community and overcoming implementation challenges.

However, I was also there to learn!

Here are some bits and pieces that I’ve taken away.

My first real take away is that I now have a working copy of the soon to be released AtoM 2.4 on my Macbook - this is really quite cool. I'll never again be bored on a train - I can just fire up Ubuntu and have a play!

Walk to Camp takes you over Cambridge's Bridge of Sighs
During the camp it was great to be able to hear about some of the new features that will be available in this latest release.

At the Borthwick Institute our catalogue is still running on AtoM 2.2 so we are pretty excited about moving to 2.4 and being able to take advantage of all of this new functionality.

Just some of the new features I learnt about that I can see an immediate use case are:



On day two of camp I enjoyed the implementation tours, seeing how other institutions have implemented AtoM and the tweaks and modifications they have made. For example it was interesting to see the shopping cart feature developed for the Mennonite Archival Image Database and most popular image carousel feature on front page of the Chinese Canadian Artifacts Project. I was also interested in some of the modifications the National Library of Wales have made to meet their own needs.

It was also nice to hear the Borthwick Catalogue described  by Dan as “elegant”!


There was a great session on community and governance at the end of day two which was one of the highlights of the camp for me. It gave attendees the chance to really understand the business model of Artefactual (as well as alternatives to the bounty model in use by other open source projects). We also got a full history of the evolution of AtoM and saw the very first project logo and vision.

The AtoM vision hasn't changed too much but the name and logo have!

Dan Gillean from Artefactual articulated the problem of trying to get funding for essential and ongoing tasks, such as code modernisation. Two examples he used were updating AtoM to work with the latest version of Symfony and Elasticsearch - both of these tasks need to happen in order to keep AtoM moving in the right direction but both require a substantial amount of work and are not likely to be picked up and funded by the community.

I was interested to see Artefactual’s vision for a new AtoM 3.0 which would see some fundamental changes to the way AtoM works and a more up-to-date, modular and scalable architecture designed to meet the future use cases of the growing AtoM community.

Artefactual's proposed modular architecture for AtoM 3.0

There is no time line for AtoM 3.0, and whether it goes ahead or not is entirely dependent on a substantial source of funding being available. It was great to see Artefactual sharing their vision and encouraging feedback from the community at this early stage though.

Another highlight of Camp:
a tour of the archives of St John's College from Tracy Deakin
A session on data migrations on day three included a demo of OpenRefine from Sara Allain from Artefactual. I’d heard of this tool before but wasn’t entirely sure what it did and whether it would be of use to me. Sara demonstrated how it could be used to bash data into shape before import into AtoM. It seemed to be capable of doing all the things that I’ve previously done in Excel (and more) ...but without so much pain. I’ll definitely be looking to try this out when I next have some data to clean up.

Dan Gillean and Pete Vox from IMAGIZ talked through the process of importing data into AtoM. Pete focused on an example from Croydon Museum Service who's data needed to be migrated from CALM. He talked through some of the challenges of the task and how he would approach this differently in future. It is clear that the complexities of data migration may be one of the biggest barriers to institutions moving to AtoM from an alternative system, but it was encouraging to hear that none of these challenges are insurmountable.

My final take away from AtoM Camp is a long list of actions - new things I have learnt that I want to read up on or try out for myself ...I best crack on!





Permission denied for files in www-data

Published 11 May 2017 by petergus in Newest questions tagged mediawiki - Ask Ubuntu.

I have image files being uploaded with mediawiki, and they are setting the owner as www-data. Viewing the files results in 403 forbidden. (all other site files owned by SITE_USER).

The SITE_USER and www-data are both in each others (secondary) groups.

What am I missing here?

EDIT: My Apache directives

DocumentRoot "/home/SITE_USER/public_html/en.domain.org"
ServerName en.domain.org
# Alias for Wiki so images work
Alias /images "/home/SITE_USER/public_html/mediawiki/sites/images"    
<Directory "/home/SITE_USER/public_html/en.domain.org">
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} !-f
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} !-d
RewriteRule ^(.*)$ %{DOCUMENT_ROOT}//index.php [L]
## http://www.mediawiki.org/wiki/Manual:Short_URL/Apache
# Enable the rewrite engine
RewriteEngine On
# Short url for wiki pages
RewriteRule ^/?wiki(/.*)?$ %{DOCUMENT_ROOT}/index.php [L]
# Redirect / to Main Page
RewriteRule ^/*$ %{DOCUMENT_ROOT}/index.php [L]
#
Options -Indexes +SymLinksIfOwnerMatch
allow from all
AllowOverride All Options=ExecCGI,Includes,IncludesNOEXEC,Indexes,MultiViews,SymLinksIfOwnerMatch
Require all granted
</Directory>

Maintenance report of April 28th 2017

Published 11 May 2017 by Pierrick Le Gall in The Piwigo.com Blog.

Piwigo.com clients have already received this message. Many users told us they were happy to receive such details about our technical operations so but let’s make it more “public” with a blog post!

A. The short version

On April 27th 2017, we replaced one of Piwigo.com main servers. The replacement itself was successful. No downtime. The read-only mode has lasted only 7 minutes, from 6:00 to 6:07 UTC.

While sending the notification email to our clients, we encountered difficulties with Gmail users. Solving this Gmail issue made the website unavailable for a few users and maybe an hour. Everything was back to normal in a few hours. Of course, no data has been lost during this operation.

The new server and Piwigo are now good friends. They both look forward to receive version 2.9 in the next days 😉

B. Additional technical details

The notification message had already been sent to the first 390 users when we realized emails sent to Gmail addresses were returned in error. Indeed Gmail now asks for a “reverse DNS IPv6”. Sorry for this very technical detail. We already had it on the old server so we added it on the new server. And then start the problems… Unfortunately the new server does not manage IPv6 the same way. A few users, on IPv6, told us they only see “Apache2 Debian Default Page” instead of their Piwigo. Here is the timeline:

Unfortunately adding or removing an IPv6 is not an immediate action. It relies on the “DNS propagation” which may take a few hours, depending on each user.

We took the rest of the day to figure out how to make Gmail accept our emails and web visitors see your Piwigo. Instead of “piwigo.com”, we now use a sub-domain of “pigolabs.com” (Pigolabs is the company running Piwigo.com service) with an IPv6 : no impact on web traffic.

We also have a technical solution to handle IPv6 for web traffic. We have decided not to use it because IPv6 lacks an important feature, the FailOver. This feature, only available on IPv4, let us redirect web traffic from one server to another in a few seconds without worrying about DNS propagation. We use it when a server fails and web traffic goes to a spare server.

In the end, the move did not go so well and we sweat quite a this friday, but everything came back to normal and the “Apache2 Debian Default Page” issue eventually affected only a few people!


At the J Shed

Published 7 May 2017 by Dave Robertson in Dave Robertson.

We can’t wait to play here again soon… in June… stay tuned! Photo by Alex Chapman

Share


At the J Shed

Published 7 May 2017 by Dave Robertson in Dave Robertson.

We can’t wait to play here again soon… in June… stay tuned! Photo by Alex Chapman

Share


Semantic MediaWiki require onoi/callback-container, but it can't be installed

Published 5 May 2017 by Сергей Румянцев in Newest questions tagged mediawiki - Server Fault.

I try to install the latest release of SemanticMediaWiki. When I run composer update, it returns the following:

> ComposerHookHandler::onPreUpdate
Loading composer repositories with package information
Updating dependencies (including require-dev)
Your requirements could not be resolved to an installable set of packages.

  Problem 1
    - mediawiki/semantic-media-wiki 2.4.x-dev requires onoi/callback-container ~1.0 -> satisfiable by onoi/callback-container[1.0.0, 1.1.0] but these conflict with your requirements or minimum-stability.
    - mediawiki/semantic-media-wiki 2.4.6 requires onoi/callback-container ~1.0 -> satisfiable by onoi/callback-container[1.0.0, 1.1.0] but these conflict with your requirements or minimum-stability.
    - mediawiki/semantic-media-wiki 2.4.5 requires onoi/callback-container ~1.0 -> satisfiable by onoi/callback-container[1.0.0, 1.1.0] but these conflict with your requirements or minimum-stability.
    - mediawiki/semantic-media-wiki 2.4.4 requires onoi/callback-container ~1.0 -> satisfiable by onoi/callback-container[1.0.0, 1.1.0] but these conflict with your requirements or minimum-stability.
    - mediawiki/semantic-media-wiki 2.4.3 requires onoi/callback-container ~1.0 -> satisfiable by onoi/callback-container[1.0.0, 1.1.0] but these conflict with your requirements or minimum-stability.
    - mediawiki/semantic-media-wiki 2.4.2 requires onoi/callback-container ~1.0 -> satisfiable by onoi/callback-container[1.0.0, 1.1.0] but these conflict with your requirements or minimum-stability.
    - mediawiki/semantic-media-wiki 2.4.1 requires onoi/callback-container ~1.0 -> satisfiable by onoi/callback-container[1.0.0, 1.1.0] but these conflict with your requirements or minimum-stability.
    - Installation request for mediawiki/semantic-media-wiki ~2.4.1 -> satisfiable by mediawiki/semantic-media-wiki[2.4.1, 2.4.2, 2.4.3, 2.4.4, 2.4.5, 2.4.6, 2.4.x-dev].

I have even set minimum-stability to dev and even prefer-stable to false. Nothing resolves.

It is not the first problem with Composer. It returned an error due to no set version in package mediawiki/core, which was required still by this SMW. But not at this time, surprise.

And Composer don't see the package in composer show onoi/callback-container. There is stable version 2.0 at all.


After upgrade to 14.04 I get "You don't have permission to access /wiki/ on this server."

Published 3 May 2017 by Finn Årup Nielsen in Newest questions tagged mediawiki - Ask Ubuntu.

After dist-upgrade to 14.04 I get "You don't have permission to access /wiki/ on this server." for a MediaWiki installation with alias. /w/index.php is also failing.

So far I have seen a difference in configuration between 12.04 and 14.04 and I did

cd /etc/apache2/sites-available
sudo ln -s ../sites-available/000-default.conf .

This fixed other problems, but not the MediaWiki problem.


How can we preserve Google Documents?

Published 28 Apr 2017 by Jenny Mitcham in Digital Archiving at the University of York.

Last month I asked (and tried to answer) the question How can we preserve our wiki pages?

This month I am investigating the slightly more challenging issue of how to preserve native Google Drive files, specifically documents*.

Why?

At the University of York we work a lot with Google Drive. We have the G Suite for Education (formally known as Google Apps for Education) and as part of this we have embraced Google Drive and it is now widely used across the University. For many (me included) it has become the tool of choice for creating documents, spreadsheets and presentations. The ability to share documents and directly collaborate are key.

So of course it is inevitable that at some point we will need to think about how to preserve them.

How hard can it be?

Quite hard actually.

The basic problem is that documents created in Google Drive are not really "files" at all.

The majority of the techniques and models that we use in digital preservation are based around the fact that you have a digital object that you can see in your file system, copy from place to place and package up into an Archival Information Package (AIP).

In the digital preservation community we're all pretty comfortable with that way of working.

The key challenge with stuff created in Google Drive is that it doesn't really exist as a file.

Always living in hope that someone has already solved the problem, I asked the question on Twitter and that really helped with my research.

Isn't the digital preservation community great?

Exporting Documents from Google Drive

I started off testing the different download options available within Google docs. For my tests I used 2 native Google documents. One was the working version of our Phase 1 Filling the Digital Preservation Gap report. This report was originally authored as a Google doc, was 56 pages long and consisted of text, tables, images, footnotes, links, formatted text, page numbers, colours etc (ie: lots of significant properties I could assess). I also used another more simple document for testing - this one was just basic text and tables but also included comments by several contributors.

I exported both of these documents into all of the different export formats that Google supports and assessed the results, looking at each characteristic of the document in turn and establishing whether or not I felt it was adequately retained.

Here is a summary of my findings, looking specifically at the Filling the Digital Preservation Gap phase 1 report document:


...but what about the comments?

My second test document was chosen so I could look specifically at the comments feature and how these were retained (or not) in the exported version.

  • docx - Comments are exported. On first inspection they appear to be anonymised, however this seems to be just how they are rendered in Microsoft Word. Having unzipped and dug into the actual docx file and looked at the XML file that holds the comments, it is clear that a more detailed level of information is retained - see images below. The placement of the comments is not always accurate. In one instance the reply to a comment is assigned to text within a subsequent row of the table rather than to the same row as the original comment.
  • odt -  Comments are included, are attributed to individuals and have a date and time. Again, matching up of comments with right section of text is not always accurate - in one instance a comment and it's reply are linked to the table cell underneath the one that they referenced in the original document.
  • rtf - Comments are included but appear to be anonymised when displayed in MS Word...I haven't dug around enough to establish whether or not this is just a rendering issue.
  • txt - Comments are retained but appear at the end of the document with a [a], [b] etc prefix - these letters appear in the main body text to show where the comments appeared. No information about who made the comment is preserved.
  • pdf - Comments not exported
  • epub - Comments not exported
  • html - Comments are present but appear at the end of the document with a code which also acts as a placeholder in the text where the comment appeared. References to the comments in the text are hyperlinks which take you to the right comment at the bottom of the document. There is no indication of who made the comment (not even hidden within the html tags).

A comment in original Google doc

The same comment in docx as rendered by MS Word

...but in the XML buried deep within the docx file structure - we do have attribution and date/time
(though clearly in a different time zone)

What about bulk export options?

Ed Pinsent pointed me to the Google Takeout Service which allows you to:
"Create an archive with your data from Google products"
[Google's words not mine - and perhaps this is a good time to point you to Ed's blog post on the meaning of the term 'Archive']

This is really useful. It allows you to download Google Drive files in bulk and to select which formats you want to export them as.

I tested this a couple of times and was surprised to discover that if you select pdf or docx (and perhaps other formats that I didn't test) as your export format of choice, the takeout service creates the file in the format requested and an html file which includes all comments within the document (even those that have been resolved). The content of the comments/responses including dates and times is all included within the html file, as are names of individuals.

The downside of the Google Takeout Service is that it only allows you to select folders and not individual files. There is another incentive for us to organise our files better! The other issue is that it will only export documents that you are the owner of - and you may not own everything that you want to archive!

What's missing?

Quite a lot actually.

The owner, creation and last modified dates of a document in Google Drive are visible when you click on Document details... within the File menu. Obviously this is really useful information for the archive but is lost as soon as you download it into one of the available export formats.

Creation and last modified dates as visible in Document details

Update: I was pleased to see that if using the Google Takeout Service to bulk export files from Drive, the last modified dates are retained, however on single file export/download these dates are lost and the last modified date of the file becomes the date that you carried out the export. 

Part of the revision history of my Google doc
But of course in a Google document there is more metadata. Similar to the 'Page History' that I mentioned when talking about preserving wiki pages, a Google document has a 'Revision history'

Again, this *could* be useful to the archive. Perhaps not so much so for my document which I worked on by myself in March, but I could see more of a use case for mapping and recording the creative process of writing a novel for example. 

Having this revision history would also allow you to do some pretty cool stuff such as that described in this blog post: How I reverse engineered Google Docs to play back any documents Keystrokes (thanks to Nick Krabbenhoft for the link).

It would seem that the only obvious way to retain this information would be to keep the documents in their original native Google format within Google Drive but how much confidence do we have that it will be safe there for the long term?

Conclusions

If you want to preserve a Google Drive document there are several options but no one-size-fits-all solution.

As always it boils down to what the significant properties of the document are. What is it we are actually trying to preserve?

  • If we want a fairly accurate but non interactive digital 'print' of the document, pdf might be the most accurate representation though even the pdf export can't be relied on to retain the exact pagination. Note that I didn't try and validate the pdf files that I exported and sadly there is no pdf/a export option.
  • If comments are seen to be a key feature of the document then docx or odt will be a good option but again this is not perfect. With the test document I used, comments were not always linked to the correct point within the document.
  • If it is possible to get the owner of the files to export them, the Google Takeout Service could be used. Perhaps creating a pdf version of the static document along with a separate html file to capture the comments.

A key point to note is that all export options are imperfect so it would be important to check the exported document against the original to ensure it accurately retains the important features.

Another option would be simply keeping them in their native format but trying to get some level of control over them - taking ownership and managing sharing and edit permissions so that they can't be changed. I've been speaking to one of our Google Drive experts in IT about the logistics of this. A Google Team Drive belonging to the Archives could be used to temporarily store and lock down Google documents of archival value whilst we wait and see what happens next. 

...I live in hope that export options will improve in the future.

This is a work in progress and I'd love to find out what others think.




* note, I've also been looking at Google Sheets and that may be the subject of another blog post


Security updates 1.2.5, 1.1.9 and 1.0.11 released

Published 27 Apr 2017 by Roundcube Webmail Dev Team in Roundcube Webmail Project News.

We just published updates to all stable versions 1.x delivering important bug fixes and improvements which we picked from the upstream branch.

The updates primarily fix a recently discovered vulnerability in the virtualmin and sasl drivers of the password plugin (CVE-2017-8114). Security-wise the update is therefore only relevant for those installations of Roundcube using the password plugin with either one of these drivers.

See the full changelog for the according version in the release notes on the Github download pages: v1.2.5, v1.1.9, v1.0.11

All versions are considered stable and we recommend to update all productive installations of Roundcube with either of these versions.

As usual, don’t forget to backup your data before updating!


Legal considerations regarding hosting a MediaWiki site

Published 27 Apr 2017 by Oliver K in Newest questions tagged mediawiki - Webmasters Stack Exchange.

What legal considerations are there when creating a wiki using MediaWiki for people to use worldwide?

For example, I noticed there are privacy policies & terms and conditions; are these required to safeguard me from any legal battles?


Things I hope you learn in GLAM School

Published 27 Apr 2017 by inthemailbox in In the mailbox.

I’ve just realised that I haven’t blogged for a very long time, so lest you think me moribund, it’s time to start typing. I have a few things I want to say about collections software and the GLAMPeak project, as well as pulling some thoughts together on the Open Government initiative, so there will be some slightly more professional blogposts after this, I promise.

But today, to get the writing process back underway, I’m going to munge together two #GLAMBlogClub topics – hope, and what I wish they’d taught me in GLAM School. It’s been a few years since I was in GLAM school, but not that long since I left teaching. Reading through the blogs, though, reminded me very much of that long distant self, who wrote a letter to her lecturer, the lovely Peter Orlovich, bemoaning the gap between practice and theory. I also wrote one to the WA Museums Australia co-ordinator, Stephen Anstey, when I could not get a job for love or money.  And they basically said this:

It’s just not possible to learn all the things, all the technical details or peculiar ways that people reinvent the wheel, in just three or four, or one or two years. What you can learn, and what we hope you learn, is how to learn. GLAM school should provide you with a fundamental structure for understanding and implementing theory in practical ways.  The basic theoretical foundations for archival or library description, museum collection management or art history will remain, even as new theoretical concepts are added that build on what we know from the past. The way we implement those concepts will depend on our collections, our resources, our own strengths and weaknesses, but if you can learn, you can change, grow and adapt.

Be bold in your choices. GLAM school, like any good school, will have taught you how to read, research and analyse content. It will teach you how to express yourself in a range of communication styles and platforms. The tests and stresses that you experience at GLAM school will help you temper the way you respond to those stresses in the work place.  We can, and do, try to provide experiences and examples in an environment where you are supported to fail, and to try again.

Do not put artificial limits on yourselves.

And, give yourselves hope. You have the skills, they just need sharpening and developing. Try, and try again.

Finally – “Keep interested in your own career, however humble;
it is a real possession in the changing fortunes of time.”

(Max Ehrmann, The Desiderata)



Release Candidate for version 1.3

Published 25 Apr 2017 by Roundcube Webmail Dev Team in Roundcube Webmail Project News.

We just published the feature-complete version for the next major version 1.3 of Roundcube webmail for final testing. After dropping support for older browsers and PHP versions and adding some new features like the widescreen layout, the release candidate finalizes that work and also fixes two security issues plus adds improvements to the Managesieve and Enigma plugins.

We also slightly polished the Larry theme to make it look a little less 2010. Still, the default theme doesn’t work on mobile devices but a fully responsive skin is currently being worked on.

As a reminder: if you’re installing the dependent package or run Roundcube directly from source, you now need to install the removed 3rd party javascript modules by executing the following install script:

$ bin/install-jsdeps.sh

With the upcoming stable release of 1.3.0 the old 1.x series will only receive important security fixes.

See the complete Changelog and download the new packages from roundcube.net/download.

Please note that this is a release candidate and we recommend to test it on a separate environment. And don’t forget to backup your data before installing it.


Current state of Babe

Published 23 Apr 2017 by camiloh in blogs.kde.org blogs.

KDE Project:

A better view of this post can be found here:
https://medium.com/@temisclopeolimac/current-state-of-babe-9fb56ce16ac6

To continue my last post about Babe [1] where I wrote about a little of its history, in this new entry (I’ve now switched from the KDE blogs to Medium) I will tell you about the current state of Babe and the features implemented so far.
[1] https://blogs.kde.org/2017/04/14/introducing-babe-history

So welcome to this walk through Babe:

To start: In the last post I wrote about wanting to make in the first place a tiny music player, and that’s why I first started the GTK3 version of Babe and tried to make it look like a plain playlist highlighting the cover artwork.

The now present version of Babe still sticks to that idea, but also, to make of it a more powerful music player, it has an integrated music collection manager, with different views and features.

One of the core points of Babe is the Babe-it action. Babe-it basically means make a track a favorite and put it in the main playlist. A Babed track is a new hot song you really dig for now, like a new song you found on YouTube and added to your collection via the Babe Chromium extension ;).

The Babe-it action can be found on the contextual menu to mark as babe any track on your collection and also in the playback toolbar controls to mark as Babe the current playing track. Also via the native KDE notification system you can Babe or Un-Babe the current playing song without having to interact with the babe interface.
Contextual actions on a track. Babe-it comes first ;)


View Modes:

When you first launch Babe it will make use of the called “Playlist Mode”, but in the case scenario where there is not music to play, then it will change to the “Collection Mode” to let you add new music sources or drag and drop music tracks, albums or artists.

Babe also has a “Mini Mode” that just displays the artwork and the basic playback functions.

So Babe has 3 view modes:

Collection Mode where all the collection manager views are visible: all tracks, artists, albums, playlists, info, online and settings view.

At the left you found the views toolbar and at the bottom the so called “Utilsbar” where all the useful actions for each specific view are placed.

Playlist Mode where only the current tracks on the playlist and the artwork on top are visible.

In this mode you also get a mini “Utilsbar” This bar has 4 main utilities:
1- the calibrator, which allows you to: clear the list and only put your Babes on play, to clear the whole list to put together a new playlist, to save the the current list as a playlist, to use a different playlist from the ones you already created and finally to remove the repeated songs from the list.
2- a go-back button which takes you back to the view you were before at the Collection Mode.
3- a filter option that will let you quickly put on play your search queries without having to go back to the Collection Mode. Search queries can go like this: “artist:aminé,artist:lana del rey,genre:indie” or simply “caroline”.
4- and finally an open dialog for you to select a file on your system.(you can also drag and drop those files instead)

Mini Mode that only displays the artwork and the basic playback functions.

The Collection Mode Views:
You can browse your collection by all tracks, albums, artists or playlists.
Those views have a permanent set of actions at the bottom Utilsbar:
Add the tracks on current table to the main playlist
Save the tracks on the current table to a playlist

Also there is a playAll button that shows when hovering the artwork, when pressed it will clear the main playlist and started playing all the tracks from the artist or from the album, depending of which type of artwork was clicked.

Menu Actions:

From the contextual menu you can perform these actions on a track:
Babe-it or Unbabe-it
Put it on Queue
Send to phone by using the KDE Connect system
Get info from the track, such as music lyrics, artist bio, album info, related artists and tags
Edit the file metadata
Save track to a location
Remove track from the list
Add track to an existing playlist
Rate it from 1 to 5 stars
And finally set a color mood for the track. Tracks with a colored-mood can color the interface.

Babe Chromium Extension:

By making use of the Babe Chromium extension you can fetch your favorite YouTube music videos and add them to your local music collection. Babe will find relevant information to catalog them and the add them to the main playlist for you to listen later on.
Babe makes use of youtube-dl, so the supported sites by it are also supported on Babe, such as Vimeo… etc

The Info View:

The info view is going to be the main focus point on future development, the main idea for Babe is to become a contextual music collection manager, that meaning that Babe will let your discover new music based on the tracks you collect and listen to, and also show you contextual information about those.
Right now this is still work on progress but for now you can enjoy the following features:
While showing info about a track you can click on the generated tags to look fro those on your local collection

The Search Results
This is another focal point for babe, but I will cover this one in a future blog post.
Hope you like and I’m hoping to read about what you think and your ideas.


mosh, the disconnection-resistant ssh

Published 22 Apr 2017 by Carlos Fenollosa in Carlos Fenollosa — Blog.

The second post on this blog was devoted to screen and how to use it to make persistent SSH sessions.

Recently I've started using mosh, the mobile shell. It's targeted to mobile users, for example laptop users who might get short disconnections while working on a train, and it also provides a small keystroke buffer to get rid of network lag.

It really has little drawbacks and if you ever ssh to remote hosts and get annoyed because your vim sessions or tail -F windows get disconnected, give mosh a try. I strongly recommend it.

Tags: software, unix

Comments? Tweet  


In conversation with the J.S. Battye Creative Fellows

Published 19 Apr 2017 by carinamm in State Library of Western Australia Blog.

How can contemporary art lead to new discoveries about collections and ways of engaging with history?  Nicola Kaye and Stephen Terry will discuss this idea drawing from the experience of creating Tableau Vivant and the Unobserved.

In conversation with the J.S. Battye Creative Fellows
Thursday 27 April, 6pm
State Library Theatre.

April 4 Tableau Vivant Image_darkened_2

Tableau Vivant and the Unobserved is the culmination of the State Library’s inaugural J.S. Battye Creative Fellowship.  The Creative Fellowship aims to enhance engagement with the Library’s heritage collections and provide new experiences for the public.

Tableau Vivant and the Unobserved
visually questions how history is made, commemorated and forgotten. Through digital art installation, Nicola Kaye and Stephen Terry expose the unobserved and manipulate our perception of the past.  Their work juxtaposes archival and contemporary imagery to create an interactive experience for the visitor where unobserved lives from the archive collide with the contemporary world. The installation is showing at the State Library until 12 May 2017.

For more information visit: http://www.slwa.wa.gov.au


Filed under: community events, Exhibitions, Pictorial, SLWA collections, SLWA displays, SLWA Exhibitions, SLWA news, State Library of Western Australia, talks, Western Australia Tagged: contemporary art, discussion, installation, J.S. Battye Creative Fellowship, Nicola Kaye, Stephen Terry, talk

1.5

Published 17 Apr 2017 by mblaney in Tags from simplepie.

Merge pull request #510 from mblaney/master

Version bump to 1.5 due to changes to Category class.


Introducing Babe - History

Published 14 Apr 2017 by camiloh in blogs.kde.org blogs.

KDE Project:

https://babe.kde.org/

This is my very first post for KDE blogs and it is also my very first application. So when I sit down to think about what to write about I thought I would like to tell you all about how and why I wanted to start coding and then why I decided to create a (yet another (i know)) music player, specially made for KDE/Plasma.

So here comes the story:

I've been using Linux/GNU for almost ten years now, that was when I was still in high school, I always thought Linux based distros looked so cool and kind of mysterious, so then I decided to wipe off my Windows XP installation and move to Ubuntu, since then I have not looked back and I'm glad because not only i found a great OS but also a great group of communities behind it. I first got involved with the community by making GTK/CSS themes and small icon sets.

Let's say I always found the visual part the most interesting, so I tried all the available desktop environments, visual appealing applications and themes. Among those apps, I always liked to check out the default music players of each distro and their set of multimedia applications.
I can say I've pretty much tested almost all of the Linux music players that have appeared in the wild. Some looked cool, others boring, and they worked... some others were buggy as hell... and many others were a very nice and complete tool to manage your local music collection but didn't look that great or well integrated.

Anyway, I finished high school and then went to University to the Arts program. Two years ago I also started the Computer Science bachelor program and then began to start developing small console apps. By then I was using elementary OS, because it looked nice and polish, and then was went I first wanted to create my very own music player to satisfied my own needs and also to learn a new graphic toolkit (GTK3)


I wanted to have a simple tiny music player that resembled a playlist where I could keep my favorite (Babes) music at the moment, I didn't care much about managing the whole music collection, as I didn't have much local music files anyway.

I did what I wanted and then I stopped developing it. By the time I tried Plasma once again and I liked it very much, the new Breeze theme looked awesome and the tools were much more advanced than the ones from elementaryOS and then I decided to stay. :)

I kept on using my small and kind of broken music player, but then I found myself using a lot the youtube-dl to get my music, given that most of the music I listen to is music that I've found/discovered while watching another (YouTube) music videos. That's when I decided to once again go back to Babe and make it fetch my favorite YouTube music videos. But by then I was using KDE/Plasma instead of a GTK based D.E. So i started learning about the Qt framework because I wanted Babe to look good and well integrated in the Plasma desktop.


My plans for Babe-Qt were simple: fetch my favorite music and then play it in a tiny interface. But, oh well, my music collection started to grow and then I decided i could make use of a collection manager integrated when needed, to be able to create another playlists besides from my favorites (Babes)... and then I implemented a full collection view, artist, albums, playlists and info view.

The "info" view then became really important: I wanted to be able to get as much information of a track besides the basic metadata information, I wanted to know about the lyrics, the artwork, the artist and the album. And even eventually I wanted to be able to find similar songs... and that is what Babe is now trying to aim at, but that's something I will tell you in a next blog post... I want to introduce a contextual music collection manager.

That's it for now, but I will be writing to you all back soon, and letting you know about:
-Current state of Babe
-Planned features
-The future
-Conceptual ideas

;)


Interview on Stepping Off: Rewilding and Belonging in the South West

Published 14 Apr 2017 by Tom Wilson in thomas m wilson.

You can listen to a recent radio interview I did about my new book with Adrian Glamorgan here.

Interview on Stepping Off: Rewilding and Belonging in the South West

Published 14 Apr 2017 by Tom Wilson in thomas m wilson.

You can listen to a recent radio interview I did about my new book with Adrian Glamorgan here.

Wikimania submisison: apt install mediawiki

Published 9 Apr 2017 by legoktm in The Lego Mirror.

I've submitted a talk to Wikimania titled apt install mediawiki. It's about getting the MediaWiki package back into Debian, and efforts to improve the overall process. If you're interested, sign up on the submissions page :)


Archivematica Camp York: Some thoughts from the lake

Published 7 Apr 2017 by Jenny Mitcham in Digital Archiving at the University of York.

Well, that was a busy week!

Yesterday was the last day of Archivematica Camp York - an event organised by Artefactual Systems and hosted here at the University of York. The camp's intention was to provide a space for anyone interested in or currently using Archivematica to come together, learn about the platform from other users, and share their experiences. I think it succeeded in this, bringing together 30+ 'campers' from across the UK, Europe and as far afield as Brazil for three days of sessions covering different aspects of Archivematica.

Our pod on the lake (definitely a lake - not a pond!)
My main goal at camp was to ensure everyone found their way to the rooms (including the lakeside pod) and that we were suitably fuelled with coffee, popcorn and cake. Alongside these vital tasks I also managed to partake in the sessions, have a play with the new version of Archivematica (1.6) and learn a lot in the process.

I can't possibly capture everything in this brief blog post so if you want to know more, have a look back at all the #AMCampYork tweets.

What I've focused on below are some of the recurring themes that came up over the three days.

Workflows

Archivematica is just one part of a bigger picture for institutions that are carrying out digital preservation, so it is always very helpful to see how others are implementing it and what systems they will be integrating with. A session on workflows in which participants were invited to talk about their own implementations was really interesting. 

Other sessions  also helped highlight the variety of different configurations and workflows that are possible using Archivematica. I hadn't quite realised there were so many different ways you could carry out a transfer! 

In a session on specialised workflows, Sara Allain talked us through the different options. One workflow I hadn't been aware of before was the ability to include checksums as part of your transfer. This sounds like something I need to take advantage of when I get Archivematica into production for the Borthwick. 

Justin talking about Automation Tools
A session on Automation Tools with Justin Simpson highlighted other possibilities - using Archivematica in a more automated fashion. 

We already have some experience of using Automation Tools at York as part of the work we carried out during phase 3 of Filling the Digital Preservation Gap, however I was struck by how many different ways these can be applied. Hearing examples from other institutions and for a variety of different use cases was really helpful.


Appraisal

The camp included a chance to play with Archivematica version 1.6 (which was only released a couple of weeks ago) as well as an introduction to the new Appraisal and Arrangement tab.

A session in progress at Archivematica Camp York
I'd been following this project with interest so it was great to be able to finally test out the new features (including the rather pleasing pie charts showing what file formats you have in your transfer). It was clear that there were a few improvements that could be made to the tab to make it more intuitive to use and to deal with things such as the ability to edit or delete tags, but it is certainly an interesting feature and one that I would like to explore more using some real data from our digital archive.

Throughout camp there was a fair bit of discussion around digital appraisal and at what point in your workflow this would be carried out. This was of particular interest to me being a topic I had recently raised with colleagues back at base.

The Bentley Historical Library who funded the work to create the new tab within Archivematica are clearly keen to get their digital archives into Archivematica as soon as possible and then carry out the work there after transfer. The addition of this new tab now makes this workflow possible.

Kirsty Lee from the University of Edinburgh described her own pre-ingest methodology and the tools she uses to help her appraise material before transfer to Archivematica. She talked about some tools (such as TreeSize Pro) that I'm really keen to follow up on.

At the moment I'm undecided about exactly where and how this appraisal work will be carried out at York, and in particular how this will work for hybrid collections so as always it is interesting to hear from others about what works for them.


Metadata and reporting

Evelyn admitting she loves PREMIS and METS
Evelyn McLellan from Artefactual led a 'Metadata Deep Dive' on day 2 and despite the title, this was actually a pretty interesting session!

We got into the details of METS and PREMIS and how they are implemented within Archivematica. Although I generally try not to look too closely at METS and PREMIS it was good to have them demystified. On the first day through a series of exercises we had been encouraged to look at a METS file created by Archivematica ourselves and try and pick out some information from it so these sessions in combination were really useful.

Across various sessions of the camp there was also a running discussion around reporting. Given that Archivematica stores such a detailed range of metadata in the METS file, how do we actually make use of this? Being able to report on how many AIPs have been created, how many files and what size is useful. These are statistics that I currently collect (manually) on a quarterly basis and share with colleagues. Once Archivematica is in place at York, digging further into those rich METS files to find out which file formats are in the digital archive would be really helpful for preservation planning (among other things). There was discussion about whether reporting should be a feature of Archivematica or a job that should be done outside Archivematica.

In relation to the later option - I described in one session how some of our phase 2 work of Filling the Digital Preservation Gap was designed to help expose metadata from Archivematica to a third party reporting system. The Jisc Research Data Shared Service was also mentioned in this context as reporting outside of Archivematica will need to be addressed as part of this project.

Community

As with most open source software, community is important. This was touched on throughout the camp and was the focus of the last session on the last day.

There was a discussion about the role of Artefactual Systems and the role of Archivematica users. Obviously we are all encouraged to engage and help sustain the project in whatever way we are able. This could be by sharing successes and failures (I was pleased that my blog got a mention here!), submitting code and bug reports, sponsoring new features (perhaps something listed on the development roadmap) or helping others by responding to queries on the mailing list. It doesn't matter - just get involved!

I was also able to highlight the UK Archivematica group and talk about what we do and what we get out of it. As well as encouraging new members to the group, there was also discussion about the potential for forming other regional groups like this in other countries.

Some of the Archivematica community - class of Archivematica Camp York 2017

...and finally

Another real success for us at York was having the opportunity to get technical staff at York working with Artefactual to resolve some problems we had with getting our first Archivematica implementation into production. Real progress was made and I'm hoping we can finally start using Archivematica for real at the end of next month.

So, that was Archivematica Camp!

A big thanks to all who came to York and to Artefactual for organising the programme. As promised, the sun shined and there were ducks on the lake - what more could you ask for?



Thanks to Paul Shields for the photos

Failover in local accounts

Published 7 Apr 2017 by MUY Belgium in Newest questions tagged mediawiki - Server Fault.

I would like to use mediawiki as documentation with access privileges. I use the LdapAuthentication extension (here : https://www.mediawiki.org/wiki/Extension:LDAP_Authentication/Configuration_Options ) in order to get user authenticated against a LDAP.

For various reason, the authentication should continue working even if the LDAP fails.

How can I get a fail-over (for example using the passwords in the local SQL database?) which should enable the wiki to remains accessible even if infrastructure fails?


Shiny New History in China: Jianshui and Tuanshan

Published 6 Apr 2017 by Tom Wilson in thomas m wilson.

  The stones in this bridge are not all in a perfect state of repair.  That’s part of its charm.  I’m just back from a couple of days down at Jianshui, a historic town a few hours south of Kunming with a large city wall and a towering city gate.  The trip has made me reflect on […]

Shiny New History in China: Jianshui and Tuanshan

Published 6 Apr 2017 by Tom Wilson in thomas m wilson.

  The stones in this bridge are not all in a perfect state of repair.  That’s part of its charm.  I’m just back from a couple of days down at Jianshui, a historic town a few hours south of Kunming with a large city wall and a towering city gate.  The trip has made me reflect on […]

Complex text input in Plasma

Published 6 Apr 2017 by eike hein in blogs.kde.org blogs.

KDE Project:

Binary keyboard
Surprisingly not enough

A brief note: If you're a developer or user of input methods in the free desktop space, or just interested in learning about "How does typing Chinese work anyway?", you might be interested in a discussion we're now having on the plasma-devel mailing list. In my opening mail I've tried to provide a general overview about what input methods are used for, how they work, who they benefit, and what we must do to improve support for them in KDE Plasma.

Bringing high-quality text input to as many language users as possible, as well as surfacing functionality such as Emoji input and word completion in a better way, is something we increasingly care about. With the situation around complex text input on Wayland and specifically KWin still in a state of flux and needing-to-crystallize, we're looking to form closer ties with developers and users in this space. Feel free to chime in on the list or hang out with us in #plasma on freenode.


Update 1.0.10 released

Published 5 Apr 2017 by Roundcube Webmail Dev Team in Roundcube Webmail Project News.

We just published a security update to the LTS version 1.0. It contains some important bug fixes and security improvements backported from the master version.

It’s considered stable and we recommend to update all productive installations of Roundcube with this version. Download it from roundcube.net/download.

Please do backup before updating!


Simon 0.4.80 alpha released

Published 3 Apr 2017 by fux in blogs.kde.org blogs.

KDE Project:

The first version (0.4.80) towards Simon 0.5.0 is out in the wilds. Please download the source code, test it and send us feedback.

Some new features are:

  • MPRIS Support (Media Player control)
  • MacOS port (thanks to René there is a first Macports script
  • A series of bug fixes.

You can get it here:
https://download.kde.org/unstable/simon/0.4.80/src/simon-0.4.80.tar.xz.mirrorlist

In the work is also an AppImage version of Simon for easy testing. We hope to deliver one for the Beta release coming soon.

Known issues with Simon 0.4.80 are:

  • Base model download doesn't work as expected. You can search for some base models though (BUG: 377968)
  • Some Scenarios available for download don't work anymore (BUG: 375819)
  • Simon can't add Arabic or Hebrew words (BUG: 356452)

We hope to fix these bugs and look forward to your feedback and bug reports and maybe to see you at the next Simon IRC meeting: Tuesday, 4th of April, at 10pm (UTC+2) in #kde-accessibility on freenode.net.

About Simon
Simon is an open source speech recognition program that can replace your mouse and keyboard. The system is designed to be as flexible as possible and will work with any language or dialect. For more information take a look at the Simon homepage.


Tableau Vivant and the Unobserved

Published 30 Mar 2017 by carinamm in State Library of Western Australia Blog.

April 4 Tableau Vivant Image_darkened_2.jpg

Still scene: Tableau Vivant and the Unobserved, 2016, Nicola Kaye, Stephen Terry.

Tableau Vivant and the Unobserved visually questions how history is made, commemorated and forgotten. Through digital art installation, Nicola Kaye and Stephen Terry expose the unobserved and manipulate our perception of the past.  Their work juxtaposes archival and contemporary imagery to create an experience for the visitor where unobserved lives from the archive collide with the contemporary world.

Tableau Vivant and the Unobserved is the culmination of the State Library’s inaugural J.S. Battye Creative Fellowship.  The Creative Fellowship aims to enhance engagement with the Library’s heritage collections and provide new experiences for the public.

Artists floor talk
Thursday 6 April, 6pm
Ground Floor Gallery, State Library of Western Australia.

Nicola Kaye and Stephen Terry walk you through Tableau Vivant and the Unobserved

In conversation with the J.S. Battye Creative Fellows
Thursday 27 April, 6pm
State Library Theatre.

How can contemporary art lead to new discoveries about collections and ways of engaging with history?  Nicola Kaye and Stephen Terry will discuss this idea drawing from the experience of creating Tableau Vivant and the Unobserved.

Tableau Vivant and the Unobserved is showing at the State Library from 4 April – 12 May 2017.
For more information visit: www.slwa.wa.gov.au


Filed under: community events, Exhibitions, SLWA collections, SLWA displays, SLWA events, SLWA Exhibitions, SLWA news, State Library of Western Australia, talks, WA history, Western Australia Tagged: exhibitions, installation art, J.S. Battye Creative Fellowship, Nicola Kaye, Perth, Perth Cultural Centre, State Library of Western Australai, Stephen Terry, Tableau Vivant and the Unobserved

Remembering Another China in Kunming

Published 29 Mar 2017 by Tom Wilson in thomas m wilson.

Last weekend I headed out for a rock climbing session with some locals and expats.  First I had to cross town, and while doing so I came across an old man doing water calligraphy by Green Lake.  I love the transience of this art: the beginning of the poem is starting to fade by the time he reaches […]

Remembering Another China in Kunming

Published 29 Mar 2017 by Tom Wilson in thomas m wilson.

Last weekend I headed out for a rock climbing session with some locals and expats.  First I had to cross town, and while doing so I came across an old man doing water calligraphy by Green Lake.  I love the transience of this art: the beginning of the poem is starting to fade by the time he reaches […]

Week #11: Raided yet again

Published 27 Mar 2017 by legoktm in The Lego Mirror.

If you missed the news, the Raiders are moving to Las Vegas. The Black Hole is leaving Oakland (again) for a newer, nicer, stadium in the desert. But let's talk about how we got here, and how different this is from the moving of the San Diego Chargers to Los Angeles.

The current Raiders stadium is outdated and old. It needs renovating to keep up with other modern stadiums in the NFL. Owner Mark Davis isn't a multi-billionaire that could finance such a stadium. And the City of Oakland is definitely not paying for it. So the options left are find outside financing for Oakland, for find said financing somewhere else. And unfortunately it was the latter option that won out in the end.

I think it's unsurprising that more and more cities are refusing to put public money into stadiums that they will see no profit from - it makes no sense whatsoever.

Overall I think the Raider Nation will adapt and survive just as it did when they moved to Los Angeles. The Raiders still have an awkward two-to-three years left in Oakland, and with Derek Carr at the helm, it looks like they will be good ones.


Week #10: March Sadness

Published 23 Mar 2017 by legoktm in The Lego Mirror.

In California March Madness is really...March Sadness. The only Californian team that is still in is UCLA. UC Davis made it in but was quickly eliminated. USC and Saint Mary's both fell in the second round. Cal and Stanford didn't even make it in. At best we can root for Gonzaga, but that's barely it.

Some of us root for school's we went to, but for those of us who grew up here and support local teams, we're left hanging. And it's not bias in the selection commitee, those schools just aren't good enough.

On top of that we have a top notch professional team through the Warriors, but our amateur players just aren't up to muster.

So good luck to UCLA, represent California hella well. We somewhat believe in you.


Week #9: The jersey returns

Published 23 Mar 2017 by legoktm in The Lego Mirror.

And so it has been found. Tom Brady's jersey was in Mexico the whole time, stolen by a member of the press. And while it's great news for Brady, sports memorabilia fans, and the FBI, it doesn't look good for journalists. Journalists are given a lot of access to players, allowing them to obtain better content and get better interviews. It would not be surprising if the NFL responds to this incident by locking down the access that journalists are given. And that would be real bummer.

I'm hoping this is seen as an isolated incident and all journalists are not punished for the offenses by one.


Piwigo.com Enterprise plans, now official!

Published 23 Mar 2017 by Pierrick Le Gall in The Piwigo.com Blog.

In the shadow of the standard plan for several years and yet already adopted by more than 50 organizations, it is time to officially introduce the Piwigo.com Enterprise plans. They were designed for organizations, private or public, looking for a simple, affordable and yet complete tool to manage their collection of photos.

The main idea behind Piwigo.com Enterprise is to democratize photo library management for organizations of all kind and size. We are not targeting fortune 500, although some of them are already clients, but fortune 5,000,000 companies!

Piwigo.com Enterprise plans can replace, at a reasonable cost, inadequate solutions relying on intranet shared folders, where photos are sometimes duplicated, deleted by mistake, without the appropriate permission system.

Introduction to Piwigo.com Enterprise plans

Introduction to Piwigo.com Enterprise plans

Why announcing officially these plans today? Because the current trend obviously shows us that our Enterprise plans find its market. Although semi-official, Enterprise plans represented nearly 40% of our revenue in February 2017! It is time to put these plans under the spotlights.

In practice, here is what changes with the Piwigo.com Enterprise plans:

  1. they can be used by organizations, as opposed to the standard plan
  2. additional features, such as support for non-photo files (PDF, videos …)
  3. higher level of service (priority support, customization, presentation session)

Discover Piwigo.com Entreprise


Please Help Us Track Down Apple II Collections

Published 20 Mar 2017 by Jason Scott in ASCII by Jason Scott.

Please spread this as far as possible – I want to reach folks who are far outside the usual channels.

The Summary: Conditions are very, very good right now for easy, top-quality, final ingestion of original commercial Apple II Software and if you know people sitting on a pile of it or even if you have a small handful of boxes, please get in touch with me to arrange the disks to be imaged. apple@textfiles.com. 

The rest of this entry says this in much longer, hopefully compelling fashion.

We are in a golden age for Apple II history capture.

For now, and it won’t last (because nothing lasts), an incredible amount of interest and effort and tools are all focused on acquiring Apple II software, especially educational and engineering software, and ensuring it lasts another generation and beyond.

I’d like to take advantage of that, and I’d like your help.

Here’s the secret about Apple II software: Copy Protection Works.

Copy protection, that method of messing up easy copying from floppy disks, turns out to have been very effective at doing what it is meant to do – slow down the duplication of materials so a few sales can eke by. For anything but the most compelling, most universally interesting software, copy protection did a very good job of ensuring that only the approved disks that went out the door are the remaining extant copies for a vast majority of titles.

As programmers and publishers laid logic bombs and coding traps and took the brilliance of watchmakers and used it to design alternative operating systems, they did so to ensure people wouldn’t take the time to actually make the effort to capture every single bit off the drive and do the intense and exacting work to make it easy to spread in a reproducible fashion.

They were right.

So, obviously it wasn’t 100% effective at stopping people from making copies of programs, or so many people who used the Apple II wouldn’t remember the games they played at school or at user-groups or downloaded from AE Lines and BBSes, with pirate group greetings and modified graphics.

What happened is that pirates and crackers did what was needed to break enough of the protection on high-demand programs (games, productivity) to make them work. They used special hardware modifications to “snapshot” memory and pull out a program. They traced the booting of the program by stepping through its code and then snipped out the clever tripwires that freaked out if something wasn’t right. They tied it up into a bow so that instead of a horrendous 140 kilobyte floppy, you could have a small 15 or 20 kilobyte program instead. They even put multiple cracked programs together on one disk so you could get a bunch of cool programs at once.

I have an entire section of TEXTFILES.COM dedicated to this art and craft.

And one could definitely argue that the programs (at least the popular ones) were “saved”. They persisted, they spread, they still exist in various forms.

And oh, the crack screens!

I love the crack screens, and put up a massive pile of them here. Let’s be clear about that – they’re a wonderful, special thing and the amount of love and effort that went into them (especially on the Commodore 64 platform) drove an art form (demoscene) that I really love and which still thrives to this day.

But these aren’t the original programs and disks, and in some cases, not the originals by a long shot. What people remember booting in the 1980s were often distant cousins to the floppies that were distributed inside the boxes, with the custom labels and the nice manuals.

.

On the left is the title screen for Sabotage. It’s a little clunky and weird, but it’s also something almost nobody who played Sabotage back in the day ever saw; they only saw the instructions screen on the right. The reason for this is that there were two files on the disk, one for starting the title screen and then the game, and the other was the game. Whoever cracked it long ago only did the game file, leaving the rest as one might leave the shell of a nut.

I don’t think it’s terrible these exist! They’re art and history in their own right.

However… the mistake, which I completely understand making, is to see programs and versions of old Apple II software up on the Archive and say “It’s handled, we’re done here.” You might be someone with a small stack of Apple II software, newly acquired or decades old, and think you don’t have anything to contribute.

That’d be a huge error.

It’s a bad assumption because there’s a chance the original versions of these programs, unseen since they were sold, is sitting in your hands. It’s a version different than the one everyone thinks is “the” version. It’s precious, it’s rare, and it’s facing the darkness.

There is incredibly good news, however.

I’ve mentioned some of these folks before, but there is now a powerful allegiance of very talented developers and enthusiasts who have been pouring an enormous amount of skills into the preservation of Apple II software. You can debate if this is the best use of their (considerable) skills, but here we are.

They have been acquiring original commercial Apple II software from a variety of sources, including auctions, private collectors, and luck. They’ve been duplicating the originals on a bits level, then going in and “silent cracking” the software so that it can be played on an emulator or via the web emulation system I’ve been so hot on, and not have any change in operation, except for not failing due to copy protection.

With a “silent crack”, you don’t take the credit, you don’t make it about yourself – you just make it work, and work entirely like it did, without yanking out pieces of the code and program to make it smaller for transfer or to get rid of a section you don’t understand.

Most prominent of these is 4AM, who I have written about before. But there are others, and they’re all working together at the moment.

These folks, these modern engineering-minded crackers, are really good. Really, really good.

They’ve been developing tools from the ground up that are focused on silent cracks, of optimizing the process, of allowing dozens, sometimes hundreds of floppies to be evaluated automatically and reducing the workload. And they’re fast about it, especially when dealing with a particularly tough problem.

Take, for example, the efforts required to crack Pinball Construction Set, and marvel not just that it was done, but that a generous and open-minded article was written explaining exactly what was being done to achieve this.

This group can be handed a stack of floppies, image them, evaluate them, and find which have not yet been preserved in this fashion.

But there’s only one problem: They are starting to run out of floppies.

I should be clear that there’s plenty left in the current stack – hundreds of floppies are being processed. But I also have seen the effort chug along and we’ve been going through direct piles, then piles of friends, and then piles of friends of friends. We’ve had a few folks from outside the community bring stuff in, but those are way more scarce than they should be.

I’m working with a theory, you see.

My theory is that there are large collections of Apple II software out there. Maybe someone’s dad had a store long ago. Maybe someone took in boxes of programs over the years and they’re in the basement or attic. I think these folks are living outside the realm of the “Apple II Community” that currently exists (and which is a wonderful set of people, be clear). I’m talking about the difference between a fan club for surfboards and someone who has a massive set of surfboards because his dad used to run a shop and they’re all out in the barn.

A lot of what I do is put groups of people together and then step back to let the magic happen. This is a case where this amazingly talented group of people are currently a well-oiled machine – they help each other out, they are innovating along this line, and Apple II software is being captured in a world-class fashion, with no filtering being done because it’s some hot ware that everyone wants to play.

For example, piles and piles of educational software has returned from potential oblivion, because it’s about the preservation, not the title. Wonderfully done works are being brought back to life and are playable on the Internet Archive.

So like I said above, the message is this:

Conditions are very, very good right now for easy, top-quality, final ingestion of original commercial Apple II Software and if you know people sitting on a pile of it or even if you have a small handful of boxes, please get in touch with me to arrange the disks to be imaged. apple@textfiles.com.

I’ll go on podcasts or do interviews, or chat with folks on the phone, or trade lots of e-mails discussing details. This is a very special time, and I feel the moment to act is now. Alliances and communities like these do not last forever, and we’re in a peak moment of talent and technical landscape to really make a dent in what are likely acres of unpreserved titles.

It’s 4am and nearly morning for Apple II software.

It’d be nice to get it all before we wake up.

 


Managing images on an open wiki platform

Published 19 Mar 2017 by Oliver K in Newest questions tagged mediawiki - Webmasters Stack Exchange.

I'm developing a wiki page using MediaWiki and there are a few ways of inplementing images into wiki pages such as uploading them on the website and uploading them on external websites it potentially banning and requesting others to place an image.

Surely images may be difficult to manage as one day someone may upload a vulgar image and many people will then see it. How can I ensure vulgar images do not get through and that administrators aren't scarred for life after monitoring them?


Does the composer software have a command like python -m compileall ./

Published 18 Mar 2017 by jehovahsays in Newest questions tagged mediawiki - Server Fault.

I want to use composer for a mediawiki root folder with multiple directories
that need composer to install their dependencies
with a command like composer -m installall ./
For example , if the root folder was all written in python
i could use the command python -m compileall ./


Hilton Harvest Earth Hour Picnic and Concert

Published 18 Mar 2017 by Dave Robertson in Dave Robertson.

Share


Hilton Harvest Earth Hour Picnic and Concert

Published 18 Mar 2017 by Dave Robertson in Dave Robertson.

Share


Sandpapering Screenshots

Published 15 Mar 2017 by Jason Scott in ASCII by Jason Scott.

The collection I talked about yesterday was subjected to the Screen Shotgun, which does a really good job of playing the items, capturing screenshots, and uploading them into the item to allow people to easily see, visually, what they’re in for if they boot them up.

In general, the screen shotgun does the job well, but not perfectly. It doesn’t understand what it’s looking at, at all, and the method I use to decide the “canonical” screenshot is inherently shallow – I choose the largest filesize, because that tends to be the most “interesting”.

The bug in this is that if you have, say, these three screenshots:

…it’s going to choose the first one, because those middle-of-loading graphics for an animated title screen have tons of little artifacts, and the filesize is bigger. Additionally, the second is fine, but it’s not the “title”, the recognized “welcome to this program” image. So the best choice turns out to be the third.

I don’t know why I’d not done this sooner, but while waiting for 500 disks to screenshot, I finally wrote a program to show me all the screenshots taken for an item, and declare a replacement canonical title screenshot. The results have been way too much fun.

It turns out, doing this for Apple II programs in particular, where it’s removed the duplicates and is just showing you a gallery, is beautiful:

Again, the all-text “loading screen” in the middle, which is caused by blowing program data into screen memory, wins the “largest file” contest, but literally any other of the screens would be more appropriate.

This is happening all over the place: crack screens win over the actual main screen, the mid-loading noise of Apple II programs win over the final clean image, and so on.

Working with tens of thousands of software programs, primarily alone, means that I’m trying to find automation wherever I can. I can’t personally boot up each program and do the work needed to screenshot/describe it – if a machine can do anything, I’ll make the machine do it. People will come to me with fixes or changes if the results are particularly ugly, but it does leave a small amount that no amount of automation is likely to catch.

If you watch a show or documentary on factory setups and assembly lines, you’ll notice they can’t quite get rid of people along the entire line, especially the sign-off. Someone has to keep an eye to make sure it’s not going all wrong, or, even more interestingly, a table will come off the line and you see one person giving it a quick run-over with sandpaper, just to pare down the imperfections or missed spots of the machine. You still did an enormous amount of work with no human effort, but if you think that’s ready for the world with no final sign-off, you’re kidding yourself.

So while it does mean another hour or two looking at a few hundred screenshots, it’s nice to know I haven’t completely automated away the pleasure of seeing some vintage computer art, for my work, and for the joy of it.


Thoughts on a Collection: Apple II Floppies in the Realm of the Now

Published 15 Mar 2017 by Jason Scott in ASCII by Jason Scott.

I was connected with The 3D0G Knight, a long-retired Apple II pirate/collector who had built up a set of hundreds of floppy disks acquired from many different locations and friends decades ago. He generously sent me his entire collection to ingest into a more modern digital format, as well as the Internet Archive’s software archive.

The floppies came in a box without any sort of sleeves for them, with what turned out to be roughly 350 of them removed from “ammo boxes” by 3D0G from his parents’ house. The disks all had labels of some sort, and a printed index came along with it all, mapped to the unique disk ID/Numbers that had been carefully put on all of them years ago. I expect this was months of work at the time.

Each floppy is 140k of data on each side, and in this case, all the floppies had been single-sided and clipped with an additional notch with a hole punch to allow the second side to be used as well.

Even though they’re packed a little strangely, there was no damage anywhere, nothing bent or broken or ripped, and all the items were intact. It looked to be quite the bonanza of potentially new vintage software.

So, this activity at the crux of the work going on with both the older software on the Internet Archive, as well as what I’m doing with web browser emulation and increasing easy access to the works of old. The most important thing, over everything else, is to close the air gap – get the data off these disappearing floppy disks and into something online where people or scripts can benefit from them and research them. Almost everything else – scanning of cover art, ingestion of metadata, pulling together the history of a company or cross-checking what titles had which collaborators… that has nowhere near the expiration date of the magnetized coated plastic disks going under. This needs us and it needs us now.

The way that things currently work with Apple II floppies is to separate them into two classes: Disks that Just Copy, and Disks That Need A Little Love. The Little Love disks, when found, are packed up and sent off to one of my collaborators, 4AM, who has the tools and the skills to get data of particularly tenacious floppies, as well as doing “silent cracks” of commercial floppies to preserve what’s on them as best as possible.

Doing the “Disks that Just Copy” is a mite easier. I currently have an Apple II system on my desk that connects via USB-to-serial connection to my PC. There, I run a program called Apple Disk Transfer that basically turns the Apple into a Floppy Reading Machine, with pretty interface and everything.

Apple Disk Transfer (ADT) has been around a very long time and knows what it’s doing – a floppy disk with no trickery on the encoding side can be ripped out and transferred to a “.DSK” file on the PC in about 20 seconds. If there’s something wrong with the disk in terms of being an easy read, ADT is very loud about it. I can do other things while reading floppies, and I end up with a whole pile of filenames when it’s done. The workflow, in other words, isn’t so bad as long as the floppies aren’t in really bad shape. In this particular set, the floppies were in excellent shape, except when they weren’t, and the vast majority fell into the “excellent” camp.

The floppy drive that sits at the middle of this looks like some sort of nightmare, but it helps to understand that with Apple II floppy drives, you really have to have the cover removed at all time, because you will be constantly checking the read head for dust, smudges, and so on. Unscrewing the whole mess and putting it back together for looks just doesn’t scale. It’s ugly, but it works.

It took me about three days (while doing lots of other stuff) but in the end I had 714 .dsk images pulled from both sides of the floppies, which works out to 357 floppy disks successfully imaged. Another 20 or so are going to get a once over but probably are going to go into 4am’s hands to get final evaluation. (Some of them may in fact be blank, but were labelled in preparation, and so on.) 714 is a lot to get from one person!

As mentioned, an Apple II 5.25″ floppy disk image is pretty much always 140k. The names of the floppy are mine, taken off the label, or added based on glancing inside the disk image after it’s done. For a quick glance, I use either an Apple II emulator called Applewin, or the fantastically useful Apple II disk image investigator Ciderpress, which is a frankly the gold standard for what should be out there for every vintage disk/cartridge/cassette image. As might be expected, labels don’t always match contents. C’est la vie.

As for the contents of the disks themselves; this comes down to what the “standard collection” was for an Apple II user in the 1980s who wasn’t afraid to let their software library grow utilizing less than legitimate circumstances. Instead of an elegant case of shiny, professionally labelled floppy diskettes, we get a scribbled, messy, organic collection of all range of “warez” with no real theme. There’s games, of course, but there’s also productivity, utilities, artwork, and one-off collections of textfiles and documentation. Games that were “cracked” down into single-file payloads find themselves with 4-5 other unexpected housemates and sitting behind a menu. A person spending the equivalent of $50-$70 per title might be expected to have a relatively small and distinct library, but someone who is meeting up with friends or associates and duplicating floppies over a few hours will just grab bushels of strange.

The result of the first run is already up on the Archive: A 37 Megabyte .ZIP file containing all the images I pulled off the floppies. 

In terms of what will be of relevance to later historians, researchers, or collectors, that zip file is probably the best way to go – it’s not munged up with the needs of the Archive’s structure, and is just the disk images and nothing else.

This single .zip archive might be sufficient for a lot of sites (go git ‘er!) but as mentioned infinite times before, there is a very strong ethic across the Internet Archive’s software collection to make things as accessible as possible, and hence there are over nearly 500 items in the “3D0G Knight Collection” besides the “download it all” item.

The rest of this entry talks about why it’s 500 and not 714, and how it is put together, and the rest of my thoughts on this whole endeavor. If you just want to play some games online or pull a 37mb file and run, cackling happily, into the night, so be it.

The relatively small number of people who have exceedingly hard opinions on how things “should be done” in the vintage computing space will also want to join the folks who are pulling the 37mb file. Everything else done by me after the generation of the .zip file is in service of the present and near future. The items that number in the hundreds on the Archive that contain one floppy disk image and interaction with it are meant for people to find now. I want someone to have a vague memory of a game or program once interacted with, and if possible, to find it on the Archive. I also like people browsing around randomly until something catches their eye and to be able to leap into the program immediately.

To those ends, and as an exercise, I’ve acquired or collaborated on scripts to do the lion’s share of analysis on software images to prep them for this living museum. These scripts get it “mostly” right, and the rough edges they bring in from running are easily smoothed over by a microscopic amount of post-processing manual attention, like running a piece of sandpaper over a machine-made joint.

Again, we started out 714 disk images. The first thing done was to run them against a script that has hash checksums for every exposed Apple II disk image on the Archive, which now number over 10,000. Doing this dropped the “uniquely new” disk images from 714 to 667.

Next, I concatenated disk images that are part of the same product into one item: if a paint program has two floppy disk images for each of the sides of its disk, those become a single item. In one or two cases, the program spans multiple floppies, so 4-8 (and in one case, 14!) floppy images become a single item. Doing this dropped the total from 667 to 495 unique items. That’s why the number is significantly smaller than the original total.

Let’s talk for a moment about this.

Using hashes and comparing them is the roughest of rough approaches to de-duplicating software items. I do it with Apple II images because they tend to be self contained (a single .dsk file) and because Apple II software has a lot of people involved in it. I’m not alone by any means in acquiring these materials and I’m certainly not alone in terms of work being done to track down all the unique variations and most obscure and nearly lost packages written for this platform. If I was the only person in the world (or one of a tiny sliver) working on this I might be super careful with each and every item to catalog it – but I’m absolutely not; I count at least a half-dozen operations involving in Apple II floppy image ingestion.

And as a bonus, it’s a really nice platform. When someone puts their heart into an Apple II program, it rewards them and the end user as well – the graphics can be charming, the program flow intuitive, and the whole package just gleams on the screen. It’s rewarding to work with this corpus, so I’m using it as a test bed for all these methods, including using hashes.

But hash checksums are seriously not the be-all for this work. Anything can make a hash different – an added file, a modified bit, or a compilation of already-on-the-archive-in-a-hundred-places files that just happen to be grouped up slightly different than others. That said, it’s not overwhelming – you can read about what’s on a floppy and decide what you want pretty quickly; gigabytes will not be lost and the work to track down every single unique file has potential but isn’t necessary yet.

(For the people who care, the Internet Archive generates three different hashes (md5, crc32, sha1) and lists the size of the file – looking across all of those for comparison is pretty good for ensuring you probably have something new and unique.)

Once the items are up there, the Screen Shotgun whips into action. It plays the programs in the emulator, takes screenshots, leafs off the unique ones, and then assembles it all into a nice package. Again, not perfect but left alone, it does the work with no human intervention and gets things generally right. If you see a screenshot in this collection, a robot did it and I had nothing to do with it.

This leads, of course, to scaring out which programs are a tad not-bootable, and by that I mean that they boot up in the emulator and the emulator sees them and all, but the result is not that satisfying:

On a pure accuracy level, this is doing exactly what it’s supposed to – the disk wasn’t ever a properly packaged, self-contained item, and it needs a boot disk to go in the machine first before you swap the floppy. I intend to work with volunteers to help with this problem, but here is where it stands.

The solution in the meantime is a java program modified by Kevin Savetz, which analyzes the floppy disk image and prints all the disk information it can find, including the contents of BASIC programs and textfiles. Here’s a non-booting disk where this worked out. The result is that this all gets ingested into the search engine of the Archive, and so if you’re looking for a file within the disk images, there’s a chance you’ll be able to find it.

Once the robots have their way with all the items, I can go in and fix a few things, like screenshots that went south, or descriptions and titles that don’t reflect what actually boots up. The amount of work I, a single person, have to do is therefore reduced to something manageable.

I think this all works well enough for the contemporary vintage software researcher and end user. Perhaps that opinion is not universal.

What I can say, however, is that the core action here – of taking data away from a transient and at-risk storage medium and putting it into a slightly less transient, less at-risk storage medium – is 99% of the battle. To have the will to do it, to connect with the people who have these items around and to show them it’ll be painless for them, and to just take the time to shove floppies into a drive and read them, hundreds of times… that’s the huge mountain to climb right now. I no longer have particularly deep concerns about technology failing to work with these digital images, once they’re absorbed into the Internet. It’s this current time, out in the cold, unknown and unloved, that they’re the most at risk.

The rest, I’m going to say, is gravy.

I’ll talk more about exactly how tasty and real that gravy is in the future, but for now, please take a pleasant walk in the 3D0G Knight’s Domain.


The Followup

Published 14 Mar 2017 by Jason Scott in ASCII by Jason Scott.

Writing about my heart attack garnered some attention. I figured it was only right to fill in later details and describe what my current future plans are.

After the previous entry, I went back into the emergency room of the hospital I was treated at, twice.

The first time was because I “felt funny”; I just had no grip on “is this the new normal” and so just to understand that, I went back in and got some tests. They did an EKG, a blood test, and let me know all my stats were fine and I was healing according to schedule. That took a lot of stress away.

Two days later, I went in because I was having a marked shortness of breath, where I could not get enough oxygen in and it felt a little like I was drowning. Another round of tests, and one of the cardiologists mentioned a side effect of one of the drugs I was taking was this sort of shortness/drowning. He said it usually went away and the company claimed 5-7% of people got this side effect, but that they observed more like 10-15%. They said I could wait it out or swap drugs. I chose swap. After that, I’ve had no other episodes.

The hospital thought I should stay in Australia for 2 weeks before flying. Thanks to generosity from both MuseumNext and the ACMI, my hosts, that extra AirBnB time was basically paid for. MuseumNext also worked to help move my international flight ahead the weeks needed; a very kind gesture.

Kind gestures abounded, to be clear. My friend Rochelle extended her stay from New Zealand to stay an extra week; Rachel extended hers to match my new departure date. Folks rounded up funds and sent them along, which helped cover some additional costs. Visitors stopped by the AirBnB when I wasn’t really taking any walks outside, to provide additional social contact.

Here is what the blockage looked like, before and after. As I said, roughly a quarter of my heart wasn’t getting any significant blood and somehow I pushed through it for nearly a week. The insertion of a balloon and then a metal stent opened the artery enough for the blood flow to return. Multiple times, people made it very clear that this could have finished me off handily, and mostly luck involving how my body reacted was what kept me going and got me in under the wire.

From the responses to the first entry, it appears that a lot of people didn’t know heart attacks could be a lingering, growing issue and not just a bolt of lightning that strikes in the middle of a show or while walking down the street. If nothing else, I’m glad that it’s caused a number of people to be aware of how symptoms portray each other, as well as getting people to check up cholesterol, which I didn’t see as a huge danger compared to other factors, and which turned out to be significant indeed.

As for drugs, I’ve got a once a day waterfall of pills for blood pressure, cholesterol, heart healing, anti-clotting, and my long-handled annoyances of gout (which I’ve not had for years thanks to the pills). I’m on some of them for the next few months, some for a year, and some forever. I’ve also been informed I’m officially at risk for another heart attack, but the first heart attack was my hint in that regard.

As I healed, and understood better what was happening to me, I got better remarkably quick. There is a single tiny dot on my wrist from the operation, another tiny dot where the IV was in my arm at other times. Rachel gifted a more complicated Fitbit to replace the one I had, with the new one tracking sleep schedule and heart rate, just to keep an eye on it.

A day after landing back in the US, I saw a cardiologist at Mt. Sinai, one of the top doctors, who gave me some initial reactions to my charts and information: I’m very likely going to be fine, maybe even better than before. I need to take care of myself, and I was. If I was smoking or drinking, I’d have to stop, but since I’ve never had alcohol and I’ve never smoked, I’m already ahead of that game. I enjoy walking, a lot. I stay active. And as of getting out of the hospital, I am vegan for at least a year. Caffeine’s gone. Raw vegetables are in.

One might hesitate putting this all online, because the Internet is spectacularly talented at generating hatred and health advice. People want to help – it comes from a good place. But I’ve got a handle on it and I’m progressing well; someone hitting me up with a nanny-finger-wagging paragraph and 45 links to change-your-life-buy-my-book.com isn’t going to help much. But go ahead if you must.

I failed to mention it before, but when this was all going down, my crazy family of the Internet Archive jumped in, everyone from Dad Brewster through to all my brothers and sisters scrambling to find me my insurance info and what they had on their cards, as I couldn’t find mine. It was something really late when I first pinged everyone with “something is not good” and everyone has been rather spectacular over there. Then again, they tend to be spectacular, so I sort of let that slip by. Let me rectify that here.

And now, a little bit on health insurance.

I had travel insurance as part of my health insurance with the Archive. That is still being sorted out, but a large deposit had to be put on the Archive’s corporate card as a down-payment during the sorting out, another fantastic generosity, even if it’s technically a loan. I welcome the coming paperwork and nailing down of financial brass tacks for a specific reason:

I am someone who once walked into an emergency room with no insurance (back in 2010), got a blood medication IV, stayed around a few hours, and went home, generating a $20,000 medical bill in the process. It got knocked down to $9k over time, and I ended up being thrown into a low-income program they had that allowed them to write it off (I think). That bill could have destroyed me, financially. Therefore, I’m super sensitive to the costs of medical care.

In Australia, it is looking like the heart operation and the 3 day hospital stay, along with all the tests and staff and medications, are going to round out around $10,000 before the insurance comes in and knocks that down further (I hope). In the US, I can’t imagine that whole thing being less than $100,000.

The biggest culture shock for me was how little any of the medical staff, be they doctors or nurses or administrators, cared about the money. They didn’t have any real info on what things cost, because pretty much everything is free there. I’ve equating it to asking a restaurant where the best toilets to use a few hours after your meal – they might have some random ideas, but nobody’s really thinking that way. It was a huge factor in my returning to the emergency room so willingly; each visit, all-inclusive, was $250 AUD, which is even less in US dollars. $250 is something I’ll gladly pay for peace of mind, and I did, twice. The difference in the experince is remarkable. I realize this is a hot button issue now, but chalk me up as another person for whom a life-changing experience could come within a remarkably close distance of being an influence on where I might live in the future.

Dr. Sonny Palmer, who did insertion of my stent in the operating room.

I had a pile of plans and things to get done (documentaries, software, cutting down on my possessions, and so on), and I’ll be getting back to them. I don’t really have an urge to maintain some sort of health narrative on here, and I certainly am not in the mood to urge any lifestyle changes or preach a way of life to folks. I’ll answer questions if people have them from here on out, but I’d rather be known for something other than powering through a heart attack, and maybe, with some effort, I can do that.

Thanks again to everyone who has been there for me, online and off, in person and far away, over the past few weeks. I’ll try my best to live up to your hopes about what opportunities my second chance at life will give me.

 


Want to learn about Archivematica whilst watching the ducks?

Published 13 Mar 2017 by Jenny Mitcham in Digital Archiving at the University of York.

We are really excited to be hosting the first European Archivematica Camp here at the University of York next month - on the 4-6th April.

Don't worry - there will be no tents or campfires...but there may be some wildlife on the lake.


The Ron Cooke Hub on a frosty morning - hoping for some warmer weather for Camp!

The event is taking place at the Ron Cooke Hub over on our Heslington East campus. If you want to visit the beautiful City of York (OK, I'm biased!) and meet other European Archivematica users (or Archivematica explorers) this event is for you. Artefactual Systems will be leading the event and the agenda is looking very full and interesting.

I'm most looking forward to learning more about the workflows that other Archivematica users have in place or are planning to implement.


One of these lakeside 'pods' will be our breakout room


There are still places left and you can register for Camp here or contact the organisers at info@artefactual.com.

...and if you are not able to attend in person, do watch this blog in early April as you can guarantee I'll be blogging after the event!



Through the mirror-glass: Capture of artwork framed in glass.

Published 13 Mar 2017 by slwacns in State Library of Western Australia Blog.

 

State Library’s collection material that is selected for digitisation comes to the Digitisation team in a variety of forms. This blog describes capture of artwork that is framed and encased within glass.

So let’s see how the item is digitized.

14

Two large framed original artworks from the picture book Teacup written by Rebecca Young and illustrated by Matt Ottley posed some significant digitisation challenges.

When artwork from the Heritage collection is framed in glass, the glass acts like a mirror and without great care during the capture process, the glass can reflect whatever is in front of it, meaning that the photographer’s reflection (and the reflection of capture equipment) can obscure the artwork.

This post shows how we avoided this issue during the digitisation of two large framed paintings, Cover illustration for Teacup and also page 4-5 [PWC/255/01 ] and The way the whales called out to each other [PWC/255/09].

Though it is sometimes possible to remove the artwork from its housing, there are occasions when this is not suitable. In this example, the decision was made to not remove the artworks from behind glass as the Conservation staff assessed that it would be best if the works were not disturbed from their original housing.

PWC/255/01                                                         PWC/255/09

The most critical issue was to be in control of the light. Rearranging equipment in the workroom allowed for the artwork to face a black wall, a method used by photographers to eliminate reflections.

 

We used black plastic across the entrance of the workroom to eliminate all unwanted light.

6

The next challenge was to set up the camera. For this shoot we used our Hasselblad H3D11 (a 39 mega pixel with excellent colour fidelity).

 

Prior to capture, we gave the glass a good clean with an anti-static cloth. In the images below, you can clearly see the reflection caused by the mirror effect of the glass.

 

Since we don’t have a dedicated photographic studio we needed to be creative when introducing extra light to allow for the capture. Bouncing the light off a large white card prevented direct light from falling on the artwork and reduced a significant number of reflections. We also used a polarizing filter on the camera lens to reduce reflections even further.

11

Once every reflection was eliminated and the camera set square to the artwork, we could test colour balance and exposure.

In the image below, you can see that we made the camera look like ‘Ned Kelly’ to ensure any shiny metal from the camera body didn’t reflect in the glass. We used the camera’s computer controlled remote shutter function to further minimise any reflections in front of the glass.

12

 

The preservation file includes technically accurate colour and greyscale patches to allow for colour fidelity and a ruler for accurate scaling in future reproductions.

13

The preservation file and a cropped version for access were then ingested into the State Library’s digital repository. The repository allows for current access and future reproductions to be made.

From this post you can see the care and attention that goes into preservation digitisation, ‘Do it right, do it once’ is our motto.


Filed under: Children's Literature, Exhibitions, Illustration, Picture Books, SLWA collections, SLWA Exhibitions, State Library of Western Australia, Uncategorized, WA, Western Australia Tagged: digitisation, illustration, slwa, SLWA collections, WA, WA Author

Week #8: Warriors are on the right path

Published 12 Mar 2017 by legoktm in The Lego Mirror.

As you might have guessed due to the lack of previous coverage of the Warriors, I'm not really a basketball fan. But the Warriors are in an interesting place right now. After setting an NBA record for being the fastest team to clinch a playoff spot, Coach Kerr has started resting his starters and the Warriors have a three game losing streak. This puts the Warriors in danger of losing their first seed spot with the San Antonio Spurs only half a game behind them.

But I think the Warriors are doing the right thing. Last year the Warriors set the record for having the best regular season record in NBA history, but also became the first team in NBA history to have a 3-1 advantage in the finals and then lose.

No doubt there was immense pressure on the Warriors last year. It was just expected of them to win the championship, there really wasn't anything else.

So this year they can easily avoid a lot of that pressure by not being the best team in the NBA on paper. They shouldn't worry about being the top seed, just seed in the top four, and play your best in the playoffs. Get some rest, they have a huge advantage over every other team simply by already being in the playoffs with so many games left to play.


How can we preserve our wiki pages

Published 10 Mar 2017 by Jenny Mitcham in Digital Archiving at the University of York.

I was recently prompted by a colleague to investigate options for preserving institutional wiki pages. At the University of York we use the Confluence wiki and this is available for all staff to use for a variety of purposes. In the Archives we have our own wiki space on Confluence which we use primarily for our meeting agendas and minutes. The question asked of me was how can we best capture content on the wiki that needs to be preserved for the long term? 

Good question and just the sort of thing I like to investigate. Here are my findings...

Space export

The most sensible way to approach the transfer of a set of wiki pages to the digital archive would be to export them using the export options available within the Space Tools.

The main problem with this approach is that a user will need to have the necessary permissions on the wiki space in order to be able to use these tools ...I found that I only had the necessary permissions on those wiki spaces that I administer myself.

There are three export options as illustrated below:


Space export options - available if you have the right permissions!


HTML

Once you select HTML, there are two options - a standard export (which exports the whole space) or a custom export (which allows you to select the pages you would like included within the export).

I went for a custom export and selected just one section of meeting papers. Each wiki page is saved as an HTML file. DROID identifies these as HTML version 5. All relevant attachments are included in the download in their original format.

There are some really good things about this export option:
  • The inclusion of attachments in the export - these are often going to be as valuable to us as the wiki page content itself. Note that they were all renamed with a number that tied them to the page that they were associated with. It seemed that the original file name was however preserved in the linking wiki page text 
  • The metadata at the top of a wiki page is present in the HTML pages: ie Created by Jenny Mitcham, last modified by Jenny Mitcham on 31, Oct, 2016 - this is really important to us from an archival point of view
  • The links work - including links to the downloaded attachments, other wiki pages and external websites or Google Docs
  • The export includes an index page which can act as a table of contents for the exported files - this also includes some basic metadata about the wiki space

XML

Again, there are two options here - either a standard export (of the whole space) or a custom export, which allows you to select whether or not you want comments to be exported and choose exactly which pages you want to export.

I tried the custom export. It seemed to work and also did export all the relevant attachments. The attachments were all renamed as '1' (with no file extension), and the wiki page content is all bundled up into one huge XML file.

On the plus side, this export option may contain more metadata than the other options (for example the page history) but it is difficult to tell as the XML file is so big and unwieldy and hard to interpret. Really it isn't designed to be usable. The main function of this export option is to move wiki pages into another instance of Confluence.

PDF

Again you have the option to export whole space or choose your pages. There are also other configurations you can make to the output but these are mostly cosmetic.

I chose the same batch of meeting papers to export as PDF and this produces a 111 page PDF document. The first page is a contents page which lists all the other pages alphabetically with hyperlinks to the right section of the document. It is hard to use the document as the wiki pages seem to run into each other without adequate spacing and because of the linear nature of a pdf document you feel drawn to read it in the order it is presented (which in this case is not a logical order for the content). Attachments are not included in the download though links to the attachments are maintained in the PDF file and they do continue to resolve to the right place on the wiki. Creation and last modified metadata is also not included in the export.

Single page export

As well as the Space Export options in Confluence there are also single page export options. These are available to anyone who can access the wiki page so may be useful if people do not have necessary permissions for a space export.

I exported a range of test pages using the 'Export to PDF' and 'Export to Word' options.

Export to PDF

The PDF files created in this manner are version 1.4. Sadly no option to export as PDF/A, but at least version 1.4 is closer to the PDF/A standard than some, so perhaps a subsequent migration to PDF/A would be successful.

Export to Word

Surprisingly the 'Word' files produced by Confluence appear not to be Word files at all!

Double click on the files in Windows Explorer and they open in Microsoft Word no problem, but DROID identifies the files as HTML (with no version number) and reports a file extension mismatch (because the files have a .doc extension).

If you view the files in a text application you can clearly see the Content-Type marked as text/html and <html> tags within the document. Quick View Plus, however views them as an Internet Mail Message with the following text displayed at the top of each page:


Subject: Exported From Confluence
1024x640 72 Print 90

All very confusing and certainly not giving me a lot of faith in this particular export format!


Comparison

Both of these single page export formats do a reasonable job of retaining the basic content of the wiki pages - both versions include many of the key features I was looking for - text, images, tables, bullet points, colours. 

Where advanced formatting has been used to lay out a page using coloured boxes, the PDF version does a better job at replicating this than the 'Word' version. Whilst the PDF attempts to retain the original formatting, the 'Word' version displays the information in a much more linear fashion.

Links were also more usefully replicated in the PDF version. The absolute URL of all links, whether internal, external or to attachments was included within the PDF file so that it is possible to follow them to their original location (if you have the necessary permissions to view the pages). On the 'Word' versions, only external links worked in this way. Internal wiki links and links to attachments were exported as a relative link which become 'broken' once that page is taken out of its original context. 

The naming of the files that were produced is also worthy of comment. The 'Word' versions are given a name which mirrors the name of the page within the wiki space, but the naming of the PDF versions are much more useful, including the name of the wiki space itself, the page name and a date and timestamp showing when the page was exported.


Neither of these single page export formats retained the creation and last modified metadata for each page and this is something that it would be very helpful to retain.

Conclusions

So, if we want to preserve pages from our institutional wiki, what is the best approach?

The Space Export in HTML format is a clear winner. It reproduces the wiki pages in a reusable form that replicates the page content well. As HTML is essentially just ASCII text it is also a good format for long term preservation.

What impressed me about the HTML export was the fact that it retained the content, included basic creation and last modified metadata for each page and downloaded all relevant attachments, updating the links to point to these local copies.

What if someone does not have the necessary permissions to do a space export? My first suggestion would be that they ask for their permissions to be upgraded. If not, perhaps someone who does have necessary permissions could carry out the export?

If all else fails, the export of a single page using the 'Export as PDF' option could be used to provide ad hoc content for the digital archive. PDF is not the best preservation format but may be able to be converted to PDF/A. Note that any attachments would have to be exported separately and manually is this option was selected.

Final thoughts

A wiki space is a dynamic thing which can involve several different types of content - blog posts, labels/tags and comments can all be added to wiki spaces and pages. If these elements are thought to be significant then more work is required to see how they can be captured. It was apparent that comments could be captured using the HTML and XML exports and I believe blog posts can be captured individually as PDF files.

What is also available within the wiki platform itself is a very detailed Page History. Within each wiki page it is possible to view the Page History and see how a page has evolved over time - who has edited it and when those edits occurred. As far as I could see, none of the export formats included this level of information. The only exception may be the XML export but this was so difficult to view that I could not be sure either way.

So, there are limitations to all these approaches and as ever this goes back to the age old discussion about Significant Properties. What is significant about the wiki pages? What is it that we are trying to preserve? None of the export options preserve everything. All are compromises, but perhaps some are compromises we could live with.

China – Arrival in the Middle Kingdom

Published 9 Mar 2017 by Tom Wilson in thomas m wilson.

I’ve arrived in Kunming, the little red dot you can see on the map above.  I’m here to teach research skills to undergraduate students at Yunnan Normal University.  As you can see, I’ve come to a point where the foothills of the Himalayas fold up into a bunch of deep creases.  Yunnan province is the area of […]

China – Arrival in the Middle Kingdom

Published 9 Mar 2017 by Tom Wilson in thomas m wilson.

I’ve arrived in Kunming, the little red dot you can see on the map above.  I’m here to teach research skills to undergraduate students at Yunnan Normal University.  As you can see, I’ve come to a point where the foothills of the Himalayas fold up into a bunch of deep creases.  Yunnan province is the area of […]

Updates 1.2.4 and 1.1.8 released

Published 9 Mar 2017 by Roundcube Webmail Dev Team in Roundcube Webmail Project News.

We just published another update to the both stable versions 1.2 and 1.1 delivering important bug fixes and improvements which we picked from the upstream branch.

Included is a fix for a recently reported security XSS issue with CSS styles inside an SVG tag (CVE-2017-6820).

See the full changelog for 1.2.4 in the wiki. And for version 1.1.8 in the release notes.

Both versions are considered stable and we recommend to update all productive installations of Roundcube with either of these versions. Download them from GitHub via roundcube.net/download.

As usual, don’t forget to backup your data before updating!


Introducing Similarity Search at Flickr

Published 7 Mar 2017 by Clayton Mellina in code.flickr.com.

At Flickr, we understand that the value in our image corpus is only unlocked when our members can find photos and photographers that inspire them, so we strive to enable the discovery and appreciation of new photos.

To further that effort, today we are introducing similarity search on Flickr. If you hover over a photo on a search result page, you will reveal a “…” button that exposes a menu that gives you the option to search for photos similar to the photo you are currently viewing.

In many ways, photo search is very different from traditional web or text search. First, the goal of web search is usually to satisfy a particular information need, while with photo search the goal is often one of discovery; as such, it should be delightful as well as functional. We have taken this to heart throughout Flickr. For instance, our color search feature, which allows filtering by color scheme, and our style filters, which allow filtering by styles such as “minimalist” or “patterns,” encourage exploration. Second, in traditional web search, the goal is usually to match documents to a set of keywords in the query. That is, the query is in the same modality—text—as the documents being searched. Photo search usually matches across modalities: text to image. Text querying is a necessary feature of a photo search engine, but, as the saying goes, a picture is worth a thousand words. And beyond saving people the effort of so much typing, many visual concepts genuinely defy accurate description. Now, we’re giving our community a way to easily explore those visual concepts with the “…” button, a feature we call the similarity pivot.

The similarity pivot is a significant addition to the Flickr experience because it offers our community an entirely new way to explore and discover the billions of incredible photos and millions of incredible photographers on Flickr. It allows people to look for images of a particular style, it gives people a view into universal behaviors, and even when it “messes up,” it can force people to look at the unexpected commonalities and oddities of our visual world with a fresh perspective.

What is “similarity”?

To understand how an experience like this is powered, we first need to understand what we mean by “similarity.” There are many ways photos can be similar to one another. Consider some examples.

It is apparent that all of these groups of photos illustrate some notion of “similarity,” but each is different. Roughly, they are: similarity of color, similarity of texture, and similarity of semantic category. And there are many others that you might imagine as well.

What notion of similarity is best suited for a site like Flickr? Ideally, we’d like to be able to capture multiple types of similarity, but we decided early on that semantic similarity—similarity based on the semantic content of the photos—was vital to facilitate discovery on Flickr. This requires a deep understanding of image content for which we employ deep neural networks.

We have been using deep neural networks at Flickr for a while for various tasks such as object recognition, NSFW prediction, and even prediction of aesthetic quality. For these tasks, we train a neural network to map the raw pixels of a photo into a set of relevant tags, as illustrated below.

Internally, the neural network accomplishes this mapping incrementally by applying a series of transformations to the image, which can be thought of as a vector of numbers corresponding to the pixel intensities. Each transformation in the series produces another vector, which is in turn the input to the next transformation, until finally we have a vector that we specifically constrain to be a list of probabilities for each class we are trying to recognize in the image. To be able to go from raw pixels to a semantic label like “hot air balloon,” the network discards lots of information about the image, including information about  appearance, such as the color of the balloon, its relative position in the sky, etc. Instead, we can extract an internal vector in the network before the final output.

For common neural network architectures, this vector—which we call a “feature vector”—has many hundreds or thousands of dimensions. We can’t necessarily say with certainty that any one of these dimensions means something in particular as we could at the final network output, whose dimensions correspond to tag probabilities. But these vectors have an important property: when you compute the Euclidean distance between these vectors, images containing similar content will tend to have feature vectors closer together than images containing dissimilar content. You can think of this as a way that the network has learned to organize information present in the image so that it can output the required class prediction. This is exactly what we are looking for: Euclidian distance in this high-dimensional feature space is a measure of semantic similarity. The graphic below illustrates this idea: points in the neighborhood around the query image are semantically similar to the query image, whereas points in neighborhoods further away are not.

This measure of similarity is not perfect and cannot capture all possible notions of similarity—it will be constrained by the particular task the network was trained to perform, i.e., scene recognition. However, it is effective for our purposes, and, importantly, it contains information beyond merely the semantic content of the image, such as appearance, composition, and texture. Most importantly, it gives us a simple algorithm for finding visually similar photos: compute the distance in the feature space of a query image to each index image and return the images with lowest distance. Of course, there is much more work to do to make this idea work for billions of images.

Large-scale approximate nearest neighbor search

With an index as large as Flickr’s, computing distances exhaustively for each query is intractable. Additionally, storing a high-dimensional floating point feature vector for each of billions of images takes a large amount of disk space and poses even more difficulty if these features need to be in memory for fast ranking. To solve these two issues, we adopt a state-of-the-art approximate nearest neighbor algorithm called Locally Optimized Product Quantization (LOPQ).

To understand LOPQ, it is useful to first look at a simple strategy. Rather than ranking all vectors in the index, we can first filter a set of good candidates and only do expensive distance computations on them. For example, we can use an algorithm like k-means to cluster our index vectors, find the cluster to which each vector is assigned, and index the corresponding cluster id for each vector. At query time, we find the cluster that the query vector is assigned to and fetch the items that belong to the same cluster from the index. We can even expand this set if we like by fetching items from the next nearest cluster.

This idea will take us far, but not far enough for a billions-scale index. For example, with 1 billion photos, we need 1 million clusters so that each cluster contains an average of 1000 photos. At query time, we will have to compute the distance from the query to each of these 1 million cluster centroids in order to find the nearest clusters. This is quite a lot. We can do better, however, if we instead split our vectors in half by dimension and cluster each half separately. In this scheme, each vector will be assigned to a pair of cluster ids, one for each half of the vector. If we choose k = 1000 to cluster both halves, we have k2= 1000 * 1000 = 1e6 possible pairs. In other words, by clustering each half separately and assigning each item a pair of cluster ids, we can get the same granularity of partitioning (1 million clusters total) with only 2 * 1000 distance computations with half the number of dimensions for a total computational savings of 1000x. Conversely, for the same computational cost, we gain a factor of k more partitions of the data space, providing a much finer-grained index.

This idea of splitting vectors into subvectors and clustering each split separately is called product quantization. When we use this idea to index a dataset it is called the inverted multi-index, and it forms the basis for fast candidate retrieval in our similarity index. Typically the distribution of points over the clusters in a multi-index will be unbalanced as compared to a standard k-means index, but this unbalance is a fair trade for the much higher resolution partitioning that it buys us. In fact, a multi-index will only be balanced across clusters if the two halves of the vectors are perfectly statistically independent. This is not the case in most real world data, but some heuristic preprocessing—like PCA-ing and permuting the dimensions so that the cumulative per-dimension variance is approximately balanced between the halves—helps in many cases. And just like the simple k-means index, there is a fast algorithm for finding a ranked list of clusters to a query if we need to expand the candidate set.

After we have a set of candidates, we must rank them. We could store the full vector in the index and use it to compute the distance for each candidate item, but this would incur a large memory overhead (for example, 256 dimensional vectors of 4 byte floats would require 1Tb for 1 billion photos) as well as a computational overhead. LOPQ solves these issues by performing another product quantization, this time on the residuals of the data. The residual of a point is the difference vector between the point and its closest cluster centroid. Given a residual vector and the cluster indexes along with the corresponding centroids, we have enough information to reproduce the original vector exactly. Instead of storing the residuals, LOPQ product quantizes the residuals, usually with a higher number of splits, and stores only the cluster indexes in the index. For example, if we split the vector into 8 splits and each split is clustered with 256 centroids, we can store the compressed vector with only 8 bytes regardless of the number of dimensions to start (though certainly a higher number of dimensions will result in higher approximation error). With this lossy representation we can produce a reconstruction of a vector from the 8 byte codes: we simply take each quantization code, look up the corresponding centroid, and concatenate these 8 centroids together to produce a reconstruction. Likewise, we can approximate the distance from the query to an index vector by computing the distance between the query and the reconstruction. We can do this computation quickly for many candidate points by computing the squared difference of each split of the query to all of the centroids for that split. After computing this table, we can compute the squared difference for an index point by looking up the precomputed squared difference for each of the 8 indexes and summing them together to get the total squared difference. This caching trick allows us to quickly rank many candidates without resorting to distance computations in the original vector space.

LOPQ adds one final detail: for each cluster in the multi-index, LOPQ fits a local rotation to the residuals of the points that fall in that cluster. This rotation is simply a PCA that aligns the major directions of variation in the data to the axes followed by a permutation to heuristically balance the variance across the splits of the product quantization. Note that this is the exact preprocessing step that is usually performed at the top-level multi-index. It tends to make the approximate distance computations more accurate by mitigating errors introduced by assuming that each split of the vector in the production quantization is statistically independent from other splits. Additionally, since a rotation is fit for each cluster, they serve to fit the local data distribution better.

Below is a diagram from the LOPQ paper that illustrates the core ideas of LOPQ. K-means (a) is very effective at allocating cluster centroids, illustrated as red points, that target the distribution of the data, but it has other drawbacks at scale as discussed earlier. In the 2d example shown, we can imagine product quantizing the space with 2 splits, each with 1 dimension. Product Quantization (b) clusters each dimension independently and cluster centroids are specified by pairs of cluster indexes, one for each split. This is effectively a grid over the space. Since the splits are treated as if they were statistically independent, we will, unfortunately, get many clusters that are “wasted” by not targeting the data distribution. We can improve on this situation by rotating the data such that the main dimensions of variation are axis-aligned. This version, called Optimized Product Quantization (c), does a better job of making sure each centroid is useful. LOPQ (d) extends this idea by first coarsely clustering the data and then doing a separate instance of OPQ for each cluster, allowing highly targeted centroids while still reaping the benefits of product quantization in terms of scalability.

LOPQ is state-of-the-art for quantization methods, and you can find more information about the algorithm, as well as benchmarks, here. Additionally, we provide an open-source implementation in Python and Spark which you can apply to your own datasets. The algorithm produces a set of cluster indexes that can be queried efficiently in an inverted index, as described. We have also explored use cases that use these indexes as a hash for fast deduplication of images and large-scale clustering. These extended use cases are studied here.

Conclusion

We have described our system for large-scale visual similarity search at Flickr. Techniques for producing high-quality vector representations for images with deep learning are constantly improving, enabling new ways to search and explore large multimedia collections. These techniques are being applied in other domains as well to, for example, produce vector representations for text, video, and even molecules. Large-scale approximate nearest neighbor search has importance and potential application in these domains as well as many others. Though these techniques are in their infancy, we hope similarity search provides a useful new way to appreciate the amazing collection of images at Flickr and surface photos of interest that may have previously gone undiscovered. We are excited about the future of this technology at Flickr and beyond.

Acknowledgements

Yannis Kalantidis, Huy Nguyen, Stacey Svetlichnaya, Arel Cordero. Special thanks to the rest of the Computer Vision and Machine Learning team and the Vespa search team who manages Yahoo’s internal search engine.



Thumbs.db – what are they for and why should I care?

Published 7 Mar 2017 by Jenny Mitcham in Digital Archiving at the University of York.

Recent work I’ve been doing on the digital archive has made me think a bit more about those seemingly innocuous files that Windows (XP, Vista, 7 and 8) puts into any directory that has images in – Thumbs.db.

Getting your folder options right helps!
Windows uses a file called Thumbs.db to create little thumbnail images of any images within a directory. It stores one of these files in each directory that contains images and it is amazing how quickly they proliferate. Until recently I wasn’t aware I had any in my digital archive at all. This is because although my preferences in Windows Explorer were set to display hidden files, the "Hide protected operating system files" option also needs to be disabled in order to see files such as these.

The reason I knew I had all these Thumbs.db files was through a piece of DROID analysis work published last month. Thumbs.db ranked at number 12 in my list of the most frequently occurring file formats in the digital archive. I had 210 of these files in total. I mentioned at the time that I could write a whole blog post about this, so here it is!

Do I really want these in the digital archive? In my mind, what is in the ‘original’ folders within the digital archive should be what OAIS would call the Submission Information Package (SIP). Just those files that were given to us by a donor or depositor. Not files that were created subsequently by my own operating system.

Though they are harmless enough they can be a bit irritating. Firstly, when I’m trying to run reports on the contents of the archive, the number of files for each archive is skewed by the Thumb.db files that are not really a part of the archive. Secondly, and perhaps more importantly, I was trying to create a profile of the dates of files within the digital archive (admittedly not an exact science when using last modified dates) and the span of dates for each individual archive that we hold. The presence of Thumbs.db files in each archive that contained images gave the false impression that all of the archives had had content added relatively recently, when in fact all that had happened was that a Thumbs.db file had automatically been added when I had transferred the data to the digital archive filestore. It took me a while to realise this - gah!

So, what to do? First I needed to work out how to stop them being created.

After a bit of googling I quickly established the fact that I didn’t have the necessary permissions to be able to disable this default behaviour within Windows so I called in the help of IT Services.

IT clearly thought this was a slightly unusual request, but made a change to my account which now stops these thumbnail images being created by me. Being that I am the only person who has direct access to the born digital material within the archive this should solve that problem.

Now I can systematically remove the files. This means that they won’t skew any future reports I run on numbers of files and last modified dates.

Perhaps once we get a proper digital archiving system in place here at the Borthwick we won’t need to worry about these issues as we won’t directly interact with the archive filestore? Archivematica will package up the data into an AIP and put it on the filestore for me.

However, I will say that now IT have stopped the use of Thumbs.db from my account I am starting to miss them. This setting applies to my own working filestore as well as the digital archive. It turns out that it is actually incredibly useful to be able to see thumbnails of your image files before double clicking on them! Perhaps I need to get better at practicing what I preach and make some improvements to how I name my own image files – without a preview thumbnail, an image file *really* does benefit from a descriptive filename!

As always, I'm interested to hear how other people tackle Thumbs.db and any other system files within their digital archives.


This Month’s Writer’s Block

Published 7 Mar 2017 by Dave Robertson in Dave Robertson.

Share


This Month’s Writer’s Block

Published 7 Mar 2017 by Dave Robertson in Dave Robertson.

Share


WHAT’S IN THE WAY IS THE WAY

Published 6 Mar 2017 by timbaker in Tim Baker.

The image on the left was taken a year ago when I had to renew my driver’s license, so I am stuck with it for the next 10 years. I don’t mind so much as it reminds me how far I’ve come. The photo on...

Week #7: 999 assists and no more kneeling

Published 4 Mar 2017 by legoktm in The Lego Mirror.

Joe Thornton is one assist away from reaching 1,000 in his career. He's a team player - the recognition of scoring a goal doesn't matter to him, he just wants his teammates to score. And his teammates want him to achieve this milestone too, as shown by Sharks passing to Thornton and him passing back instead of them going directly for the easy empty netter.

Oh, and now that the trade deadline has passed with no movement on the goalie front, it's time for In Jones We Trust:

via /u/MisterrAlex on reddit

In other news, Colin Kaepernick announced that he's going to be a free agent and opted out of the final year of his contract. But in even bigger news, he said he will stop kneeling for the national anthem. I don't know if he is doing that to make himself more marketable, but I wish he would have stood (pun intended) with his beliefs.


Songs for the Beeliar Wetlands

Published 2 Mar 2017 by Dave Robertson in Dave Robertson.

The title track of the forthcoming Kiss List album has just been included on an awesome fundraising compilation of 17 songs by local songwriters for the Beeliar wetlands. All proceeds go to #rethinkthelink. Get it while its hot! You can purchase the whole album or just the songs you like.

Songs for the Beeliar Wetlands: Original Songs by Local Musicians (Volume 1) by Dave Robertson and The Kiss List

Share


Songs for the Beeliar Wetlands

Published 2 Mar 2017 by Dave Robertson in Dave Robertson.

The title track of the forthcoming Kiss List album has just been included on an awesome fundraising compilation of 17 songs by local songwriters for the Beeliar wetlands. All proceeds go to #rethinkthelink. Get it while its hot! You can purchase the whole album or just the songs you like.

Songs for the Beeliar Wetlands: Original Songs by Local Musicians (Volume 1) by Dave Robertson and The Kiss List

Share


Stepping Off Meets the Public

Published 1 Mar 2017 by Tom Wilson in thomas m wilson.

At the start of February I launched my new book, Stepping Off: Rewilding and Belonging in the South-West, at an event at Clancy’s in Fremantle.  On Tuesday evening this week I was talking about the book down at Albany Library.     As I was in the area I decided to camp for a couple of […]

Stepping Off Meets the Public

Published 1 Mar 2017 by Tom Wilson in thomas m wilson.

At the start of February I launched my new book, Stepping Off: Rewilding and Belonging in the South-West, at an event at Clancy’s in Fremantle.  On Tuesday evening this week I was talking about the book down at Albany Library.     As I was in the area I decided to camp for a couple of […]

Digital Deli, reading history in the present tense

Published 1 Mar 2017 by Carlos Fenollosa in Carlos Fenollosa — Blog.

Digital Deli: The Comprehensive, User Lovable Menu Of Computer Lore, Culture, Lifestyles, And Fancy is an obscure book published in 1984. I found about it after learning that the popular Steve Wozniak article titled "Homebrew and How the Apple Came to Be" belonged to a compilation of short articles.

The book

I'm amazed that this book isn't more cherished by the retrocomputing community, as it provides an incredible insight into the state of computers in 1984. We've all read books about their history, but Digital Deli provides a unique approach: it's written in present tense.

Articles are written with a candid and inspiring narrative. Micro computers were new back then, and the authors could only speculate about how they might change the world in the future.

The book is adequately structured in sections which cover topics from the origins of computing, Silicon Valley startups, and reviews of specific systems. But the most interesting part for me are not the tech articles, but rather the sociological essays.

There are texts on how families welcome computers to the home, the applications of artificial intelligence, micros on Wall Street and computers on the classroom.

How the Source works

Fortunately, a copy of the book has been preserved online, and I highly encourage you to check it out and find some copies online

Besides Woz explaining how Apple was founded, don't miss out on Paul Lutus describing how he programmed AppleWriter in a cabin in the woods, Les Solomon envisioning the "magic box" of computing, Ted Nelson on information exchange and his Project Xanadu, Nolan Bushnell on video games, Bill Gates on software usability, the origins of the Internet... the list goes on and on.

Les Solomon

If you love vintage computing you will find a fresh perspective, and if you were alive during the late 70s and early 80s you will feel a big nostalgia hit. In any case, do yourself a favor, grab a copy of this book, and keep it as a manifesto of the greatest revolution in computer history.

Tags: retro, books

Comments? Tweet  


Week #6: Barracuda win streak is great news for the Sharks

Published 24 Feb 2017 by legoktm in The Lego Mirror.

The San Jose Barracuda, the Sharks AHL affiliate team, is currently riding a 13 game winning streak, and is on top of the AHL — and that's great news for the Sharks.

Ever since the Barracuda moved here from Worcester, Mass., it's only been great news for the Sharks. Because they play in the same stadium, sending players up or down becomes as simple as a little paperwork and asking them to switch locker rooms, not cross-country flights.

This allows the Sharks to have a significantly deeper roster, since they can call up new players at a moment's notice. So the Barracuda's win streak is great news for Sharks fans, since it demonstrates how even the minor league players are ready to play in the pros.

And if you're watching hockey, be on the watch for Joe Thornton to score his 1,000 assist! (More on that next week).


How can I keep mediawiki not-yet-created pages from cluttering my google webmaster console with 404s?

Published 24 Feb 2017 by Sean in Newest questions tagged mediawiki - Webmasters Stack Exchange.

we have a mediawiki install as part of our site. As on all wikis people will add links for not yet created pages (red links). When followed these links return a 404 status (as there is no content) along with an invite to add content.

I'm not getting buried in 404 notices in google webmaster console for this site. Is there a best way to handle this?

Thanks for any help.


The Other Half

Published 24 Feb 2017 by Jason Scott in ASCII by Jason Scott.

On January 19th of this year, I set off to California to participate in a hastily-arranged appearance in a UCLA building to talk about saving climate data in the face of possible administrative switchover. I wore a fun hat, stayed in a nice hotel, and saw an old friend from my MUD days for dinner. The appearance was a lot of smart people doing good work and wanting to continue with it.

While there, I was told my father’s heart surgery, which had some complications, was going to require an extended stay and we were running out of relatives and companions to accompany him. I booked a flight for seven hours after I’d arrive back in New York to go to North Carolina and stay with him. My father has means, so I stayed in a good nearby hotel room. I stayed with him for two and a half weeks, booking ten to sixteen hour days to accompany him through a maze of annoyances, indignities, smart doctors, variant nurses ranging from saints to morons, and generally ensure his continuance.

In the middle of this, I had a non-movable requirement to move the manuals out of Maryland and send them to California. Looking through several possibilities, I settled with: Drive five hours to Maryland from North Carolina, do the work across three days, and drive back to North Carolina. The work in Maryland had a number of people helping me, and involved pallet jacks, forklifts, trucks, and crazy amounts of energy drinks. We got almost all of it, with a third batch ready to go. I drove back the five hours to North Carolina and caught up on all my podcasts.

I stayed with my father another week and change, during which I dented my rental car, and hit another hard limit: I was going to fly to Australia. I also, to my utter horror, realized I was coming down with some sort of cold/flu. I did what I could – stabilized my father’s arrangements, went into the hotel room, put on my favorite comedians in a playlist, turned out the lights, drank 4,000mg of Vitamin C, banged down some orange juice, drank Mucinex, and covered myself in 5 blankets. I woke up 15 hours later in a pool of sweat and feeling like I’d crossed the boundary with that disease. I went back to the hospital to assure my dad was OK (he was), and then prepped for getting back to NY, where I discovered almost every flight for the day was booked due to so many cancelled flights the previous day.

After lots of hand-wringing, I was able to book a very late flight from North Carolina to New York, and stayed there for 5 hours before taking a 25 hour two-segment flight through Dubai to Melbourne.

I landed in Melbourne on Monday the 13th of February, happy that my father was stable back in the US, and prepping for my speech and my other commitments in the area.

On Tuesday I had a heart attack.

We know it happened then, or began to happen, because of the symptoms I started to show – shortness of breath, a feeling of fatigue and an edge of pain that covered my upper body like a jacket. I was fucking annoyed – I felt like I was just super tired and needed some energy, and energy drinks and caffiene weren’t doing the trick.

I met with my hosts for the event I’d do that Saturday, and continued working on my speech.

I attended the conference for that week, did a couple interviews, saw some friends, took some nice tours of preservation departments and discussed copyright with very smart lawyers from the US and Australia.

My heart attack continued, blocking off what turned out to be a quarter of my bloodflow to my heart.

This was annoying me but I didn’t know it was, so according to my fitbit I walked 25 miles, walked up 100 flights of stairs, and maintained hours of exercise to snap out of it, across the week.

I did a keynote for the conference. The next day I hosted a wonderful event for seven hours. I asked for a stool because I said I was having trouble standing comfortably. They gave me one. I took rests during it, just so the DJ could get some good time with the crowds. I was praised for my keeping the crowd jumping and giving it great energy. I’d now had been having a heart attack for four days.

That Sunday, I walked around Geelong, a lovely city near Melbourne, and ate an exquisite meal at Igni, a restaurant whose menu basically has one line to tell you you’ll be eating what they think you should have. Their choices were excellent. Multiple times during the meal, I dozed a little, as I was fatigued. When we got to the tram station, I walked back to the apartment to get some rest. Along the way, I fell to the sidewalk and got up after resting.

I slept off more of the growing fatigue and pain.

The next day I had a second exquisite meal of the trip at Vue Le Monde, a meal that lasted from about 8pm to midnight. My partner Rachel loves good meals and this is one of the finest you can have in the city, and I enjoyed it immensely. It would have been a fine last meal. I’d now had been experiencing a heart attack for about a week.

That night, I had a lot of trouble sleeping. The pain was now a complete jacket of annoyance on my body, and there was no way to rest that didn’t feel awful. I decided medical attention was needed.

The next morning, Rachel and I walked 5 blocks to a clinic, found it was closed, and walked further to the RealCare Health Clinic. I was finding it very hard to walk at this point. Dr. Edward Petrov saw me, gave me some therapy for reflux, found it wasn’t reflux, and got concerned, especially as having my heart checked might cost me something significant. He said he had a cardiologist friend who might help, and he called him, and it was agreed we could come right over.

We took a taxi over to Dr. Georg Leitl’s office. He saw me almost immediately.

He was one of those doctors that only needed to take my blood pressure and check my heart with a stethoscope for 30 seconds before looking at me sadly. We went to his office, and he told me I could not possibly get on the plane I was leaving on in 48 hours. He also said I needed to go to Hospital very quickly, and that I had some things wrong with me that needed attention.

He had his assistants measure my heart and take an ultrasound, wrote something on a notepad, put all the papers in an envelope with the words “SONNY PALMER” on them, and drove me personally over in his car to St. Vincent’s Hospital.

Taking me up to the cardiology department, he put me in the waiting room of the surgery, talked to the front desk, and left. I waited 5 anxious minutes, and then was bought into a room with two doctors, one of whom turned out to be Dr. Sonny Palmer.

Sonny said Georg thought I needed some help, and I’d be checked within a day. I asked if he’d seen the letter with his name on it. He hadn’t. He went and got it.

He came back and said I was going to be operated on in an hour.

He also explained I had a rather blocked artery in need of surgery. Survival rate was very high. Nerve damage from the operation was very unlikely. I did not enjoy phrases like survival and nerve damage, and I realized what might happen very shortly, and what might have happened for the last week.

I went back to the waiting room, where I tweeted what might have been my possible last tweets, left a message for my boss Alexis on the slack channel, hugged Rachel tearfully, and then went into surgery, or potential oblivion.

Obviously, I did not die. The surgery was done with me awake, and involved making a small hole in my right wrist, where Sonny (while blasting Bon Jovi) went in with a catheter, found the blocked artery, installed a 30mm stent, and gave back the blood to the quarter of my heart that was choked off. I listened to instructions on when to talk or when to hold myself still, and I got to watch my beating heart on a very large monitor as it got back its function.

I felt (and feel) legions better, of course – surgery like this rapidly improves life. Fatigue is gone, pain is gone. It was also explained to me what to call this whole event: a major heart attack. I damaged the heart muscle a little, although that bastard was already strong from years of high blood pressure and I’m very young comparatively, so the chances of recovery to the point of maybe even being healthier than before are pretty good. The hospital, St. Vincents, was wonderful – staff, environment, and even the food (incuding curry and afternoon tea) were a delight. My questions were answered, my needs met, and everyone felt like they wanted to be there.

It’s now been 4 days. I was checked out of the hospital yesterday. My stay in Melbourne was extended two weeks, and my hosts (MuseumNext and ACMI) paid for basically all of the additional AirBNB that I’m staying at. I am not cleared to fly until the two weeks is up, and I am now taking six medications. They make my blood thin, lower my blood pressure, cure my kidney stones/gout, and stabilize my heart. I am primarily resting.

I had lost a lot of weight and I was exercising, but my cholesterol was a lot worse than anyone really figured out. The drugs and lifestyle changes will probably help knock that back, and I’m likely to adhere to them, unlike a lot of people, because I’d already been on a whole “life reboot” kick. The path that follows is, in other words, both pretty clear and going to be taken.

Had I died this week, at the age of 46, I would have left behind a very bright, very distinct and rather varied life story. I’ve been a bunch of things, some positive and negative, and projects I’d started would have lived quite neatly beyond my own timeline. I’d have also left some unfinished business here and there, not to mention a lot of sad folks and some extremely quality-variant eulogies. Thanks to a quirk of the Internet Archive, there’s a little statue of me – maybe it would have gotten some floppy disks piled at its feet.

Regardless, I personally would have been fine on the accomplishment/legacy scale, if not on the first-person/relationships/plans scale. That my Wikipedia entry is going to have a different date on it than February 2017 is both a welcome thing and a moment to reflect.

I now face the Other Half, whatever events and accomplishments and conversations I get to engage in from this moment forward, and that could be anything from a day to 100 years.

Whatever and whenever that will be, the tweet I furiously typed out on cellphone as a desperate last-moment possible-goodbye after nearly a half-century of existence will likely still apply:

“I have had a very fun time. It was enormously enjoyable, I loved it all, and was glad I got to see it.”

 


Three take aways to understand Cloudflare's apocalyptic-proportions mess

Published 24 Feb 2017 by Carlos Fenollosa in Carlos Fenollosa — Blog.

It turns out that Cloudflare's proxies have been dumping uninitialized memory that contains plain HTTPS content for an indeterminate amount of time. If you're not familiar with the topic, let me summarize it: this is the worst crypto news in the last 10 years.

As usual, I suggest you read the HN comments to understand the scandalous magnitude of the bug.

If you don't see this as a news-opening piece on TV it only confirms that journalists know nothing about tech.

How bad is it, really? Let's see

I'm finding private messages from major dating sites, full messages from a well-known chat service, online password manager data, frames from adult video sites, hotel bookings. We're talking full HTTPS requests, client IP addresses, full responses, cookies, passwords, keys, data, everything

If the bad guys didn't find the bug before Tavis, you may be on the clear. However, as usual in crypto, you must assume that any data you submitted through a Cloudflare HTTPS proxy has been compromised.

Three take aways

A first take away, crypto may be mathematically perfect but humans err and the implementations are not. Just because something is using strong crypto doesn't mean it's immune to bugs.

A second take away, MITMing the entire Internet doesn't sound so compelling when you put it that way. Sorry to be that guy, but this only confirms that the centralization of the Internet by big companies is a bad idea.

A third take away, change all your passwords. Yep. It's really that bad. Your passwords and private requests may be stored somewhere, on a proxy or on a malicious actor's servers.

Well, at least change your banking ones, important services like email, and master passwords on password managers -- you're using one, right? RIGHT?

You can't get back any personal info that got leaked but at least you can try to minimize the aftershock.

Update: here is a provisional list of affected services. Download the full list, export your password manager data into a csv file, and compare both files by using grep -f sorted_unique_cf.txt your_passwords.csv.

Afterwards, check the list of potentially affected iOS apps

Let me conclude by saying that unless you were the victim of a targeted attack it's improbable that this bug is going to affect you at all. However, that small probability is still there. Your private information may be cached somewhere or stored on a hacker's server, waiting to be organized and leaked with a flashy slogan.

I'm really sorry about the overly dramatic post, but this time it's for real.

Tags: security, internet, news

Comments? Tweet  


The localhost page isn’t working on MediaWiki

Published 23 Feb 2017 by hasanghaforian in Newest questions tagged mediawiki - Webmasters Stack Exchange.

I want to use Widget PDF to embed PDF files on my MediaWiki pages. So at first installed Extension:Widgets on MediaWiki and it seems it is installed (I can see it in Installed extensions list in Special:Version of the Wiki). The I copied and pasted the entire source of the PDF widget code page into a page called Widget:PDF on my Wiki:

<noinclude>__NOTOC__
<big>This widget allows you to '''embed PDF files''' on your wiki page.</big>

Created by [https://wiki.karlsregion.net/z/User:Wilhelm_Bühler Wilhelm Bühler] and adapted by [https://www.wikihoster.net Karsten Hoffmeyer].

== Using this widget ==
For information on how to use this widget, see [https://www.mediawikiwidgets.org/PDF widget description page on MediaWikiWidgets.org].

== Copy to your site ==
To use this widget on your site, just install [https://www.mediawiki.org/wiki/Extension:Widgets MediaWiki Widgets extension] and copy the [{{fullurl:{{FULLPAGENAME}}|action=edit}} full source code] of this page to your wiki as page '''{{FULLPAGENAME}}'''.
</noinclude><includeonly><object class="pdf-widget" data="<!--{$url|validate:url}-->" type="application/pdf" wmode="transparent" style="z-index: 999; height: 100%; min-height: <!--{$height|escape:'html'|default:680}-->px; width: 100%; max-width: <!--{$width|escape:'html'|default:960}-->px;"><param name="wmode" value="transparent">
<p>Currently your browser does not use a PDF plugin. You may however <a href="<!--{$url|validate:url}-->">download the PDF file</a> instead.</p></object></includeonly>

My PDF file is under this URL:

http://localhost/<wiki-name>/index.php/File:GraphicsandAnimations-Devoxx2010.pdf

And it's name is File:GraphicsandAnimations-Devoxx2010.pdf. So as described here, I added this code to my Wiki page:

{{#widget:PDF
 |url=http://localhost/<wiki-name>/index.php/File:GraphicsandAnimations-Devoxx2010.pdf
 |width=750
 |height=1050
}}

But this error occured:

The localhost page isn’t working
localhost is currently unable to handle this request. 
HTTP ERROR 500

What I did:

  1. Also I tried this (original example of the Widget PDF)

    {{#widget:PDF
     |url=https://www.semantic-mediawiki.org/w/images/e/e9/SMW_quick_reference.pdf
     |width=750
     |height=1050
    }}
    

    But result was the same.

  2. I read Extension talk:Widgets but did not find any thing.

  3. I opened Chrome DevTools (Ctrl+Shift+I), but there was no error.

How I can solve the problem?

Edit:

After some times, I tried to uninstall Widget PDF and Extension:Widgets and reinstall them. So I removed Extension:Widgets files/folder from $IP/extensions/ and also deleted Widget:PDF page from Wiki. Then I installed Extension:Widgets again, but now, I can not open the Wiki pages at all (I see above error again), unless I delete require_once "$IP/extensions/Widgets/Widgets.php"; from LocalSettings.php. So I even cannot try to load Extension:Widgets.

Now I see this error in DevTools:

Failed to load resource: the server responded with a status of 500 (Internal Server Error)

Also after uninstalling Extension:Widgets, I tried Extension:PDFEmbed and unfortunately again I saw above error.


Mediawiki doesn't send any email

Published 19 Feb 2017 by fpiette in Newest questions tagged mediawiki - Ask Ubuntu.

My mediawiki installation (1.28.0, PHP 7.0.13) doesn't send any email and yet there is no error emitted. I checked using Special:EmailUser page.

What I have tryed: 1) A simple PHP script to send a mail using PHP's mail() function. It works. 2) I have turned PHP mail log. There is a normal line for each Mediawiki email "sent".

PHP is configured (correctly since it works) to send email using Linux SendMail. MediaWiki is not configured to use direct SMTP.

Any suggestion appreciated. Thanks.


Week #5: Politics and the Super Bowl – chewing a pill too big to swallow

Published 17 Feb 2017 by legoktm in The Lego Mirror.

For a little change, I'd like to talk about the impact of sports upon us this week. The following opinion piece was first written for La Voz, and can also be read on their website.

Super Bowl commercials have become the latest victim of extreme politicization. Two commercials stood out from the rest by featuring pro-immigrant advertisements in the midst of a political climate deeply divided over immigration law. Specifically, Budweiser aired a mostly fictional story of their founder traveling to America to brew, while 84 Lumber’s ad followed a mother and daughter’s odyssey to America in search of a better life.

The widespread disdain toward non-white outsiders, which in turn has created massive backlash toward these advertisements, is no doubt repulsive, but caution should also be exercised when critiquing the placement of such politicization. Understanding the complexities of political institutions and society are no doubt essential, yet it is alarming that every facet of society has become so politicized; ironically, this desire to achieve an elevated political consciousness actually turns many off from the importance of politics.

Football — what was once simply a calming means of unwinding from the harsh winds of an oppressive world — has now become another headline news center for political drama.

President George H. W. Bush and his wife practically wheeled themselves out of a hospital to prepare for hosting the game. New England Patriots owner, Robert Kraft, and quarterback, Tom Brady, received sharp criticism for their support of Donald Trump, even to the point of losing thousands of dedicated fans.

Meanwhile, the NFL Players Association publicly opposed President Trump’s immigration ban three days before the game, with the NFLPA’s president saying “Our Muslim brothers in this league, we got their backs.”

Let’s not forget the veterans and active service members that are frequently honored before NFL games, except that’s an advertisement too – the Department of Defense paid NFL teams over $5 million over four years for those promotions.

Even though it’s an America’s pastime, football, and other similar mindless outlets, provide the role of allowing us to escape whenever we need a break from reality, and for nearly three hours on Sunday, America got its break, except for those commercials. If we keep getting nagged about an issue, even if we’re generally supportive, t will eventually become incessant to the point of promoting nihilism.

When Meryl Streep spoke out at the Golden Globes, she turned a relaxing event of celebrating fawning into a political shitstorm which redirected all attention back toward Trump controversies. Even she was mostly correct, the efficacy becomes questionable after such repetition as many will become desensitized.

Politics are undoubtedly more important than ever now, but for our sanity’s sake, let’s keep it to a minimum in football. That means commercials too.


What have we got in our digital archive?

Published 13 Feb 2017 by Jenny Mitcham in Digital Archiving at the University of York.

Do other digital archivists find that the work of a digital archivist rarely involves doing hands on stuff with digital archives? When you have to think about establishing your infrastructure, writing policies and plans and attending meetings it leaves little time for activities at the coal face. This makes it all the more satisfying when we do actually get the opportunity to work with our digital holdings.

In the past I've called for more open sharing of profiles of digital archive collections but I am aware that I had not yet done this for the contents of our born digital collections here at the Borthwick Institute for Archives. So here I try to redress that gap.

I ran DROID (v 6.1.5, signature file v 88, container signature 20160927) over the deposited files in our digital archive and have spent a couple of days crunching the results. Note that this just covers the original files as they have been given to us. It does not include administrative files that I have added, or dissemination or preservation versions of files that have subsequently been created.

I was keen to see:
...and also use these results to:
  • Inform future preservation planning and priorities
  • Feed further information to the PRONOM team at The National Archives
  • Get us to Level 2 of the NDSA Levels of Digital Preservation which asks for "an inventory of file formats in use" and which until now I haven't been collating!

Digital data has been deposited with us since before I started at the Borthwick in 2012 and continues to be deposited with us today. We do not have huge quantities of digital archives here as yet (about 100GB) and digital deposits are still the exception rather than the norm. We will be looking to chase digital archives more proactively once we have a Archivematica in place and appropriate workflows established.

Last modified dates (as recorded by DROID) appear to range from 1984 to 2017 with a peak at 2008. This distribution is illustrated below. Note however, that this data is not always to be trusted (that could be another whole blog post in itself...). One thing that it is fair to say though is that the archive stretches back right to the early days of personal computers and up to the present day.

Last modified dates on files in the Borthwick digital archive

Here are some of the findings of this profiling exercise:

Summary statistics

  • Droid reported that 10005 individual files were present
  • 9431 (94%) of the files were given a file format identification by Droid. This is a really good result ...or at least it seems it in comparison to my previous data profiling efforts which have focused on research data. This result is also comparable with those found within other digital archives, for example 90% at Bentley Historical Library, 96% at Norfolk Record Office and 98% at Hull University Archives
  • 9326 (99%) of those files that were identified were given just one possible identification. 1 file was given 2 different identifications (an xlsx file) and 104 files (with a .DOC extension) were given 8 identifications. In all these cases of multiple identifications, identification was done by file extension rather than signature - which perhaps explains the uncertainty

Files that were identified



So perhaps these are things I'll look into in a bit more detail if I have time in the future.

  • 90 different file formats were identified within this collection of data

  • Of the identified files 1764 (19%) were identified as Microsoft Word Document 97-2003. This was followed very closely by JPEG File Interchange Format version 1.01 with 1675 (18%) occurrences. The top 10 identified files are illustrated below:

  • This top 10 is in many ways comparable to other similar profiles that have been published recently from Bentley Historical Library, Hull University Archive and Norfolk Records Office with high occurrences of Microsoft Word, PDF and JPEG images. In contrast. what it is not so common in this profile are HTML files and GIF image files - these only just make it into the top 50. 

  • Also notable in our top ten are the Sibelius files which haven't appeared in other recently published profiles. Sibelius is musical notation software and these files appear frequently in one of our archives.


Files that weren't identified

  • Of the 574 files that weren't identified by DROID, 125 different file extensions were represented. For most of these there was just a single example of each.

  • 160 (28%) of the unidentified files had no file extension at all. Perhaps not surprisingly it is the earlier files in our born digital collection (files from the mid 80's), that are most likely to fall into this category. These were created at a time when operating systems seemed to be a little less rigorous about enforcing the use of file extensions! Approximately 80 of these files are believed to be WordStar 4.0 (PUID:  x-fmt/260) which DROID would only be able to recognise by file extension. Of course if no extension is included. DROID has little chance of being able to identify them!

  • The most common file extensions of those files that weren't identified are visible in the graph below. I need to do some more investigation into these but most come from 2 of our archives that relate to electronic music composition:


I'm really pleased to see that the vast majority of the files that we hold can be identified using current tools. This is a much better result than for our research data. Obviously there is still room for improvement so I hope to find some time to do further investigations and provide information to help extend PRONOM.

Other follow on work involves looking at system files that have been highlighted in this exercise. See for example the AppleDouble Resource Fork files that appear in the top ten identified formats. Also appearing quite high up (at number 12) were Thumbs.db files but perhaps that is the topic of another blog post. In the meantime I'd be really interested to hear from anyone who thinks that system files such as these should be retained.



Harvesting EAD from AtoM: a collaborative approach

Published 10 Feb 2017 by Jenny Mitcham in Digital Archiving at the University of York.

In a previous blog post AtoM harvesting (part 1) - it works! I described how archival descriptions within AtoM are being harvested as Dublin Core for inclusion within our University Library Catalogue.* I also hinted that this wouldn’t be the last you would hear from me on AtoM harvesting and that plans were afoot to enable much richer metadata in EAD 2002 XML (Encoded Archival Description) format to be harvested via OAI-PMH.

I’m pleased to be able to report that this work is now underway.

The University of York along with five other organisations in the UK have clubbed together to sponsor Artefactual Systems to carry out the necessary development work to make EAD harvesting possible. This work is scheduled for release in AtoM version 2.4 (due out in the Spring).

The work is being jointly sponsored by:



We are also receiving much needed support in this project from The Archives Hub who are providing advice on the AtoM EAD and will be helping us test the EAD harvesting when it is ready. While the sponsoring institutions are all producers of AtoM EAD, The Archives Hub is a consumer of that EAD. We are keen to ensure that the archival descriptions that we enter into AtoM can move smoothly to The Archives Hub (and potentially to other data aggregators in the future), allowing the richness of our collections to be signposted as widely as possible.

Adding this harvesting functionality to AtoM will enable The Archives Hub to gather data direct from us on a regular schedule or as and when updates occur, ensuring that:




So, what are we doing at the moment?




What we are doing at the moment is good and a huge step in the right direction, but perhaps not perfect. As we work together on this project we are coming across areas where future work would be beneficial in order to improve the quality of the EAD that AtoM produces or to expand the scope of what can be harvested from AtoM. I hope to report on this in more detail at the end of the project, but in the meantime, do get in touch if you are interested in finding out more.







* It is great to see that this is working well and our Library Catalogue is now appearing in the referrer reports for the Borthwick Catalogue on Google Analytics. People are clearly following these new signposts to our archives!

Week #4: 500 for Mr. San Jose Shark

Published 9 Feb 2017 by legoktm in The Lego Mirror.

He did it: Patrick Marleau scored his 500th career goal. He truly is Mr. San Jose Shark.

I had the pleasure of attending the next home game on Saturday right after he reached the milestone in Vancouver, and nearly lost my voice chearing for Marleau. They mentioned his accomplishment once before the game and again during a break, and each time Marleau would only stand up and acknowledge the crowd cheering for him when he realized they would not stop until he did.

He's had his ups and downs, but he's truly a team player.

“I think when you hit a mark like this, you start thinking about everyone that’s helped you along the way,” Marleau said.

And on Saturday at home, Marleau assisted on both Sharks goals, helping out his teammates who had helped Marleau score his over the past two weeks.

Congrats Marleau, and thanks for the 20 years of hockey. Can't wait to see you raise the Cup.


Simpson and his Donkey – an exhibition

Published 9 Feb 2017 by carinamm in State Library of Western Australia Blog.

Illustrations by Frané Lessac and words by Mark Greenwood share the heroic story of John Simpson Kirkpatrick in the picture book Simpson and his Donkey.  The exhibition is on display at the State Library until  27 April. 

simpson
Unpublished spread 14 for pages 32 – 33
Collection of draft materials for Simpson and his Donkey, PWC/254/18 

The original illustrations, preliminary sketches and draft materials displayed in this exhibition form part of the State Library’s Peter Williams’ collection: a collection of original Australian picture book art.

Known as ‘the man with the donkey’, Simpson was a medic who rescued wounded soldiers at Gallipoli during World War I.

The bravery and sacrifice attributed to Simpson is now considered part of the ‘Anzac legend’. It is the myth and legend of John Simpson that Frané Lessac and Mark Greenwood tell in their book.

Frané Lessac and Mark Greenwood also travelled to Anzac Cove to explore where Simpson and Duffy had worked.  This experience and their research enabled them to layer creative interpretation over historical information and Anzac legend.

simpson2

On a moonless April morning, PWC254/6 

Frané Lessac is a Western Australian author-illustrator who has published over forty books for children. Frané speaks at festivals in Australia and overseas, sharing the process of writing and illustrating books. She often illustrates books by , Mark Greenwood, of which Simpson and his Donkey is just one example.

Simpson and his Donkey is published by Walker Books, 2008. The original illustrations are  display in the Story Place Gallery until 27 April 2017.

IMG_1233.JPG


Filed under: Children's Literature, community events, Exhibitions, Illustration, Picture Books, SLWA collections, SLWA displays, WA books and writers, WA history, Western Australia Tagged: children's literature, exhibitions, Frane Lessac, Mark Greenwood, Peter Williams collection, Simpson and his Donkey, State Library of Western Australia, The Story Place

LISTEN TO THE WHISPER

Published 6 Feb 2017 by timbaker in Tim Baker.

So I’ve got this speaking gig coming up at the Pursue Your Passion conference in Bryon Bay Saturday week, February 18. And I’ve been thinking a lot about what I want to say. One of my main qualifications for this gig is my 2011 round...

Week #3: All-Stars

Published 2 Feb 2017 by legoktm in The Lego Mirror.

via /u/PAGinger on reddit

Last weekend was the NHL All-Star game and skills competition, with Brent Burns, Martin Jones, and Joe Pavelski representing the San Jose Sharks in Los Angeles. And to no one's surprise, they were all booed!

Pavelski scored a goal during the tournament for the Pacific Division, and Burns scored during the skills competition's "Four Line Challenge". But since they represented the Pacific, we have to talk about the impossible shot Mike Smith made.

And across the country, the 2017 NFL Pro Bowl (their all-star game) was happening at the same time. The Oakland Raiders had seven Pro Bowlers (tied for most from any team), and the San Francisco 49ers had...none.

In the meantime the 49ers managed to hire a former safety with no General Manager-related experience as their new GM. It's really not clear what Jed York, the 49ers owner, is trying out here, and why he would sign John Lynch to a six year contract.

But really, how much worse could it get for the 49ers?


“The end is nigh”: RiC(h) Description – part 2

Published 31 Jan 2017 by inthemailbox in In the mailbox.

The period for comment on the EGAD RiC – CM draft standard or model is coming to an end.  Since I last posted, there has been a flurry of activity, with comments from at least two Society of American Archivists technical subcommittees (TS-DACS and TS-EAS being the ones I know of), Artefactual (the developers of Accesstomemory software), the Australian Society of Archivists, Chris Hurley and Ross Spencer.

Each has something of value to add; whether concerned with specifics or in thinking about the broader implications for archival description in an online and connected world.



Time for pastures old, anew

Published 30 Jan 2017 by inthemailbox in In the mailbox.

(There’s this thing called #Glamblogclub, and it has a theme each month. I tried to resist…)

This time last year, I was getting ready for a year of academic freedom – time to think, to read, to nurture new professionals.  I’d taken secondment from the State Records Office of WA, after four years of having a split personality, to take up a lecturing contract at Curtin.

I went to ResBazPerth and learnt a little about github and python, and realised I needed a research project or similar to make that learning stick. June saw me doing #blogJune, and act as a general data mentor for #GovHack, an experience that proved useful when it came time to work on the Curtin #Makathon, using cultural heritage data (it also got me thinking about the coding I’d learnt in February, again).

I thought about archives and digital scholarship, and access. I learnt that I like being a mentor and teaching face to face, but worry about the loneliness and neediness of the distance/online student.  I like working with archives and answering queries.  During the ASA conference in Parramatta, I learnt about community and connected archives, and did some connecting of my own, with old and new friends.

And Curtin has given me some great connections too, who supported me through some pretty tough times  – with humour and cake and some fantastic projects. But it’s time to move on or back, and learn some new things. I’m not sure what 2017 has in store for me yet, but I’m guessing there will be archives and access and queries and cake, not to mention planning for the 2018 ASA conference in Perth.



Updates to legoktm.com

Published 29 Jan 2017 by legoktm in The Lego Mirror.

Over the weekend I migrated legoktm.com and associated services over to a new server. It's powered by Debian Jessie instead of the slowly aging Ubuntu Trusty. Most services were migrated with no downtime by rsync'ing content over and the updating DNS. Only git.legoktm.com had some downtime due to needing to stop the service before copying over the database.

I did not migrate my IRC bouncer history or configuration, so I'm starting fresh. So if I'm no longer in a channel, feel free to PM me and I'll rejoin!

At the same time I moved the main https://legoktm.com/ homepage to MediaWiki. Hopefully that will encourage me to update the content on it more often.

Finally, the tor relay node I'm running was moved to a separate server entirely. I plan on increasing the resources allocated to it.


Week #2: NATTY HATTY FOR PATTY

Published 26 Jan 2017 by legoktm in The Lego Mirror.

The only person who would dare upstage Patrick Marleau's four goal night is Randy Hahn, with his hilarious call after Marleau's third goal to finish a natural hat-trick: "NATTY HATTY FOR PATTY". And after scoring another, Marleau became the first player to score four goals in a single period since the great Mario Lemieux did in 1997. He's also the third Shark to score four goals in a game, joining Owen Nolan (no video available, but his hat-trick from the 1997 All-Star game is fabulous) and Tomáš Hertl.

Marleau is also ready to hit his next milestone of 500 career goals - he's at 498 right now. Every impressive stat he puts up just further solidifies him as one of the greatest hockey players of his generation. But he's still missing the one achievement that all the greats need - a Stanley Cup. The Sharks made their first trip to the Stanley Cup Finals last year, but realistically had very little chance of winning; they simply were not the better team.

The main question these days is how long Marleau and Joe Thornton will keep playing for, and if they can stay healthy until they eventually win that Stanley Cup.

Discuss this post on Reddit.


Creating an annual accessions report using AtoM

Published 24 Jan 2017 by Jenny Mitcham in Digital Archiving at the University of York.

So, it is that time of year where we need to complete our annual report on accessions for the National Archives. Along with lots of other archives across the UK we send The National Archives summary information about all the accessions we have received over the course of the previous year. This information is collated and provided online on the Accessions to Repositories website for all to see.

The creation of this report has always been a bit time consuming for our archivists, involving a lot of manual steps and some re-typing but since we have started using AtoM as our Archival Management System the process has become much more straightforward.

As I've reported in a previous blog post, AtoM does not do all that we want to do in the way of reporting via it's front end.

However, AtoM has an underlying MySQL database and there is nothing to stop you bypassing the interface, looking at the data behind the scenes and pulling out all the information you need.

One of the things we got set up fairly early in our AtoM implementation project was a free MySQL client called Squirrel. Using Squirrel or another similar tool, you can view the database that stores all your AtoM data, browse the data and run queries to pull out the information you need. It is also possible to update the data using these SQL clients (very handy if you need to make any global changes to your data). All you need initially is a basic knowledge of SQL and you can start pulling some interesting reports from AtoM.

The downside of playing with the AtoM database is of course that it isn't nearly as user friendly as the front end.

It is always a bit of an adventure navigating the database structure and trying to work out how the tables are linked. Even with the help of an Entity Relationship Diagram from Artefactual creating more complex queries is ...well ....complex!

AtoM's database tables - there are a lot of them!


However, on a positive note, the AtoM user forum is always a good place to ask stupid questions and Artefactual staff are happy to dive in and offer advice on how to formulate queries. I'm also lucky to have help from more technical colleagues here in Information Services (who were able to help me get Squirrel set up and talking to the right database and can troubleshoot my queries) so what follows is very much a joint effort.

So for those AtoM users in the UK who are wrestling with their annual accessions report, here is a query that will pull out the information you need:

SELECT accession.identifier, accession.date, accession_i18n.title, accession_i18n.scope_and_content, accession_i18n.received_extent_units, 
accession_i18n.location_information, case when cast(event.start_date as char) like '%-00-00' then left(cast(event.start_date as char),4) 
else cast(event.start_date as char)
end as start_date,
case when cast(event.end_date as char) like '%-00-00' then left(cast(event.end_date as char),4) 
else cast(event.end_date as char)
end as end_date, 
event_i18n.date
from accession
LEFT JOIN event on event.object_id=accession.id
LEFT JOIN event_i18n on event.id=event_i18n.id
JOIN accession_i18n ON accession.id=accession_i18n.id
where accession.date like '2016%'
order by identifier

A couple of points to make here:

  • In a previous version of the query, we included some other tables so we could also capture information about the creator of the archive. The addition of the relation, actor and actor_i18n tables made the query much more complicated and for some reason it didn't work this year. I have not attempted to troubleshoot this in any great depth for the time being as it turns out we are no longer recording creator information in our accessions records. Adding a creator record to an accessions entry creates an authority record for the creator that is automatically made public within the AtoM interface and this ends up looking a bit messy (as we rarely have time at this point in the process to work this into a full authority record that is worthy of publication). Thus as we leave this field blank in our accession record there is no benefit in trying to extract this bit of the database.
  • In an earlier version of this query there was something strange going on with the dates that were being pulled out of the event table. This seemed to be a quirk that was specific to Squirrel. A clever colleague solved this by casting the date to char format and including a case statement that will list the year when there's only a year and the full date when fuller information has been entered. This is useful because in our accession records we enter dates to different levels. 
So, once I've exported the results of this query, put them in an Excel spreadsheet and sent them to one of our archivists, all that remains for her to do is to check through the data, do a bit of tidying up, ensure the column headings match what is required by The National Archives and the spreadsheet is ready to go!

Bromptons in Museums and Art Galleries

Published 23 Jan 2017 by Andy Mabbett in Andy Mabbett, aka pigsonthewing.

Every time I visit London, with my Brompton bicycle of course, I try to find time to take in a museum or art gallery. Some are very accommodating and will cheerfully look after a folded Brompton in a cloakroom (e.g. Tate Modern, Science Museum) or, more informally, in an office or behind the security desk (Bank of England Museum, Petrie Museum, Geffrye Museum; thanks folks).


Brompton bicycle folded

When folded, Brompton bikes take up very little space

Others, without a cloakroom, have lockers for bags and coats, but these are too small for a Brompton (e.g. Imperial War Museum, Museum of London) or they simply refuse to accept one (V&A, British Museum).

A Brompton bike is not something you want to chain up in the street, and carrying a hefty bike-lock would defeat the purpose of the bike’s portability.


Jack Wills, New Street (geograph 4944811)

This Brompton bike hire unit, in Birmingham, can store ten folded bikes each side. The design could be repurposed for use at venues like museums or galleries.

I have an idea. Brompton could work with museums — in London, where Brompton bikes are ubiquitous, and elsewhere, though my Brompton and I have never been turned away from a museum outside London — to install lockers which can take a folded Brompton. These could be inside with the bag lockers (preferred) or outside, using the same units as their bike hire scheme (pictured above).

Where has your Brompton had a good, or bad, reception?

Update

Less than two hours after I posted this, Will Butler-Adams, MD of Brompton, >replied to me on Twitter:

so now I’m reaching out to museums, in London to start with, to see who’s interested.

The post Bromptons in Museums and Art Galleries appeared first on Andy Mabbett, aka pigsonthewing.


Bromptons in Museums and Art Galleries

Published 23 Jan 2017 by Andy Mabbett in Andy Mabbett, aka pigsonthewing.

Every time I visit London, with my Brompton bicycle of course, I try to find time to take in a museum or art gallery. Some are very accommodating and will cheerfully look after a folded Brompton in a cloakroom (e.g. Tate Modern, Science Museum) or, more informally, in an office or behind the security desk (Bank of England Museum, Petrie Museum, Geffrye Museum; thanks folks).


Brompton bicycle folded

When folded, Brompton bikes take up very little space

Others, without a cloakroom, have lockers for bags and coats, but these are too small for a Brompton (e.g. Imperial War Museum, Museum of London) or they simply refuse to accept one (V&A, British Museum).

A Brompton bike is not something you want to chain up in the street, and carrying a hefty bike-lock would defeat the purpose of the bike’s portability.


Jack Wills, New Street (geograph 4944811)

This Brompton bike hire unit, in Birmingham, can store ten folded bikes each side. The design could be repurposed for use at venues like museums or galleries.

I have an idea. Brompton could work with museums — in London, where Brompton bikes are ubiquitous, and elsewhere, though my Brompton and I have never been turned away from a museum outside London — to install lockers which can take a folded Brompton. These could be inside with the bag lockers (preferred) or outside, using the same units as their bike hire scheme (pictured above).

Where has your Brompton had a good, or bad, reception?

Update

Less than two hours after I posted this, Will Butler-Adams, MD of Brompton, >replied to me on Twitter:

so now I’m reaching out to museums, in London to start with, to see who’s interested.

The post Bromptons in Museums and Art Galleries appeared first on Andy Mabbett, aka pigsonthewing.


Running with the Masai

Published 23 Jan 2017 by Tom Wilson in thomas m wilson.

What are you going to do if you like tribal living and you’re in the cold winter of the Levant?  Head south to the Southern Hemisphere, and to the wilds of Africa. After leaving Israel and Jordan that is exactly what I did. I arrived in Nairobi and the first thing which struck me was […]

Running with the Masai

Published 23 Jan 2017 by Tom Wilson in thomas m wilson.

What are you going to do if you like tribal living and you’re in the cold winter of the Levant?  Head south to the Southern Hemisphere, and to the wilds of Africa. After leaving Israel and Jordan that is exactly what I did. I arrived in Nairobi and the first thing which struck me was […]

Week #1: Who to root for this weekend

Published 22 Jan 2017 by legoktm in The Lego Mirror.

For the next 10 weeks I'll be posting sports content related to Bay Area teams. I'm currently taking an intro to features writing class, and we're required to keep a blog that focuses on a specific topic. I enjoy sports a lot, so I'll be covering Bay Area sports teams (Sharks, Earthquakes, Raiders, 49ers, Warriors, etc.). I'll also be trialing using Reddit for comments. If it works well, I'll continue using it for the rest of my blog as well. And with that, here goes:

This week the Green Bay Packers will be facing the Atlanta Falcons in the very last NFL game at the Georgia Dome for the NFC Championship. A few hours later, the Pittsburgh Steelers will meet the New England Patriots in Foxboro competing for the AFC Championship - and this will be only the third playoff game in NFL history featuring two quarterbacks with multiple Super Bowl victories.

Neither Bay Area football team has a direct stake in this game, but Raiders and 49ers fans have a lot to root for this weekend.

49ers: If you're a 49ers fan, you want to root for the Falcons to lose. This might sound a little weird, but currently the 49ers are looking to hire Falcons offensive coordinator, Kyle Shanahan, as their new head coach. However, until the Falcons' season ends, they cannnot officially hire him. And since 49ers general manager search depends upon having a head coach, they can get a head start by two weeks if the Falcons lose this weekend.

Raiders: Do you remember the Tuck Rule Game? If so, you'll still probably be rooting for anyone but Tom Brady, quarterback for the Patriots. If not, well, you'll probably want to root for the Steelers, who eliminated Raiders' division rival Kansas City Chiefs last weekend in one of the most bizarre playoff games. Even though the Steelers could not score a single touchdown, they topped the Chiefs two touchdowns with a record six field goals. Raiders fans who had to endure two losses to the Chiefs this season surely appreciated how the Steelers embarrassed the Chiefs on prime time television.

Discuss this post on Reddit.


Four Stars of Open Standards

Published 21 Jan 2017 by Andy Mabbett in Andy Mabbett, aka pigsonthewing.

I’m writing this at UKGovCamp, a wonderful unconference. This post constitutes notes, which I will flesh out and polish later.

I’m in a session on open standards in government, convened by my good friend Terence Eden, who is the Open Standards Lead at Government Digital Service, part of the United Kingdom government’s Cabinet Office.

Inspired by Tim Berners-Lee’s “Five Stars of Open Data“, I’ve drafted “Four Stars of Open Standards”.

These are:

  1. Publish your content consistently
  2. Publish your content using a shared standard
  3. Publish your content using an open standard
  4. Publish your content using the best open standard

Bonus points for:

Point one, if you like is about having your own local standard — if you publish three related data sets for instance, be consistent between them.

Point two could simply mean agreeing a common standard with other items your organisation, neighbouring local authorities, or suchlike.

In points three and four, I’ve taken “open” to be the term used in the “Open Definition“:

Open means anyone can freely access, use, modify, and share for any purpose (subject, at most, to requirements that preserve provenance and openness).

Further reading:

The post Four Stars of Open Standards appeared first on Andy Mabbett, aka pigsonthewing.


Four Stars of Open Standards

Published 21 Jan 2017 by Andy Mabbett in Andy Mabbett, aka pigsonthewing.

I’m writing this at UKGovCamp, a wonderful unconference. This post constitutes notes, which I will flesh out and polish later.

I’m in a session on open standards in government, convened by my good friend Terence Eden, who is the Open Standards Lead at Government Digital Service, part of the United Kingdom government’s Cabinet Office.

Inspired by Tim Berners-Lee’s “Five Stars of Open Data“, I’ve drafted “Four Stars of Open Standards”.

These are:

  1. Publish your content consistently
  2. Publish your content using a shared standard
  3. Publish your content using an open standard
  4. Publish your content using the best open standard

Bonus points for:

Point one, if you like is about having your own local standard — if you publish three related data sets for instance, be consistent between them.

Point two could simply mean agreeing a common standard with other items your organisation, neighbouring local authorities, or suchlike.

In points three and four, I’ve taken “open” to be the term used in the “Open Definition“:

Open means anyone can freely access, use, modify, and share for any purpose (subject, at most, to requirements that preserve provenance and openness).

Further reading:

The post Four Stars of Open Standards appeared first on Andy Mabbett, aka pigsonthewing.


Community collections and digital access

Published 17 Jan 2017 by inthemailbox in In the mailbox.

Those interested in the Community collections discussion paper for WA groups, may also find the Federation of Australian Historical Societies’ annual survey of interest.

There’s also a teaser page for keeping up to date with the GLAM Peak Bodies digital access project.

Both these links are from the Federation of Australian Historical Societies’ newsletter.



Supporting Software Freedom Conservancy

Published 17 Jan 2017 by legoktm in The Lego Mirror.

Software Freedom Conservancy is a pretty awesome non-profit that does some great stuff. They currently have a fundraising match going on, that was recently extended for another week. If you're able to, I think it's worthwhile to support their organization and mission. I just renewed my membership.

Become a Conservancy Supporter!


A Doodle in the Park

Published 16 Jan 2017 by Dave Robertson in Dave Robertson.

The awesome Carolyn