You may or may not know, but there exists a small island nation near the rim of the arctic circle bearing the name Iceland. The island was settled by Scandinavian vikings in the 9th century CE (though there may have been Irish monks there before) and throughout the centuries the nation was pretty much isolated from the rest of the world while being subjugated by Norway and Denmark at different points in time.
The climate, being subarctic, is characterized by long and usually very cold winters followed by short mild summers. During the winter months sunlight is at a premium and where at winter solstice the country only receives around four hours of sunlight. At that time of year it is not uncommon for the country to be completely wrapped up in storms and clouds with not much direct sunlight being had for days or weeks on end.
It is thus not difficult to imagine that throughout the ages, most Icelanders spent their time indoors during the harsh, cold and deadly winters, spending their time doing what few things they could with what little they had. Storytelling and poetry, written and verbally communicated throughout the ages became a big thing in the nation. Once Christianity arrived on nation's shores, it brought with it monasteries that boosted the production of written documents, and in the 16th century the nation got its first printing press.
So to keep a long story short, due to a combination of geographical & geopolitical isolation, long hars winters, and a fairly good supply of written documents (for such a small nation) and the higher then average standard of literacy among the populace throughout the centuries has meant that modern day Icelandic has not deviated far from its roots, where most native speakers are able to read the old sagas from the 11th-12th centuries without major problems (obviously the language has evolved over the last millennia).
It then goes without saying that a nation with its national identity tied as closely to its language and written heritage will mirror that in its behavior both inwards and outwards. Iceland will frequently show up on lists regarding literacy and the quantity of domestic publication of books, where it will usually sit somewhere in the top 3 places on these list. During international conference or exhibitions it is not uncommon to see the Icelandic state have booths where the language and history of the country is proudly on display. And even online we can't escape the insidious active and passive marketing of the language.
Now at this point you may be asking yourself; Why the hell are you writing about this?
And it is simple.
The Icelandic language is dying, and there is fuck all being done to save it.
Some may say that this is hyperbole. That the death of the language has been prophesied about since strong danish cultural effects where first felt in the 18-19th century, but that today the Icelandic nation still speaks Icelandic not Danish. To that I'll just ask you, what is the word commonly used for car.
The death of a language is not the result of a war with decisive battles and victories, but one of a war of attrition. There isn't a time constraint on this process. The Icelandic language as we know it today is going to die, and it may take a couple of hundred years for that to happen. Or it may take a few decades. Possibly less. It all depends on what action is taken to push back.
And frankly, the means by which that battle is being fought today brings up images of Polish cavalry charging Nazi Germany tanks in World War 2.
Fighting on digital beaches
By the start of the 90's it was obvious that the new frontier would be digital, and midway into the decade it had become painfully clear that not enough effort had been put into the creation of digital resources and support for the implementation of Icelandic in software projects, both domestic and foreign. So by 1998 the government of Iceland made a deal with Microsoft to start translation project for some of its product offerings, culminating in the translation of their operating systems as well. It should be mentioned that both Apple's and IBM's software had been translated in the late 80's so Microsoft was by far not the first to include the Icelandic language. But it was the most used operating system at that time, so this project definitely had more of an impact then previous efforts.
In 1999 a document (whose creation was sponsored by the Icelandic government) was published where the current and future state of the Icelandic language in this new domain was analyzed, and suggestions layed out for how to get things on track again. One of the pain points for digitizing a whole language is obviously the cost involved, and how it stays the same no matter how many native speakers there are. Digitizing Icelandic would roughly cost as much as doing so for any other language. The cost would simply be much higher per capita then in other cases.
And so the suggested monetary amount said to be needed to shore up the digital defenses was 250 million ISK yearly over a four year period. The author of that document later wrote in an updated one that over the preceding four years after the first document was published only 133 millions ISK were put forth to this task in total. That is only 1/8th of the funds recommended initially.
Yet, according to the author, surprising strides were made. Various specialized software programs came out of research projects that got part of that funding, some of which were web based and others were desktop programs. These all made huge strides in regards to access and functionality for normal users, in ways that hadn't been available before.
So it all worked out then? The Icelandic language was finally fully digitized and these massive expensive efforts that were paid for by the Icelandic tax payer guaranteed access to these valuable data sets for ever more. Right?
Not quite so. Today most of these projects are outdated when it comes to their underlying installation requirements, and in the cases of hosted applications either not around anymore or having the accessibility features of a student project terminal software from the 80's. Furthermore, as much as it angers my inner software engineer to say, the data sets that these systems rely on are not being freely distributed online for any other free or commercial project to utilize in any way they deem fit (or at least the collective community of developers and language enthusiasts in Iceland have yet to discover where and how to gain access).
Here be dragons
As Danish rule over Iceland starting waning in the 20th century, the nation started petitioning Denmark to return the cultural heritage that had been transported out of the country over centuries before. Mainly, the written documents contained within the Arnamagnæan manuscript collection. Over a few decades, more and more documents were returned and a institution was created around their safe keeping and research. This institution was given the name Árnastofnun in honor of the academic Árni Magnússon who created and curated the manuscript collection in the 17th and early 18th century in Denmark.
By the year 2006 this institute had merged with various others to combine one of the largest independent research institute for Icelandic studies in Iceland (and probably the world). They are still governed by the Icelandic state, with their CEO appointed by the Minister of Education, and receive their funding from Icelandic taxpayers.
On their recently updated webpage are numerous links to various hosted applications related to the Icelandic language. But nowhere is direct access to be found to the underlying data sets that power these applications.
Now if the idea of simply scraping this data has popped up into your mind as a way of solving this little problem, then you are not alone my friend. Various people have attempted to scrape the data or querying directly to Árnastofnuns API's for either re-publication or usage within their own projects. This has always ended badly for the individuals, and (if memory serves) in some cases apparent threats of legal action have occurred.
And here is the crux of the issue for me (and I believe I am not alone); Árnastofnun (and by extension the Icelandic government) have effectively become Fáfnir. The priced possession they now have, they guard with such illogical ferocity that its value is diminished by their own very actions.
Árnastofnun has a mandate to keep, preserve and research the Icelandic language and to facilitate distribution of the accumulative knowledge they have. As with most laws, there is enough ambiguity about what form that distribution should take that it has lead to the accumulation of ever more seemingly abandoned (in the worst cases) to badly designed (in the best cases) hosted applications that can be infuriating to use.
The need for the datasets spurred the injection of funds to digitize the language in the late 90's early 00's. And now that the data partially exists, it can only be accessed through hosted applications that will never receive enough time and attention, in the end dooming them to become obsolete as soon as they are deployed. These then act as the fulfillment of the institutes mandate, but also as a limiting factor of the impact that the underlying data could have if only it was freely available without any restrictions.
Árnastofnun is filled to the brim with experts in the Icelandic language. They should focus on what they do best; curating, cataloging & researching the data that they have, while opening up access to it for the rest of us to do what we do best; creating beautiful, scaling & user-centered designed software that consumes that data.
Old man yells at cloud
It may seem like I am just picking on Árnastofnun, and its true that they aren't the only player in town. But they are the biggest one and do have a government mandate to fulfill. I personally also just get spurred on when presented with the apparent duplicitous nature of our government, where the commitment to preservation of the language is often touted as being of the utmost importance while the actions taken usually spell out the opposite. And the blame for deteriorating Icelandic language proficiency is often placed squarely on the shoulders of software and its lack of Icelandic options. To which, due to my profession, I feel I need stand up to.
That and I believe all digital goods created by government agencies or paid for with taxpayers money should be freely available in their source format without restrictions online.
You may finally be asking yourself why I wrote this in English given the subject matter, and its simply that this point has been conveyed in Icelandic time and time again. And it has always failed to make an impression. So I felt it was time for a new strategy.