Category Archives: news

current activities, upcoming changes and events This is main front page content area 3

PIMMS workshop, Bristol: Web engineering for research data management

I’m attending the PIMMS (Portable Infrastructure for Metafor Metadata System) workshop today, in an anonymous office off Bristol’s Berkeley Square where, unusually, the sun is shining and the room is warm. The Metafor project referenced in the PIMMS acronym is all about metadata for climate data, so the weather is on everybody’s minds today. It’s a slightly cosier meeting than expected, because it turns out that there’s a huge conference on climate science simulation currently taking place somewhere in India and relatively few people have been able to resist its siren call.

PIMMS is a software stack designed to support the development and collection of metadata. Initially, it was developed to “document the provenance of computer simulations of real world earth system processes”[*], but as is traditional for almost any infrastructure designed to support different types of experiment, the thought has occurred that the approach may be more broadly generalisable. It’s designed to allow for direct user (=scientist) involvement and the platform consists of, essentially, a sequence of stages of form-filling and design  processes, each of which fulfil a purpose:

  • ‘experiment’-level description documents (at a high level, describing an overall scientific investigation in an area) – these include information such as description, author, rationale and so on and are encoded into CIM documents
  • ensemble-level descriptions (datasets, versus individual data files, although as is so often true with collection-level metadata, opinions may vary on how this works in any given context).
  • run-level descriptions: detailed component – based descriptions of individual experimental runs or even sessions.

Unusually and rather charmingly, PIMMS uses mindmapping as a design tool, viewing it as accessible to users. Whilst PIMMS clearly contains elements of the thinking that underlies UML-based design and uses UML vocabulary and tools in places, UML is ‘useful within a development team’, says Charlotte Pascoe, the PIMMS project manager, but it is not meant for end-users.

PIMMS counts among its potential benefits an increase in sheer quantity, quality and consistency of metadata provided. The underlying methods and processes can, in theory at least, also be generalised. A mindmap could be built for any domain, parsed into a formal data structure, automagically built or compiled into a web form and applied on your metadata. The process for building a PIMMS form goes more or less as follows.

  1. Get someone to install a copy of the PIMMS infrastructure (software stack) for you, or indeed do it yourself.
  2. Work out what you are trying to achieve. Write an introductory explanation, and export it to XML.
  3. Identify and invite along stakeholders.
  4. Invite them to build a visual representation (a mindmap) of a domain with you.
  5. Press ‘Go’ and generate a web form suitable for input of metadata.

If this sounds somewhat familiar, folks, it is because the concepts underlying PIMMS have a long and honourable background in software engineering. Check out the Web Semantics Design Method (De Troyer et al, 2008), which specifies the following process for engineering a web application – my own comments in parentheses to the right:

  1. Mission statement (what, when you get down to it, are you trying to achieve?)
  2. Audience modeling (for whom?)
  3. Conceptual design (with what? about what? doing what?)
  4. Implementation design (using what?)
  5. Implementation (Well, what are you waiting for? Go on, do it.)

WSDM, as described here, owes much to the waterfall model of software engineering (although, one would assume, there is nothing stopping subsequent iteration through phases) – see for example Ochoa et al (2006). To my eyes, the PIMMS metadata development process would appear to implement about half of WSDM in a less analytical and more user-centric model, encouraging direct input from the scientists likely to use the data.

The distinction, primarily, is in the implementation design and implementation phase; the PIMMS infrastructure compiles your conceptual design/structure, as represented in the mind map you have developed, into an XML structure from which PIMMS can build user-facing forms. After that compilation phase, further implementation work is essentially cosmetic, presentational work such as skinning the form. PIMMS removes the majority of implementation decisions from the user by making them in advance. Much as SurveyMonkey consciously limits the user’s design vocabulary to elements that may be useful for your average survey, PIMMS essentially offers a constrained vocabulary of information types and widgets.

I don’t make the comparison between PIMMS and SurveyMonkey lightly. The PIMMS project itself uses the terminology of ‘questionnaires’. PIMMS-based forms have a lot in common with SurveyMonkey, too;  incrementally developing the form whilst still retaining your previously collected data is not a straightforward operation. That may be a good thing – that way, you know which version of the input questionnaire your data came from – but on the other hand, incremental tinkering can sometimes be a useful design approach too…

The day continues. The sun subsides and the room is cooling fast. The geographers in the room, climate modellers of anything from the Jurassic to the Quaternary, go through a worked example of developing a PIMMS questionnaire. They discover a minor problem: the dates in the PIMMS forms don’t represent the usage of dates in palaeoclimate research, which are measured in ‘ka’ – thousands of years. This is a problem inherited from the UM, the Met. Office Unified Model [numerical modelling system].

Faster than you could say ‘Precambrian’, we are out of time. There has not been a chance to view the generated metadata collection form in use, which I regret slightly as it is the most common scenario in which the attendees will work with the system. Still, it was a worthwhile day. Workshop attendees  have voiced an interest in working with the software in future. As for me, after this glimpse into the future of palaeoclimate data management, I find myself thinking back to my past life in web engineering. I wonder whether, like  palaeoclimatologists, research data managers  could develop their expectations of the future by exploring the literature of times past…

De Troyer, O.,  Casteleyn, S.  and Plessers, P. (2008). WSDM: Web Semantics Design Method. In: Rossi et al (eds.), Web Engineering: Modeling and Implementing Web Applications. [see]

Sergio F. Ochoa, José A. Pino, Luis A. Guerrero, César A. Collazos (2006). SSP: A Simple Software Process for Small-Size Software Development Projects. IFIP Workshop on Advanced Software Engineering 2006: 94-107


Further Stanford* classes to be run in January

According to a series of posts on the ml-class forum, a series of apparently-free Stanford-inspired* distance-learning ten-week classes are currently expected to start in January/February 2012. Note that as before, the courses are not credit-bearing – take them for what they are.

Looks like neither AI-class nor DB-class will be rerun, at least not yet; if you fancied the AI-class, take Machine Learning instead. Ng is a great teacher. Because the course involves extensive amounts of practical coding, it is actually both fun and fulfilling, which I’m afraid I cannot say of the AI-class  – which has been informative, and at times challenging, but for me there is less satisfaction in a set of quiz grades than there is in building something and watching it work.

Technical courses

Machine Learning (Andrew Ng) – will be rerun in more or less its current form
Probabilistic Graphical Models
(Daphne Koller)- a logical successor for ml-class survivors
Natural Language Processing (Chris Manning and Dan Jurafsky)
Cryptography (Dan Boneh)
Game Theory (Matthew O. Jackson and Yoav Shoham)
Human-Computer Interaction (Scott Klemmer)
Design and Analysis of Algorithms I (Tim Roughgarden)
Computer Science 101 (Nick Parlante)  – The beginner’s guide to these strange things they call ‘computers’ and ‘code’
Software Engineering for Software as a Service (Armando Fox and David Patterson)
Computer security (Dan Boneh, John Mitchell and Dawn Song) – How to ‘design secure systems and write secure code’

Electrical Engineering
Information Theory (Tsachy Weissman) – ‘the science of operations on data such as compression, storage and communication’. Begins in March 2012.

Complex Systems
Model Thinking (Scott E. Page) – building models of complex systems.

Technology Entrepreneurship (Chuck Easley) – ‘understand the formation and growth of high-impact start-ups in areas such as information, green/clean, medical and consumer technologies.’
The Lean Launchpad (Steve Blank) –  Business models, customer development, and starting up your startup.

Civil Engineering
Making Green Buildings (Martin Fischer) – how to manage sustainable building projects.

Anatomy (Sakti Srivastava) – knee bone connected to the hip bone, etc.

Caveat emptor
As these courses are free online, I suppose that really ought to read caveat lector or caveat auditor or something, but you know what I mean. Here’s the warning: each of these courses are supposed to take over ten hours a week. Follow the Stanford AI-Class Decision Diagram with care and attention when deciding whether to enrol.

If you’re not a computer science or mathematics graduate, you will probably need to work on your maths for many of these courses. The Khan academy have very useful course material for areas like basic probability, Bayes and linear algebra/matrices.

If anybody wonders what an unspecified number of thousands of dedicated students attempting to finish a midterm exam before the deadline can do to a server, wonder no more:

Having seen it repeatedly whilst trying to fill in the midterm forms, today I see this message every time I close my eyes…

* As it happens, not all these classes are run by Stanford. Software Engineering for Software as a Service is a Berkeley course (although one of the instructors, Armando Fox, was previously employed at Stanford), Computer Security is a joint effort, and Model Thinking is taught by Scott Page of the University of Michigan.

A Brief Introduction to eBooks and eReading

eBooks have been around for almost 4 decades now. The earliest eBooks were those in Project Gutenberg, the oldest digital library, founded in 1971.  eBooks and eReaders have gained a lot of popularity in the last decade or so. A large of people are increasingly moving towards eBooks to satisfy their literary needs.


For those who have heard of eBooks but don’t really know what they are, an eBook stands for electronic book. These eBooks can be read on computers or other electronic devices. For starters, if I were to ask how would you carry your favourite books, each around 1000 pages on an international flight which you may or may not want to read again, what would you do? Yes, you do have to carry your clothes and other essentials too. In such a case, chances are, you’ll leave the books at home. eBooks were created to solve this exact problem. eBooks are mainly for portability and providing convenient and fast access to books, with newer devices supporting news, magazines and internet surfing. eBooks come with various memory options ranging from limited internal memory allowing you to store around 200 books to ones with an expandable memory of up to 64GB allowing you to store as many as 50,000 eBooks. A huge number of eBooks are currently sold by publishers all over the world.

Some of the oldest written scripts (Cuneiform script) date back to the 30th century B.C. Our ancestors drew paintings and symbols on clay and stone for very many years. We then evolved from writing on clay to writing on papyrus (made from the pith of the Papyrus plant) and other materials, eventually coming to the invention of paper which was superseded by digital information. We have come a long way from reading pictures and writings off stone and clay that can last for ages, to reading books, magazines and novels on small hand-held devices that can be erased by the touch of a button, but if not can stay for just as long. These handheld devices are capable of automatically updating news and other necessary information within a few seconds of the information becoming available.

eBook Readers

There are a large number of eBook readers currently available in the market. These eBook readers differ from each other in a number of ways such as the underlying operating system, hardware capabilities, available screen real estate, display technology and others. These eBook readers are capable of rendering different types of eBook formats; ePub, PDF, mobi, txt and azw being some of the most widely used. Due to the large number of eBook readers available, deciding which one to go for is not an easy task. Especially when some of them cost around £500. Making an informed decision by reviewing each and every device can take hours on end, seldom giving fruitful results and making it a matter of personal preference. All the devices have rights and wrongs, pros and cons, winning features and well, some not-so-good features.

ePub Format

The ePub format has gained popularity among eBook makers as it is designed for re-flowable (content presentation adapts to the output device) and re-sizable content . A large number of readers are capable of rendering the ePub format in a variety of form factors. These include traditional PCs and laptops, tablet PCs, Android and iPad devices, eInk devices in several configurations ranging from ‘paperback-size’ to A4-equivalent, mobile telephones and MP3 players, and so forth.

CC Image by Rodrigo Galindez

Metadata creation tool (IEMSR) features demo is published on youtube

A new version of feature-rich metadata creation tool demonstration is published now.
The video has a narrator explaining major features like creating Dublin Core application profile, saving that to an RDF file, publishing that to the repository. Also described are major features such as defining metadata vocabulary and user interface internationalization (French, German, Dutch and other languages are supported).

IEMSR will be a truly international DC tool

The client tool for Metadata Schema Registry (IEMSR) is “learning” new languages. Apart from English it is going to have interfaces in German, French, Portuguese and other languages. One step closer to enabling DCAPs to be built in an internationally accessible environment…