Principles

This project was established to bring methods and technologies forward to achieve new capacities within government for the use of government structure and functions data. The core idea being that descriptions (datasets) of government structure and functions (also a type of structure) could be presented in ways allowing for both interoperability (associations, cross-walking) and that the set of datasets established here could act as a spine, that is a set of government structure/function reference datasets for other data to associate with, see below.

§ Spine

There are a series of datasets in use at by the Federal government used to describe current and past government structure/function. These datasets are related - they are all about real-world related concepts - and share many similar structures and specific objects but they are neither identical, deliberately mapped to one another or even managed by a single authority. For example, we have the Australian Government Organisations Register (AGRO) database, managed by the Department of Finance, and also the Commonwealth Record Series (CRS) database, managed by the National Archives of Australia (NAA) which both describe government structure but they are not made to work together. This is mostly due to different foci - AGOR is about government now, CRS about government in the past - and partly due to having different host agencies - nevertheless, the can, and optimally should, be linked up for cross-querying to support government research into structural chang eover time, among other things.

A data spine, as it is conceived of in this project and the Location Index, is a collection of datasets and methods for data presentation that act as an anchor point for other data. Spines are created around a theme and LongSpines's spine centers on government structure and functions.

LongSpine then (re)presents existing datasets about government structure/function in interoperable, machine-readable formats which allow for the best technical access to them. It also presents, for the first time, mapping datasets that allow for cross-walking of these structure/function datasets. These Linksets (specialised types of datasets that link other datasets together, see below) are managed independently from the original datasets allowing for independent governance of mappings.


Figure P1: The core spine created by LongSpine (in grey). Items within the spine are linked to one another and further links between spine items may be assumed by joining multiple links together. Here, Govt Structure Database X may be crosswalked with Govt Functions Database Z by following links made via Govt Structure Database Y. Things outside the spine, here Other Data, may be crosswalked to Yet Other Data, also outside the spine, if both external datasets are associated with parts of the spine.

§ Datasets and Linksets

LongSpine contains a series of Datasets - databases and collections of information - about government structure, such as the Commonwealth Record Series (CRS) database. It also contains a series of vocabularies of government functions, such as the Administrative Arrangement Orders. While these vocabularies are different in size and structure to the government structural datasets, we will refere to them also as Datasets for these Datasets and vocabularies are all managed in a similar way: as a single, homogeneous dataset, with few connections to other Datasets. For example, the National Archives of Australia manages the CRS database as a singl, stand-alone Datasets not dependent on any thing else that NAA doesn't also manage.

LongSpine also contains a series of datasets that join other datasets and we call these Linksets. Linksets are an interesting sort of dataset since they potentially cross jurisdictions - from one agency's Dataset to another. It would be possible to create multiple, different, joins between Datasets with different methods or with different levels of authority. For this reason, Linksets are published independently from individual Datasets. An example of a cross-jurisdictional Linkset is the AGIFT/COFOG-A Linkset which joins the AGIFT and COFOG-A vocabularies (remember, these are Datasets!) published by the NAA and the Australian Bureau of Statistics respectively. Figure P2 below shows a conceptual view of a Linkset.

The formal definitions of Dataset and Linkset, as used by LongSpine are taken directly from the Location Index (LocI) project's defintion given in it's top-level ontology:


Figure P2: An example Linkset ("A/B") between Datasets A & B. The Linkset contains both dataset-level metadata (who published it, when, what methods were used to generate it) and a series of links that actually joint the target Datasets. Here Dataset A element A2 links to Dataset B element B1, and A4 to B3.

Since Linksets are presented separately from the Datasets they join, multiple Linksets can be published that join the same Datasets. If two methods, X & Y, were used to join Datasets A & B in different ways, you could publish two Linksets, as per Figure P3.


Figure P3: Two Linksets, A/B 1 & A/B 2 joining Datasets A & B created using different methods.

Due to the way graph database systems such as the LongSpine cache work, users can use which elements are used to answer queries they pose. This means that a user could query a system containing the elements in Figure P3 and decide which of the two Linksets are to be used to make joins. This allows any/all joining methods to be implemented but then to be used only where appropriate.

§ Identifiers

With multiple Datasets and Linksets constituting LongSpine's spine, it's important to be able to unambiguously identify them and the smaller elements within them, such as the Australian Government Organisations Register (AGOR)'s representation of the Department of Prime Minister & Cabinet. It's also a very useful thing to be able to discover more information about items identified. We are using Uniform Resource Identifiers (URIs) - essentially web addresses - as universally unique and resolvable (clickable) identifiers for all items in LongSpine, large (whole dataset) and small (individual data items). This is in contrast to, say, just using local database Primary Keys or data codes or even UUIDs for item identifiers. Use of URIs for identifiers is part of established Linked Data practice, see below.

Some examples of LongSpine identifiers are:

Most of the URI identifiers used in LongSpine are based on the linked.data.gov.au web domain that was established to provide long-term stable web identifiers for Australian Government data. See the Australian Government Linked Data Working Group's governance web page for more information. Some other domains in use are agency-secific, such as data.naa.gov.au - the National Archives of Australia's data subdomain.

§ Semantic Relations

Semantic Web data models use typed relationships between objects. So, while we can say that a Thing may be partOf another (larger) Thing, we can specialise this relationship and perhaps say that a particular CRS CommonwealthOrganisation is a subOrganizationOf of a CommonwealthAgency. Not only is there a very rich set of mechanics within the Semantic Web's methods to make specialised relationships, this project utilises many specialised relations already defined within the area of organisations and functions, such as the Organization Ontology's subOrganizationOf.


Figure P4: Specialisation of part/whole relations for firstly an Organisations context and secondly the more specialised CRS context

The sort of specialisation of relationship, as well as the Semantic Web's ability to specialise objects (an Organisation is a specialised tpy eof Things) allows projects like LongSpine to implement specialised models of things and yet also generalised models that allow for interoperability across specialised elements. For example, where we have Government Entity in AGO and we have Commonwealth Agency in the CRS, we can deal with each as is - with all the properties expected of them in their orignal datasets - but also deal with either as a a specialisation of an Organisation and thus, at a certain level of abstraction, interoperate across the datasets on that basis.

§ Linked Data

Where the Semantic Web is a conceptual way of relating data, Linked Data is a set of mechanics for actually implementing Semantic Web data over the Internet. Linked Data relies on data being modelled according to models such as the Semantic Web models LongSpine uses and for elements to be identified with URIs, again as LongSpine does.

LongSpine uses Linked Data mechanics to allow the the data and models that make it up as to be presented as distributed elements, accessed over the Internet. This allows different agencies to publish point-of-truth data models and have it work together. For example, the Portfolio Budget Statement (PBS) data can be delivered by the Department of Finance using the system of their choice and yet still be technically, and instantly, interoperable with the CRS data from the National Archives of australia delivered through different chanel.

Systems such as this LongSpine DB can cache information form disparate sources for ease of use but caches such as this are not to be regarded as points of truth for the data - they are not the spine, only a temporary utilisation of it. The spine exists as the collective whole of the distributed datasets and models.

§ Time

Time (temporality) is one of the conceptual pillars of the LongSpine project. The organisations and functions that make up the core data of the spine are related structurally - Function X is performed by Organisation Y - and also temporally - Function X was performed by Organisation Y at time z or Organisation A was the precursor of Organisation B. A sophisticated handling of time by the spine will allow users of it to integrate datasets with time-based queries.

To facilitate a powerful handing of time, elements within Datasets and Linksets within LongSpine are associated with temporal objects (instants and ranges) to indicate their real-world temporal natures. This is in accordance with the way Time Ontology in OWL models time. Some examples:

The first point above just indicates how an information object - the CRS record about Paul Keating - represents a relationship that was in effect for a certain time. The second point indicates how particular time intervals can be named - here the time period in which Administrative Arrangement Order #80 was in effect - and other objects associated with them.

By naming important time intervals and instants, we can calculate things using them without having to actually check specific days. We can, for instance, ask "Which where all of the government agencies responsible for agriculture during the Hawke Prime Mininstership?". Figure P5 below shows some of the relations between objects we can have as a result of using the Time Ontology in OWL.


Figure P5: Possible relations using the Time ontology in OWL. A. an object can be associated with a time interval to show it has real-world temporality. B. A relationship between objects can do likewise. C. Objects can have temporal relations between them such as before and after. D. & E. Further temporal relations are intervalMeets (Y starts where X ends) and intervalDuring.