LIST Platform Tools and Techniques

Introduction and Motivation

The platform tools and techniques developed by LIST were design to be fully utilized by both the digital humanities and biological data usecases, as well a provided a reusable web based architecture, available to collaborators as described in the project architecture documentation, and a homogenous interface to the datasets, as well as access to server based processing of the data . The system also provides user access control to data sets and use case specific views and functionality as well as a infrastructure for the storage of in progress analysis.

The BLIZAAR tool developed by list also as part of these view provided novel, and above all useful, visualizations and analytic techniques. It allows for multiple different visualizations of the data ( customized to specific use cases) using an underlying data structure, developed specifically for the multilayer network use case. It embraces the concept of multilayer networks, allowing users to explore layers categories by different aspects of the data. The top level tool bar allows users to specify which aspect of the data (primarily for the Digital Humanities data), and which layer is currently selected for views which focus on a single layer).

Users and Data

Both data sets are stored in the neo4j back end. In order to avoid users from one use case seeing each others data sets are defined as being only view able for a specific user data group. Views for visualizing the data are also restricted by group , ensuring that users get the variant of a view that is most applicable for their data set. Datasets and views can be asigned to multiple user data groups. Adminsitration View Each user of the application requires a username and password to log in. The LIST BLIZAAR application is server over https providing security of data. User provided passwords are salted and hashed, so no sensitive data is kept in the system. The application administrator can create and assign the user to a specific data group through one of the admin views.

Views and Visualizations

In this section s we describe the views and visualization on offer to the LIST implementation for BLIZZAR. As we have provided functionality for both user cases some views are tailored towards a specific use case, and there may be one variant for each use case. When a user logs in they are presented with a different set of views for their data-set.

List View Approach

Click sort example Global Sort example

When the user logs in they will need to query the back end system for their data . A list view has been created for each use case which allows the user to query the back end systems, and then quickly inspect the results (which are automatically divided into a set of layers, depending on usecase). The primary differences between the two variant mainly concern the loading and layering of data.

However there is much common functionality between the variants. When a user hovers over a node, all of its neighbors in the current layers and those in the other layers are highlighted, as well with other copies of the node itself if it exists in more than 1 layer. Due the large scale of our data sets it is not possible to show all neighbor nodes on screen therefore the nodes in each layer are reordered so that the neighbors of the clicked node appear at the top os the list.

Using the side menus the individual layers in the list view can each be resorted by any attribute or type. The sorting can be a local sort within the layer itself or a global sort, where nodes are positioned at the same point in the layer if they are common (see image above) .

Digital Humanities Variant

Digital humanities variant of the list view Defining Layers for the temporal aspect

For the digital humanities variant, users can select a specific entity type (person, organization, location, resource, etc.) and query for that entity and all related items from the back end. As show in the figure, drop down menus help guide the users on the different entities in the data set. When an entity is selected, such as a person, the system retrieves all people, organizations, documents, and locations related to the selected entity and divides them into layers (based on those four categories). Right clicking on an entity will bring up a data panel concerning an entity, containing information such as links to the original text, or images, and reference links (if available)

Once the data as retrieved the entities are displayed as a list of items color coded by layer, one single column list for each layer ( hence the name list view). The layering on display in currently by types but it is possible to layer by another aspect of the data for the digital humanities. Every document is related to a specific date of an event or publication, allowing the data to be layered in a temporal fashion. Therefore we provide temporal layering functionality, allowing the user to inspect the dates associated with the data. The users can then define the date ranges for each of the temporal layers.

Biological Data Variant

Biological Variant of the list view For this variant users need to be able to specify constraints on the data, As can be seen in the image there are two relationship constraints specified. Additionally the layering requirements are different for the biological case, as there is no temporal layering explicitly required, and in some cases the biologists want to focus explicitly on the edge relations between proteins and metabolites (e.g. “The catalysis layer”). The entity data information for the biologists displayed on a right click also differs significantly. Some biological data sets contain sets of attribute data, which appears as a bar chart in the data panel for the entity (as can be seen in the above image).

Data Overview

Multiple layers and Aspects

The definition of layers and the aspects of the data used to characterize them. As part of this project the LIST team explored possibilities for developing a generic solution for layer definition, as part of the data query at the front end as part of the back end data querying. However, ultimately we decided to define a set of layers for each use case, based on the explicit requirements of each.

Digital Humanities Use Case Layer definition

Over the course of the project, the digital humanities use case evolved to contain multiple aspects. The primary aspect concerns a division by node type. In this aspect nodes are divided by type and appear in only one layer. The types were , people, organisation , location and documents, where documents grouped several types of resource together. Due to the fact that there were several different types of resource merged together this led to there being some duplicate resource nodes, where there were two nodes of differently underlying types. Cleaning of data was beyond the scope of the BLIZAAR project, due to the considerable effort and the fact that it would raise research questions of its own, beyond the domain of multilayer networks.

Time is also of critical importance for the analysis of historical networks, and hence there is an aspect where layers are characterized by time. For the histograph data set, nodes were added to layers based on the time of their associated document / resource (or their time, if they were a resource themselves). By default, the time layers are defined based on date ranges that will result in approximately two to five layers based on the time range of the data being requested form the back end. However, this is just to provide an initial layering within the aspect, as described in section on the digital humanities variant for loading data, the user can inspect a bar chart of node dates and manually define time ranges for the layers. The stacked bar chart shows the distribution of entities by year.

 The temporal aspect of a data set, using a global sort by organisation type.

As each entity can be mentioned in more than one document, each entity can appear in more than 1 layer within the temporal aspect, as seen in the above image. The list sort in a global order, where a node appears in the same position across each of the layers, and there are gaps where node do not appear in a layer.

The final aspect is based on data source. This for the case where where the users has made multiple queries to the back end. This allows users to compare the ego networks of entities in the histograph dataset.

As with all of the list views connections between layers are highlighted on mouse over, and the list can be resorted based on connectivity.

The biological use case currently has only a single aspect defined. The functionality exists to open up the bioloigcal data use case to multiple aspects, however at the moment the needs of the biologists do not require multiple aspects.

Digital humanities variant of the list view Defining Layers for the temporal aspect

As an alternative to the specific query for an entity approach described above, we also developed a visual interface to create queries that would return a working data set for the user to visualize as a multilayer network in the front end.

A list of available entities allows the user to draw a meta network, illustrating the entities to be retrieved. Clicking on an entity in this meta-network displays all the list of all entities related directly to the clicked one in the back end system. The size of the arc is proportional to the reality numbers of each entity type (see image on right above)

It is also possible to combine aspects of layers to form their intersection , as can be seen in the image below

In this case the aspects describe the entity type aspect and the temporal aspect have been combined to visualise layers showing edges between people in different time periods. Hovering over an entity highlights it's neighbors in the current layer, and all others being displayed.