[retired] Feedback on demos #1

Demos by LaBRI and EISTI (Pau) February 2017

This document contains feedback on the first two demos by LaBRI (Bruno, Antoine) and EISTI Pau (Sébastien, Joachim) as of early February 2017 together with some additional thoughts. It is great to see our discussions and ideas materialise. For me this makes it much easier to think things through. I have added notes and suggestions for priorities below.

Both demos made it clear to me that we also need start thinking about a query building system which bridges the gap between user-defined research interests and the need to get users to set parameters for algorithms (for now DOI). For the query-building I wonder how we can get an effective interplay between user input (as in known-item searches) and user selection from graph-based recommendations. More on this in the last segment.

This is particularly relevant for tasks 1 and 2 in the [[project_vision|project vision]].

  1. Pau and Bordeaux(?): Revised layout for nodes which is easier to read and less complex. Could parallel coordinates be part of the solution?
  2. Pau: The relationship between node-link-diagram and matrices could be made clearer, how can we make it easier to understand?
  3. Bordeaux: Allow more complex queries and allow for quick assessment of which nodes and which node types have been selected. Either list view or csv export (list view preferred).
  4. Marten: Are co-occurrences in our case best computed on sentence, paragraph or n-word-window level? These could be transformed into multilayer data as well (increasing likelihood of a meaningful relationship).
  5. Bordeaux and Pau: Create a bridge between data visualization and content: Can we add links to the documents to be displayed e.g. in a browser for further assessment?
  6. Marten: Find out about best practices in evaluation of such applications and algorithms.
  1. Time step visualisations: works well and is easy to follow
  2. Can we use viz. of dynamic graph change to compare similar graphs? Just a thought: Right now, dynamic graphs focus on change over time only. Does it make sense to apply the techniques for the display of temporal change also as a means to compare graphs, e.g. the graphs of two persons?
  3. Layout: Currently the demo does not use layouting in between which helps create a mental map of nodes for the user. But the distribution of nodes is arbitrary and different node types are mingled which makes it hard to read. See the segment below for thoughts on how a stable map of different types of nodes could be combined with dynamically generated graphs/matrices.
  4. Matrices: I found it hard to understand at first which data is represented by the matrices and how they play out their strengths compared to node-link-diagrams.
  5. Afterlife of persons: We discussed a bit the distinction between actions which can be attributed to a person (writing an article) and references to persons after their death. No action points.
  6. Self-loops: are irrelevant for us at this stage.
  7. Parallel coordinates: I suggested parallel coordinates as a means to keep different types of nodes in groups and to allow a fast scanning of label names, this is inspired by Jigsaw’s list view which could be built upon? A bit more on why I like it in the segments below.
  8. Detect change in nodes: We discussed signals of change with regard to Task 1 diversity and continuous coverage: How do we know that we should draw a user’s attention to a node? One suggestion was to look at centrality scores of a node: if they rise or fall abruptly relative to their previous centrality, this might be worth flagging them. Alternatively, nodes may leave clusters of which they were part for several time periods and move into another.
  1. On DOI: DOI is best suited for the creation of ego networks from scratch and driven by user interest. For these first demos and with regard to the project vision, Task 1 Diversity and continuous coverage is best suited for the DOI approach: We strive to get a general overview of the presence of a node (person, institution etc.) in the corpus.
  2. DOI and Project vision Task 2: The particular strength of the DOI-approach, free-flowing definitions of user interest, may also become relevant for task 2, “Search by tag” at a later stage. User interest can be either user-driven (“I already know which nodes I would like to include”) and/or assisted by recommendations (“I select nodes from a list of recommendations”).
  3. Demo and documentation: No problem to run the Python scripts in Tulip, the instructions by Antoine are clear. The only bit I struggle with are the last two slides on the current setup of the algorithm.
  4. More precise queries needed: While testing, I realized that for an assessment of “meaningfulness” of the DOI, I need to be able to run more specific queries. I need to apply additional conditions to further narrow down time periods and have “co-occurs with” AND and OR conditions to create more specific queries. If this is hard to implement in Tulip at this stage, we can discuss more complex demo queries and ask you to extract subgraphs directly.
  5. Hard to spot (multimodal) nodes: Same as in Pau-Demo: issues with distinguishing node types. I need to get a quick sense of which nodes and node-types are present in a subgraph. Reading labels of documents is hard because they are too long but abbreviating them does not make sense.
  6. Enrich with captions and links to CVCE docs: Easy access to the captions and ideally a link to the original document. As a temporary solution, could we either create a link to histograph (preferred) or CVCE.eu (already based on example of a CVCE URL using CVCE-DOIs)
  7. Spreadsheet: To remedy this, I tried the spreadsheet view to get an overview of which nodes are present. But it only shows a limited number of characters, I can't read captions and longer titles.
  8. Spreadsheet: How can I download csv files of the spreadsheet view to show it to my colleagues? If I just copy them from the spreadsheet view I seem to get different data.
  9. IDs: I used the slug (“konrad-adenauer”) as unique identifier to find specific nodes, but this does not work for ePubs or documents of course
  10. Need to filter node types: ePublication nodes are dominant (center node in the screenshot below with node “konrad-adenauer” in blue).

Tulip bugs?

This happens when I add a panel to Tulip on Mac. Is this a bug? Is there a better way to arrange a side-by-side view of graph and spreadsheet?

This happened when I tried to re-add a graph panel

Some thoughts on querying and graph representation

All of the following illustrations are meant to help me express my thinking about the problems I experienced with graph visualisations so far. They are not intended to be starting points for future discussions.

Problems:

Classical node-link diagrams especially of multimodal networks require users to:

  • Read node labels to get an overview of what is there
  • Look at node symbols/colours to understand what types of nodes are there
  • Often hard to find out what different types of edges mean
  • Often hard to find out what clusters and central nodes mean
  • Updating layout algorithms move nodes around, hard to locate them, no mental maps

Solutions for this particular context?

  • Sortable lists make it easy to understand which nodes are on display (Are all the nodes I expect to see there? Which nodes did I not expect to see there?)
  • Keeping them in visually distinct categories helps orientation
  • Edges are relevant the moment one focuses on a node, i.e. already has an interest. Node-link diagrams show a lot of edges we might not care about. There might be better ways to display centrality and clustering of nodes? Node-links should be used for small subgraphs we understand and which are easy to read – no hairballs.

The concepts below should have a dual role assigned to nodes and edges:

  • preview nodes and relationships and let the user move on and explore others which are more interesting
  • select nodes and relationships as part of a query

Is it a good idea to implement both functions in parallel?

We have to deal with various entry points into the corpus. In histograph we don't distinguish between different types, everything is captured by one search bar. I wonder whether it is better to be clear from the start, about which node type is which? For the following concepts, I did this.

Overall I think we deal with two main strategies of our users:

  1. A user knows exactly what he/she wants and searches
  2. A user learns along the way about ways to improve, enrich, specify queries and maybe to reconsider original assumptions

For the second in particular, the underlying graph will be powerful.

What follows is inspired by the List View in Jigsaw which I like a lot for these reasons:

  • Very quick overview of (in our case:) which nodes are in the graph, sort alphabetically, by degree etc.
  • Visibility of links between nodes follows user attention: if I am interested in a node, a click/hovering makes the links visible. I am not overwhelmed by a lot of links which I don't know what they mean
  • No complex spring-embedder layouts which require background knowledge

In the development of histograph we came to the conclusion that we should work with lists or enhanced list views for as long as possible. Only then we want histograph to show spring-embedder layouts.

Start searching for a person, institution etc. and select it, e.g. Pierre Werner

Once an entity is selected (e.g. a person), related institutions, places and ePubs are displayed together with e.g. frequently co-occurring other persons. Highlight strength of relation (based e.g. on appearance in the same sentence, relative frequency, number of documents etc.) as well as relevant time periods (below). Option to sort alphabetically, expand to see all related nodes to balance between recommendation and free-flowing exploration.

As a laymen, to me it makes sense to combine search, filtering and recommendations in general. What are your views on this? Should they be kept more apart?

User interest is here expressed by selecting additional nodes and specifying a time range. If we want to for example let users also pick A Priori interest, we need to come up with a good metaphor for the consequences of different algorithms.

Users select nodes based on their existing knowledge as shown in the slide before. Example: “I want to know about Pierre Werner”

For each node types there are recommendations which users can choose to add to their selection or ignore (“EU Parliament” as an institution, “Paris” as a place etc.). These recommendations improve as more and more nodes are selected.

Elastic lists are a nice example of how to combine similar user actions with more and more focused suggestions. I understand that Elastic Search follows good practice inasmuch as it goes from the general and get more and more specific along the way. I think that this would be a very powerful feature in itself.

In our case however, we could try to go one step further and give users prompts to revise their original choices where this will yield more promising results.

These updates point users to other promising choices which they could make and let them reconsider or expand their previous assumptions: Example: If you are interested in Pierre Werner, EU Parliament, Paris in the 1970's, you should definitely consider adding “Francois Mitterrand” as well. Or: If you are interested in “EU Parliament, Paris in the 1970's” you should consider removing “Pierre Werner” and add “Robert Schuman” instead.

A nice way to work with the problem of allowing users to revise their original premises is the way with which some flight search engines allow you to see alternative, yet still related options for your flight dates: Fly two days later and save a lot of money.

As I was working with the demos, I could not help but imagine what a future interface would look like. As stated above, one of the big problems with node-link diagrams for non-network people is the ever-changing layout which makes it hard to create mental maps. Something I experienced as pleasant in Pau's demo was the absence from a constantly updating layout which allowed me to develop such a mental map.

So I wondered whether we could work with a semi-stable layout:

Left: An easy to read overview of the current query so that users know what they see which can be tweaked (add/remove nodes, time periods etc.)

Center: Persons, Institutions, Places, ePubs will always be placed in a fixed segment of the canvas, as a user I can rely on this. What changes is of course which nodes are shown. Within the respective segments we could also make use of different visualisation techniques: for small and sparse graphs node-links, for dense networks matrices etc. Hovering or clicks would reveal links between nodes, other means could highlight clusters or centrality scores. I would like to propose that links are only visible following user actions. This as a means to avoid overwhelming graphs and to create a deeper understanding of user's actions and their consequences. What is your take on this?

Everytime I see a graph, I find myself scanning the labels to achieve just this and it is never enjoyable. Where there are many nodes, a list view could therefore facilitate a quick overview of which nodes are there.

## update 14.1.2017: possibility to toggle different graph viz like node-links, matrices, lists?

Right: A document viewer which bridges the gap between data visualisation and original content.