March 2017

General remarks

When I filter for time I get rather few edges. Here I would expect to be able to scan all node labels to see what is interesting since structural patterns are less relevant here. This is a different use case than looking for central nodes in a denser graph and checking what's behind them by hovering and seeing the label. But as I wrote elsewhere already, a simple overview of nodes would be helpful for a first assessment of what we see in the graphs. Could there be a smart label-visibility algorithm?

Minimap

If the node link diagrams become much larger than those in the example, I imagine this to be useful.

Advanced layer transforms

My biggest concern is controlling the interface. Of course I am not used to it yet but it feels as if I had to untangle a long cable using only a stick.

  • The controls are very powerful and versatile and mouse gesture seem inadequate - if I could, I'd use my hands for this.
  • When I revisited http://sb.perso.eisti.fr/blizaar/POC/multilayer/ to further assess the added value of the MiniMap I struggled to get back to a bird's eye view on the graph. The first thing I tried was to drag the grey layer area in to position but this did not work. Would this make sense for you as well?
  • Mouse-over highlights would help to identify paths between layers
  • Right now all edges are in black and I struggle to see any patterns. Do you plan on introducing edge bundling to make it easier to spot patterns? If I try to move the graph into position, I don't seem to be able to get a clearer view on what goes where
  • Nodes are too close to each other. If I zoom in, I can distinguish them. If I zoom out to see where the edges lead, I can't anymore
  • Related to the point above and my recurrent problem: 2.5D helps me see that there are connections between layers but only when I zoom out. But then I can't see which nodes are concerned
  • “If these transforms turn out to be too difficult to manipulate by non expert users, we can develop a number of shortcuts which position the layers to predefined spots in the scene.” – I agree, less control is more I think..

(Main contribution) Dynamically adding et deleting layers

  • When I look at the examples, it seems that again that 2.5D works best when it comes to observe high-level patterns such as one node being linked to a lot of nodes in other layers. My sense is that for the history use case we need to ensure easy inspection of individual nodes and following that: documents etc. Do you see a way to bring the two approaches closer together?
  • Query building will be crucial for the users in any way so we need to find a solution for this
  • I am very curious to see the layers you propose, there are many overlaps with DEIS Project Vision
  • But is there a way to prearrange the layers or to work with arrangement schemas between users can toggle to escape the hassle of manual arrangement?

Feedback Demo 14.6.2017

Demo by Stefan via Google Hangout with Marten

Notes

  • discussion about histograph datastructure, CVCE Backend data structure and histograph entity detection procedure.
  • presentation of new demo hampered by performance issues, only “small” queries can be run within reasonable duration - needs to be addressed in consultation with Daniele
  • Issues with very dominant nodes such as “Europe”, “European Union” etc. which have little explanatory value (we know that the corpus deals with this topic). Marten/Stefan: generate a list of most frequent nodes, eliminate nodes which are omnipresent.
  • resource list on the left side with links to documents on CVCE website is basic but gets the job done. Discussion whether the snippet preview in histograph and/or other histograph elements could be reused with reasonable effort, no conclusion at this stage
  • Node size dependent on time filter works well and is effective. Degree is the preferred centrality measure due to its simplicity in communication with end-users.
  • Marten: request for an interactive node list to accompany every graph. Sortable alphabetically and by degree centrality. Allows users to quickly get an overview of which entities are present. Clicks on items in the list highlight the corresponding node in the graph including their neighbours
  • Creation of new layers by means of a context menu is straightforward. A simple yet very effective means to keep an overview of exploratory process are the auto-generated captions for each layer, e.g. “Ego network of “Robert Schuman”. Other filters such as time or specifics of DOI function should be documented as well. It is important for users to understand what they see.
  • Discussion about 2.5D elements and the problem to control layouts and to follow inter-layer links. Organisation of multiple layers by default in 2D on a canvas may be easier to control and navigate. Option to activate 2.5D view to explore inter-layer links is preferable.
  • graphs of units now display links between units in different ePublications but also relationships between units and higher-level units in the same ePublication. The latter is less relevant and may be dropped to avoid confusion.
  • We raised the question, whether it make sense to merge units which have the same contents but different DOIs? On further reflection: No, this reflects the data in Backend which remains the point of reference. Such units point to reuse of existing work in new contexts and should not be removed.

Action points

  • clarify which entities can be ignored (Marten, Daniele): The
  • clarify how performance can be improved (Marten, Daniele)
  • generate top node list for elimination (Marten, Stefan, Daniele)
  • specify contents of layer captions (Marten)
  • implement interactive node list (Stefan)
  • toggle 2.5D / 2D (Stefan)
  • focus ePub graphs on inter-ePub links only (Stefan)
  • merge organisation and institution and social group nodes? (Marten, Daniele)
  • does it make sense to merge units which have the same contents but different DOIs?

Proposed fix for Performance issues

From Daniele:

Each resource node should have the stdf property (sum of appears_in.frequency for the connected entities) That is, once an entity is linked via the appears_in relationship, the appears_in must contain the frequency property as integer.

Moreover, each entity has a df property which displays the number of documents it appears in.

do: Refresh document counts (v:variables {scope:'tfidf’}). num_of_docs then refresh stdf property https://github.com/CVCEeu-dh/histograph/blob/master/queries/similarity.cyp#L16

then: Refresh entity counts https://github.com/CVCEeu-dh/histograph/blob/dbe4984c7a0949483112bb46c8d2c6a531858e09/queries/similarity.cyp#L29

do TFIDF computation on appears_in relationships https://github.com/CVCEeu-dh/histograph/blob/master/queries/similarity.cyp#L35

pre computation of jacquard distances between entities: https://github.com/CVCEeu-dh/histograph/blob/dbe4984c7a0949483112bb46c8d2c6a531858e09/queries/similarity.cyp#L100

Proposed fix for duplicate entities issues

To merge organisations, institutions and social groups (which are essentially the same thing but different labels please try:

From Daniele:

normally you rename label with Match(n:organization) REMOVE n:organization SET n:institution; this is how I’d proceed: add institution label to all organizations Match(n:organization) SET n:institution; then Match(n:organization:institution) REMOVE n:organization

Data cleaning

for entity, person / theme / location:

all is_part_of relationships between entities and (sub)units can safely be deleted

for cvce, table / diagram / audio / map / text / photos:

These need to be added to histograph at a later stage. They need to be run through the discovery process and we can't put Daniele on this currently. We will prepare a new dump in time which contains these nodes as well. Once we do advanced user testing these will be a bit more important but at this stage, we can ignore them.

Some nodes are over-prominent without promising to deliver deeper insights. These are generic country names, cities with strong European ties (Brussels, Strassbourg). EU institutions such as the European Council are also very dominant but should be kept for now given that they promise deeper insights. Selection is based on the top 100 most frequent nodes in the histograph dataset.

Download csv file with entities to delete

Feedback Demo 2.8.2017

Demo by Stefan via Skype with Marten

  • toggle 2D / 3D view
  • toggle showinteredges
  • new design
  • filter: time-slider
  • filter: resource type
  • clone view - duplicates a view
  • freeze view - preserves information independently from filters - allows comparisons of two views
  • labels appear dependent on zoom
  • currently we don't display nodes with a degree higher than 500. This will likely frustrate users for whom such nodes are likely particularly interesting
  • performance slow down the process of creating subgraphs. MD to schedule a Skype call with BLIZAAR and Daniele to discuss solutions.
  • due to performance but also to avoid clutter, currently only the top 20 nodes (which ranking btw?) are being displayed. It is not possible to display all nodes in a subgraph. It is also important to make any ranking or selection as transparent as possible: option 1) We give users control of pre-selection of nodes. 2) We use a simply ranking and users need to understand why they see some nodes and get a sense of which nodes they don't see. 3) either way it should be possible to review hidden nodes to get a sense of what is missing.
  • highlights of node neighbors becomes problematic in dense graphs
  • bug: context menu disappears
  • depending on the use of filters it becomes possible that the ego node in a network is no longer visible but only its neighbors. This is consistent with the filtering of the subgraph but will likely irritate users. One solution is to highlight ego visually to make the absence immediately noticeable
  • adjustment of node sizes to represent the number of associated resources could be improved. In case of duplicate nodes, node size helps us to identify which node is the more promising one to explore