Archive for the ‘649week09’ Category

649week09 Divided Discussion on Evaluation

0 Comments

Since we divided the InfoVis class into two discussion groups on evaluation, I wanted to share a little bit about my group’s discussion and invite others to comment on what they took away.

One thing that stood out for me in evaluation was that not every situation calls for InfoVis. There was a prominent story in the news this week about the effectiveness of Jon Stewart’s presentation on CNBC’s financial predictions, linked below. The stories I saw emphasized that Stewart used no access to CEOs or experts, just widely available video clips and briefly stated, widely reported statements, shown simply in white type on a black background. This was certainly a case where information overload challenged a careful study of quantitative information, but certainly not a case where visualization of results was needed. I can easily imagine creating a visualization from the given evidence, but I can’t imagine how a visualization could be more effective than the simple presentation linked below.

(picture links to video on comedy central)

(picture links to video on comedy central)

One student brought up visual cues and the challenge of describing or evaluating them. The same student raised contrasting views of InfoVis as something to interact with casually, one time, say in an online news source providing surprise, and a long-term system needing to support continual surprise over years.

Domain knowledge is a big issue and an issue that looks bigger when making InfoVis for analysts whose domain knowledge is elaborate and acquired over years of expertise development. Herbert Simon claimed that anyone could be a world class expert in any subject given 15 years of devotion, by the way.

When should we not use InfoVis? Could machine learning be more appropriate sometimes? (it was this issue, raised by a student, that made me think about the Daily Show incident mentioned above)

We discussed the issues of choosing alternative techniques and supporting exploration and the subject of surprisingness.

An acute observation by one student was that social computing technologies allow us to find out about what people are thinking without having to ask them.

649week09 Farnsworth Treemap Re-revisited

0 Comments

As a followup exercise, I’d like to revisit the Farnsworth treemap you designed last time. This time I would like you to work on the computer, using an application of your choice, and design a color scheme for your treemap. Please produce a one page summary of your activity in pdf format and put it in the shared folder with your names on it. This page should have four or five images on it, as follows: there should be two or three images of the treemap in different states, there should be a legend, showing what the colors mean, and there should be an image of a color scheme you copy and paste from an external source, to be explained next.

colorbrewer

Begin by looking at ColorBrewer, illustrated above. This tool helps you choose related colors, offering five-color palettes showing colors related as sequential, diverging, or qualitative. Pick one or more and copy it into the document you will share with us. If you don’t like any of these, visit kuler.adobe.com and pick a palette from there. Put that into the pdf you will share with us.

bathymetric-legend

Next, extend the palette. Add at least two colors to it. Assume that you need at least seven colors and you’ve been given five colors to work with. You may notice a problem, depending on the kind of scale you’ve chosen and the relationships in the data you intend to explore. Consider the legend above, reprinted from the given source by Tufte. You probably perceive a linear relationship between the colors in this legend. You may be surprised if you apply Digital Color Meter or Art Director’s Toolkit to analyze the color components. What you will see is that, to achieve the perception of linearity, non-linear amounts of components have been added (or subtracted) at each level.

Finally, create two or three images of the treemap in different states to illustrate how the color scheme you’ve chosen supports the goals you’ve chosen. Put these on the same display as the legend and the original palette and save as pdf and put into our shared folder. Feel free to modify aspects of this exercise as long as you keep to the spirit of (1) making choices about color and (2) extending widely available canned color choices.

649week09 Readings in InfoVis Evaluation

0 Comments

InfoVis has been around long enough for the community’s attention to fasten on evaluation as a topic, rather than to subscribe to the evaluation methods inherited from constituent disciplines. The pair of papers by Plaisant (2004) and Shneiderman (2006) illustrate this questioning and the synthesis of a technique, MILC, for multidimensional indepth longterm case studies, arising from this questioning. The 2004 paper describes several existing techniques with different strengths and weaknesses, and promoting field studies. The 2006 paper provides some detail for integrating various techniques into a new, but potentially much more expensive technique, MILC. What do you think of the predicted trajectory: modest MILCs followed by more ambitious ones. Is it reasonable to assume that popularity of the technique will lead to developments that reduce its cost? What are the ideal conditions for MILC to succeed?

vast-eval

Plaisant (2008), in what may be the most useful paper you will see in this class, describes an InfoVis contest and, in Section 4.6.1, describes evaluation challenges. This section includes many provocative statements worthy of discussion. For example, Plaisant admits that it’s hard to keep track of different, mostly visual, artifacts when judging. How would you address this problem? What do you think of Plaisant’s proposed solution (shared environment)? Another issue pervading not only this section but the entire contest has to do with the magnitude of what is being evaluated. The effects being studied may be overwhelmed by other environmental features. Plaisant, by the way, refers to the VAST reading we discussed previously and whose evaluation model is illustrated above.

Another approach to evaluation is described by Tory (2005). Heuristic evaluation should be painfully familiar to most of you, and this should be an interesting opportunity to see a different community adopting them. Given your experience, do you see anything missing from this paper? (Hint: How does Nielsen justify the particular set of heuristics he describes?) On a related note, how might you answer (or integrate) the criticisms in Thimbleby (2007)? This paper, like Tory’s, may suffer a little from a sketchy understanding of user-centered design and beliefs that are enhanced by a lack of frequent contact with it. For example, what is your view of the iterative design cycle shown in Figure 9?

Buring (2006) shows how you could evaluate an interaction design on a small device without the small device. It is for you to consider whether the simulation described, using a device for which we have a good proxy in DL1, overcomes the problem of not actually using the device. This paper provides a good introduction to methodology (but see Amar (2005) for an example of how to push beyond the technique. It shows a very common set of priorities in looking at task completion time and preference.

analytic-gaps

How can we enrich evaluations? Kobsa (2001) evaluated three commercial InfoVis systems using an experiment and a method that’s a good model for understanding how InfoVis features lead to outcomes. It’s worthwhile to look at this study to see how you can overcome some of the methodological problems your own intuition may suggest. Nevertheless, there are limitations to evaluating InfoVis artifacts in this way. Amar (2005) provide some insight into these limitations. Amar and Stasko (2005) criticize InfoVis evaluation in general as focused on representational primacy: how well do you get the information via the information representation? They introduce two kinds of gaps left unaddressed by evaluations respecting representational primacy, a worldview gap (what is the right data? what is the right presentation design?) and a rationale gap (how strong are the relationships shown? how confident are we in the usefulness of relationships shown?). They show how the Kobsa evaluation could benefit from considering these two gaps.

649week09 Evaluating Information Visualization

0 Comments

How can we evaluate a given InfoVis artifact? How do we know what a given InfoVis is good for, if anything? Can we use intuition? Without any study at all, we might just start thinking about what can be measured. We can ask someone to look at an InfoVis and observe them. We can ask them afterward if they liked it. We can ask them questions that we think the InfoVis might help answer and see if they can answer the questions better than if they did not use any InfoVis. We can look at the advertising for a given InfoVis, then look around and see if we see something that advertises the same thing, but is not InfoVis. Then we could compare them, using the above list.

If we try to do any of the things listed above, we will soon find many pitfalls plaguing every one. You can probably envision better and worse ways to do every single thing listed above. If we discussed them, you will soon find that there are pitfalls you didn’t think of, but that your classmate brought up. You might think gathering a group of people would improve the ideas about how to evaluate InfoVis. What if all the people you gathered were cognitive psychologists and software engineers? Do you think they would systematically catch some problems and overlook others? Do you think they would be better equipped to handle some problems than others?

top-papers

InfoVis people are, by training, a much more diverse group today than they were ten years ago. Consider the above picture, from the InfoVis 2004 Contest to visualize the history of InfoVis (Fekete, J.-D., Grinstein, G., Plaisant, C., IEEE InfoVis 2004 Contest, the history of InfoVis, www.cs.umd.edu/hcil/iv04contest (2004).). There were several winning entries, including the one from which this picture and the following one are drawn. The picture above answers the question of who wrote the most frequently cited papers. The subsequent one combines several features to arrive at some idea of influence. Although we see that George Furnas and George Robertson wrote the two most cited papers in the picture above, we see from the subsequent picture that Shneiderman has written the most papers and has the most coauthors. Independently, we may find that he has graduated far and away the most Ph.D. students, increasing both his number of papers and number of coauthors. We can also see the strong tie between Card, Mackinlay, and Robertson in the subsequent picture. And so on. Pictures like this strive to show a community. You could probably imagine taking this another step and checking on the field in which the central players obtained their training. That might give us a very good clue as to which evaluation tools are valued in the community.

coauthor-history

In my next entry, I’ll discuss some of the readings, none of which explicitly address this issue, but which is something you may want to keep in mind as you look at examples of evaluation and commentary on evaluation itself.