2023-10-15
Week NINE
The new soup is the soup of human culture. We need a name for the new replicator, a noun that conveys the idea of a unit of cultural transmission, or a unit of imitation. ‘Mimeme’ comes from a suitable Greek root, but I want a monosyllable that sounds a bit like ‘gene’. I hope my classicist friends will forgive me if I abbreviate mimeme to meme. If it is any consolation, it could alternatively be thought of as being related to ‘memory’, or to the French word même. It should be pronounced to rhyme with ‘cream’.
Designers use an almost folkloric understanding of how people organize information to design information artifacts to work with rather than against people. Some of the borrowings from other disciplines studying the organization of information include the following.
We can group information together under labels or without labels. The latter activity is usually called clustering while the former is often called categorization. If we have labels, the question arises as to where the labels come from and who gets to identify them. Famously, Melvil Dewey reserved many labels in his library classification system for items familiar to him and European men like him, but few labels for items that were familiar to the vast majority of humans.
Card sorting is a common way to elicit labels. You can give a person a set of cards with terms written on them and ask the person to sort the cards into piles of similar terms. Then ask them to name the piles. This is typically called an open card sort, described by Spencer (2009). An alternative might be to provide a set of category cards in addition to the content cards and ask a person to place the content cards adjacent to the appropriate category card. This exemplifies a closed card sort, described by Spencer (2009). There are many variations of card sorts and an extensive literature on using them to label concepts.
After you conduct a card sort, how do you evaluate your results? If you’ve recorded several people sorting the same cards, you can measure interrater reliability, using Cohen’s 𝜅 (pronounced Kappa).
\[𝜅 \equiv \frac{p_o-p_e}{1-p_e} = 1 - \frac{1-p_o}{1-p_e}\]
where \(p_o\) is the proportion of observed agreement between raters (same as accuracy, defined as the number of agreed items divided by the total number of items), and
\[p_e= \frac{1}{N^2}\sum_k n_{k1}n_{k2}\]
for \(k\) categories and \(N\) items and the number of times rater \(i\) predicted category \(k\): \(n_{ki}\).
Cohen’s original 1960 article also defines 𝜅 in terms of frequencies of observed agreement, \(f_o\), and agreement expected by chance, \(f_c\):
\[𝜅 = \frac{f_o-f_c}{N-f_c}\]
Why should you use this measure of interrater reliability? The problem is that people will agree to some extent by chance. You should try to account for chance agreement in a measure of agreement.
Suppose you have two raters and twenty items. Each item can be rated as 0 or 1. You can simulate this easily with random binomial draws as follows.
theta <- 0.5
N <- 20
rater1 <- rbinom(n = N, size = 1, prob = theta)
rater2 <- rbinom(n = N, size = 1, prob = theta)
twentyitems <- cbind(rater1, rater2)
twentyitems
The output of twentyitems
is as follows in one example run. Each run will differ because of the random number generation.
The output of agree()
is as follows.
The tolerance=0
parameter says that you don’t allow similar scores to be interpreted as the same. For example, suppose instead of 0 or 1, the raters could choose any integer from 0 to 100. You might want the difference between 50 and 52 to be interpreted differently than the difference between 10 and 90. You might even say that they agree if their scores are 50 and 52. The tolerance
parameter allows you to tune for this.
Now calculate Cohen’s 𝜅 to adjust for the possibility of chance agreement.
Cohen's Kappa for 2 Raters (Weights: unweighted)
Subjects = 20
Raters = 2
Kappa = 0.0625
z = 0.294
p-value = 0.769
Now suppose you have five piles of cards in a closed card sort. I can simulate this with random numbers uniformly distributed from 1 to 5.
The output of fiveitems
is as follows. Note that your results will differ with the same code because of random number generation.
You can’t use the exact same function as with two raters but there are a couple of other 𝜅 functions for multiple raters.
The output for agree(fiveitems)
is
Percentage agreement (Tolerance=0)
Subjects = 5
Raters = 4
%-agree = 0
The output for kappam.fleiss(fiveitems)
is
Fleiss' Kappa for m Raters
Subjects = 5
Raters = 4
Kappa = -0.0965
z = -0.975
p-value = 0.329
The output for kappam.light(fiveitems)
is
Light's Kappa for m Raters
Subjects = 5
Raters = 4
Kappa = -0.0286
z = NaN
p-value = NaN
Warning message:
In sqrt(varkappa) : NaNs produced
The following concepts: card sort, monitoring navigation, monitoring social networks, and flexibility of information representation, all come together to give us tools to build information containers. Let us briefly review them.
We discussed card sort as a means of understanding the labels people use to describe things of interest. We considered the issue of cognitive dissonance in the labeling of information containers and card sorting as a means to overcome it. We can connect the concepts of information hiding and labeling to see how labeling helps to limit information overload.
All commercial interests have recognized the significance of social networks and have devised ways to exploit social networks to influence navigation. Many navigational features in common use today are the result of specialists in a new field called network science drawing together research in many fields to understand human behavior and influence in networks. They use terms like betweenness centrality and network closeness. Major figures in the field include M.E.J. Newman, Stanley Wasserman, Albert-László Barabási, Duncan Watts, and Lada Adamic.
We have extensively discussed how and whether information is structured, using as a principle the degree to which human intervention is required to process information. We have discussed hierarchical and relational ways of organizing and storing information.
We have touched on the notion that information structures are more or less amenable to change. Brittle structures may be symptomatic of technical shortcomings or may be symptoms of authoritarian governance. We discussed whether the speed with which we can modify an information artifact matters in a given context. We saw that we may put together an information artifact with little planning if we expect to take advantage of user behavior to improve it but that, if we can can not or will not change an information artifact after publishing it, we can not realize the value of understanding navigational behavior.
We discussed several elements that information architecture authors have referred to as information design patterns. To determine whether these elements deserve the label of design pattern, we must examine the coinage and past use of the term.
The term design pattern is popularly used in many ways. Popular usage leads to an abbreviation of the original usage that may lose some of the original essence. Following are a few popular borderline uses of the term that barely work.
Design patterns originated as a architectural concept by Alexander (1977). Alexander examined architecture from the standpoint of its value to a community of people in daily life. Alexander’s ideas were largely ignored or rejected by architects but soon gained a cult following among computer scientists. Eventually his books became so popular outside architecture that they began to influence architecture.
Alexander describes it as a structured method of describing good design practices within a field of expertise. Interview The term was coined by Christopher Alexander and popularized by his book A Pattern Language. This book was followed by another book intended to explain the first book. Alexander has continued to try to explain the concept to this day.
The Syntax describes where the solution fits into the larger design. The Grammar describes how the solution solves the problem. For example, “Balconies and porches which are less than 6 feet deep are hardly ever used.”
The problem is that the process of waiting has inherent conflicts in it. The solution: In places where people end up waiting (for a bus, for an appointment, for a plane), create a situation which makes the waiting positive.
The problem is that cooking is uncomfortable if the kitchen counter is too short and also if it is too long. Solution: To strike the balance between the kitchen which is too small, and the kitchen which is too spread out, place the stove, sink, and food storage and counter in such a way that:
The Gang of Four (commonly abbreviated GoF) were among computer scientists seeking a basis to make code less arcane, more scientific and, above all, reusable.
One aspect of Alexander’s description was so general that it seemed applicable to any field in which design plays a role. This key aspect was the notion of a quality that could not be named but that could be understood through experience—the quality shared by successful designs. Specific and non-obvious combinations of characteristics could support this quality.
Gamma et al. (1994) exploded on the software scene and propelled Alexander to greater fame at the same time as solidifying Object Orientation’s place in mainstream software development.
The GoF argue that great writers use patterns, e.g., all of Shakespeare’s plays were based on earlier, less successful plays or stories. The GoF refer to tragically flawed hero or boy-meets-girl, boy-loses-girl as patterns with infinite variety The GoF book serves two purposes, to tell what patterns are and to catalog 23 well-known patterns.
A design pattern is a description of communicating objects and classes customized to solve a general design problem in a particular context. (from the introduction to Design Patterns, 1994)
The pattern name must be good enough to become part of the design vocabulary. The pattern must be useful in conversation, documentation, and thinking. The GoF spent a lot of its time on the names of the 23 patterns in the catalog.
The first kind includes basic design problems such as algorithm design. Another kind includes commonly occurring classes or object structures known to be problematic. A third kind includes lists of conditions that, if they occur together, create a generic problem.
It is not a solution in a packaged sense. A solution is abstract, not implementation specific. A solution is a description of the elements of the solution (objects and classes). The description must identify
The application of a pattern may resolve conflicts of various kinds, most often conflicts of space and time. To contemplate the use of a design pattern is to evaluate the design decision with awareness of the consequences. Consequences may have implementation issues, unlike the solution. If you feel tempted to talk about implementation, do so under the consequences banner instead of under the solution banner. Keep the solution a description, not an evaluation of itself.
A pioneer of HCI, Gary Olson, once told me that his main interdisciplinary frustration was that practitioners in other fields often wanted him to sprinkle magic fairy dust on their products. As HCI practitioners, you must avoid the complementary trap: don’t mistake the work of Alexander and the GoF as magic fairy dust that can be sprinkled on your information architecture.
END
This slideshow was produced using quarto
Fonts are League Gothic and Lato
Comments on output
When I run these, they give slightly different results. I’m not sure how much that matters but my guess is not much. You should report which function you used. One nice thing about Fleiss’s 𝜅 is that it allows missing values. That is to say that, if you have some piles that differ from others, you could leave part of a row blank if only some raters have a score for it.
Light’s 𝜅 is briefly described in Hallgren (2012). That tutorial gives R code for several versions of 𝜅, including Cohen’s weighted 𝜅, described above as
kappa2()
, and both Fleiss’s and Light’s 𝜅.