Article
Article by Joyojeet Pal is in Canvas > resources > Pal2017.pdf
\(\langle\) pause for Joyojeet Pal video \(\rangle\)
2024-02-08
Week FOUR
From my perspective, Fitts’s Law brings attention to the profound impact that the size and placement of interactive elements can have on our efficiency and comfort in digital interactions. The law’s simplicity belies its practical importance – larger and closer targets make tasks quicker and more accurate. Embracing Fitts’s Law is, in my opinion, can be a key aspect of creating user interfaces that feel intuitive and responsive.
As someone who will be presenting regularly in the near future, it was really important and insightful to learn that it takes 45 seconds for people to formulate an answer to a question that has been posed to them. I will keep this in mind next time when asking questions, so as to not rush the thought process.
I think Norman’s point about how ineffective GDP statistic is and how it lack relevance to individual lives is important. While the economy is one of the foundations of the world, these single numbers cannot provide a consistent picture of reality, or even ensure its truthfulness.
I like the different ideas they propose for measuring the state of social life and human happiness, which are more humane. And I think it’s hard for people to recognize where they fit in, or how they will be affected, either way.
I strongly agree that social media applications like Instagram play a big role in influencing people’s cultural or ethical beliefs. For unopinionated or gullible people, this is particularly dangerous as these applications are designed to display posts/reels that reinforce an idea based on the user’s activity. This is a very concerning issue in my opinion as their exposure to the outside world is completely based on one perspective or thought.
During our conversation about measurement, we delved into how risk management evaluates individuals differently depending on their societal hierarchy status. Being in risk management, this conversation was really interesting, because how certain risk is heavily depended on revenue. I have seen countless times where enough revenue can “bend” or “lower” the risk given, but I have also seen times where no amount of revenue would even out the potential risk a company might endure. I wish we discussed more openly about measurement, would like to hear my peers thoughts on it!
I learned about Hick’s law and the ways that people have tried to quantify more nebulous ideas of cognition such as decision making through formulaic approaches
the importance of simplifying complex information, like climate change, for public understanding without compromising scientific integrity.
inadequacy of GDP as the sole measure of societal progress and well-being, advocating for alternative metrics that encompass a broader range of factors, including well-being, equity, environmental sustainability, and more.
From my notes: “Measurements are facts, but is complete without the story. Stories provide meaning, and are easy to remember. It does not matter how complex something is, if it is meaningful and easy to understand, people will call it simple. Complexity is a fact of the world, and simplicity is a state of the mind.”
This is especially true, when science demands that we make sacrifices for the greater good. The use of R22 as a refrigerant for example was both an ozone depleting gas and a green house gas. The industry switched to R134 as it was not an ozone depleting gas, and had less of an impact on global warming. The switch was actually motivated by profits, as R134 was cheaper.
When it comes to global warming in this age, going green has a cost whether that is monetary or time. I agree with Norman, telling people that crossing the 1.5 degree threshold or 2.0 degree threshold has no meaning to the common person. If we want to make progress towards global warming, we need to make it easy for people to execute, easy to understand, and we need to make sure people understand why it is important. An example was the lady in New York painting where the sea levels would rise.
The definition of affordance was something I paid extra attention to as I tend to mix them up myself. I learnt how much it differs from signifiers and that affordance has a lt to do with capabilities and action of an object. Which to be honest isn’t as complicated as I had initially thought.
Inspired from the Fitts’s Law, how can one pull a users attention to specific important features in the design with varying sizes and colors, without over-shadowing the others to optimize the experience?
Regarding to ethical questions of AI portfolio website, before AI tools appeared, people would look for some good website examples from others, and imitate the layout or structure in some level. For my understanding, AI is now doing it for us, replacing the brain works and making the process easier with better results. So for me the answer is vague, because how we work before AI is not technically that ‘ethical’.
I missed some steps in the Figma prototyping session. Will try again at home referring to YouTube videos
What kind of examples where high-level cognition can trigger low-level emotion as vice vera?
I was curious about how to better communicate big ideas in easy to understand concepts? How do we better bridge that gap between daunting concepts and making change around us that connects with people?
I actually prefer Skeuomorphism vs flat designs, it find the old designs easier to read and recognize when i am quickly glancing at my applications. Comparing the old Google logos to the new ones for example, the new logos require me to pay attention on what i am clicking. With this in mind, why does the industry keep going towards this minimalist design?
How are you using gen AI in user research? How could you? (Both senses!)
If not, how are you using gen AI right now?
Where do you stand on the audio recording debate?
What is the best way to do competitive analysis? What are the pitfalls?
What shorthand symbols do you use in note-taking? What else could you use?
Is there any downside to developing empathy with the user during initial design?
Is our data collection ever free from bias? Is it ever really complete?
\(\langle\) pause for Steven’s design critique \(\rangle\)
Article by Joyojeet Pal is in Canvas > resources > Pal2017.pdf
\(\langle\) pause for Joyojeet Pal video \(\rangle\)
For example, see Olson and Kellogg (2014) for a different list of ways hci professionals and scholars study the interactions of people and computers. The above list is based on Lazar, Feng, and Hochheiser (2017).
Wikipedia tells us that an experiment is a procedure used to support or refute a hypothesis, model, or theory.
For example, suppose we hypothesize that programmers can complete a specific task faster using one IDE (integrated development environment) than in another IDE.
You can imagine an experiment to support or refute this hypothesis.
It is a hallmark of experiments that the experimenter tries to control as many variables as possible. For the IDE experiment, what could you control? Task lighting, machine and peripherals, chair, desk and ambient noice all come to mind as possible confounds.
Experiments focus on dependent and independent variables, also referred to as \(y\) and \(x\), response and treatment, regressand and regressor, predicted and predictor, unknown and known, output and input.
Experiments randomly allocate subjects to treatments.
Cook and Campbell wrote an interesting book called Quasi-Experimentation about the possibilities for observation of nature that would be similar to experimentation except that the researcher can’t control the variables and instead tries to collect as much information as possible so that all factors are accounted for.
Surveys allow you to gather information from many more people than would be practical in other methods. Their main drawback is that the researcher can not react to the survey subject. This raises a lot of problems.
You have to pilot surveys with a small group to gain confidence that the questions have construct validity. In other words, you want to gain confidence that the questions are asking what you think they are asking.
Many validated survey instruments exist, including some in HCI. Researchers try to validate instruments by asking many similar questions and phrasing some positively and others negatively to see how people answer them.
Some researchers ask people to keep a diary while using an application over a period of weeks or months. The key problem here is getting people to maintain a habit of diary entries.
Case studies are in-depth studies of specific instances within specific real-life context, according to Lazar, Feng, and Hochheiser (2017). They may concern a single specific instance or a small set of instances.
Case studies typically have small sample sizes so sample selection is a key challenge.
Observing Sara is a well-known case study conducted by RIT faculty member Kristen Shinohara. She conducted 6 two-hour sessions in which Sara demonstrated her use of assistive technologies for the blind. The generated material included notes, audio recordings, interviewer reactions, and photographs. Analysis included task tables and detailed descriptions. The result of the study was a set of design guidelines.
According to Lazar, Feng, and Hochheiser (2017), the following goals apply to HCI case studies.
Interviewing is an open-ended, exploratory technique. It affords the flexibility to react to what the interviewee says as well as to the interviewe’s unspoken signals.
Interviewing requires practice to develop skill. You are unlikely to be a good interviewer in your first few, perhaps many, interviews.
Analysis of interviews is likewise challenging and time-consuming. One hour of interviewing may lead to ten hours of analysis, according to Lazar, Feng, and Hochheiser (2017).
Both interviews and surveys require the participant to remember something from the past, not an easy task.
Contextual inquiry is an interview-oriented technique that requires the interview to occur in the workplace during work. An important tenet of contextual inquiry is that the interviewer not examine her notes until an interpretation session, which is conducted under strict rules.
In the interpretation session, transcribe interview notes to affinity notes (post-it notes or a digital equivalent) and arrange them on an affinity diagram constructed in a bottom-up manner. Write affinity notes in the first person as the interviewee. After clustering, add labels at different hierarchical levels. The process is described in detail in Holtzblatt, Wendell, and Wood (2005).
Ethnography is a challenging combination of observation, interviews, and participation in a community. It originated with anthropologists and has been adopted in other fields, including sociology and HCI.
The ethnographer immerses herself in the environment of the people being studied. One famous ethnographer in information systems earned a medical doctor degree while studying radiologists intensively over a period of seven years. (HCI researchers rarely have the opportunity to conduct ethnographic research because of time and money limitations but, when they are able to, a remarkably deep understanding of user needs results.)
Online traffic is a rich source of data, for better or worse. Google Flu Trends exemplifies the former. Facebook’s Emotions study of 2014 may exemplify the latter. (Facebook altered the news feeds of half a million people to see if they could alter their moods.)
A/B testing, where visitors are served one of two versions of a web artifact, is the most common method for HCI professionals.
Eye tracking was the most well-known form of biometric study until the advent of smart watches. Other types of physiological data used in HCI research includes electrodermal activity, cardiovascular data, respiration, muscular and skeletal positioning, muscle tension, and brain activity.
I usually call this affinity diagramming but every online source I’ve found uses the terms interchangeably. Jacek calls it WAAD (work activity affinity diagramming).
Example of the relationship between a top level label and those below it
Affinity diagramming or mapping is similar to many other techniques. It’s a summarizing activity, originally developed to support contextual design or personas or scenarios, but eventually adapted to many other uses. The keys, in my view, are
Surveys are the easiest method.
Hence, surveys are the most abused method.
Lazar, Feng, and Hochheiser (2017) says they may be the most appropriate method for measuring attitudes, awareness, intent, feedback on user experiences, characteristics of users, and over-time comparisons.
Surveys may be less useful for precise measurements or standalone measures of usability.
Surveys can be construed as a method, including questions, sampling, reminders, and incentives.
Surveys are cheap and can have large sample sizes.
Surveys are easy for the IRB to approve.
The tools are readily available: paper, email, or survey websites.
Surveys are good for getting shallow data.
Surveys are hard to modify once sent out.
Surveys are subject to all kinds of biases: recall bias, bias about self-image.
Some cultures are predominantly oral rather than written.
Surveys are targeted at specific groups. How do you find the group?
A census tries to select everyone to participate but random sampling is more frequent because of the extraordinary expense of a census.
A sample frame refers to the target population from which a sample is drawn. The sample may be of institutions such as libraries rather than individuals.
Stratification occurs when you divide the population into subpopulations known as strata.
An example of strata would be classes (freshman, sophomore, etc.) among students. Suppose you wanted equal representation from each class.
Another example given in Lazar, Feng, and Hochheiser (2017) is long-distance and local moves.
How large should the sample be? Lazar, Feng, and Hochheiser (2017) dodges this question then says 384 gives a ninety-five percent confidence level with a plus or minus five percent margin of error. Then says you should read another book.
Sampling error occurs when too small a fraction of those surveyed actually respond.
Coverage error occurs when members of the target group have an unequal chance of being selected for survey.
Measurement error occurs when questions are poorly worded or biased.
Nonresponse error occurs when the respondents differ systematically from the sampling frame.
HCI usually does not engage in probabilistic sampling except at the smallest level.
HCI usually uses techniques like self-selected surveys or snowballing (respondents recruit other respondents).
Nonprobabilistic sampling is unacceptable to some disciplines but common in HCI.
HCI researchers often use surveys in conjunction with other research methods.
The more you ask for, the more respondents you lose. I like to place demographic data near the end of the survey.
In the USA, the zip code is the most valuable piece of demographic data because household income averages are available via census data.
Lazar, Feng, and Hochheiser (2017) suggests that you get age, gender, education, job responsibility, and computer usage.
This occurs when the response is large in proportion to the target population size.
Also known as intercept sampling, you can survey on, for instance, every hundredth page load.
This occurs when a link on a page invites every visitor to complete a survey. This can be considered invited or self-selected.
A self-selected survey may make the best starting point for investigating new user populations or new phenomena of usage. Social media may provide the best contact point for a self-selected survey.
Lazar, Feng, and Hochheiser (2017) asserts that many groups, such as sufferers of certain cognitive impairments, are completely unstudied in an HCI context. The authors have studied people with Down Syndrome and seem to think there are opportunities for similar studies.
Snowballing may work well for uninvestigated populations if they have community ties.
A set of questions is often called an instrument and one that has been tested using factor analysis is often called a validated instrument. This means that the survey has been tested to see if the questions are asking what you think they are asking.
One way to think of individual questions is whether they are open ended or not.
Closed ended questions may have ordinal or nominal responses.
Open ended questions should ask about specific constructs: instead of asking whether you like software, ask about the usability and usefulness of the software. Ask about the barriers faced.
Likert scales are the most typical ordinal closed-ended questions. One nice thing about computers is that users can’t mark the space between two numbers. Bear in mind that the middle option may be interpreted in two different ways. It is usual to phrase some Likert-type questions in reverse to see if people are just randomly using one column of answers. You may want to discard surveys showing strong evidence of random answering.
Nominal questions may elicit radio-button answers or choose as many as apply.
Beware of a question that asks two questions at once.
Negative wording often may not cause confusion.
Biased wording leads to biased responses.
Politically charged words lead to biased responses.
Abbreviations cause confusion and distraction.
Group similar questions together to take advantage of spreading activation.
Place demographic and sensitive questions near the end.
I personally like to end surveys with “Please share any additional thoughts at this time.”
Contingent questions need to be marked, either by indentation or arrows on a paper survey, or by a link on a web-based survey.
Some validated instruments are mentioned in Lazar, Feng, and Hochheiser (2017): CSUQ, ICTQ, PUTQ, QUIS, SUMI, WAMMI.
The website garyperlman.com/quest collects a bunch of these and others along with a perl script to administer them. Personally I would manually type the questions into Qualtrics and use that. There is another survey site popular with hci students (name escapes me).
Lazar, Feng, and Hochheiser (2017) urge you to consider paper surveys for underserved populations. I am skeptical. The only reason I can think of is to reach people at certain postal addresses or people you can meet in certain locations, such as homeless people congregating at a shelter.
An informed consent form may be needed as part of a paper-based survey but informed consent for online studies is alleged by Lazar, Feng, and Hochheiser (2017) to be controversial.
Lazar, Feng, and Hochheiser (2017) adopts a three stage model from Dillman (2000) for pretesting or piloting
Pilot study may include tests of internal reliability such as asking the same question multiple times in different ways. A popular statistic for this kind of check is Cronbach’s alpha. A better statistic, according to the R documentation for alpha, is omega, also found in the psych package for R.
Pilot study may include a factor analysiss to try to reduce the number of questions by keeping only those in a group of similar questions with high factor loadings.
Prizes are often used as incentives to increase response because people typically overestimate the likelihood they will win the prize.
The instrument itself can include statements attesting to the value to the researchers and perhaps society at large of the respondent’s participation.
Dillman (2000) suggests five contacts: Precontact, the actual instrument, a thank you reminder, a replacement survey to non-respondents, and a final contact using a different mode.
Descriptive statistics can be generated for all closed-ended questions as well as for some open-ended questions.
Most open-ended question are amenable to content analysis. Content analysis is typically more used by academic researchers than in industry, but it is becoming more popular so I predict that methods like those of Braun and Clarke will gain popularity in industrial research in coming years, especially for the development of sophisticated software, such as medical software. Large language models like ChatGPT can’t easily replace content analysis by humans (yet) due to inconsistent language used by humans in discussing their work and play.
Readings last week include Johnson (2020): Ch 7–9, Norman (2013): Ch 2, 4
Readings this week include Hartson and Pyla (2019): Ch 7, 8
END
This slideshow was produced using quarto
Fonts are League Gothic and Lato