AI and Leadership
PAS influences your Trust in Technology
Disturbing stories of humans over-relying on technology with fatal consequences are not isolated cases, as it turns out. There is even a term for this in the Death Valley National Park: Death by GPS. Park rangers have to witness death by GPS rather frequently. The GPS gives weird directions (often technically correct, e.g., the shortest path goes over a mountaintop or through a river), and people follow the directions unquestioningly, get lost and die in the human-unfriendly conditions of the national park.
Over-reliance on technology is a faulty human-automation interaction. It has also appeared in other areas where automation is supposed to increase safety and performance, e.g., aviation, health care or supervising nuclear powerplants (Merrit et al., 2015). On the other side of the faulty human-automation interaction spectrum is under-reliance. It is the decision to not rely on e.g. on an automation aid _in favor of manual human control. Both expressions of reliance are the result of a miscalibrated level of trust, and both have led to fatal incidents. Therefore, the concept of trust in automation has been the focus of substantial research over the past decades (Hoff & Bashir, 2015, p.3).
As explained in the article AI & Trust – Stop asking how to increase trust in AI, trust is a complex and dynamic variable, and it is likely to mediate reliance on automation (including other technologies). The level of trust influences how operators rely on and use technology. What factors influence the individual level of trust? Several studies investigated individual and cultural differences in trust behavior and attitudes. Individual differences can be personality traits, such as extraversion or openness to new experiences.
Furthermore, propensity to trust machines, which can be best described as a general tendency to trust a machine, has received much attention in research (Chien et al., 2014; Merrit & Ilgen, 2008). One emerging construct, which is neither personality nor a cultural factor and which may shed some more light on how to avoid trust calibration errors is the PAS, the Perfect Automation Schema (Merrit et al., 2015; Lyon & Guznov, 2018). The PAS describes cognitive beliefs about the performance of a system. It has gained attention more recently, and its theoretical foundations and relationship to trust will be the focus of this present article.
To contextualize: The majority of the current state of knowledge of the human-machine relationship of trust draws on the research context of automation. It is likely that these hypotheses also apply to trust in artificial intelligence, but empirical evidence is scarce. For this reason, this paper will only talk about automation. In many research papers, the term automation has been used to describe a technology that can search and analyze information, make recommendations or make decisions, although the degree of human control varies (based on Parasuraman et al., 2000 in Lyons & Guznov, 2018). How and to what extent this coincides with the unclearly defined term and fields of application of artificial intelligence is the subject of future research.
Various notions of PAS as an antecedent of trust have been mentioned and explored within different domains, especially in Human-Robot Interaction (Hancock et al., 2011) and Human-Automation Interaction (Hoff & Bashir, 2015). Recently, Merrit et al. (2015) have developed and tested a self-report measure assessing the two main factors. Before defining PAS, a short overview of interpersonal trust (human to human) and how it translates to trust in automation will be given (for more in-depth discussions, see previous reports on trust/trust in AI in our archive).
A vast amount of literature from different disciplines has formed the conversation around trust. For example, trust as a stable property (like a characteristic, e.g., that one generally trusts people easily) or trust as a dynamic process, differentiating between a trust formation phase or how to repair trust after a breach has occurred. The brief explanation below touches only on those factors that best help to understand how interpersonal trust can be, in part, translated into trust in automation.
Many have described human to human trust as some sort of adhesive (Hoff & Bashir, 2015) or social glue that holds humanity together. Through trust, humans can collaborate with other humans to increase productivity, efficiency, or reach other personal goals, like building families, working on a school project together, or create companies. Trust in these scenarios is seen as an attitude of one human willing to be vulnerable towards another human with regards to a specific topic.
What all these scenarios have in common is that there is an uncertainty factor and a target X. In varying degrees, something is unknown about the trustee: Will, my partner, be loyal? Will, my colleague, deliver on time, so this school project or company succeeds? In these scenarios, trust resembles a coping mechanism to deal with the unknown. Trust helps us to move on shaky ground and not to go crazy with fear. The most important terms that characterize interpersonal trust as an attitude are:
Trust is an attitude that an agent will help achieve an individual’s goal in a situation characterized by uncertainty (Lee & See, p. 54, 2004)
How does a trustor decide to be vulnerable, risk failure, and to go ahead? According to Mayer et al. (1995), this decision is based on three cues, mostly observational cues, interpreted by the trustor (with many intricacies to be aware of):
In short, to trust another person, humans make judgments of the trustworthiness (a set of properties) of the respective person, based on various information and experiences from the past and present moment.
Humans also judge the trustworthiness of a machine/automation, on properties that are quite similar to the properties of interpersonal trust mentioned above. It is likely that these hypotheses also count for trust in artificial intelligence, yet, empirical evidence is scarce. An ample body of research has used the term automation for technology capable of seeking and analyzing information, recommending or making decisions, with varying degrees of human oversight. Human-automation trust depends on (Lee & See, 2004):
Studies suggest that human-automation trust shares some important features with human to human trust but differ in various details (see Lee & See, 2004 and Hoff & Bashir, 2015, for a comprehensive overview of differences between interpersonal trust and human-automation trust). Among many differences, one difference relevant to the present article lies in the initial trust formation process, where not much knowledge or experience is available. According to Hoff & Bashir (2015), people tend to be somewhat cautious with their openness to strangers. In contrast, researchers have found a kind of positivity bias towards machines, which is mainly based on performance. Some people tend to believe that machines work 100% correctly (Lyons & Guznov, 2018).
In our ongoing A-IQ research project, we have also observed that some people tend to overestimate the capabilities of digital assistants, such as Siri or Alexa. We hypothesize that these exaggerated expectations may help explain the trust-usage relationship, which we aim to explore. High expectations combined with the all-or-none belief are the two pillars of PAS, which will be explained below based on the works of Merrit et al. (2015), Lyons & Guznov (2018)), and Dzindolet et al. (2003).
Understanding […] PAS and associated beliefs is particularly important because it is thought to be one of the key constructs differentiating trust in automation from trust among humans (Madhavan & Wiegmann, 2007 in Merrit et al., p. 740, 2015).
The Perfect Automation Schema can be best understood as a cognitive schema focusing on the performance (one of the trustworthiness properties mentioned above) of an automated system. Schemas are cognitive structures that help organize and interpret information and help classify various stimuli and how something might behave in the future. Trying to predict future actions is essential when humans are deciding on their willingness to be vulnerable to others. In other words, these expectations may affect how one relies on another human or machine (Merrit et al., 2015; Lyons & Guznov, 2018; and Dzindolet et al., 2003).
The PAS encompasses two factors: High performance expectations and all-or-none beliefs.
The high performance expectations dimension is a mindset that automated systems are nearly perfect. People with high performance expectations of a system are less likely to question such systems, which may lead to overtrusting and, thus, over-reliance, which can end catastrophically. Lyons & Guznov (2018) further argue that this close to perfect belief will further induce automation bias, a tendency to favor automated decision-systems.
Because machines cannot get tired or stressed, attributions for errors are rather stable in contrast to human relationships. Humans tend to be more forgiving towards other humans. This reluctance to forgive a machine is assumed to be closely related to the second pillar of the PAS: All-or-none thinking with regards to automation performance. It means that people tend to think that automation either works perfectly or not at all. This all-or-none thinking could be one of the critical elements to distinguish trust in automation and trust in humans (Dzindolet et al., 2003).
The PAS is theorized as a model that includes both high expectations and all-or-none thinking. However, empirical evidence is not consistent, and these two dimensions may be distinct. More research is necessary in this space, especially given the frequency notions of PAS have been mentioned in trust in automation literature (Merrit et al., 2015). The data situation is also not entirely clear concerning trust. PAS seems to play a significant role when examining the effects on trust prior and after experiencing an error, and it may vary according to the different types of automation (types of automation see Parasuraman & Riley, 1997 as cited in Lyons & Guznov, 2018).
As the PAS is to be understood as a concept to interpret information, it is of specific relevance when the machine performance violates these expectations. Several psychological processes then arise to overcome this cognitive dissonance and find an explanation for why the machine did not do what it was supposed to do (the notion of transparency comes into play). Faulty human-automation interactions have (at least) two consequences: Disuse and Misuse. Let us look at an example: If I have a strong PAS:
Studies suggest that (1) all-or-none thinking is related to trust after experiencing an error and (2) high expectations are related to trust before experiencing an error (Lyons & Guznov, 2018; Merrit et al., 2015). Understanding how these two concepts or the PAS as a whole works should shed more light on the calibration of trust and, thus, more understanding of how users rely on automation. Next to broadening theoretical knowledge, the goal is also to understand how the correct usage/reliance can be fostered, for example, through user design or adapted training interventions.
Merrit et al. (2015) were the first to operationalize PAS and developed a self-report measure assessing the two proposed factors. To measure PAS, the researchers came up with various creative experimental settings. How can researchers and practitioners design such an experiment?
A research setting is often the combination of a task done by an automated aid and a participant. Various questionnaires have to be filled out before, during, or after the task. In the study by Merrit et al. (2015), participants needed to complete a 30 trial X-Ray screening task, where the X-Rays were slides, like those screens at the airport security (to screen hand luggage). The participants’ task was to select whether the slide contained a weapon or not. An aid would give recommendations as well. There are many different settings. For example, experimenters manipulate reliability, like aid functions 100, 80, or 40 percent correctly. Another variation can be whether participants get training or performance feedback. The performance of the automation aid can also be ambiguous. Errors can be subtle, but also very obvious: For example, the automation aid classifies the image as safe, but there was a massive (highly visible) machine gun in the image.
In a study by Lyons & Guznov (2018), participants engaged in a video-game-like task. They had to watch a video of an unmanned ground vehicle, which displayed its route, where the viewer could see houses, people, and more. The automated aid screened the people in the video for their potential to be insurgents. The participant then must agree or disagree with the aid’s decision as the vehicle moved closer to the dangerous-looking person. The setting can also be manipulated, e.g., by varying performance reliability, the number of insurgents, or training.
Many studies use a higher-level type of automation giving a recommendation, or performing an action (information acquisition and analysis as lower-level automation, a taxonomy proposed by Parasuraman et al. 2000 as cited in Lyons & Guznov 2018).
The final Perfect Automation Schema Scale by Merrit et al. (2015) included four items for the High Expectations dimension and three items for the all-or-none belief on a 5-point Likert scale from strongly disagree to strongly agree. So all in all, participants of a study have to answer seven questions.
To measure if participants have a strong PAS, they would have high results (strongly agree) for questions like, Automated systems have 100% perfect performance (high expectations dimension) or If an automated system makes a mistake, then it is completely useless (all-or-none belief).
The results of the questionnaire will then be correlated with a measurement of trust and reliance behavior (with regards to the outcome of the task they had to deliver). For example, Merrit et al. found that all-or-none thinking was negatively related to trust after encountering an error (or many errors), which was along the theoretical hypothesis: Trust decreases after experiencing an error and reliance as well. Contrary to their expectations, the level of trust does not correlate with the belief that machines work perfectly before the occurrence of an error. This hypothesis, however, was tested by Lyons & Guznov (2018), who found a positive correlation between high expectations and trust. In other words, participants with high expectations had a significantly higher level of trust before experiencing an error. The inconsistency of the results calls for further research using the PAS scale while controlling for other related constructs.
It is essential to understand that PAS is conceptually different from automation bias and propensity to trust, and at the same time, shares similar empirical patterns. The extent to which the PAS is responsible for the variance of trust in contrast to other constructs still needs to be investigated. A recommended approach is to control for critical related constructs and attitudes. It may also be worthwhile to investigate the role of anthropomorphism, which ultimately questions the legitimacy of the concept of trust in human-machine interaction (see the work of Prof. Joanna Bryson).
In the context of Human-Automation-Trust research, the following variables are important:
Further, it would be interesting to see if PAS and trust/reliance behaviors alter in group settings. How do trust/reliance behaviors of others from peers to supervisors, experts, or novices, influence one person? Is there such a thing as trust group think, for example? Situations often focus on one human to one automation/robot setting. Group settings are quite rare but presumably more realistic (Lyong & Guznov, 2018).
Apart from the fact that this list is not comprehensive, one thought might also be very relevant in the experimental setting. What kind of reward do participants receive for the task? The reward or consequence may affect reliance behavior, as well. For example, in Dzindolet et al.‘s study (2003), students who had to decide whether they would rely on their own performance, or rather the performance of an automated aid (a detection task) in order to win coupons for the cafeteria. Lyons & Guznov (2018), chose quite a different scenario in their study: F-16 Air Force Pilots had to rely on the Automatic Ground Collision Avoidance System, an actual automated safety system. Their reliance decision had implications for the life and death of the pilot.
Practical implications so far impact design and training settings. For example, next to transparency design features (why did an automated aid fail?), it should be stated whether this error is stable or transient, when it comes to mitigating all-or-none beliefs. Merrit et al. (2019) propose that individuals with low all-or-none beliefs should receive different training and vice versa, in order to train for the adequate level of trust; They even suggest that recruiters can use the PAS scale for job selection and placement.
The PAS has been tested and validated in the context of automation or automated aids, with real, realistic, and fake settings. Do the findings translate to artificial intelligence? This question is worthwhile to explore. As we believe in the potential of the PAS, we want to use it in our A-IQ research. We want to use the PAS to understand the role of trust better when using or adopting conversational AI, such as Siri or Alexa. The A-IQ project (Artificial Intelligence Quotient) started in 2017 as an indicator of performance from a behavioral perspective (end to end). Performance is said to be a significant influence for trust-building from a user perspective. The hypothesis that trust is crucial for adoption and usage found support in our Titanium Trust Report, where participants (n=111) who use conversational AI have a significantly higher level of trust (unpublished report). We want to understand the relationship between trust and expected performance versus measured performance (A-IQ), and usage. We observed that people tend to overestimate the skills of digital assistants, even those with a strong IT background (high expectation dimension of PAS). Furthermore, many reported that they stopped using voice assistants when they experienced a stupid answer (all-or-none belief of PAS). A calibrated use of conversational AI may not be a life-saving decision, but could still yield essential findings when it comes to appropriate evaluation of artificial intelligence, AI-powered systems or a mix of various technologies, such as the Google Assistant and Google Maps.
The Perfect Automation Schema is a promising model that influences trust. Trust mediates reliance. A faulty trust-automation interaction can have consequences, in varying severity. Consider this: When participants of the experiments from the studies above made an error, it led to not winning a 50-cent voucher for the cafeteria. When human operators in real life made an error, like gaming while driving a semi-automated Tesla or driving up a mountaintop because GPS said so, it led to people dying. There is this idea that automation (including AI) can free us from dull, tedious or even dangerous tasks. It is built to enhance productivity, safety, and efficiency by reducing human error. However, with introducing and implementing automation and AI to factor out human errors, we have introduced new human errors.
Our experts will get in contact with you!
Our experts will get in contact with you!
Further articles available here