Stanford Seminar - The State of Design Knowledge in Human-AI Interaction

Categories
Stanford Online
Stanford Seminar - The State of Design Knowledge in Human-AI Interaction

Stanford Seminar - The State of Design Knowledge in Human-AI Interaction

I lead the intelligent interactive systems group at harvard i guess if i was starting today i might call it human ai interaction group but this was 2009 i was very interested in what happens when people human intelligence and machine intelligence meet and that was the best phrase that came to my mind that time before i dive into the content i really want to you know acknowledge that i do this work with a number of very fantastic collaborators the newest one jana is actually a graduate of stanford so you guys are influencing me to set the stage um. I think human ai interaction as a field and i will use intelligent interactive systems and human ai interaction interchangeably is a is a very interesting extension of the classical human computer interaction like we strive to create systems that are that are useful predictable and give people a meaningful sense of control just like classical hci folks but we face very interesting challenges because the material that we work with the machine intelligence is occasionally wrong. So how do you how do you build systems from which people derive real value they are they are complex. So how do you give people a sense of predictability and and knowing what the machine is going to do and often we really want the intelligent systems to act proactively. And yet you know we want to give people meaning meaningful sense of control. So these are fun challenges that go beyond prior knowledge in human computer interaction what's important is a bit of history this is taken from from the interactions magazine from 1998 on 1997 this is debates that occurred in 1997 between ben schneiderman whom i hope you guys know about and patias who was one of the leading researchers on intelligent agents so proactive machine learning systems that do things on behalf of the users and they battled notice that the title of the conversation is machine direct manipulation versus interface agents it was the it the presumption was that the two could not coexist ben argued that you know people need a strong sense of control and that intelligence inherently and inevitably will confuse people party was arguing but look at the benefits that we can deliver and because you know the two communities spoke in somewhat different language and had different objectives if you went to conferences at the iui conference the intelligent user interfaces conference you would have a lot of people with ai backgrounds building interactive systems and showing off the capabilities of those systems but not necessarily worrying about usability and measuring whether people's satisfaction or performance. And then you had you know conversations happening at the hci conferences where people presumed without again empirically testing it that machine intelligence was inevitably too complex and unpredictable to result in useful systems so for a very long time there was too little productive the conversation really started around 2000.

So the first two first point that i want to make is that we are facing a novel and very interesting and difficult challenges and second is that the field is relatively young and despite our youth and the challenges that we faced as a community we have produced a bunch of useful knowledge and i'l give you one example of it. But also something very interesting has happened you know several years ago the newest resurgence of ai and suddenly ai is being sprinkled everywhere in lots and lots of interactive systems we are moving incredibly fast putting ai it in places where it never belonged and using it in ways that we've never used it before so we. We have to build stuff and along the way because our knowledge that exists in the field is fragmentary and incomplete we are often turning assumptions well you know very reasonable assum reasonable s sounding assumptions into the fundamental tenants of our field and we build on them and we never have time to pause and verify are these assumptions actually sound so you know initially i was going to say we need to slow down. But i realized that one never says that in the heart of silicon valley so instead i'm saying we need to accelerate the production of of design knowledge to catch up with with reality so this is this is kind of the high level vision for my talk let me start with one kind of one area of human ai interaction where we did produce some useful knowledge that actually got incorporated into real systems and that's the area of adaptive user interfaces so back when when hci was young and computing was young the systems that we produced were simple and we figured out how to build direct manipulation systems that people could really comprehend very quickly explore and know what to do with them and then those systems started getting more and more and more and more complex and suddenly things started breaking it was becoming a little bit harder to use them well professor jna mcgrery characterized it pretty well and demonstrated that most people used only a tiny fraction maybe up to 20% of the features available in any piece of software the challenge was that different people used different small fractions of the available features so it wasn't we couldn't just simplify the software because all of the features got used by someone but any one person would have been benefited from a simpler design than what whatever was shipped an interesting design got tried commercially by microsoft this is this is the smart menus and notice what happens the smart menus can expand so the default version of the smart menus shows you only a fraction of the features available in the system and the fraction that is showed to you includes things that everybody needs like copy paste edit and also things that you've recently used so if you've used some features they become immediately visible but some things that are that are deemed as rarely used and things that you've never touched get hidden and you don't get to see them until you ask the menu to expand so the menu expands if you press on this on this button at the bottom or if you do a lot of funning across the menus then it expands this feature was accepted by roughly half of the users and i think john may have like really good information possibly but roughly half of the users found it challenging in various ways and eventually this feature this design got abandoned even though the field was fairly young a number of people in including myself started to attempted to systematically explore the design space of adaptive user interfaces trying to understand what might be the strategies through which we can build adaptive user interfaces that make people productive that make people happy that give people meaningful sense of control and there are a number of things that as a community we have learned so one was that unlike in traditional hci when you know create simplifications you really reduce the work and load on people when you use machine intelligence to make some aspects of interaction simpler you often add some new some new cost you often require people to pay just a little bit more attention than they did before you require people to think just a little bit more so we've learned that what we are doing is not just pure savings but a bit of a trade-off and we really need to pay attention to the main work like amount of clicking but also how much attention cognition and visual search people need to perform to take advantage of it we also learned another thing and that took us a while we discovered the the loss aversion cognitive bias so we've realized that when built interactive intelligent systems people's perceived cost of ai's mistakes as huge but they give ai relatively little credit for the for the things that the that the ai does well. So this means that we really have to design these systems not such that there is a net benefit but that there is a huge benefit of the ai with respect to compared to the possible costs that people are going to incur and we have to assume that they're going to be costs because still ai are less than 100% correct and very nicely what came out from all of this activity both commercial.

And. Uh and research is the is a design pattern of split user interfaces so the idea is that if ai is being used to make some existing interaction more efficient the a robust design is not to replace the existing way of doing things but provide an alternative so people who want to turn off their brains and just go on autopilot can do things the way they always did or they can they can use the alternative design so here are some designs in commercial software so frequently or recently used fonts get copied to the top of the font menu so that you can access them if you pay if you're paying attention or you can scroll all the way down to to times new roman predictive text gives you an option to select a word completion or a word prediction but you can just keep typing without paying attention to it on your ipad the operating system is suggesting a software that you that you are likely to use that is not part of your taskar based on context and your prior usage. But you can still access it via regular channels so this is an example of an area in human ai interaction where we have produced useful knowledge that knowledge has survived the test of time it is being used commercially and in research it's great the rest of the talk will be about places where we thought we knew but it turned out that we didn't and the first example is predictive text. So when predictive text got developed we assumed that we're building a smart system that will make people more efficient at inputting text without having any impact on what people write after all people are in control it turns out that this assumption was not quite correct here is one experiment that we conducted we asked people to write captions for images and we created our own predictive text systems that we that we controlled completely and before i show you the results let me define the word predictable. We define the word to be predictable if the if the language model that we used in this study predicted this word to be to be the next word the person will input and in this system we did not predict word completions we predicted the next word to enter so just entire word predictions what we found unsurprisingly is that when people used used the predictive keyboard the text that they produced the word choices were more predictable so they were more likely to use words aligned with the predictions of the of the underlying language model than if they if they did not see the predictions and the mechanism that we saw frequently was that people would substitute the word that they meant to write with one that was easily available so for example a person might be thinking you know a train is approaching the station but in but because the system proposes the word on they choose the word on instead more surprisingly it turns out that people who were given predictive text keyboard wrote shorter captions than people who did not and this was unexpected because predictive text is supposed to make it easier and faster to enter text so we thought people would write more instead what happened was that we realized that the predictive text very rarely suggested adjectives adverbs or other embellishments so for example if a person wanted to write an old trainers approaching a quiet outdoor station the outdoor word would not be predicted but this subsequent word station would be so people often skipped adjectives ad adverbs and other embellishments resulting in simpler and shorter captions.

So this study demonstrated that there is an impact on content of what people write from this design this is already you know possibly concerning because you know it you know makes makes all text you written with this technology more alike. But we also wondered what would happen if the underlying model had a substantial bias so you know we both looked at how a biased corpus impacts impacts the algorithm and then also we looked at what is the impact on on on what people produce with this the study that we designed was one in which we asked people to come in and list four recent restaurant experiences this was before co people were still going to restaurants and we asked them specifically to try to think of two positive experiences and two negative experience and we as them to write down those restaurants and we asked them to assign star rating to each of these experiences only after they've done that we randomize these these experiences to two conditions so we trained two language models both on yelp data set one one of the models we trained predominantly on positive reviews and one we trained predominantly on negative reviews and then we randomized people's restaurant experiences such that two of them one positive and one negative would be written with the of a predictive text trained on positive reviews and the other one based on negative reviews so people then wrote those reviews but remember that they were already committed to to the star rating and then we asked external raters to read all of these reviews and say just without seeing the star ratings say just given the words how positive or how negative were these reviews and we found a substantial effect of of the language model on the content of the reviews that people wrote so people were already committed to what they were going to write they wrote it and the suggestions given by the language model substantially affected how the text that they wrote was perceived so just quick question what's the y axis measuring number of positive words it. It was a i can't exactly remember what measure we use. But we asked raters to just rate the so this is subjective human rating of the reviews so we as a community we build predictive text systems to make people faster without affecting what they write it turns out that we're incorrect the predictive systems do change what people write if the system is biased that bias gets reflected in what people do and we had another paper which i didn't mention that shows that the effects of the that the impact of the suggestions is greater if we if we predict if we suggest phrases rather than words and at that time you know.

Good. Wow. Okay 8 years ago that this was still science fiction. But now it's actually reality. Right we suggest very large chunks of text to people. Okay next area more contemporary ai assisted decision making so here you know roughly in 1999 the sorry 2019. Uh the we started really talking about explainable ai. And we thought all right we are going to assist people in the in the decisions that they make by giving them decision recommendations and explain why these decisions were recommended to them and we assumed that this is going to lead to good decisions just to you know set have some you know concrete example in mind. I'm specifically thinking about what might happen in a in a clinician's office when a clinician has to come up with a treatment for a patient so a clinician you know enters the decision situation that the system predicts you know these are the best treatments for this particular patient and for this particular reason so when the work on ai assist assisted decision- making started the we all believed that what is going to happen is that you know people and ai will have different accuracies and when you combine the two you will inevitably get better out outcome than from either of the components alone and anyone who has learned about emble classifiers or emble techniques in machine learning knows that if you combine two systems that make independent errors and you know and do something sensible with those outputs you get an emble that performs better than any of the components like it was inevitable it had to happen except that it didn't people com assisted by ais actually that poorer than the ais often better than they would have done on their own but poorer than the machine learning systems on their own so you know it took us a while to figure out that this was actually happening because for a long time we weren't testing people's decision making with the ai assistants instead we were testing people's predictions of how the models would behave and it turns out that this wasn't a particularly good measure of progress but what once we started measuring how people actually make decisions with these systems we noticed this and we realized that what's happening is that people appear to over rely on the ai so notice that when ai is incorrect people often make poorer decisions than they would have on their own because they were swayed by the ai in the in the wrong correction so the issue was not to increase trust in ai as we believed for a while but to calibrate the trust potential and you know in one of our own studies we looked at how clinicians make treatment decisions and we indeed found an evidence of over reliance and we found that explanations did not improve the situation and this idea that explanations were actually not improving the situation was spotted by a number of people one of the first explorations of this idea came from gagan bansal and in his paper he noticed that when explanations were added people often did not engage with the content of the explanations but instead took the meia presence of the explanations as an indication that the system was competent so they would over rely on the system without engaging with what the system is actually saying there is another very recent paper that also suggests that if that when people try to decide whether information that they are receiving from someone is credible and they do not know the field particularly well what they pay attention to is the number of concrete facts or things that like concrete facts that is being communicated and if you see a lot of concrete facts you believe that the person is comp competent and notice that the way we design ai explanations is often by producing a very large number of concrete facts so that the way we are designing the explanations really encouraged people to perceive ais as competent even without people having to engage with what the system is saying just the superficial features led to that impression so at that point our mental model was that you know indeed you know the explanations may have some impact on how people make their decisions but there is this element of c human cognitive engagement that moderates this effect and that in the current design people actually are not engaging with the content of what the ai is saying so we dug into the literature from medical decision making and one of the idea that we ideas that we found is the idea of cognitive forcing intervening at the moment of decision making to interrupt euristic and superficial decision making and encourage people to engage more deeply with what's going on so we already believe that people engage superficially with the ai generated content so we thought all right if we interrupted this superficial engagement and got people to engage more deeply would we see a decrease in in human overreliance on the ai so the way we implemented cognitive forcing we actually tried several different techniques but one that is that is particularly effective and that many other people have used is one where people are asked to make their decision on their own first. And they record that decision and only then they are shown the prediction and explanation from the system. And then they are asked whether they want to revise their decision so they've already thought about it made the decision they're confronted with potentially different recommendation from the system. And then they have a chance to combine the two pieces of knowledge what we what we found was that indeed and notice that the results that i'l show here are only for situations when ai suggestions are incorrect so we found that people who who are given the traditional recommendations and explanations over relied on the ai a lot they made they rarely made correct decisions if ai suggested an incorrect option. But if we use this cognitive forcing strategy people over relied on the ai less still over relied on it but over relied less so we felt pretty positive about this result and our updated mental model at that point was that cognitive forcing was indeed leading to greater human cognitive engagement with the content and that we're heading in the right direction then we decided to probe human cognitive engagement in a different way and specifically we use the concept of incidental learning so the idea is that if i interact with if i need advice on a decision- making task and i ask somebody for for help for example dan i asked him how should i search for something super complicated he will tell me and i will do a better job on the task in the moment but presumably i will learn from this interaction in such a way that the next time i doing some performing complicated search i'l do a better job on that as well i will retain something from this interaction this is called incidental learning and this is by some accounts roughly half of the learning that occurs in modern organization happens through some form of informal and incidental learning. So this is big this is very important to our intellectual growth very importantly learning only occurs if there is cognitive engagement. So we thought that we would use the existence the measure of learning of incidental learning as an indicator of whether cognitive engagement occurs in interactions with the ai or. Not so we assume that for simple explainable ai where we just give people decision recommendations and and and explanations given our prior results there would be no cognitive engagement and therefore no learning but we also expected that with cognitive forcing there would be cognitive engagement and therefore we would see evidence of of incidental learning this study that we designed was one where people asked fairly simp were asked to make very simple decisions about nutrition they were asked whether which of two meals had more protein or fat or carbohydrates or fiber in the baseline condition they were just you know asked to answer this question no feedback in the simple explainable ai condition people were actually given decision recommendation and an explanation and in the cognitive forcing condition people would first make their decision indicate that decision then they would get the recommendation. And then they would have a chance to revise their choice before i show you the results let me explain the two baselines so one baseline was when people was when people received absolutely no feedback no assistance and no feedback and from prior work we knew that this provides people with very minimal learning any learning that occurs is just from repeated exposure to same type of questions our high conditions our high baseline is one where people performed this task without assistance but after each of their responses they got expert feedback you were correct or you were incorrect for the following reason and from the prior work we know that people learn from this and that it's hard to beat so this was our low baseline and high baseline and just as expected the simple explainable ai did roughly as poorly as our low baseline there was no evidence of learning with simple explainable ai shockingly to us there was also no evidence of learning in the cognitive forcing setting it caused us to revise our our explanation of what is happening with cognitive forcing at least with the update design it's possible that what's happening is that not people it's not that people engage cognitively more and that's why they avoid over reliance but instead of over relying on the ai they over rely a little bit more on their initial idea so we shifted we reduced over reliance. But we did not really improve cognitive engagement or overall outcomes so so far this is all negative results like it seems that nothing works and that people make really poor decisions and do not think about anything that we give them is there anything that we can do better we found one thing that seems to work a little bit better so if instead of giving people decision recommendations and explanations we only gave them informative explanations that were contrastive and that in included some knowledge that they can build on and all they have to do here is so notice that the explanation says beans are a significant source of carbohydrates so with this explanation people you know can make a fairly small cognitive leap to come up with the right answer and it turns out that making this just small cognitive leap is enough of an engagement for people to substantially learn so we found that with if we only give people informative explanations people do engage people do learn so this is a very positive result for us but in this study we only had situations.

We only we designed it such that a is 100% correct we still do not have particularly good informative results on what happens if you apply this strategy when ai can also be incorrect do you think this is because it's its own kind of forcing function like it's not telling you the answer it's saying here's some information and you need to integrate it. Uh so it's not disrupting things because i'm not sure if i would call it cognitive forcing. But it definitely creates a situation in which there is no shortcut it removes the shortcut rather than distracting you from the shortcut it takes the shortcut away completely so it gives you enough information so that you can you can do better than you would have otherwise so people in this condition made as good decisions as people who were given decision recommendations and they learned so they performed well on the task and they learned so they cognitively engaged so at that point our hope is that you know in more complex situations this kind of design may lead to more thoughtful engagement with ai provided prov ed information and real reduction in over reliance but this work is still ongoing so in conclusion for this subsection. I want to show that you know more and more people are recognizing that this simple explainable ai design where you give people decision recommendation and explanation is insufficient in many situations we have suggested this the the approach where we take away the decision recommendation and we just give people a useful synthesis of relevant information so they can make the cognitive leap on their own many people might have heard of tim miller's recent paper on on evaluative ai where instead you provide people with good reasons you know with informative synthesis of reasons pro and against different options that you can make choices on your own and then there is another recent paper from patim mar's group that instead just gives people questions that focus their attention on the right aspects of the problem so they can you know perhaps not spend time on irrelevant aspects of the problem and arrive at the at the answer themselves and they found they actually found that people prompted with such questions performed better than either people or ais alone so they actually achieved this holy grail of complimentarity but they achieved it on a relatively simple tasks so this decision this design space is just being formulated and has not yet been evaluated so we are still in the process of making design knowledge in the space but a little bit in the spirit of james's talk so some of you have seen james's talk on human centered community centered and society centered ai i focused so far mostly on the very low level details of our design knowledge but let me also slightly higher level questions which is you know are we even right solving the right problems so we conducted a codesign study with with clinicians on you know what role ai should plays play in the treatment selection activities we first designed one system and we then we met with 10 clinicians we both first conducted an a formative interview trying to understand the their needs for ai in clinical work and also got their reactions to this first design then we redesigned it based on their feedback. And then we had another round of of conversations with clinicians who reacted to this design revised design and provided new suggestions i will not go into the details because it's a fast overview talk but one of the key findings from the study was that clinicians particularly in mental health really value shared decision making so they really want to make sure that they build good report with the paper that the patient is on board activated and committed to their health so they really want to structure the health encounter in such a way that they collaboratively arrive at the decision that they build shared understanding of what's going on and build a shared plan for what to do forward instead what we are doing we are bringing this new thing into the into the clinical encounter that takes the clinician eyes away from the patient and to the computer and they sit there and bang spend three minutes on the computer and then return to the patient and say you know treatment a instead what they want is support for conversations between clinicians and patients could we build systems that allow them to continue the conversation together build this common ground make a sh decision and for that they say that system really needs to support an informed choice by both which is which means that it needs to allow the patient and provider to evaluate multiple options and contrast them everybody understands that most ai algorithms only have access to the subset of knowledge that is relevant to a decision making they argue that this interactive system should reflect all of the knowledge that is important so for example patients preferences side effects and so on and it's very important that the system allows fast and interactive exploration of what if scenarios.

Okay if i really care about insomnia how do my options change if i only want to take drugs once a day rather than three times a day what are my options. Right how do i want to balance my various trade-offs related to my life. Next the providers with whom we interacted also say that they do not want to be responsible for evaluating every single decision recommendation and trying to decide is ai leading me astray or is this a reasonable thing they really want us to evaluate the systems ahead of time do a randomized clinical trial give them the results that they can read once and know how good the system is this said they. So they do not want to see an explanation for every decision they do not want to be responsible for validating the decision recommendation but if the system suggests something that deviates from the clinical guidelines they would like to see a contrastive explanation that says why this particular patient should be treated as an exception. Right. So if there is if this patient has three comorbidities and unusual preference that leads us to prefer an unusual treatment they would like to see that explanation in that case but not otherwise and they also point out that treatment selection is not the only decision that they make another decision that they can do is decide whether the patient should also see a therapist or not whether the system whether the patient should have a followup in three months or two weeks so they want they want us to build systems that have an understanding of the range of different actions that can be taken in the soci technical healthcare system the last application of ai that i would like to touch on is algorithmic recurse so this is the situations where algorithms make consequential decisions about our lives for example about our about our loan applications unemployment benefits or situations like that there is now legislation happening in many countries that the demands that when systems like this are built people are given appropriate grievance redressal mechanism that if the ai says you know you cannot have the loan or so on people should be able to understand what's going on and be able to act on it a key technical approach that has been taken by the ai community is to generate counterfactual explanation so if michael asks for some benefit and is denied he can be told you didn't if you only did something differently you would have gotten this benefit so if you if your income was only $10,000 small you would have gotten the loan the loan so this is attractive and many people see it as obviously desirable thing to do because it's so easily actionable right you tell the person exactly what they need to do in order to get the loan next time but built into this is an assumption that the algorithm that makes this recommendation that produces this explanation knows everything about what is possible and easy in a person's life so for example the algorithm may say you know all you have to do is increase your income by $110,000 and you know you. You will get a loan even though there are other paths through which you could also get the loan this person may not be able to take on extra workships because of their child care obligations but they could reduce their debt by you know paying cash instead of using the credit cards and it would also have gotten the loan but the system did not communicate it because it didn't it didn't understand the relative costs of the different actions that the person could take so because the current community is so devoted to the idea of counterfactuals we decided to explore it and what happens when the when the underlying algorithm has an accurate versus inaccurate idea of what the person can do so we conducted an experiment in which the participants were put in the role of career counselors who helped students who were just denied an internship reapply for that internship and as part of the reapplication process the student would have to take additional courses to build up their competence so what the participants were given was the letter that pro that included reasons so in the counterfactual condition the letter would say if only the candidate had this many courses in these different areas they would have gotten the internship the reason codes explanation would say the candidate needs more expertise in the following areas the you know pharmacology being the most important followed by anatomy followed by pathology unbeknownst to the algorithm there is an extra component here which is that courses in different fields are offered in different semesters so for example here in know in order to take two extra pharmacology courses the student would have to you know take courses for an extra three semesters which takes a long time so they would have to you know work without an intership you know for a very long time so the an objective function that the student has is to become qualif ifed for an internship as quickly as possible so we considered two conditions one in which the reality of the student was aligned with the model of the algorithm that produced the explanation and a misaligned one where the students costs as reflected by the schedules of available courses were misaligned with what the underlying algorithm thought would be the quickest way to make the person eligible for an internship so counterfactuals are really attractive because they are so easy to interpret they exactly tell you what you need to do to succeed whereas reason codes you know perhaps tell you give you a larger range of options but it's they perhaps make it easier to make a mistake interestingly we found that people in both conditions were equally likely to correctly reapply for the internship but very importantly people in the in the reason codes condition took few were selected actions that would require fewer semesters of course taking to become eligible than people who were presented with counterfactuals so people who were presented with counterfactuals had one path that they could follow well. But if that path was not optimal they didn't have enough information to figure out what else they could do reason codes they gave them enough information to make correct decisions and to make several correct decisions and giving them an option to choose one that was optimal for them we also conducted a follow-up study where we compared reason codes to feature attribution where we give people more detailed information about the impacts of different features and the difference between those was relatively small and we also conducted a third experiment in which instead of providing a single counterfactual we provided multiple diverse counterfactuals in hopes that would give people some broader idea of what's possible and that for the fully misaligned condition did not again resulted in worse outcomes than giving people the reason codes so this is perhaps a small net gritty thing so let us also look at the again larger picture of you know how we might how we might deploy ai and support people who are applying for public benefits or other resources so we conducted a study of two settings in which algorithms are currently not being used we just wanted to see how people currently apply for things what does the application process look like and in particular we wanted to see how people react to negative decisions and how they process the negative decisions we interviewed the beneficiaries who are applying for things we interviewed the bureaucrats and we also interviewed people from organizations that support the beneficiaries people who provide advice on how to apply and how to navigate the system we conducted the study in two settings one was in boston metropolitan area and we looked at people applying for housing assistance and the second setting was people applying for land to build their homes near the chenai metropolitan area in india these were comparable things comparable settings and again algorithms are currently not used in these settings we wanted to understand how the decision making is currently happening so we could make thoughtful decisions about how to deploy ai in the future some findings where the following so things interesting things already happened before people even decide to apply it turns out that many people particularly people from marginalized groups often do not know or do not believe that they're eligible for for a particular benefit so they need to be told an intervention is needed to help them decide that they are eligible and they then also need to figure out what are the prerequisites for applying so for example in order to apply for land you first need to have an identity card and you need to have something else so there are often multiple other bureaucratic steps through which you have to go to to succeed then during the application process one implicit assumption that we are making as a as a computing community is that the application is kind of effortless that you sit down fill it out and it goes but it turns out that often collecting the information is effortful and costly and the more detail you provide the more successful your application can be but people rar provide all possible detail people satisfies next in boston metropolitan area we also found that there is often a step of pre- denial so before people get officially denied something they get noticed that they are about to be denied it differs from denial because at the moment of denial consequences kick in pre- denial you are just told denial is about to happen but nothing the consequences have not yet been triggered so this is a brief period in which people can say oops there is information missing there has been any a misunderstanding then when the denial occurs and people get the denial it turns out that they typically process the information emotionally rather than intellectually so potentially you know giving them informative explanations is not the most impactful then there is the process of appeal in many situations and during the appeal often what people do is argue that they legitimate exception to the rule and michael knows everything about discretion in public service. Right so the a very important part of bureaucracy is that correct application of discretion and a very important finding was that particularly for marginalized populations a lot of applicants were supported by by accompaniers people from various nonprofits and other in institutions who would guide them throughout the process provide with both moral and and practical support throughout the whole thing so now with these findings in mind we can realize that there are many places where human- centered ai could be deployed that are much more impactful than just creating counterfactual explanations so we could be proactively identifi identifying people who are eligible but who are not applying we could be pro providing people with personaliz plans for what they need to do in order to you know to be able to apply what other things they need to do during the application process we can provide them with personalized recommendations for which types of information they should prioritize and how they should presented to be most informative to the decision maker so they got so they do not get denied simply because the relevant information was missing perhaps during the denial when people receive the denial letter currently through the counterfactuals we support them in deciding what they can do better next time they apply but another decision that they actually have to make is whether they got denied for legitimate reasons or whether they actually have a good case for arguing that they an exception that they should be treated a little bit differently so currently the way we communicate denials does not provide people sufficient information to make that decision well and when people do get denied and decide to appeal we can provide them with personal guide guidance on how to how to structure that information and throughout the process we really shouldn't just be supporting the beneficiaries we should also building the we should be building the tools for the accompan. So how many people have read at cano tama's geek heresy a lot of amplification a number of you have seen it he often argues that the best way to intervene is not directly but indirectly you have to find the person who is actually capable of making good use of your intervention and often the direct beneficiary does not necessarily have the time resources or the perseverance to be the right recipient of of your intervention so again under the realistic assumptions the counterfactuals that we are producing do not even meet the goals the stated goals for which they are being designed and and goals are actually larger even at the denial stage they should be not just telling people whether what they should do next. But they to support they should be supporting the decision whether people should reapply or argue that they an exception and even more broadly again kind of stepping into james's territory we should be thinking about other places in the application process where ai could be used to intervene for the benefit of people so to summarize the specific takeaways i told you about the adaptive user interfaces and the design knowledge that we have we have produced there and in particular this really cool design pattern of split interfaces where the ai powered solution is an alternative rather than a replacement for what people currently do in the context of decision making i'm arguing that the current simple explainable ai is probably insufficient in most cases that giving people decision recommendations and explanations does not lead to more thoughtful more engaged better decisions we really need to be thinking about a very different paradigms for this interaction and when we when we think about algorithmic decision making in situations like provision of public benefits we focusing on communicating denials denial explanations through count urals doesn't even meet the stated goals and it's the wrong place to to intervene and at the at the large level i want us to celebrate the fact that as a human ai interaction community we have already produced a whole bunch of very useful knowledge but things are moving so incredibly fast right now that we often just come across ideas that sound pretty good and we say yep we are going to build on this we are going to explainable. Ai. Yep we're going to do counterfactuals they're just so easy to follow. And we just turn them without verifying into the foundational aspects of their body of knowledge and adopting the these ideas without verification can have pretty negative consequences downstream and in the language of silicon valley we need to accelerate the production and systematization of the design knowledge related to human ai interaction. So this is really badly needed not just create new products and applications but really try to make sure that we have solid basic knowledge to build on and this is it.

All right. We've got some time for questions feedback comments. Okay and if you could say your name. Yeah um. And thanks so much for your talk i was wondering in the context of the decision making if knowing that the ai model was like 50% correct or 70% correct could potentially change how the humans might like take the advice of the ai. Uh yes it does make a difference and there is another very interesting study in psychology that shows that knowing this only works if you are re- reminded every time you make a decision ision so it's not enough to just tell people once you actually have to re- remind them at every decision instance i'd love to dive in so you made it almost offhand comment early on that you know. Well obviously if you have two inputs that are making uncorrelated errors and ensembling them is sort of the right solution mathematically. Yeah. So just reapplying that given everything you've said should we be taking people in ai out of a loop together and instead just be having independent judgments and ensembling ourselves so i think several people have tried that. And you know intellectually it's a it's a very appealing thing to try to do except that the reason why we have human oversight in decision making is for legal and ethical reasons so we want the final decision to be made by the human so even though this independent assembling can produce very good decisions this does not solve our societal need for having the human ultimately take the responsibility for the final decision right you would then need to have someone look at the ensembled decision can turtles go up or only down you know. I'm thinking that you know if we do that then you know the final decision maker might be over relying but they would be over relying on a slightly better input that's right. Yeah. I mean it's going to it's like a you have a closed loop system there where maybe as long as you're amplifying a better signal. Yeah. But we still want to make sure and you're next that we want to make sure that people honestly understand why they made the decision and they can be like honest to goodness accountable for the decision that they've made and that requires understanding it does even seem possible.

And then i'l be quiet that maybe you could gather the human input in such a way that to the oversight it you can't tell whether the each piece of evidence came from a person or an ai might help debias certain people do occasionally treat machine and human input differently depends on the domain. Yeah um. Yeah. It's intriguing to me just up that a lot of the support for decision making comes from an analytic approach and in psychology behavioral science there's a lot of data summarized ion and think and slow that a lot of these decisions are propelled by an emotional field got instinct if you will and i'm wondering how that works its way in some of these experiments to try you know particularly and explaining some of yours so all of our experiments the experiments that we conducted were kind of in an emotionally neutral setting so that there were good preconditions for analytical thinking in other experiments that i didn't talk about we explored the effect of time pressure. And we saw that it you know can substantially increase o over reliance we have not worked with emotion. But i can totally see how that would impact things so dr sanders is actually a physician and has thought about clinical decision- making so i'm inclined to listen rather than talk. No. I just i just think it's a if anybody hasn't read thinking. Slow. I really recommend it's really a lot of data suggest that these decisions we think are analytically based our own decisions are actually are not driven by so i removed references to canaman from the talk because people have a very specific view of that work like analytical good euristic bad but the dual processing theories actually posit that they just different ways of making decisions. So the system one the urtic one is a pattern completion engine so it can actually make very good complex decisions that we are not necessarily able to make equally well analytically so these are complimentary systems but if you apply them incorrectly things get bad and also the problem with euristic use of explanations is not that people make the decisions juristically but that they make a euristic that the presence of explanations indicates competence therefore i should rely so it's so it's tic processing not of the content but the presence one more question and then people will start running away. Okay i want to extend that same thought a little bit there are lots of situations where you need to do decision making very quickly and putting human in the loop in a lot of those situations it kind of doesn't work so do you have any recommendations for how to improve the quality of the ai joint human decision making process when you've got a reaction we now have evidence that time pressure is bad. So we've got an introduction for the next paper. Okay. Well so i fully agree with you that we need to accelerate production design. But i'm still worried that the people who are hly turning to productizing ai may not even be paying attention or so do you have any thoughts about how to make those i would welcome your and suggestions given that you actually have seen companies from the inside that's what makes me wor.

Stanford Seminar - The State of Design Knowledge in Human-AI Interaction

🤖 Human-AI interaction is a fascinating extension of human-computer interaction, facing challenges in creating predictable, valuable, and proactive intelligent systems, as debated by Ben Shneiderman and Pattie Maes in 1997.

🤖 We need to accelerate the production of design knowledge in human-AI interaction to catch up with reality, especially in building adaptive user interfaces that balance simplification with added cognitive costs and minimize the perceived costs of AI's mistakes.

🤖 AI interaction design can remove shortcuts and foster cognitive engagement to allow for more thoughtful decision-making, as shown in studies on the design of AI systems for clinical work.

🤖 Human-AI interaction is influenced by emotion and time pressure, impacting decision-making processes, but dual processing theories suggest that both analytical and heuristic decisions are valuable in different contexts.

👇 Give it a try