An AI for UX: How I used generative AI to speed up user research

Published in

LexisNexis Design

13 min readSep 3, 2024

A futuristic image of an eye with the words ‘AI’ incribed inside the iris — *An AI for UX. (Image created with the help of Midjourney)*

If there was a Hollywood Boulevard for buzzwords of the year, ‘generative AI’ (‘GenAI’ for short) would get not just one star but three: for 2022, for 2023, and for 2024. But unlike the previous two years of hype and excitement, this year is different: it’s about results. Can this AI thing prove its worth? They say it can do all these things for us — but can it, really?

My official job title is UX designer at Tolley, the tax division of LexisNexis UK, but I also frequently ‘unicorn’ as the primary researcher for that portfolio.

At the start of the year, LexisNexis’s internal Large Language Model (LLM) console was released for use by all employees. It is a secure, ungrounded (in other words, not specifically trained on any specialist data), text-based GenAI model that uses Anthropic’s Claude under the hood.

One of the things I revel in while working at LN, is the opportunity to try new things. Here was my chance to shake up my UX toolkit by trying GenAI for myself.

What if I use AI for research?

What if I used our internal LLM console to assist me with research for a new product I was working on? Let’s call it Product X (*). The project research hypotheses centred around Product X’s value to the users, its usability, and the users’ trust in it. The dataset consisted of eight 60-minute-long user interviews in which I was showing the users an early prototype and quizzing them on their expectations, likely use cases and their concerns about the product. Ordinarily, it would have taken me around a week to process this amount of rich qualitative data, including watching the video recordings several times, tagging and pulling out quotes, synthesising the findings, and producing a report for the stakeholders. I was curious to see whether the LLM could automate some of this process for me and help me save time.

A stylised drawing of a group of people meeting around a screen with the abbreviation ‘AI’ on it — *Using AI for work. (Image created with the help of Midjourney)*

There have been sensationalist media headlines about some “AI for research“ projects. But many such headlines tend to miss the details — where the proverbial devil lives. Did the researcher just bash out some questions for the AI and hope for the best, or did they carefully consider how to get best value out of it by applying prompt engineering best practices? Did they acknowledge that it is they, the human-in-the-loop, and not the technology, that are ultimately responsible for the results? What was their process of checking the AI output for accuracy?

(*) For data protection reasons, the specific product name, use cases and features have been anonymised, including in the images where the relevant information has been obscured and/or overwritten with dummy data.

Due diligence

I certainly feel I have done my due diligence before embarking on this quest. I started planning early, signing up to some online prompt engineering courses and studying several academic papers and other articles on the subject (see Resources section at the bottom). I researched the methods of checking the AI output for errors. To get a sense of what I’m up against, I pitched an example interview transcript against my AI learning notes, to see whether the prompt patterns from the various courses and papers could be tailored to my use case. It seemed quite doable.

The rest of the process kind of fell into place. Specifically, into the following steps:

Prepare transcripts
Prepare Prompt Deck
Feed & train the LLM
Harvest the LLM
Re-rail and catch errors
Condense the LLM answers
Rinse and repeat
Export and use as needed

Step 1 — Prepare transcripts

A thick stack of paper with printed interview transcripts — *Interview transcripts can be overwhelmingly long. (Image created with the help of Midjourney)*

To ensure AI output quality, having accurate interview transcripts was essential: making sure the speakers were correctly labelled and the key terminology was accurate.

I also pretended to be the LLM’s eyes. In the interviews, the users were shown a prototype on the screen, but the text-based LLM had no access to this visual stimulus. In the future, we might be able to upload videos directly to AI, but for now, I manually added some tags to the transcript that briefly described the visuals.

Step 2 — Prepare Prompt Deck

Staring at the LLM console — a wall of text — was getting to me. I needed to create something a bit easier on the eye, that I could refer to while working with the console. So, I created my own Prompt Deck — a deck of ‘cards’ to represent specific prompt patterns. In parallel, I set up a Miro board containing all my in-progress prompts — I found it much easier to navigate that than, say, a Word document.

I grouped the prompt patterns(^) into three categories: Umbrella, Prepping, and Correcting. This image gives an overview of them all. Each prompt pattern and its role are explained below.

(^) Some of these prompt patterns are directly borrowed from the articles and courses on prompt engineering, whereas others are created by me. The borrowed ones are marked by orange or blue star, with the sources quoted in the Resources section.

Step 3 — Feed & train the LLM

In this step, I used a combination of Umbrella and Prepping Prompt patterns to “feed and train” the model prior to getting it to do things for me.

Prompt Deck: Umbrella Prompts

Umbrella Prompts ensure that the model knows how it should behave, over what period of time, that it understands the context for its task, and is fed any datasets that will be used to inform the task.

Time

This prompt pattern ensures that the LLM understands when to apply a specific prompt and when to stop applying it.

Examples:
“From now on and forever…”
“From now on and until I say STOP TEMPLATE”

Feed

This pattern feeds the console an external dataset (if using). Make it clear to the model what is being fed to it — in this case, an interview transcript.

Example:
“I’m going to provide you with a transcript from a user interview. (Transcript follows).”

Set Scene

This pattern describes the scene to the LLM. What has been happening that should matter to the model when it’s performing this task for you? For example, who has been interviewed and why? What have they been shown during the session? What has the researcher been trying to get them to do?

Example:
“From now on, please note the following information before you respond to my questions: In the interview, the user was shown a mockup or prototype of Product X on the computer screen and asked for their reactions and opinions. The main use cases illustrated in the prototype are: (list the use cases).”

Persona

This is about getting the LLM to roleplay. The console will give better responses if it knows which particular hat it’s wearing today. Is it a UX researcher’s hat, a usability specialist’s hat, some other specialist’s hat?

Example:
“You are the world’s most thorough user researcher and summariser”

Combining the above Umbrella Prompts gives you…

‘Root’

Offered to the LLM at the beginning, this prompt will govern my entire interaction with the model. It is not a real root prompt (those are pre-engineered to tailor the model to a particular use case and then hidden), but a bespoke, pseudo-root prompt that suits my specific requirements.

Example:
“From now on and forever, you are the world’s most thorough user researcher and summariser who is working on user research for Product X, a new tool for X professionals. I will be providing you with a transcript from a user interview about Product X. The transcript will be long, so you will be interpreting the ENTIRETY of the transcript, not just beginning and end. (^^) (Transcript follows).
From now on, please note the following information before you respond to my questions: In the interview, the user was shown a mockup or prototype of Product X on the computer screen and asked for their reactions and opinions. The main use cases illustrated in the prototype are: (list the use cases and any other relevant information here).”

(^^) Borrowed prompt pattern (blue star reference) — see References section

Prompt Deck: Prepping Prompts

Prepping Prompts ensure that while performing the required task, the LLM behaves in the way I want. Without guardrails, LLMs tend to ad-lib and revert to their basic behaviour of just interpreting the text given to them, in an unstructured way. While ad-lib behaviour can be useful, here I need much more structure. I also need the model to try its best to come up with accurate responses and an easy way for me to check their accuracy.

Few-Shot

A highly useful prompt pattern where instead of explaining what I want the model to produce, I simply give it several examples of the types of outputs I’m after. LLMs excel at spotting patterns, so just a couple of few-shot examples should be sufficient.

This prompt goes hand-in-hand with…

Template

If I require the LLM to come up with a very specific answer format, I can use the template prompt pattern. It’s handy to give this prompt a recognisable name, so that the LLM can remember it later.

…and it also goes with…

Chain of Thought

A type of pattern that has been shown to be very effective in increasing the LLM’s accuracy. Simply, if I ask the LLM to explain why it’s giving me the answer it’s giving (for example, by including real user quotes from the transcript), it’s less likely to make stuff up. And if it does, I can spot the mistake much more easily.

Example of a Few-Shot, Template, Chain of Thought prompt:
“I will now give you some TEMPLATES to use for a question and an answer. Acknowledge that you have understood the format of the TEMPLATES, but do not yet analyse the transcript. The templates must include the relevant user quotes from the transcript per each bullet point.
TEMPLATE 1
My question: Does the user want to incorporate Product X into their workstream? If so, why?
Your answer:
REASONING — The user says that they would give it a go. “Relevant user quote from transcript”.- The user is also unsure about Use Case 4. “Relevant user quote from transcript”.- The user has concerns about the specificity of the product feature 2. “Relevant user quote from transcript”.
ANSWER — MAYBE. Short explanation of answer.”

Here’s one more Prepping Prompt pattern, for good measure…

Ask for Input

A LLMs is a bit like an enthusiastic but inexperienced personal assistant. Left unchecked, it will jump forth with what it believes to be a helpful answer, even when you don’t want it to. Ask for Input is a handy prompt type that prevents the model from jumping ahead.

Examples:
“Please await my instructions before responding.”
“You have to wait until all parts (of the dataset) have been provided and I say, “That’s all”.” (^^)

(^^) Borrowed prompt pattern (blue star reference) — see References section

Step 4 — Harvest the LLM

At last, the LLM had everything it needed to proceed. The next step was for me to harvest it — that is, to get it started with the actual task of answering questions about the interview transcript.

To create the questions, I revisited the original research hypotheses about value, usability and trust, broke these down into more bitesize, granular questions, and asked the model the questions one by one. The model proceeded to generate its answers.

Step 5(ish) — Re-rail and catch errors

LLMs hallucinate. Ungrounded LLMs hallucinate much more than grounded ones. How could I best spot and correct these? Enter the Correcting Prompts deck.

Prompt Deck: Correcting Prompts

For when the LLM goes off the rails or makes stuff up.

Re-rail

Sometimes the model forgets its instructions and reverts to ad-libbing. Thankfully, a simple reminder should get it back on track. If the model has forgotten to use a template I’d provided, I can simply mention the template by name — the model should then remember its instructions. Other re-rails include reminding the model to include real (and not made up) user quotes from the transcript.

Example:
“From now on, in all your answers, include the relevant user quotes from the transcript per each bullet point. With this in mind, please answer the above question again.”

Catch Errors

This prompt pattern is best applied immediately after getting an incorrect LLM response. When challenged, the model usually realises that its previous answer is wrong and rewrites it accordingly. Occasionally, it can get stuck in a hallucination loop, unable to get out — then, it’s best to ask it to disregard the entire conversation branch and try again.

It is much easier for me to catch errors because of the previously applied Chain of Thought prompt which forces the model to declare how it has arrived at a particular answer.

Example:
“How does the user quote “So I see it being a bit. A more reliable version of Product Y, basically.” support your statement that “The user believes Use Case 3 is where Product X can provide significant time savings”?”

Step 6 — Condense the LLM answers

This step was especially satisfying. After gathering all the model’s answers, I fed them back to itself, gave it a Few-Shot, Template-style prompt, and got it to spit out an A4 summary page in a standardised format.

Example:
“I will now provide you with EXAMPLES of how to shorten a new document. When you receive the EXAMPLES, await my further instructions before proceeding.
EXAMPLE 1
My input:
Does the user perceive the value of Use case 2? If so, why?
ANSWER: (LLM’s previous full answer here)
Your output:
Does the user perceive the value of the Use Case 2? If so, why?
SHORTENED ANSWER: YES, but they believe significant tailoring would be required.”

Step 7 — Rinse and repeat…

Several shortened summaries of transcripts — *Rinse and repeat: processing interview transcripts one by one. (Image credit: my own)*

…and rinse and repeat, one interview transcript at a time. The more transcripts I processed, the quicker it got. With interview 1, I spent several hours refining the prompts. By interview 4, it was mostly just a copy-and-paste type job. The processing time shrank to just over an hour per interview.

Step 8 — Export and use as needed

A spreadsheet, a data visualisation and an “oyster” in the middle (illustrates opportunity) — *Now the data can be exported and used for spreadsheets, reports and data visualisations. (Image credit: my own)*

This was my bingo moment. The condensed LLM output was directly importable into a CSV file. From there, I could do whatever dataviz I needed to illustrate the findings to the stakeholders or copy the relevant user quotes without having to troll through thirty pages of transcript.

And the verdict is…

When I began this process, I had no idea where it would land. But I can report that overall, I did save around 20% of my time by using the LLM instead of doing research analysis the old-school way. With further streamlining, I’m sure I could ramp up the speed even further.

However, I feel that this method may not work so well for all UX research studies. My user interviews had a clear structure, with clear hypothesised outcomes; but, for example, applying the same method to exploratory, unstructured user interviews may not work so well. Furthermore, prompt customisation will be required for any new project — and that takes time.

As does checking the output accuracy. I can vouch that it is true that researching with GenAI remains highly reliant on the skills of the researcher, who should be adept at catching errors and ensuring that the process remains robust. For this reason, NN/g recommends that AI is best used by experienced UX professionals, not beginners.

A delightful side-effect of working with a LLM was its role as an antidote to the inevitable researcher bias. I sometimes found that the model would come up with an answer I didn’t expect. Upon checking everything, I would conclude that there was in fact a clear reason for the answer and would agree with the model’s findings.

And finally, give a thought to how GenAI is affecting the world’s energy demand and carbon footprint. GenAI is amazing, but we have not yet figured out how to make it sustainable. When this happens, I will use it for everything from shopping to brushing my teeth, but until then, I may think twice before pressing Send on that console.

An oyster with a pearl inside with ‘AI’ engraved on it — *The world is your (AI) oyster. (Image created with the help of Midjourney)*

Resources

Experience, W.L. in R.-B.U. (n.d.). AI for UX: Getting Started. [online] Nielsen Norman Group. Available at: https://www.nngroup.com/articles/ai-ux-getting-started/.

White, J. (n.d.). Prompt Engineering for ChatGPT. [online] Available at: https://www.coursera.org/learn/prompt-engineering/ [Accessed 16 Apr. 2024]. (Orange star reference)

White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J. and Schmidt, D.C. (2023). A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT. arXiv:2302.11382 [cs]. [online] Available at: https://arxiv.org/abs/2302.11382.‌ (Orange star reference)

Raabe, P. (2023). How to Use AI and ChatGPT for Efficient UX Research. [online] Udemy. Available at: https://www.udemy.com/course/ux-research-with-ai-and-chatgpt/ [Accessed 20 Jan. 2024]. (Blue star reference)

Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q. and Zhou, D. (2022). Chain of Thought Prompting Elicits Reasoning in Large Language Models. arXiv:2201.11903 [cs]. [online] Available at: https://arxiv.org/abs/2201.11903.