5 steps to prototype AI-Driven Products

17 min readMay 10, 2021

Pen and paper, pixels and code, chalks and blackboards — these are all artifacts we tend to associate with design materials since they are all common elements used by designers when trying to find solutions to identified problems. These artifacts are both analog and digital, and their form and role are obvious. Less obvious is Artificial Intelligence (AI), which is becoming more and more prevalent in the machines of today and therefore has to be considered as a design material as well. AI in general, and machine learning in particular, offers several features and opportunities for designers to innovate and improve their designs. In traditional digital designs, users can only interact within the boundaries of what the product can do. In contrast to this, the product in AI-driven experiences adapts to the user’s behaviour (Kliman-Silver et al., 2020, p. 2). In this article, we will discuss and document the highlights of the design process of a User-Centered AI Infused Product: Moviepicker.

1. Define user needs

The first step of the process was to define the problem and then obtain a deeper understanding of it. After a brainstorming session filled with many ideas, we were drawn to one particular idea. A solution whose purpose is to help users decide what movie to watch. This movie recommendation would be based on what the user likes, in a fast and easy way.

Our concept originated during an exercise where we were told to create a design involving AI using the following constraints categorised below within three dimensions and one technology :

Three dimensions

Discretionary
Personal
High independence

Technology

Natural Language Processing (NLP)

The idea of an app that helps you decide on a movie to watch, comes from our personal experiences with streaming services, and the need for getting access to recommendations to new movies or tv-shows that match our preferences and mood.

The goals we would like to obtain are:

Finding a movie fast that matches users preferences
Not spending hours browsing through the streaming services.

After defining user needs from our own personal experiences, we conducted desk research to get a broader understanding of the context and existing technologies that might fulfil similar goals.

We decided to investigate how Netflix’s recommender system works to gain insights into their implementation of AI and machine learning. In general, Netflix offers a variety of recommendations for its users as well as different categories to choose from. Generally, there are two ways in which AI systems generate recommendations: the first is a content-based approach that bases its recommendations on the users’ previously watched films and series and the items they have shown interest in. The second is a collaborative-based approach in which all the users of the service are analysed to recommend relevant items to users with similar tastes (Christopher, 2020).

Both the content-based and the collaborative approach could be used in our product for the same purposes as Netflix’s recommendation system. We would isolate the two approaches by making it clear to the user when the recommendations are based on their own preferences and when they are based on other users’ preferences.

Another insight from our desk research when browsing through applications with similar qualities were mobile applications such as “Taste — Movie & TV Recommendations” and 5 of 20 “Dinggo!” which you can find in both the App Store and Google Play. These two applications were great examples of how to implement the content-based approach for recommendations, and Dinggo! also had a social aspect, where it could suggest movies to different social groups of people.

We argue that the social part of sharing your movie recommendations with your friends also plays a big part in what you decide to watch and therefore we have added another goal for our design:

Seek inspiration from friends/connections to choose a movie

Our concept, for a design where you get help to choose a movie, initially came to mind before actually conducting any desk research, so many of the qualities of our design are similar to existing products. On the other hand, the social aspect of our design is more from a social media perspective than a collaborative angle as seen in Dinggo! where you have to choose a movie together. By ‘social media perspective’ the thought is that you can get inspiration from your friends by seeing their favourite movies and perhaps how many of your friends have that same movie as one of their favourites. The social aspect in our design is not directly related to a social gathering but rather to seek inspiration from your fellow movie enthusiasts. Therefore our concept can primarily be categorised as personal, but with a slightly social add-on.

2. Mapping & use scenario

We followed the design sprint methodology to map out our learnings. We decided to create a User Journey map where we captured:

The steps users take while deciding on what to watch,
more detailed needs connected with each step,
pains,
and “How Might We” questions that helped us re-frame the issues into the opportunities.

Then, we voted on our “how might we” questions using star stamps, to choose the best opportunities for further iteration.

Finally, we built an understanding of the problem and chose the focus area, we progressed to perform further desk research to find some inspiration and other products with similar functions to get inspiration.

Great, now we got our first data about our users, what they want, need, and expect, our next step was to build a use scenario.

Proposed framework for classifying AI-driven experience

To help us map our product and come up with one use scenario, we used the framework from (Kliman-Silver et al., 2020) ”Proposed framework for classifying AI-driven experience”. The framework proposes three dimensions for classifying AI-driven experiences. It highlights the importance of picking a method that can capture feedback based on user-provided data.

Moviepicker is a more dependent AI system as it requires the users’ interactions to trigger the capability of the system. Although it contains social aspects, it is primarily personal because each user’s recommendations are based on their individual inputs.

The nature of the experience with Moviepicker is discretionary as it requires the user to consciously opt-in, and even though the app then creates a suggestion or more for the user, it is still up to the user to actually put on the movie that has been suggested, i.e. it is not an always-on experience. It is a more traditional and passive tool. It will not be interfering in the users’ tasks, but simply offer them a helping hand, but at the end of the day, the task (watching a movie) will only be completed if the user wants it to, regardless of how many recommendations the Moviepicker brings forth.

It is important to consider the potential unintended consequences of AI-driven systems that may have a different agenda (i.e. set by the creator) than the users do (Kliman-Silver et al., 2020, p. 3)

Ethical concerns may also arise, this could also be the case with Moviepicker, if say fx. movie companies bought their way in, so they were more suggested than other movies.

If the system explains how it makes its decisions, the user is more likely to build trust in the system based on the explanations. In other words, if the user understands how the system is making its judgments, they are likely to trust it more. The Moviepicker will gain the users’ trust by making it transparent how the suggestion algorithms work. So, by picking movies you like or don’t like, and liking your friends’ movies, that would be what drives the AI in Movie Picker. This will be much more clear to the user as they are making the choices themselves in a way and are seeing where their choices take them.

We decided to use the dimensions: high independence, discretionary and personal. In the context of this we made the following scenario:

It’s Sunday evening and you’re sitting on your couch not knowing what to watch on your television. You feel overwhelmed by the number of movies to choose from, you have looked through all IMDB ratings, and still can’t decide on a movie to watch. All of a sudden you remember this app someone told you about, where you communicate through speech and using NLP helps you decide which movie to watch depending on your mood, interests, recently watched, and so on.

3. Designing our product and prototype

Time to sketch

We started out with some lightning demos where each one of us shared results from the desk research. We were looking at different industries, as well as potential competitors to get inspiration while sketching.

To push our creativity, we followed the 4-step-sketching framework:

Step 1: Reflect — Where we silently looked at everything we did so far and gathered notes.

Step 2: Ideas — We privately jot down some rough ideas and circled the most promising ones.

Step 3: Crazy 8s — We sketched variations of our best ideas in 8 frames. We had one minute per sketch.

Step 4: Solution sketch — Where we sketch our ideas for the solution.

Our goal was to push us further from our first ideas and generate a wide variety of solutions. Afterward, we voted on the solutions we liked the most and decided which ones we wanted to go with within prototyping and testing.

The app should contain the following functionalities: Show today’s recommendations, show more personalised recommendations, show what the user’s friends are watching and give the user the possibility to choose today’s random, that picks a random movie for the user. Moreover, we decided not to use the NLP technology as we originally envisioned in our scenario.

Storyboarding

To get even more clarity on our final user scenario we decided to create a storyboard hence, we made a persona “Bob” to represent the users that might use our product, and in that way create a realistic representation of our key segments. Below is our completed storyboard:

It’s Sunday evening and Bob is sitting on his couch not knowing what to watch on his television. Bob feels overwhelmed by the number of movies to choose from, Bob has looked through all IMDB ratings, and still can’t decide on a movie to watch.
All of a sudden he remembers this app someone told him about, which uses ML to help him decide which movie to watch depending on his mood, interests, recently watched, and so on.
Bob enters the app and is met by today’s recommended movie and underneath a menu with two other options: More personalised recommendations & What my friends are watching.
Bob is browsing through what his friends are watching just now, but he still can’t decide what movie to watch.
He jumps to the personalised recommendations, but he is still not sure about what he is in the mood for. Therefore, Bob decides to see what the app might suggest if he chooses the “Give me a random!” category. Bob was given 3 suggestions from which he can choose from. He tried again and decided to choose the 3rd recommendation.
Bob opened his streaming service and had a lot of fun watching the recommended movie. The end.

Now the prototyping part — the first visual impression of our product.

We stick to a low fidelity, interactive prototype to:

Save time on preparing a static prototype, but spend time on exploring designs
To make design changes more easily during our test.

So, which tool should you use?

MOVIEPICKER low fidelity wireframes in Figma

We did it in Figma, for a more detailed prototype. But you can use other rapid prototyping tools or the old-school way: Pen and paper.

At this point, you can bring the prototype to your user group for testing, see how they react, fix, and create a new prototype.

4. User testing

In the following, we will shortly explain the Wizard of Oz test method followed by the constraints we met in the process of defining the test procedure and the Wizard’s actions. Finally, we will describe the final test procedure as well as the test results.

Wizard of Oz

The Wizard of Oz is a method where users interact with a system controlled by an unseen human. The “wizard” (the UX designer) controls the user’s screen from another room. When the user clicks, the “wizard” decides the next step and shows that page for the user. It is a good way to explore how users experience a responsive system before using a lot of resources and time to build that system. (Maulsby, D., Greenberg, S., & Mander, R., 1993)

Constraints

As previously stated, we had difficulties defining the test procedure and the Wizard’s actions when designing the user testing.

Firstly, we struggled with how to incorporate the AI functionality in the prototype. We have come to the conclusion that this AI simulation would require us to gather knowledge about the user’s taste in movies. This could either be done beforehand or alternatively during the test. Before the test, we could use a questionnaire or have an interview. During the test, we could add an initial step to the test procedure, where users would be given a couple of movies to like or dislike. For the latter option, we could have utilised the Tinder-inspired concept on the main screen of the prototype, also to showcase one of the ways the AI model is trained.

Secondly, we felt that the prototype limited the Wizard’s actions. For example, it was quite difficult to create a new screen on the spot. We suspect this would have been easier with a static prototype because the human acts as the “computer” and has complete control.

Test procedure

We introduced the concept of the product to the users with a short explanation followed by a simple test procedure based on 3 simple tasks.

To examine the users’ willingness to adopt the concept the users were asked 4–7 questions in relation to Perceived Usefulness and Sense of Comfort. According to Kliman-Silver et al. (2020), this gives us the opportunity to learn about the extent to which the product meets the users’ needs, as well as how likely they are to adopt it. The product might be perceived as useful but the users might not feel comfortable with it.

Intro: “Imagine an app that automatically suggests movies for you, based on what you have previously watched.”

Tasks:

How would you go about finding a movie suggestion?
Find a movie based on/inspired by your friends’ recent watchlist and recommendations.
Do you see another way of finding a movie suggestion?
Choose the random solution to get today’s movie
If you would like more random solutions for today’s movie where will you go next?

“Now that you have tried to use the app what are you thinking and feeling about the following”:

Perceived usefulness:

Could you imagine using the app in the future?
If you want, please elaborate on your previous answer.
How willing are you to adopt this app in your everyday life?

Sense of comfort:

Do you feel comfortable with the features?
If ‘no’ what features seemed uncomfortable?
Do you have any concerns about using the app?
If ‘yes’ please elaborate.

MOVIEPICKER — Wizard of Oz test recording

Test results

User 1: Søren, 39 years old.

Søren uses streaming services on a weekly basis. He is often looking for recommendations and inspiration for movies and tv-shows to watch. Tv-shows are a common topic of conversation when he is with his friends. They regularly share what they have watched recently and what they recommend each other to see next. Still, he often finds himself browsing to find the movie he is in the mood for.

Søren could easily see himself using the Moviepicker in the future. He points out that it is a common problem that there is so much to choose from, that it becomes unmanageable to decide on a movie to watch. He prefers a recommendation from a friend but likes the idea of a clever random show-picker, especially when you have been browsing for too long.

Søren feels generally comfortable enough with the features but points out that if the app knows what he is watching, he assumes that he has to sign up in order to let the app track his preferences. Meaning Søren is fully aware that he is giving a robot access to data about him. The social aspect of the app seems a bit uncomfortable to Søren. Maybe he is not always interested in letting his friends know about his “guilty pleasures” when it comes to movie preferences. Therefore his biggest concern about the app is that his friends find out about his chick-flick movie cravings.

User 2: Ulla, 61 years old.

Ulla rarely uses streaming services and primarily watches flow tv, i.e. channels from the Danish Broadcasting Corporation (DR) or TV2.

Still, she saw great potential in the Moviepicker, because for her it would mean that she could get better inspiration of what to watch on a Saturday evening, instead of just turning on the television as the default choice. However, when testing the prototype she experienced a couple of hiccups.

Generally, Ulla seemed a bit confused about the navigation flow of the prototype. She was not quite sure what buttons were interactive.

Another drawback for her was the placeholder movie titles we had inserted in order to make the prototype more realistic. As these movies were so far from her actual taste, she struggled in grasping the concept that the prototype was supposed to illustrate.

Though she struggled a bit with the navigation and fully understanding the idea, she still liked the concept and the look of the prototype, as it reminded her of the familiar interface of Netflix.

Ulla’s biggest complaints were about the ‘Friends’ screen, which she believed needed a little more context and refinement in order to be useful. For example, she suggested adding how many friends the recommendations were based on to contextualise the number inside the heart icon.

5. Analysis and Reflection

MOVIEPICKER high fidelity wireframes in Figma

To analyse the test results, we will again take the outset in Kliman-Silver et al. and their proposed framework of how to get users on board with new AI experiences. The framework presents two important factors to uncover: perceived usefulness and sense of comfort.

In terms of perceived usefulness, both test users liked the idea and concept of a MoviePicker app that can help you decide on a movie to watch instead of having to either browse through all the streaming services for interesting content or resorting to just watching flow tv.

In terms of sense of comfort, none of the users expressed a lack of comfort when testing the prototype. This might be due to the fact that streaming services and recommender systems such as the one found in Netflix have become increasingly widespread and thereby have normalised this kind of artificial intelligence.

Therefore, based on our two tests our app satisfies the two important factors of Kliman-Silver et al. However, the tests also uncovered some important concerns that would have to be addressed in a future iteration.

For example, Søren expressed concerns about privacy; that he would feel uncomfortable sharing his entire viewing history with his friends. This lack of privacy could result in him abandoning the app because he would be too concerned about his friends’ judgment of his ‘guilty pleasures’. A way to ensure a better balance of privacy and voluntary sharing of movie suggestions is to take inspiration from Spotify. The Swedish music streaming service has a similar feature where you can see what your friends are listening to. When this feature was first launched it was a mandatory part of the experience, but now it is possible to disable it for yourself. Similarly, we can make it so that users are able to opt-out of the sharing feature entirely if they only want personalised movie suggestions. Another way could be for each movie to let the AI ask the user whether they want to share it or not. Over time the AI will learn what kinds of movies the user does not want to share and proactively disable the functionality for those movies. The user can later manually enable the feature again. Furthermore, as we mentioned earlier we were inspired by the ways of social media, for example, Instagram, where the user can choose what to share, but also have posts or in this case, movies saved on their own private list. This way the user is in total control of what other users see, and can still maintain their privacy.

YouTube video showing the concept of MOVIEPICKER in action

The biggest takeaways from the process of this project are the importance of doing desk research early and how hard it is to test AI-infused products.

Firstly, in future projects, we will focus more on exploring what existing apps are already out there with a similar concept. If the timeframe for this project had been wider, we would have preferred to research other existing apps earlier in the process. The structure of our research affected our final app to not be as innovative as we could have wished for, as we later discovered several existing apps with the same motivation and similar concept.

Secondly, we discovered that it requires more planning and preparation testing AI-infused products and using the Wizard of Oz test method than anticipated. The project showed how important it is to have a good understanding of the behaviour of the AI, and based on this come up with a defined set of rules for the Wizard to perform in order to simulate the AI properly. We also experienced the hardships of trying to do Wizard of Oz testing with an interactive prototype. As previously stated we believe that a static prototype would provide the Wizard with the necessary control and freedom to change the user experience during tests to best simulate the AI.

Generally, we would have liked to have done more user tests to get more perspectives on our app. We would also have liked to explore the audience of the app further. In the development of the app, we have taken much inspiration from the social media apps of today and thereby the app is naturally more geared towards younger people and specifically people of our own age. However, the test with Ulla who is outside of this age group, showed that she struggled quite a bit with the layout and navigation flow. Therefore, it could be interesting to investigate the usability of the app in terms of different age groups, to see if Ulla’s confusion is a general feeling of people her age or if this was just a one-off.

In conclusion, it is fair to say that the process of this project had some bumps along the way, but through this, we have gained further knowledge of how to design and test AI-infused products.

Bibliography

Christopher, Albert. (2020) How Netflix Uses AI For Better Content Recommendation. Retrieved on the 25th of April 2021 on: https://albertchristopherr.medium.com/how-netflix-uses-ai-for-better-content-recommendation-e1423784ef4
Kliman-Silver, C., Siy, O., Awadalla, K., Lentz, A., Convertino, G., & Churchill, E. (2020). Adapting user experience research methods for AI-driven experiences. Conference on Human Factors in Computing Systems — Proceedings, 1–8.
Maulsby, D., Greenberg, S., & Mander, R. (1993). Prototyping an intelligent agent through Wizard of Oz. Conference on Human Factors in Computing Systems — Proceedings, 277–284
nngroup.com. (2016). UX Prototypes: Low Fidelity vs. High Fidelity
https://www.nngroup.com/articles/ux-prototype-hi-lo-fidelity/

Authors

Cecillie Anastassia Vogtmann Pagh
Ann-Kristin Løye Hejl
Alberte Emilie Christensen
Mai An Krause Bernt
Ida Mørck Jørgensen
Sandra Wdowiak