EPISODE 1705

[INTRODUCTION]

[00:00:00] ANNOUNCER: Inworld is a company that provides tools for game studios to add AI-driven gameplay. They are at the leading edge of using generative AI and game development, and have worked with companies such as Xbox, Ubisoft and NVIDIA. Igor Poletaev is the VP of AI, and Nathan Yu is the Director of Product and GM of Labs at Inworld. They join the show to talk about using AI and game development. 

Gregor Vand is a security-focused technologist and is the founder and CTO of Mailpass. Previously, Gregor was a CTO across cyber security, cyber insurance and general software engineering companies. He has been based in Asia Pacific for almost a decade and can be found via his profile at vand.hk. 

[INTERVIEW]

[00:00:56] GV: Hi, Nathan and Igor. Welcome to Software Engineering Daily. 

[00:00:59] NY: Hey, Gregor. Thanks for having us on.

[00:01:03] TW: Hey.

[00:01:02] IP: It's always a fun change to have two guests on the show today. And we've got yourselves Nathan and Igor from Inworld. Very exciting company to hear about. You are the GM of Labs and VP of AI respectively. I'd love to hear from both of you, how did you get into this sort of area at all to begin with? This is the intersection of gaming and AI. I'd love to hear from each of you. How did you find yourselves at Inworld? 

[00:01:30] NY: Yeah. It's been a fun adventure. I joined Inworld in February 2022 now. A little over two years ago. And at the time, I was working at Microsoft, leading product for mixed reality teams. And I stumbled across Inworld actually on a LinkedIn job post and thought, "Hey, this is interesting. They're the intersection of AI and metaverse applications," which are two very strong areas of interest for myself. And so, I learned more about the product. Got you know the team. This is pre-ChatGPT. And so, it was my first time looking at generative AI GPT3 technology. And super mind-blowing. And I thought, "This is absolutely the next technology that's going to be revolutionary." Right? Applications populating in the worlds of metaverse applications, applications in education, healthcare, everything. And so, that convinced me to join the team. And shortly thereafter, we found great product market fit in games, which we can kind of dive into later. 

[00:02:24] IP: For me, the journey started the year after Nathan joined. I think it was also February 2023. Before, I've been at Google for almost 6 years. Been doing full cycle of software engineering career. Starting from just regular engineer and ended up doing some technical lead manager work at Google's Dialogflow. It's a Google Cloud product that offers contact center automation where I actually been working together with some of the Inworld Founders. And that's how I got into Inworld. 

I was extremely excited about all the infinite avenues that generative AI opens up for gaming in particular. I've being a gamer myself since being child. And I found it an opportunity with no other option than to just join Inworld. Yeah. And since there, I've been doing a lot of different things. And, yeah, super excited to talk about it today.

[00:03:19] GV: Yeah. Let's hear a bit more about Inworld. Appreciate - you're not the founders of the company. However, still fairly sort of early-stage company. What was the Genesis of Inword at all? And where did Inworld kind of start from? 

[00:03:36] NY: From my perspective, early days, Ilya, Mike, Kylan, they were working on Dialogflow, which is a large enterprise conversational AI technology. And, obviously, with the rise of this generative AI capabilities, they found, "Hey, this is likely the future." As opposed to rules-based systems. I know Igor can talk a lot more to that. And started playing a bit more with the idea of how to populate virtual worlds. Applications and gaming as well. And decided to spin off and create a startup. 

And so, I think it was just a ripe time with the trend of a rising technology and also a clear demand to disrupt the gaming entertainment industry with gameplay mechanics that have been stagnant for years at this point. And so, I think it was just like a really great time for the inception of Inworld.

[00:04:28] GV: Yeah. I think that makes a a lot of sense. Or I say makes a lot of sense would I have been the one to start Inworld. Of course, not. But sounds - with 2020 retrospective vision, it makes a lot of sense. Let's dive into a bit more, what is Inworld. Inworld can be confused with being a game studio. And that's not the case. Let's sort of clear that up right at the start and sort of hear more what is Inworld. What does Inworld do at a high-level?

[00:04:56] NY: From a high-level, Inworld, we're looking to be the leading AI provider for games as a whole. Now what exactly does that mean? I'll clear it up first and foremost, we're not a game studio. You might have seen some technical demos produced by Inworld just to demonstrate to the world what reference use cases and ideas might be possible. But we're absolutely not a game studio. Nor do we have intentions to be one.

But what we do provide are tools to help game studios make more immersive games with this AI technology. And so, there's three core components of our product. The Inworld Studio, the Inworld Engine and then Inworld Core. And you could think of the studio as a set of design time tools that help creators and developers iterate through narratives, iterate through character definition, gameplay loops and progression in a faster, more productive manner. 

And then Inworld engine is the runtime. How do we power unique novel AI mechanics during gameplay itself? Depending on how a player will interact with the game, the world and the characters around those are dynamic. They're reactive based off of the player behaviors and can provide more personalized contextual actions and conversation to those gamers themselves. 

And so, you could think of it as orchestration platform at the end of the day where we combine a hybrid of large language models, small models, a whole suite of over 30-plus, maybe 40-plus at this point, different models that go into perception, cognition, reasoning and then action generation that can be a full set of tools that developers leverage to create the next generation of games.

[00:06:28] IP: Yeah. As Nathan said, I think, in traditional terms, it's interesting to think about Inworld as basically a business-to-business platform. We are offering software for the game studios to, first of all, using this studio and its APIs to design the narrative, design the characters. Basically, you can think about it as designing a software package, which you further using Inworld engine deploy in the environment of your liking. Whether it's Unreal, Unity, Web SDk, anything. 

And the first thing that Nthan mentioned was Inworld Core is what we've recently announced as our offering to customize different things that we provide to the customers in a way that if you need it to be deployed in-premise, if you'd like it to be deployed on our hardware as a dedicated deployment, we can do that for you. Not only that. But, also, as part of the Inworld Core, we offer customizing the actual machine learning pipeline. 

Namely, if, for example, your game experience needs new voices, which do not exist in the set of voices that Inworld offers, or you need a customized large language model for a specific language, and/or even if you need to run it on-device, we can do that for you. And that's what Inworld Core is about. Yeah, engine is runtime. Studio is design time. And core is the way how to customize both things and run it such that you can make next-generation experiences.

[00:08:03] GV: Okay. We got three main parts to the product. What was the kind of order of events there in terms of which came first and why? 

[00:08:11] NY: I would say the studio and engine kind of play hand-in-hand, right? If you think about just the basic use case of powering dynamic conversation with an NPC. Obviously, you have the engine running at gameplay time with the players. But how you set up and configure the attributes of that character? Which we can dive into into a lot. There's a lot of nuance towards that as a new field of a discipline. But that has to be done using the Inworld Studio. Those two kind of go hand-in-hand. 

But since then, they've evolved independently in many different directions. You could imagine the Inworld Engine now allows opportunities to take in perception. Changes in game state to evolve character behavior based off of the client-side state. And studio, we recently announced a whole set of design time tools in partnership with Microsoft where they can be used in their own way that's separate from engine and runtime behaviors. If you just wanted to create characters and narrative arcs to do in the design phase as you're scripting out the game, you can use those tools for that. 

And one interesting anecdote I actually recall, my first month joining, we had a writer just used Inworld Studio to help her write the next chapter of her book. Just because you could bring your character to life, and she found, "Hey, if I put this character in this context, in this environment, what would this character do?" And sometimes it's right. Sometimes it's not. But it would still give that feedback to the writer that she was able to use and continue ideation for the future. 

The studio and the engine kind of came hand-in-hand. Now they're evolving on their own. And core is most recent. As we're looking about actually deploying the technology to games, we realized, "Large game studios, sure. They might have access to their own GPU clusters, compute. But other studios may not." And there's a lot of technical complexity there on how to manage those. Fine-tune the models, customize it, drive down latency, cost, optimizations. And so, that's Igor's whole wheelhouse is how we can kind of do that sustainably and viably.

[00:10:10] GV: Yeah. Nice. I like the - at the end of that, accessibility. Effectively, accessibility to the platform. As you call out, larger studios may have the infrastructure. But imagine the indie studios and everything in between actually doesn't. And it's also the - as you call out, it's the knowhow and understanding how to deploy this in the right way. And, again, I would assume that Inworld is the expert in their own technology in that way. Makes a ton of sense. 

Something you just touched on that anecdote, Nathan, about a writer using Inworld actually for a book. That's super interesting. And kind of segues into - if we just contextualize Inworld before we dive into sort of all the details and other anecdotes, et cetera, if we just contextualize Inworld with other LLMs. I assume most listeners out there are fairly familiar with at least one at this point. Let's call it ChatGPT. What are the similarities and differences at this point? How have you approached also some of the classic problems? There's maybe, again, listener thinking, "Yeah. Well, I've used an LLM. I like bits of it. I also have a ton of kind of concerns about other aspects of it. And now that's just coming into my games?" What would you sort of say to that? 

[00:11:21] IP: Developers really want to have the characters to deliver nuanced performances. What that entails, that means that not only you have to provide the dialogue and it has to be coherent with the narrative. But, also, the characters have to be capable of expressing emotions. They have to remember things, right? They have to have some memory. They have to understand what happens around them. They have to make sure that whatever they say is, as I said, not only coherent with the narrative, but they also don't break immersion by saying something wrong and absolutely inappropriate. 

That's actually where the name is coming from, Inworld. We believe that keeping the characters in world is what makes the experiences immersive. And under the hood, if you think about this pieces that, as I said, like memory cognition, emotions, dialogue, all of the different pieces. And at the moment, the technology is such that you can't really model everything using just a large language model. The way how we think about large language models is just a piece of the pipeline. It's not the only part of it. 

To expand more here, I think in more traditional terms, I think you can think about the runtime pipeline Inworld has is that you have the perception system that recognize what the player is talking about. It also includes additional safety checks. It also includes additional checks of what the player has been talking about before. Whether it makes sense to actually use LLM at that moment or not. Because you can imagine with the LLMs being very heavy and greedy for GPU resources, it might be pretty expensive for everyone and for every single query use large language model. That's kind of front-end. 

And then the back-end is, first, you need to gather all of the missing information for the character to be able to respond. That's where the retrieval augment generation comes into play. RAG. Right? And there's a whole separate system that Inworld has created in-house that does knowledge retrial, and memory retrieval and memory updates. 

And only then, once that information being populated in certain format, it comes to the large language model that makes the decision of how exactly to respond back. And that's where the main differentiation is actually. Just having LLM is absolutely not enough. You might be able to prototype certain things. But in order to make the experiences, first of all, real-time, second of all, immersive, meaning that you do not break the - as we call it, fourth wall, which is you know something that people usually refer to as hallucinations. All that has to be built, orchestrated on a pretty large scale. And that's what Inworld does.

And I think, as I said, the easiest way to think about it is a set of different microservices where each and every one of them is responsible for it's certain part of the behavior that the character has to expose.

[00:14:18] GV: And I would encourage anyone to go check out, on the Inworld website, there's a nice demo of the safety features, where you can really watch end-to-end what happens when you've got these on or off. Could you maybe just speak a bit to that? Because, again, I think that it's an area that's probably classic. I say classic LLM. Again, let's just say ChatGPT or something from Anthropic. They're still figuring this one out a little bit. Inworld is very context-specific in the sense it is designed for gaming. At least there's maybe some kind of guide rails there for how you've approached safety. But I'd love to hear a bit more of how that was approached. And sort of is it finished effectively? Or is there more to go with that? 

[00:15:03] NY: There's a lot to unpack here. I'd say, overall, we have our own moderation layer that is similar to gaming standards. You have ESRB ratings and whatnot. At the end of the day, we give these controls to the developers. They're the ones that are responsible for ensuring that the product and the implementation of it is safe. 

Depending on if you're creating a kid's education game versus GTA, your needs might be different there. And the results are different as well. Obviously, there are some topics and engagements that we will always forbid. We don't need to state the obvious there. But beyond that, the level of escalation and how that's handled is very dependent on the client. 

Let's say we could go up to the point where developer recognizes something is unsafe. And maybe we say, "Hey, that's not cool. Don't talk about this." And then if the player repeatedly abuses this, the NBC might just walk away. And that implementation is something that needs to be handled by the client. But Inworld will provide the tools to do so. 

And so, the overall we want it to be similar to a really good deflector. If you go to Disney and you have someone who's trying to incite a representative there to say something unsafe, they're going to do so gracefully, right? And not just going to, "Sorry. I cannot talk about this." And so, those are the tools that we're still constantly evolving. 

People are very clever in terms of how they try to jailbreak experiences without limbs and whatnot. It's kind of a constant red team, blue team style iteration there. But we believe that we're ahead of the curve and we'll continue to evolve the tools around this. 

[00:16:36] GV: Yeah. That makes a lot of sense. It's sort of almost shared responsibility model if you like, yeah, between yourselves and the developers. Exactly. We're going to dive into a little bit more detail on some exact areas, especially around the NPC side. I'd love to just hear, before that, as we've been talking about, this is pretty new territory. What are some of the - you must have hit a ton of challenges to this point. And I think I'd just love to hear a little bit more about what are some of those? And how have they been overcome? 

[00:17:06] NY: Yeah. I can take a first stab at this. Some of the challenges that came to mind is, obviously, first and foremost, just education and adoption. As with any new technology or any new innovation as a whole, there's always going to be resistance to change and adoption. It's like large game studios will want to see a clear proof point. Indie developers will want to see large game studios demonstrate these as well in adoptions before they go on their own. There's kind of this chicken and egg problem. Everyone can agree today that, "Hey, there's tremendous demand and excitement for what AI could bring to games." 

Like we see with even Baldur's Gate, the new type of experience where, really, players feel like their actions are taken into consideration that impacts the gameplay loops. It's incredible. And they're saying you can never really complete Baldur's Gate, because you could go back and replay it. Infinite replayability is a term there, right? And every time you do something different, it's going to react differently. And so, that's like a basic example of something that generative AI could help accelerate or even push the boundaries beyond even further. There are times where even despite the immense number of optionality and scripts that were provided in that game, sometimes a player would say, "Hey, if only I could say this line of dialogue in this environment set, I would love to see what would happen instead." 

But, anyway, back to the root question of just reference use cases, right? A big challenge is well, how do we actually convince studios to prototype with the technology? And so, we ended up having just to do some of it ourselves. With Origins, a technical demo, you'll see, "Hey, we ended up doing that just to educate the gamers and game developers on what might be possible," just as a first standing point there. 

But that is to say another big learning is you can't actually just retrofit games with AI either. A first direction we saw is let's just make mods with Inworld. Wo we had pretty successful mods with Bannerlord, with Skyrim GTA before that one got taken down. But the truth is those experiences are cool. Like, hey, you can go up to the Whiterun guard, ask them about the time they took an arrow to the knee. But does it really make a better game? And my opinion is no, it doesn't. It brings some novelty to it. But it doesn't make the game that much better really, right? 

And the true magic can only happen when a game is designed from the get-go with the AI in mind. Talking NPCs, that's cool. But it's not sufficient. We've got to evolve the actual core gameplay loops. It's what could be possible? Such as, "Hey, a shopkeeper is going to help you get an item because you were kind in another village." And they gave you a discount on this item that will give you a cheat code for the next mission. Or because you were rude to them when you walked in, they're not going to sell you anything at all. And there's this whole social relationship dynamic. That's just one high-level example. But you've really got to design the game from the get-go with AI in mind to find real power. 

And that's, again, the big challenge we have right now, is like, "Well, who's willing to adopt this early into the design phase versus waiting to see what other companies might take that leap of faith and provide a reference use case first?" There's that. There's a whole bunch of other stuff we can dive into. But I'll kind of leave it there for now.

[00:20:25] IP: In addition to that, I wanted to add that, actually, on the technical side, there is so many different things happening that I can dive deep into. And I can give a few interesting examples here that I hope that listeners will be interested in hearing. 

One of the, in my opinion, most important aspects of how would you make the experience as immersive, as real-time? If we think about what real-time is about, it's actually something that people usually refer to as a blink latency. The response of the software that you're interacting with, whether it's game, whether it's any other thing or tool that you're using, supposed to respond to you back faster than you blink, which is 200 or 300-milliseconds. Right? And the way how you make that possible with a ton of different models orchestrated in the cloud is actually a pretty complicated problem. 

First, to unfold it, you have to make sure that communications between your client, which is usually a game engine with an Inworld SDK built into it. And the cloud supposed to be stable, reliable. That's a whole separate set of traditional software engineering challenges of making reliable service and all that. But behind the scenes, there is also a lot of interesting problems happening on the ML side itself. 

To be very specific, large language models themselves are usually requiring a lot of GPUS to run them efficiently. Some of the listeners might be familiar with the open-source large language models from, say, Meta. Llama. 70 billion parameters. That one in particular, for example, requires at least four GPUs of a size 800 to be run effectively. 

By effectively, I usually refer to this first talking latency response. That latency indeed has to happen within that 200 and 300-millisecond time window. People who are talking to characters or seeing them reacting will see that they are doing that almost instantaneously. To make that possible, there is a lot of interesting infrastructure problems. Basically, ML serving problems that you need to solve first before actually even going to building real experiences. 

Getting back to this Llama example, if you take four GPUs, for example, what it can do for you is that it only can handle, concurrently, at most one, or with worst latency, two concurrent sessions. But what if you've got 10 and 100? You properly have to properly distribute the load. Your conversation with the characters might be able to have a lot of redundancy as you can think. 

For example, like a conversation history, if you are processing that over and over again, there is a lot of unnecessary computations happening. And that is something that Inworld has been working on recently is optimizing the serving infrastructure to taking advantage of the way how the client is interacting with our models such that we can provide as fast response as possible. That's just LLMs. 

There is also speech recognition for experiences where the players are inputting information using their microphones. There is also speech synthesis in cases where the characters are talking back, which is basically a major part of most of our experiences. Basically, using text-to-speech at the moment. Actually, if you want to, for example, make a high quality speech synthesis, it's a yet another giant realm of interesting problems that exist there, which we can dive deeper into. 

To summarize kind of, the real timeness is a giant set of classical and non-classical problems that the engineers have to solve. On design time part, while you don't necessarily need to have low-latency. For design time, what is critical is quality. And for quality in terms of traditional machine learning, it's again a whole separate story of how you would make sure that you have the best quality possible. And you can already guess that Inworld might have different tiers of models for different use cases. 

For example, for design time use cases, you might want to use the largest, the best, the latest and greatest. But for real-time experiences, you need to optimize the models to be smaller. Those more cost effective, those faster. And that implies whole bunch of different challenges in terms of how you manage this many models. How you constantly update them given all the traffic patterns that, by the way, by itself constantly changing. And so on and so forth. 

[00:25:10] GV: Very interesting. Super interesting. To contextualize, again, I guess I'm just trying to help be the sort of voice of a listener here and just trying to help sort of analogize some of this maybe with LLMs that they are already familiar with. If we look at something like the latency side and sort of what could play into that and something that's, I say similar, to an LLM that they're used to. Let's take like token size. Do you have that kind of same concept where you talked about different tiers, different model sizes? Is there also that concept of, well, there's a sort of max amount of data you can put in for each response coming back? Or how do you look at that? 

[00:25:51] IP: Yes. The thing that you've been referencing to is what is usually called prompt. Prompt length or context length. And there exist in different types of models supporting different context length. But the context length by itself is not that critical. What is important is that your model has to follow the instructions that the prompt contains and utilize the information that is provided within that prompt as precisely as possible. And it's a separate ML challenge basically. 

What do you have to do for that is not only perhaps you need to create your own data format, input and output data format. But, also, you have to collect the data, the data in order to fine tune your, say, open source model to understand the exact information that you're trying to comprehend. 

If you're talking about the prompt itself, it really various from experience to experience. If your character has a lot of knowledge about the world that it's placed in, usually, that means that you almost certainly have to put a lot of information into the prompt so it will be able to actually answer right questions. Especially in cases where the players are asking wide and broad questions, which are barely possible to grasp what exact piece of information might be useful. What you usually end up doing is kind of putting everything into the context, into the prompt and trying to hope that it's going to be properly picked up. 

On the other hand, if you have a very specific question from the user, your RAG pipeline might identify, "Okay, that is exactly what is missing." And, for example, it might be as specific question as what is your hometown or something like that? And retrieval pipeline will return you just a string saying, "Okay, the hometown is blah-blah-blah." And that usually is just one English sentence. 

That means that the prompt length varies. And you can't really hope that it's going to be fixed or something. That means that your system is supposed to be able to work with different types of prompts, different lens. To summarize that, if you want better experience, you have to deal with longer prompts. That by itself implies higher latencies. That means that you have to figure out, "Okay, how do I optimize for that?" And that's where these different tiers are actually coming from. There's different models. Different models can process different context length. And they have their own, as we call them, throughput latency characteristics, which are suitable for different types of experiences. 

And kind of to circle back a little bit to the Inworld core offering, that's where exactly Inworld helping the customers to identify what is the most suitable thing for their experience that they built? For example, somebody might don't even need speech recognition and speech synthesis at all. That means that we can put more emphasis onto the LLM itself. 

On the other hand, somebody might really need only speech-to-speech experience. That means that the speech recognition and LLMS together have to be as fast as just LLM in the text only experience. And that's where this customization comes into play. And that's what we provide as an offering to the customers who don't really know how to do that exactly. We help them to figure it out.

[00:29:10] GV: Yeah. Wow. That's very cool. Yeah, the quality versus latency piece, I find that fascinating. I really value the quality piece. Let's just take a game that I love, like Red Dead 2. And let's just say to get the quality of those NPCs, I'd be super curious what then we be looking at from a latency perspective? I mean, I know this is slightly on the spot. But if we take an example like that where that gets a very high critical acclaim for how immersive it felt with the NPCs and the script, Et cetera, is that something that you think is achievable today through Inworld? Or where are we kind of on that journey? Yeah. 

[00:29:51] NY: Without giving you specific millisecond latencies here, because I think it depends again on the implementation. Certainly, it is possible. You see use cases using the Inworld's full stack. Highest level of intelligence still responding in a decent amount of time. There's a lot of clever ways to kind of go about it. Even if you think about the environment we're in, we're recording this podcast, there's actually that natural delay before I started speaking. Because I wasn't sure if Igor was going to speak. I'm still waiting to see if you fully completed your thoughts or whatnot. And there's that same type of cognition that's present in the Inworld stack. Just detecting, "Okay, it my turn to speak. Did the other person who's speaking finished what they were saying?" 

And so, obviously, if you go text-to-text, that's the fastest area of latency. We don't have this, I guess, processing decision of, "Okay, is it ready for me to speak now? As soon as I get a text back, I can fire out the text back." And so, there's a lot of pre-processing that can be done there. Again, maybe going too far into the technical weeds. But short answer is, certainly, that latency is possible. There's going to be some work on the client's side to manage the logic behind this. When exactly do we ping the Inworld APIs? What do we pre-process based off of the context that's already given to get the models a headstart and whatnot? Yeah. I'll leave it at that.

[00:31:06] GV: Let's dive maybe more on the quality side. And we're still talking quite a lot here about NPCs. Non-playable characters. Just we have a lot pretty diverse listener base on the tech side. I just want to make sure we haven't skipped over that acronym for people. These are characters that you don't play as the player that you're interacting with, these characters in whichever world you're in, in whichever game you're in. In your view, what sets AI-generated NPCs apart from traditionally-created ones? 

On the traditional side, you have a lot of human creativity and talent coming into these characters. And I noticed some stats on the Inworld site saying how AI-generated can be more immersive, more fun. How do you measure that? And how do you sort of look at that? 

[00:31:58] GV: Great question. Fundamentally, when people think of NPCs, they instinctively intuitively think primitive. Non-intelligent. Right? You see the TikTok trends where real humans pretend to be NPCs. And they're very robotic and dumb. And it's because, traditionally, in games, they are rather primitive. There's maybe like a set line that they always say regardless of how you interact with them. They're just going to T-pose and stand there. And that's not realistic. 

When you go into, let's say, a cafe in real life, you're not expecting there to be just unintelligent T-pose robots as a barista there. They need to be aware of your reactions impact them. And there's consequence mechanisms there. As well as perhaps benefits that you can derive from that. 

And so, one area I'd like to draw inspiration from is LARK, live action role-play. This exists today. And even in games like DND-style roleplay games, the level of intelligence and outcomes you can get as a player are way more immersive than you get from like a scripted NPC sequence in a traditional RPG. 

And so, what we're looking to is basically reimagine the world. And we can touch on some of the potential use cases here. But if every NPC were intelligent, what are the experiences that might be possible now? And I think that's kind of like the vision that we're still crafting. One thing that you said earlier when you're defining what an NPC is as a non-playable character actually sparked this thought I had, was the reason I was actually going to say, "Okay. Well, technically, what if all characters in a game were playable with real people?" What are the experiences that would be possible? 

But the reason that I thought that wasn't a good idea is because nobody wants to play the role of an NPC in a game, because it's not fun. Imagine you as a player, okay, you're just a shopkeeper and you're going to sell items of whoever walks in the door. That's not fun. You want to be the hero. You want to be the protagonist. You want to be the gamer. 

But if you did have real-life gamers, what would be possible? That's the experiences that we started talking about a bit earlier. I don't know why I'm hung up on this shopkeeper idea. But staying within that theme is now you get this nuance. You come in the door, shopkeeper greets you. You're either a level one peasant, let's say, or you're a level 99 mage, they can react differently based off of that. They maybe know you're rich. And maybe they're going to hike up their prices because they have their own agenda. The shopkeeper now wants to feed their own family. And so, they're going to try to bargain and negotiate with you. Do a better sales pitch. And it's really all this like real intelligence that you could bring into an experience to up-level it. 

And however a developer wants to leverage these tools to create these novel gameplay mechanics are up to them. We have some hypothesis. But, truthfully, we're still ideating on what really is possible now. And a good way to think about it is drawing inspiration from LARK. Just imagine if all characters were human intelligent, what could you do like from a gaming perspective? If you had tens of thousands of human intelligent AI that can help run a game, what is the crazy experience that you would generate? And that's what we're looking to pursue.

[00:35:07] GV: Yeah. I think that's very good context. I hadn't really maybe thought about it that way either in the sense of the volume of NPCs you ideally may want to put in a game. But at the end of the day, you can't realistically produce enough content effectively that makes each of them feel individual or interesting enough. Do you have 100 NPCs that kind of all say the same thing even if that quality of thing is quite high? Or do you actually say, "No. We're just going to focus on two or three. But then does the player feel that there aren't many people around?" Sidebar, I'd love to be a shopkeeper, by the way. I don't know any game developers out there that want to create a game where I could be the shopkeeper or the bartender. Yeah, I think I see that as a bit of a cozy gaming thing maybe.

[00:35:49] NY: I love that. And just to clarify, it's not just about the scale of the number of NPCs. We see compelling use cases defined from even like a core companion character. Let's say, a classic Cortana-type use case. If you have like an AI-driven companion who can can be your assistant guide you through the game, maybe as you're fighting a novel boss, they could remind you, "Hey, you tried this mechanic or this sequence of attacks last time. Are you sure you want to do this again? Because it didn't turn out so well last time. There's a lot of deep interactions that are now possible even within just like a single character or two. 

One thought that comes to my mind is we even heard of like an AI-driven player character where, instead of you being the player directly, you control an NPC. They are now able to generate dialogue that's AI-assisted. But you're as like the director of this player that can commit those actions. 

And so, we've had a lot of these discussions as we're moving more into like console-type discussions. Or sometimes players don't want to talk. A big realization working in VR in gaming was, "Yeah, hyper-realistic fidelity is great. Running around is great. But sometimes people just want to sit down and tootle their thumbs on a controller and sit on the couch." They don't want the full-scale immersion to run around. 

And the same is true for AI, too. It's like this full-scale version is certainly amazing and incredible. And a lot of the high-end use cases will revolve around that. But even for more casual environments, this technology has a lot of great potential. And so, the notion of being a director and controlling an AI character as like a player character is an interesting concept to toy with. The opportunities are really endless. But, yeah, those are very different angles that can be approached by the developers. 

[00:37:28] IP: And I just wanted to kind of add a few angles, but you can look at this problem. One practical angle of how to actually make it possible for you to create a character that is capable of talking like a real shopkeeper. Let's stick to this example, because everyone seems to be in love with that. What you need to do at the current moment as a developer is to work together with our studio and provide dialog examples, which you envision. Probably, I can just outline the workflow, so you can kind of get a better sense of what's going on there. 

Once you have an idea of what the character is supposed to do, right? So, you go to our studio and provide details about what the character personality actually is right there. Character traits. How chatty they are? How arrogant they could be? Et cetera, et cetera. Then the next step is that once that initial phase of developing kind of skeleton personality is there, you start talking to it right there in the studio. And you can feel what the dialog style of that character that you've just drafted is. 

Then the next step for you is actually to start narrow it down. You can think about it as like an actor's casting for a movie, right? You might have different people playing that same role. And you are trying to choose the one who is the best in role. And that's what our chat functionality currently provides basically. What you're doing is that you are talking to the character. You're providing the feedback. Basically, life feedback. Which responses you liked? Which one you didn't? If you didn't, what was the reason you didn't like it? You can, at the moment, provide back to Inworld information about what has happened wrong. 

For example, you can say, "Oh, it was hallucination." Or, "Oh, that was just too long of a response." Or, "That was too short." Or, "That was absolutely inappropriate in terms of context." For example, if you are, I don't know, a medieval times shopkeeper and you're asking about some kind of, I don't know, laser guns, of course, it's supposed to be something like, "Oh, what are you talking about?" And that's where the feedback comes into play. 

The idea is that once you provide enough quantity of this - let's put it this way, thumbs up and thumbs down, what Inworld does is it optimize the models to make sure that the responses that you've provided followed exactly. And then the third stage for you is basically to place your character in real experience and let it be played by players such that it can gather even more feedback of how exactly the players talk to this NPC. What are the missing covered cases? 

And then, as a developer, you get back to a studio, analyze that feedback and optimize the character to do exactly what it's supposed to. It's basically three stages. And I can dive more into how exactly everything work. But, overall, you should think about it as a staged development. It's not really that you kind of jump in and everything is going to be ready. We're trying to do our best to make sure that you have everything in place in terms of tooling to assist you to get there. But you should that it's still a process. And it takes a lot of iterations of how you would make your NBC to do exactly what it's needed to.

[00:40:43] GV: Yeah. Something you touched on there, the medieval shopkeeper and laser guns example. I guess that put into context for me, that, again, if we just look at a classic LLM for a second, we're quite used to sort of conversing with this LLM very much in the present tense, the present day, the present tense, "Hi. I need to do this thing." And it speaks into days. But in these contexts, we're actually talking you could be any time anywhere, but it has to come out as feeling present. And does that sort of add an extra layer of challenge to this? 

[00:41:15] IP: It does. It actually does create a lot of challenges. First of all, we need to kind of step many, many, many steps back and think about what the LLMs are actually about. Large language models is just basically in 99% of the cases when people are talking about them these days. What they refer to is a transformer. It's a neural net that is trained to basically predict the next word in a sentence. There are as an input given a sequence of words or usually referred to as tokens, which are usually more granular Inworld actually. 

And then what it's basically trained to do in a so-called unsupervised way is to predict the next token that should follow that sentence. And the way how it's being trained these days when you hear open-source models being pre-trained by, say, Meta or, I don't know, Mistral, what usually happens is that you download the internet to put it simply. You clean that text data from whatever content you don't want it to be. And then you literally just a few times go through this internet dump and train your neural network to predict next token for every single window that you can slice your data set onto. Billions and billions of tokens. Right? 

And you can think that internet contains all of the necessary information about being able to talk like if you were a medieval shopkeeper. But the truth is that it's not true. Internet isn't that old. Right? And what that means in practice is that if you just take an open source LLM - getting back to your question about just using LLM for your use cases, what you really have to do is you have to find the data whose distribution is basically covering the behaviors of that medieval shopkeeper that you need to follow. And that's actually a giant challenge. 

At the moment, the way how you would usually overcome it is not just getting that data, but also trying to - if you want to bootstrap a certain experience, you usually start not really from kind of pre-training a giant separate LLM for a single character, but you're rather trying to steer your existing large language model with the feedback loop process that I've just described, in which you are talking to it. Providing the feedback. Explaining what was working? What wasn't? And what was? And hope that it's going to be working more or less fine. 

And only the next step for you would be, okay, if you understand that it's on par with your quality bar, okay, great, you can just go ahead and use that. But if you find that it's not really working well, what Inworld can do for you is we can fine-tune the custom model for your specific use case and for the medieval kind of shopkeeper use case. That's what entails Inworld finding the corresponding data, whether it's going to be books or something else. To fine-tune the model. To speak the language, right? To understand the context. To not be able to answer what the laser guns are, et cetera. Et cetera. It's basically a giant challenge indeed. And I think the technology is just starting to evolve in this direction of how would you contextualize large language models to certain times? And that's I think pretty specific to gaming actually. I can't really imagine easily a use case like that in other places. Because, usually, when you're talking to LLMs, on the contrary, you want the freshest and greatest. If you're talking to, say, ChatGPT, what you want is that you want that assistant to be aware of yesterday news. But with Inworld is actually not only that, but everything. 

And just to kind of close the loop here, in addition to getting back in time, what is interesting is that if you want to create an experience about the imaginary world, which doesn't exist or doesn't have the physics that we all used to, that is a separate challenge as well. , for example, what if the time flies not as 24 hours in a day, but rather 1 hour in a day? And there is 10 days in a year or something like that? That is very complicated. 

For that, you currently have to really have a separate large language model that was specifically trained to kind of live in that new world. Because if you just were to take and open source model that's going to be just not able to understand you, it's going to keep hallucinating and doing some weird things. Yeah, it's an incredibly interesting challenge. And I hope that, with time, the models will be more capable and it's going to become possible to tailor them easier than now. But at the moment, yes, you have to take a lot of different steps to make them suitable.

[00:45:55] GV: Yeah. That's fascinating. I'm just now imagining, yeah, if you just turn the whole model of physics on its head for your imaginary world, and then, "What is gravity?" "Throw me that ball." "What do you mean throw me that ball? That ball was going to go up." Yeah. Whole other rabbit hole there. Yeah, we could probably just go on this for a long time. I'm going to have to move it along a little bit. 

But just a slightly different side of this, which is player profiles. There's also this concept that, with this technology, the NPC can actually know more about the player. And as that player evolves, whether as a real person or what they're doing in the game, for example. Can you maybe speak to just a bit about how that sort of player profile aspect is being sort of weaved into how Inworld operates? 

[00:46:42] NY: For the listeners, player profiles is a specific feature within the Inworld platform where we pass an information about the player entity, so characters can personalize their actions or dialogue. And it actually folds within a larger subset of features, which is contextual awareness as a whole. And so, we can all agree that in order for immersive intelligent actions, characters needs to be aware of the context and the environment that they're in. Whether that's information about the specific player they're engaging with or that's about the room they're in, the environment they're in. How they're currently feeling? What type of day it is? These things all go into kind of like a perception cognition pre-processing layer before we generate the dialogue and the non-dialogue actions just to make sure they're appropriate. 

Specifically, the player profile, a simple use case I gave earlier was, let's say, you are just like a standard villager NPC and you are now approached by a player who's dressed very well. They look like a very powerful mage-type being. Maybe you're like, "Oh, I don't want to offend this person. I want to make sure that I'm demonstrating respect, so they don't torch me alive." 

Versus if someone's like a beggar-type player, they walk up like, "Okay, maybe now you don't even want to talk to them, because you think you're better than the player or whatnot." That's just like a basic example of how the dialogue could change based off of the player entity. And it goes the same way with other characters as well. 

As we get into conversations of character-to-character interactions, the awareness of who they're speaking with obviously will ground how they respond. And, humans, we do this in real-life as well. But beyond just the player information, it's also like the context. 

And I want to take a moment to discuss how this impacts not just the dialogue variants, but the actions. Let's say you are - again, sticking with a shopkeeper example. And there's a whole slew of items you want to sell as a shopkeeper. And if a warrior walks in your door, versus a mage, versus, I don't know, some other type of player character that you can really tell based off of know the clothing, the accessories, whatnot, you would be incentivized to sell different items for them, right? Which might give you a better chance of landing the right items. 

And so, this is all built on the notion that we're assuming these characters are agentic, right? They serve a purpose in their own world. They have their own objectives. Obviously, at the end of the day, is to help push the player progression forward. But they need to have their own motivations and objectives in order to have this sense of realism. They're not just waiting for you as a player to interact with them. And, otherwise, just T-posing, not doing anything. But they're doing their own life. They're going about actions as well. 

And so, all I have to say is the available actions to NPC character that's powered by Inworld might also depend on the context of not just who the player is, but also of where they are. Maybe like they're out in battle. Now the NPC has a subset of available actions that make sense in that context. But if they're back at home, the available actions would be different. It's kind of just how we orchestrate character dialogue, character actions through contextual awareness and state. And that can be done through a number of different ways through our goals and action system, mutations to the character. It's a contextual knowledge, contextual actions. It's very similar again to how you would switch off like, let's say, work mode, home mode. But what are the cues and triggers that change that? Well, part of it is the environment you're in. Part of it is who you're interacting with at that given moment.

[00:50:22] IP: Yeah. Just to dive in even deeper in this example that Nathan gave, I think it's really fascinating to see what kind of new mechanics AI actually can unlock. Just to be very, very specific, right now, when we are talking about contextuality, what we are talking about is that there is the client, which is the game engine, right? And there is a server, which is either in the cloud, or own device, or hybrid, which is Inworld. And what this client and server do, they actually communicate through the - pass back and forth the snapshot of the game state. 

The game state is a concept that we usually use internally to describe this encompassing larger thing. Describing in the scene. What the player has been doing in that scene so far? What kind of sequence of events have been happening so far? What the NPC has have been doing in that exact same sequence of events in the past? What the surrounding world is? Where they are located physically on the scene. What they see? What they don't? What they have? What they wear? What they wield? What they carry? And so on and so forth. 

And what happens is that Inworld, given that snapshot of the game state, actually can predict a sequence of actions that the character is supposed to take, right? And just to kind of linger on that now beloved example of a shopkeeper, you can imagine that, depending on the player's profile and the character traits like being arrogant, the shopkeeper can actually ignore the player when they first approach them, right? It's going to be pretty fun of experience. You can imagine right now, when you play games, usually what happens is that you kind of - if you go ahead and replay that same level, usually, it will pretty much be felt similarly. But what if the context what you have been doing so far and the scene? Maybe you've been doing some weird things and the shopkeeper is starting to argue with you. 

But what I was trying to say is that, like, "Okay, if you really want to implement contextual-based, context-aware, and player-profile-aware action planning on the NPC site, you really have to take everything into account. And, again, as I said, with this example of a player approach and the shopkeeper, what will happen next is that the NPC can predict the sequence of action to perform, which is Inworld server, which then sends this sequence of action on the client side to be, for example, played. That could be animations. It could be sound effects. It could be dialog things to be played back through the speakers, et cetera. But what the NPC can return is nothing, which will signal to the client that the NPC just ignores the player. 

On the other hand, the Inworld ML can decide, "Okay, send just one single action," which is look at the player. And the animation of the merchant will just be played like they are looking at the player and ignore them, right? 

Another option could be, in addition to just look at them, saying something rude. Right? What do you need from me? And there is so many different mechanics that been unlocked given the AI to decide what exactly have to happen. That's where this intelligence is actually making the gameplay more immersive, much more interesting. It's actually this randomness that brings the value, because you never know what's going to happen, like in life. 

[00:53:42] GV: Yeah. I think, again, that's helped contextualize it a lot. As you call it, the randomness, the unpredictability. That is I think what a lot of people crave in games, whether they realize it or not. They perhaps like to replay games. But do they want to see exactly the same thing again? No. They hope there's going to be some maybe nuanced or changed. 

We've only got a limited time today, unfortunately, but we're going to have to move. We're going to move a little bit away from the mechanics now. But speaking of unpredictability and communities, we can't really - if you work in gaming, the community in gaming, it goes hand-in-hand with the players and the studios. Yeah, it's always an interesting relationship. And I would say of late, there's sometimes been very vocal communities that almost sort of are able to steer the studio in certain directions. These big franchises, for example. 

Look, communities and gaming, they can be quite vocal, quite passionate. Although you're not a studio yourselves, you must be having to get feedback from gamers themselves. And like how are you doing that? And can you share any examples sort of how the community input has influenced like how you're approaching things? 

[00:54:52] NY: At the end of the day, we're all looking to generate value for players, for the gamers. Even though our primary customer would be game studios as we're like a B2B tool, game studios want to make better games. And it's a better game if it better serves the player. It's all for the player experience at the end of the day. 

I'd say from our side, there's a whole number of ways we've been able to receive feedback. Both positive and critical, right? Let's say we have a pretty active Discord Community where game developers and, oftentimes, gamers themselves as well will contribute ideas, feedback on where it'd be crazy cool if X, Y, Z were possible. Or, "Hey, this feature needs and is currently lacking X, Y, Z aspect." And so, we lean very heavily into that. 

We've also seen a lot of feedback just come in because, truthfully, this is the hype these days. People and gamers are all very excited about what's possible. There's a lot of skepticism. We'll see just reviews on existing technical demos that are out there through the different platforms, whether that's Steam reviews, YouTube comments, whatnot. And those are very helpful. 

One specific example is a feedback saying, "Hey, is this really making better games? Back to what I was saying earlier, when we're just adding a mod into a game where you can talk to a character and the dialogue is unscripted. It's generative. The overwhelming sentiment is cool. But who cares? Right? 

And so, that's like a huge lightbulb moment. It's like, "Okay, fair point. Of course, I don't think it makes a better game. It's novel. Right? But that's force-fitting a technology where there's no need for. And so, we now understand the importance of, "Okay, let's focus on what could be really novel is beyond dialogue." The true novel AI actions that are generated. 

And so, a lot of our new features are revolving around those. Our goals 2.0 was like a first step. And this does tie back into the previous topic a little bit, which I think is important to emphasize. It's not really the case that we're looking for games to be fully procedural. You never get the same outcome every time you play through. Sure, there's one game design where you can kind of have that experience. But really great games, there's these key beats that have to be hit. There needs to be a, "Luke, I am your father," moment, like in Star Wars. And that's like the magic of the Inworld platform is we allow the developers to specify the level of control that they get over the characters. 

And this was some feedback that came in from the community is, "Hey, we want to have more precise control over the character." But we still obviously want to lean into the strengths of generative AI to get the natural variance where it is appropriate. And so, through our goals 2.0 release, we supported a lot more dynamic use cases to character mutations. You can change a character. But that's still based off of how the designer scripted it, right? 

But between maybe like these two key beats in the game, there's many different arcs and variants that the players can progress from point A to point B. But the characters are going to push for point B to happen no matter what because that needs to happen for the game play to be a good game experience as the writers and designer specified. 

Back to the original question, we're always on the lookout for connecting with gamers, gamer feedback. At the end of the day, if we're not making better games, we're not doing our job right. And so, obviously, there's a lot of creative ideation that still needs to happen in terms of, "Okay, what are these gameplay mechanics? What are these possibilities?" But, yeah, we're very open and receptive to the feedback. 

And the critical feedback is very helpful, too, just because now we know where we need to focus on more. And, obviously, the overwhelming positive feedback we've seen on some of the even the early technical demos has been very inspiring and confirms that we're on the right path here. 

[00:58:38] GV: And looking just in industry-wide as well, like collaboration-wise, I think I read or saw you do have some partnerships that have been announced sort of recently. Could you just speak a bit to those? Because I think, equally, yeah, gaming is quite a specific space. There's quite a lot of interesting names there. It sounds like you've been working with a couple of them. Could you maybe just speak a bit to that? 

[00:59:00] NY: Yeah. First and foremost, if you as a listener work for large gaming studio, we love to get in touch. There is this chicken and egg problem we mentioned earlier of readiness for adoption and investment into early-stage technology. We've seen a lot of the excitement there. But we're looking to push the boundaries. 

The ones that we can publicly disclose, we actually just did some several large announcements at GDC. Namely with Ubisoft on the Neo NPC's launch with Microsoft on the Narrative Graph. [inaudible 00:59:28] Aurora demo. And then with NVIDIA as not really a game studio. I guess Microsoft isn't either. Although they're the parent entity of many, NVIDIA on a technical demo called Covert Protocol. 

And so, at the end of the day, we're obviously working primarily with large game studios to create these novel next-generation experiences. But there's a lot of technology providers such as Microsoft or NVIDIA that we partner with too on just ensuring the technical stack is designed in the most effective way. 

With Microsoft, evolving our design time tools in the studio. NVIDIA partnering with their ACE microservices, such as Audio2Face, which are technologies that are needed to bring these full 3D immersive experiences to life. Yeah. Happy to dive deeper into any of those three in particular. But, yeah, those are the big ones that we've publicly announced so far. 

[01:00:18] GV: Yeah. Very exciting. We're probably not going to have time, unfortunately, to dive into those more. But I can maybe already predict, we're probably going to have another episode with you guys in New Years' time and we can cover more of that. 

Speaking of the future, anything you can just talk about in terms of the next 6 to 12 months, like where anything exciting you can talk about? Inworld's sort of looking ahead to? 

[01:00:40] NY: Sure. I can actually give a few interesting insights about what we anticipate to happen, is that the industry is moving really fast, especially in the ML. Basically, the amount of people working in the area grows exponentially and thereby making the research moving exponentially faster. And what it entails is that the models are going to get cheaper. The hardware is going to get better. The capabilities are going to get better. And that means that we, as a software provider, we're going to provide better services for our developers. 

Most of the new things which are going to come out in next 6, 12 months are going to be around that realm of better capabilities, better usability of these tools that Nathan just mentioned that we really, really want to focus on usability of our tools to make sure that whoever is working, whether it's an industry or a large enterprise AAA game studio team, we really want everyone to be able to succeed. 

And kind of to give even more news, I think there's also going to be some announcement in realm of having separate APIs for ones who want to build something separately. For example, for voices, if somebody doesn't really need at the moment to get the fully-fledged experience but they just want to start with [inaudible 01:02:08], they can start from voices. And, yeah, stay tuned for announcements. 

[01:02:13] GV: Yeah. Exciting. Exciting stuff. And, yes, we are recording in April. You might be listening to this maybe in May. But there's, yeah, clearly some exciting things coming your way. Yeah, if you're a developer or a studio, where's the best place to start your journey within Inworld.

[01:02:29] GV: Yeah. First and foremost, check out the tools yourselves. You can sign up for an account right at studio.inworld.ai and get started right away with creating characters, gameplay flows, and iterating in it. And if you're interested in deeper partnership and collaboration, I love to hear. My team specifically actively engages with Studios to make sure that we can bring together the best of both worlds. And in the next 6 to 12 months, really push to launch some of these experiences out to gamers live. Reach out to partnerships@inworld.ai. And we'd love to hear from you. Get in touch. 

[01:03:01] GV: Awesome. Well, this has been a super fascinating conversation. I hope we do get speak again in the future. And I'd love to revisit our shopkeeper example and see how our shopkeeper is evolving. Yeah, thanks so much for making the time. And can't wait to follow along and see what happens. 

[01:03:18] NY: And we're super excited, too. Thanks for having us, Gregor.

[01:03:21] IP: Thanks for having us.

[END]