EPISODE 1737

[INTRO]

[0:00:00] ANNOUNCER: Feature flagging tools have grown in popularity as a way to decouple releases and deployment, but they can introduce their own long-term problems and tech debt. Lekko is a startup democratizing the practice of dynamic configuration. Their motivating idea is to empower engineers to focus on software releases and business teams and other stakeholders to shape deployment.

Konrad Niemiecis the founder and CEO at Lekko. He previously worked at Uber, where an internal tool called Flipper enable dynamic configuration management, and which today serves as a key design inspiration for Lekko. Konrad joins the show with Sean Falconer to talk about his company and the technology they're developing.

This episode is hosted by Sean Falconer. Check the show notes for more information on Sean's work and where to find him.

[EPISODE]

[0:00:56] SF: Konrad, welcome the show.

[0:00:57] KN: Thank you. Really excited to be here.

[0:01:00] SF: Yes. Thanks so much for being here. I'm excited to get into this. So, one of the places I wanted to start was around essentially feature flags and feature flighting tools at scale. It doesn't take long, I think, for feature flags to kind of get out of control and for teams to not know even what certain flags do anymore.

Based on some of your experience, I know you worked previously at Uber and now you're building a company sort of related to this space. What are some of the problems that you've seen teams run into as these tools are started to be used at a scale and people are trying to leverage feature flags?

[0:01:35] KN: Yes, definitely. As you're saying, initially, feature flags come up as a really great way to release things to a subset of users and for engineering safety. But very quickly you run into some of the problems that you described around bookkeeping, around making sure that you know what feature flags do what. The biggest culprit I think in general is the tech debt and still feature flags that are associated with it being really easy to add something and really difficult to clean up. Often, it's much easier to leave something in the code and think, "Oh, I might need this later. This might be a kill switch." But as a result, you end up with these combinatorial factors.

You have 10 flags in one area of the code base that turns into potentially over a thousand potential variations of the code that it could run through and that will lead to unexpected bugs if you put something in the incorrect place.

[0:02:25] SF: In some ways, we've become a victim of our own success with making something useful that's simple. It's just like if it's simple and useful to use, then it gets overused and basically undoing that process is more difficult. And then it turns into this kind of like bookkeeping process that you're talking about.

[0:02:41] KN: Yes, exactly. Another area that I think teams find a lot of issues is when it comes to the risk. Initially, feature flagging is used as an engineering safety aspect. But then what you quickly find is you start, especially with these stale flags around, if you end up turning one on that wasn't expected to, you could refer to it as a zombie flag, kind of comes back to life and ends up hurting you. But you also might end up just flipping flags with unintended consequences, whether it comes to a stale code path or just something that wasn't necessarily meant to be turned on or turned on all the way.

Especially, as it comes to the control and releasing control to other teams, that's one thing we saw a big thing of at Uber was a lot of the reason that we use these kind of feature flagging and configuration systems is because we wanted to delegate control to other teams in the business. That's how our ops team was able to localize Uber at this point over 10,000 cities. With that kind of potential power comes a lot of risk. Some of the biggest issues and outages at Uber happened because of a feature flag change. It just shows every minute Uber's down, millions of millions of dollars of revenue and are potentially lost. So, just goes to show like how potentially dangerous things can get.

[0:03:56] SF: Yes, absolutely. I think it's a difficult thing to even understand, like, is something actually stale or not? Because if you need to meet certain criteria for that flag to trigger, then something even like doing some sort of dynamic analysis tool. If you don't have that right configuration or that configuration is not realistic, you might not actually be able to tell whether that code path gets executed ever or not, other than maybe mining through some sort of logs or something like that. It becomes a real nightmare and there's always the fear that if you remove it, then you're going to take down the system or take down parts of the system for some individual.

[0:04:31] KN: Yes, exactly as you're saying. Stale analysis is actually non-trivial. This was actually a big business need for Uber at one point. There's an open-source library called Piranha that came out of Uber that was trying to solve this exact problem, mainly for bundle size. Especially on Android, this is very specific to Uber, but because bundle size is so important for install rates, it was a huge imperative to actually remove stale flags from the code, not for tech debt, but purely to get the bundle size, the size of the code down for what was downloaded in every single app. So, there was a big effort to actually remove stale code branches. But it was really, really difficult to do.

What they ended up doing was literally just opening pull requests for specific things and assigning you to the engineer and being like, "Hey, you need to remove this." And it would get most of it right. But as you're saying, like often you're not exactly sure what you should remove and what you can't and that's why we think being tightly coupled to the code, which we can kind of get to the story of Lekko in a moment. But we also feel like being tightly coupled to the code taking advantage of what you're mentioning, kind of static analysis tools, because dynamic analysis is often pretty difficult. If you can take advantage of static analysis tools, then you can have a better job of understanding what the results of different code paths can be once you're embedded actually in the code.

[0:05:48] SF: Yes. I mean, the thing that you're talking about with the Android package size, I think it makes a lot of sense, especially for like, a global company like Uber, where you're going to be having to cater to all kinds of different device sizes and also different connections to the internet and stuff like that. So, having like a massive package that you have to download in a place where maybe your connectivity is not great and the size of your phone is small becomes a huge problem. That's going to be the case for any sort of large company that has like a global audience where you need to do these types of optimizations that maybe you don't need to do in other circumstances.

Now, back to some of the stuff with Uber or even other companies that are operating at scale. What are these companies doing differently to manage some of these problems when it comes to future flags at scale?

[0:06:33] KN: Yes. So, often what we've seen is at a large scale, these companies, like as you mentioned, your Google, Uber, Facebook, all build very similar systems that we like to refer to as dynamic configuration systems. So, they have a bunch of markers that mean they're slightly different than your average feature flagging tool. They are all source control-based. They all run CI tests. They all make sure that things are Canary'd or have some sort of like progressive delivery built into them that isn't specifically manual. And they all support a variety of types and complex permissioning.

So, all these things kind of all build up to address some of those issues that you talked about, as well as like automated removal and things like that. But the biggest thing is because they're source controlled and run through CI, especially when it comes to Facebook, if you look at the configurator paper, literally code, like it's Python code in the case of a Facebook system, because it's embedded into the code, you now gain all those advantages. Essentially, by treating configuration as code, try to avoid a lot of those potential pitfalls of feature flagging.

[0:07:42] SF: I want to get into that. What do you mean by configuration as code? Maybe we can take a specific example. I know that Uber developed like Flipper, their internal system, and I think all the like hyperscalers end up building something to do more than just feature flagging. As a developer, how am I actually using that system to sort of turn on and off things and also control dynamically the end-user experience?

[0:08:05] KN: Yes. So, by using configuration as code, an engineer is already familiar with a lot of the different tools, and it's integrated into the same software engineering principle workflows. You're still running through CI, you're still making sure you get code approval, and because it's intimately connected with the code you're running, that's much less likely to have the separate source of truth that is more likely to cause issues.

As an engineer, it becomes a lot more familiar to use. It becomes integrated as a part of your workflow. That in general will mean higher usability of the system, which as we talked about has some downsides. So now, you need to make it significantly easier and safer to remove things. But it's fully integrated into the software development lifecycle, so you don't have untested code slipping to customers. You avoid these untested code paths and you're able to more cleanly kind of treat configuration as code. Because those two things are together, you don't really have as many of the risks and issues that we talked about before.

[0:09:04] SF: What's the typical journey for most companies when it comes to feature flagging and maybe even growing into more of this sort of like dynamic configuration? Do most companies start building something simple in-house, and then perhaps, like, get to a place where things are unscalable with whatever they built. They go to a vendor, but then that becomes unscalable, and then they go back to basically, like building the same sort of architecture independently that Uber and Google and Meta all sort of stumbled onto?

[0:09:35] KN: So, there's a there's a large publicly traded company that went through the exact life cycle that you mentioned. They started off with something simple in-house, but they decided, "Oh, we don't want to build this ourselves. Let's go adopt one of the leading tools on the market." So, they adopt a tool like a LaunchDarkly or a Split, and they use one of those tools for many years. But over time, it causes a lot of issues.

The main issue for this company specifically was the fact that product managers were causing SEVs. They were causing large outages because they were flipping the wrong flag or they weren't understanding some combination of interactions between flags and causing real issues. So, they ended up actually ripping out this leading feature flagging tool and ended up replacing it with initially like a hackathon project, like an S3-based dynamic configuration system, sort of Git, but then they needed all the fully featured aspects in order to satisfy all the use cases internally.

Now, they staff a few staff engineers, platform engineers, on the system. They end up rebuilding the wheel. This is kind of the worst case, but not uncommon case scenario of essentially going through the life cycle of trying feature flagging, running into all the issues that happen as it scales, having to replace it with an internal solution and then rebuilding it.

[0:10:48] SF: Yes. I feel like there's a lot of software that kind of follows that similar path is like you start out with something custom because like maybe you feel like your requirements are not that complicated, we'll just build something throw it together. Then that becomes unscalable at some point. You go to some vendor or managed service that you adopt. But then if you're successful anyway, wildly successful like an Uber or Google or something, you reach a point where they're no longer scalable and then you have to come up with some sort of bespoke solutions. It sounds like, to me, there's a fundamental difference between sort of feature flagging and what you're talking about, which is, this dynamic configuration. Maybe you can - I know you've kind of alluded to some of those differences in terms of - it sounds like with dynamic configuration, you're really putting control back fully into the software development lifecycle versus being able to have other people within the organization potentially be able to control some of the flipping and flopping of the flags themselves.

[0:11:44] KN: So, I think it's a great question. I'll actually maybe correct something in your question, which is a good thing to talk about. I think the way we think about dynamic configuration is a lot of people misuse feature flagging tools. Feature flagging itself, if we want to define it exactly, is the process of progressive delivery. You are rolling out a feature. you want it to first go to some subset of users, maybe it's beta testers, maybe it's internal users only, then maybe a few customers to make sure that we're building the right thing. Maybe we have to go back, roll it back, maybe tweak some stuff, roll it fully out, and then we intend to have it fully deployed, and then we remove that feature flag from the code. That's the definition of feature flagging, which we can refer to as like a temporary flag or a progressive delivery flag.

If you look at anything else that's not temporary, maybe you, for example, roll out a feature and you now want it to only be on but for enterprise users, or you only want it to be on for a certain group of people who pay for it. Or you leave some conditional in and you're saying, "Hey, this is actually a kill switch that we want to roll over from one data center to another on a weekly basis, but we might need to do it in some other way, right? 

All those things I just mentioned, those are no longer feature flags. Those aren't temporary things to be rolled out progressively and then removed from the code. What are they? A lot of kind of common discussions online nowadays call these things permanent feature flags, which we think are kind of misnomer's because it's antithetical in the name. We're labeling the superset of these use cases, dynamic configuration, because that's essentially what they are. You modify code behavior independent of code changes in a dynamic way, so you don't need to ship a new file. Essentially, you have a system that lets that change happen dynamically. You don't have to restart a service to pick up a new YAML file, for example, right?

That's kind of on the definition side. In terms of where the tooling really differs, what you mentioned is true. So, we really think that a hallmark of these systems in order to have trust and simplicity with engineering teams is to be tightly integrated into the software development lifecycle. So, that means running through CI, that means being stored and Git. That means potentially having code reviews and things like that. But where I kind of want to express some nuance is we think this is actually extremely helpful in the boundary between engineering teams and other engineering teams or between engineering teams and other teams in the business. So, where this was most useful for Uber was with the operations team.

A big reason why we adopted Flipper in the first place is at the beginning, when it was adopted, I don't know how many cities we were in, maybe a thousand or something like that, right? There's no way that a thousand of these teams on the ground were going to be able to file tickets to then have engineers tweak some very specific thing for that city, right? That would not have scaled at all, even at those like really early days that we were - we needed to have this system where operations could - we try to make it as safe as possible, but be able to quickly make changes for all these different markets.

That's really where we see the power of Lekko shining is you can safely delegate control to other teams, but have that be in a way that still goes through the software development lifecycle, still make sure that stale things aren't left around and we could talk about some AI techniques later, right? But even with basic static analysis, understand what kind of changes would cause issues or not,
right? So, that's a long answer, but I think answered a lot of things.

[0:15:21] SF: No, that's great. I think that clearly sort of separates those two things. I mean, it sounds like over time, basically we've kind of overloaded the term of feature flag away from what the original intention is. In reality, there's just really like a new category of sort of product or process, which is dynamic configuration that can potentially encompass feature flags, but it also encompasses much more, which is around this like more permanent dynamic configuration that you can have. If you have a process that manages all that, then you can safely loop in other people within your organization to manage some of that configuration.

[0:15:55] KN: Exactly.

[0:15:56] SF: I want to get into Lekko. So, like, how did you actually start developing it? Tell me about some of those early days. How did you figure out what was reasonable to create for a valuable MVP? How do you sort of balance, like, giving something that's valuable enough, but doesn't take you 10 years of essentially create?

[0:16:16] KN: Yes, that's a really good question. So, in terms of the early days of Lekko, I think in terms of like idea and problem formulation, it really came from my experience actually of leaving Uber. So, I went to a startup called Sisu. When I was there, I used kind of one of the leading tools on the market. I was thinking back to all the tools that I had at Uber, and the one that really jumped out to me was missing Flipper and not having that dynamic configuration system. 

We had a mix of essentially a Retool, LaunchDarkly, and some manually maintained config maps and Kubernetes and I was looking at this swath of systems and saying, "Man, I really miss the kind of thing we had at Uber. This would have enabled me to move much faster, hand over control to my product and solutions architects for some of the features I was building, and feel really safe putting something critical into some sort of configuration system instead of like an external source of truth."

That's kind of how I got started. In terms of figuring out the MVP, you're right. It's really tough, especially for this kind of technical system. It's both really difficult to not just build, that's essentially our first version. What we released in the last month has been our kind of second iteration of the product. Our first iteration, we essentially just took the best pieces of what we saw at Uber and Facebook. We have one of the tech leads of the configurator system on our team as a founding engineer, Sergey. We essentially just took a bunch of these things and threw them together and said, "Hey, you guys need to use this." But everyone's like, "Man, this is really exciting. But I was just going to help you right now."

So, I think some of that problem formulation is really important in understanding like what kind of teams are you helping right now, instead of just thinking purely product, thinking, "Hey, we know this needs to be in the world and I can talk more to kind of the differences between those two." But that's a bit of how we kind of got started. Scoping down to an MVP, especially in such a technical space, is extremely difficult.

[0:18:11] SF: Yes. How long did that first initial version take?

[0:18:15] KN: So, that initial version, I think after we hired the initial team was probably like six months of development. At the time, it was like two founding engineers and myself. We spent kind of the first six months building an initial product and feeling pretty good about this fully featured dynamic configuration system, and then kind of recognizing that we didn't really focus on usability. We really just tried to be kind of feature-complete. So, we took another probably like six to nine months to kind of rework that and really figure out kind of the problems that we're solving on the customer side.

[0:18:47] SF: Did you immediately, like, you're taking inspiration from these things like Configurator and Flipper and maybe other projects, but like, did you immediately sort of deviate the architecture from that? Or are you sort of basing some of the design choices that you're making based on those existing systems?

[0:19:03] KN: So, we initially actually followed Uber and Facebook system pretty closely. So, Facebook was written in Python. We scope that down and there's a configuration language called Starlark that we based our initial prototype on. Then Uber had this essentially like a rules language, like a rules engine built into it. So, we also came with a rules engine and kind of had that be first class as well as being Git-based and in a bunch of different languages with SDKs. So, that was pretty common.

But where we kind of deviated initially was we literally took the configuration as code thing a step further. Now, you can essentially decorate functions in your source code, and Lekko will wrap those functions and actually fetch the latest version from Lekko at runtime. So now, this interface has gone a step further from where Facebook and Uber has been and we essentially have an in-code, like in your source code interface, as well as a kind of the way to get the updated values as well.

[0:20:10] SF: Yes, so how does that work? So, I'm decorating a function and then are the different versions of that function stored somewhere else and Lekko is essentially injecting those in the source code? I guess, walk me through that sort of workflow.

[0:20:21] KN: Yes, definitely. So, when you annotate a function, we don't support any arbitrary like Python or JavaScript or Go or whatever, it's like a scope-down version, right? But that gets translated into an AST that Lekko understands for this function. So, if staging return true, otherwise return, false, right? That gets translated into an AST. Then Lekko stores that AST in a Git repo. Essentially, the rest of Lekko system is either modifying that AST or shipping that AST to your running code. The annotation, if you will, is essentially letting Lekko say, "Hey, if there's a newer version of this AST, I'm actually going to execute that instead of what's built into my code."

[0:21:04] SF: And what are some of the, like, core components of this, of the whole end-to-end Lekko infrastructure? How is like deployment work? Is this a managed service or am I running this on my own servers?

[0:21:17] KN: So, we run a managed service, mainly to have the Lekko UI and be able to have that be as updated as possible. But there are ways that you can potentially self-host, right? At the end of the day, Lekko storing an AST in a Git repo. So, we have supported ways where if you want to cache that in a way that either is running in your infrastructure or is fetching directly from Git. We have modes to support that as well. But we think the standard mode, that's really easy to implement and straightforward for most users, is our managed service. That's running in AWS. But we have connections with, for now GitHub, but in the future other Git source providers in order to essentially mirror that definition or edit that definition directly.

[0:22:03] SF: What about performance? Is there any concern essentially of around like scalability and performance when you're working with clients like this? Is that something that comes up because of the fact that they're going to be have to talked to this external system in order to run this code?

[0:22:18] KN: Yes. So, I think performance and reliability are two big ones. Performance especially. If you think about it, a naive implementation of this could for example call out to a system every single time. What's fairly standard nowadays in most feature flagging tools, even on the market, do this, is actually perform local evaluation. So, you fetch the updated version, especially in like a backend perspective. You essentially fetch the definitions and when the call comes in, you want to decide what this flag, in our case, what this function is going to return. We do that evaluation locally. And we have some sort of benchmarks to show that this is a pretty straightforward operation and not too far from running native source code to perform this logic.

That's something that's fairly straightforward. When you look at a front end or a mobile, it starts getting a little more complicated, but that's a tradeoff essentially between visibility and performance and stability. Then one thing we're thinking about, especially when it comes to stability and reliability, is the fact that because we're in the code, we essentially have a build time fallback built into the system. So, the last time you built your code, you have the latest version of Lekko's sitting there in a file. Now, if something ever happens, you're only operating as of your last build, rather than being completely down and having a hard dependency on this external system.

We found that our early customers are really excited about that, and they don't have to worry about being completely down if there's a network partition or anything else that happens.

[0:23:47] SF: What kind of performance metrics are you typically monitoring in order to make sure that you're hitting the scalability, reliability that you want to hit?

[0:23:56] KN: Yes, that's a really good point. I think in terms of scalability, we're trying to figure out like how well does this fan out and how many concurrent operations can we handle. Thankfully, EKS is pretty straightforward in being able to scale us up and things like that. When it comes to performance, we're really trying to measure like, "Okay, what is the CPU utilization? How many operations a second can you feasibly do on some commodity hardware when you're interacting with Lekko systems?" That's the kind of thing we look at when it comes to performance there. We're also monitoring end-to-end latency from when you make a change, how quickly can that make it into your system? I think that is tunable and that's another aspect that we're continually monitoring is how quickly can you make a change.

Sometimes you might not want to actually make a change too fast. One thing Facebook does is they batch changes every five to ten minutes and then ship those out. That's not something that we've had a request for yet, but something we could consider.

[0:24:53] SF: What about language constraints? What languages do you work with today?

[0:24:55] KN: Right now, we have typescripts on the front end and the back end, like in a React. Node environment, as well as specifically supporting Next.js, so doing some server-side rendering and things like that. Then we have Go as a language on the back end. Coming very soon, we have Python and Rust in the works as well. We have some of those SDKs are actually available right now, but we're still working on polishing them to have a full 1.0 release on both of those SDKs.

[0:25:19] SF: Is there unique challenges between doing this for like a dynamic language like JavaScript or TypeScript versus compiled language like Go?

[0:25:28] KN: So, there's a few challenges that we encounter. Things like decoration and kind of build time plugging in in order to make Lekko's interfaces really seamless are a lot easier in languages that give you access to kind of the build time environment, the compile-time environment. So, it's actually fairly straightforward in TypeScript, Python, and Rust even, because even though it's compiled language, they give you access to those internals. In Go, it's actually a little trickier. We have to do some code generation.

So, there are some differences between those languages, but it's mainly along the lines of how much of the compilation process is exposed to you as like a library developer versus compiled versus dynamic language.

[0:26:10] SF: How are you making your choices about which languages to invest in? I mean, I think like JavaScript, TypeScript is kind of like a no-brainer. You mentioned Rust for example. I know Rust comes across very popular on the Internet if you follow such things, but it's not a language that's as old and probably less prevalent than other languages, for example.

[0:26:31] KN: Yes, that's a good point. I think for us it's always a trade-off and trying to figure out like where do we have a lot of developers and engineering teams and companies that build software that this can be the most useful, right? So, that's kind of the main tradeoff for us. Initially, for example, with like Next and Go and Rust, we're thinking about like a lot of folks that like to try new tools are writing things in those languages, right? So, in terms of getting some early adopters and getting some early usage, that's definitely something we looked at. Some of these like mid-stage startups that we're working with right now, they're primarily using those kinds of languages. That's a bit of the tradeoff there. But you're right that like a lot of people use Java and C++. That's a bit further down our priority list. But yeah, things can change pretty quickly. So, we can spend things up relatively fast.

[0:27:16] SF: We talked a little bit about removal of stale code. So, how do you actually - how does this help with detection of stale code and being able to remove it?

[0:27:23] KN: Definitely. So, one thing I mentioned at the top of the call was because we're literally in the code, it becomes much more straightforward to understand the interactions between flags, given it's pretty straightforward to trace the call graph and figure out like what's been executed, as well as using pretty straightforward static analysis tools to understand that, "Hey, if this thing's on, this thing also is on. You can create some causal relationships."

In terms of removal, that becomes significantly easier because you now have one call site in the code, and you can understand the code significantly better to have it be removed. Then in terms of stale detection, that has similar things as well, right? You can better understand the interactions between different Lekko's in your code. Stale detection is still, as we talked about earlier, a difficult problem, something we're trying to continually iterate on with our users.

[0:28:12] SF: What is the big technical challenge in the space right now? Is there like a noticeable gap or problem that you hear from even people who are Ubers of the world that have built these kind of dynamic configuration systems. What is their pain point that needs to be addressed?

[0:28:28] KN: Yes. So, I think if i were to sum it all up, one big thing you hear from especially larger organizations is like how can I keep my teams moving quickly while reducing risk? That is the real tough thing. It's like the largest outages, if you could even argue that CrowdStrike outage was a configuration change, although there's a lot more going on there than just a configuration change, right? But you could see how there's this tradeoff essentially of risk and moving fast. And I think dynamic configuration systems are often at the center of a lot of that. So, that's a big thing that some of these organizations are dealing with.

Another thing that you hear from a lot of large organizations who have this built internally is how can we make this developer experience really good? This is something that we're super interested in is how can we make this seem super seamless, make this as easy as writing a function in your code and annotating it, which is how we're looking at it right now. The developer experience is something that if you're an internal team trying to figure out how to get this usage up because it'll make the code safer and better, that's super important.

Then, in terms of teams kind of looking at this kind of tooling, I think a big thing is still like thinking about this as tech debt. How do you think about permanent and stale flags? There's a lot of unanswered questions there as kind of mid-stage companies are really looking at this kind of tooling.

[0:29:51] SF: Is there like a role for AI to play at all in this world of dynamic configuration?

[0:29:57] KN: I think so. And I think in a similar way to where you see AI really influencing dev tools, I think as we're thinking about configuration as code, I think AI plays a big role, right? If you are giving an interface to the rest of your organization to interact with the code, AI can help summarize like, "Hey, this is what this is actually going to do, right?" It can help assess risk and understand, "Hey, based on, this could even be heuristic base, but you can talk about AI in this situation as well, right?" Based on the amount of traffic that I've seen go through this Lekko, this dynamic function, this is kind of our label for a dynamic function in the code. If I can have a pattern of the traffic, I can understand, "Hey, if you make this change, you're actually not just going to direct traffic for just one customer. All the customers are actually going to go through this." You need to be careful about what kind of change you're making. 

Automated issue detection when it comes to AI or even just starting off with heuristics, I think plays a big role. Then, I think if you look even further in the future, as you're thinking about AI writing more code and being more of an author of code and a cool idea I heard recently was, software engineers were going to turn into tech leads because they're going to be managing this team of agents writing code, right? Even in that world, they're still going to be business decisions that need to be made around the code about what customers get? What feature? When do we roll them out? How is this going to look? What is the business logic associated with this code? That is still going to need to be there.

So, Lekko is, I think going to still be an important role in that future world where software engineering is more like tech leading, if that's how this ends up panning out. I think if we look even further, like this is still a really imperative part.

[0:31:42] SF: Yes, absolutely. I mean, I think that, I've always said that, at least my impression of it is that as we, the sort of co-pilots and other tooling gets better, maybe you're doing sort of less hands-on keyboard, but the abstraction as an engineer starts earlier in your career where it's going to become a lot more about putting Lego blocks together and thinking through sort of the business requirements. Of course, something like dynamic configuration competently play a big role there because that really becomes like a tool that helps facilitate the implementation of business requirements so that you can sort of create this more dynamic experience for people and test things.

[0:32:21] KN: Yes. Exactly.

[0:32:21] SF: In terms of your sort of go-to-market, I would think that in many ways this is like a new category of product. Unless you worked at one of the hyperscalers, like you might not even know this is something that is like available to you or you should be interested in it. So, do you see that? Is that one thing that you're sort of budding up against, is like there's not necessarily people who are actively going and Googling, dynamic configuration tool? You have to essentially create the knowledge that this is something that solves a real problem for people?

[0:32:54] KN: That's something we're working on and trying to figure out both the category definition in terms of, "Hey, what problems are we actually grouping together and how are we building an understanding of what kinds of problems dynamic configuration can solve?" But also, the balance of how much do we lean on the existing category of feature flagging? Is this better than feature flagging? Is this feature flagging supercharged? Is this a superset of it? How do you really get in and understand the nitty-gritty? Or without getting in and understanding the nitty gritty, how can you actually present this?

So, I think from a go-to-market standpoint, as you're mentioning, something we're really trying to figure out, and it's something we're actively working on. I think one of the strong things that we'll like to do is as we have these early proof points from some of our customers, we can tell those stories and say, "Hey, here's how we helped this customer and here's how we helped this customer. And these are kind of the ways that we've seen dynamic configuration help with this, this, and this problem." So, you can start grouping those things together and saying, "Hey, Lekko giving a safe interface for other teams or other people in their code is actually something potentially incredibly beneficial."

[0:34:05] SF: It's definitely a challenge because I think if you identify too much as yet another feature flagging tool does like better, then you limit the vision of it of what you're trying to create to some extent and you get a little bit pigeonholed into this category of feature flag and when it's so much more than that.

[0:34:23] KN: Yes, exactly. So, it's something that we're continually trying to figure out and I think one thing we have seen as you as you're mentioning is some people have seen this kind of thing before. So, starting off some conversations with senior staff engineers formerly from like Twitter and Uber and Google and it's just an interesting conversation to get started. But from there, we still need to solve those business problems. It's only a way to get started. We still butt up against similar issues you're describing.

[0:34:49] SF: Do you think that the Ubers and Facebooks of the world were eventually used something like Lekko?

[0:34:56] KN: So, I have an interesting anecdote for this. I think one of the Stripe founders was saying how at YC, they were saying, "Oh, like, the Amazon's of the world will never use us, but that's okay." That was kind of an objection from some early investors was like, "Oh, these big companies will never use you." It turns out that, like, Amazon uses Stripe now, right? So, I think it's an interesting thing where right now I would say, like, if I were to say no, I think it would be a fair answer. But I'd like to think that in the future, we've demonstrated and worked specifically so much on this tooling that we're actually in a place where an Uber or an Amazon or an Apple are using us, maybe to start on one team, right? But it becomes kind of a way that their team can just build software better. I'd like to think that in the future, so I'll leave it open. I won't say no.

[0:35:47] SF: I think if you went back 10 years, and the idea of like a bank running on the public cloud would have seemed absurd. But now almost all US-based banks, they're not fully moved over, but most of them are starting to adopt public cloud and run on AWS and Google Cloud and stuff like that. So, it takes a while sometimes, but I think there's just real value there. I wouldn't think that essentially, if you're successful, you're going to have built up way more expertise in this area about a problem that you like 100% focus on and care about, that's going to eventually surpass whatever internal systems some other companies have built because they don't necessarily have their best engineers working on that product and only 100% committing 24/7 thinking about this thing. It's just completely different. 

[0:36:36] KN: Yes, exactly. So, I think that's a really exciting part about being in a startup space, especially in developer tooling, is like, people are really excited about solving this one problem, and they can potentially create a ton of value by exactly, as you're saying, just being hyper-focused on solving these few subsets of problems that most companies have to deal with.

[0:36:54] SF: What are your thoughts on - I've seen this with feature flagging. I think it actually makes more sense with some of the stuff that you're doing, but I'm still not convinced that it's a good idea of essentially skipping staging environments and releasing everything on prod, but behind essentially a dynamic configuration. So, you essentially test on prod. You help address that problem of like, "Oh, it works in this environment, but doesn't work in this environment by always releasing to prod."

[0:37:21] KN: Yes. So, you're right that this is slightly controversial. Uber did a ton of this, right? Like it was very difficult for a while. We actually - I forget where the test city was, but we had some city, in some not very inhabited place that had a ton of fake Uber traffic. We would essentially run fake traffic through it, right? So, a lot of initial flippers would turn it on in this test city. We would essentially be testing in production.

I think there's a lot of benefit to that. If you think about a CD perspective, right, if continuously deploying, but you're continuously deploying things with dynamic configuration in it. That means it's dynamically configured to not really do much. You can still have that Canary system for the code deploy. But then with this dynamic configuration system, you can more effectively manage that code once it's already out and deployed.

So, I think it does lean towards kind of a CD test and production mindset, which I think a lot of companies are moving towards just because a lot of people have recognized that pure staging environments with actual customer data that's very realistic is difficult to reproduce. I think a lot of - some companies have tried to like, "Hey, we'll de-anonymize your customer data and make it look very similar and make all these environments very similar."

But at the end of the day, that's a very difficult problem. So, this is one way to make that easier and potentially safer. Sometimes the more you say like, I think there was like a test and prod meetup or test and prod group at one point. It's like, it has kind of a bad rap. But I think is a place that actually most companies are moving towards.

[0:38:55] SF: Yes. I mean, we did some similar, you mentioned though, like test city for Uber. We had something similar when I was at Google. I forget the exact city for Google Maps essentially. Because if you're creating like, say a business that you're putting on like using the Google tooling that create like a small business or something. It's almost possible to simulate that entire testing environment because of all the things that has to go into it, because it touches so many different systems to validate that this is like a real business. I think it was like some obscure city in Alaska or something like that. We used to drop all those things into. So, I'm sure if people dig around enough, they can probably find it somewhere.

[0:39:29] KN: Yes. I need to remember the name of the city, but it's some fun quirks about these kinds of systems.

[0:39:35] SF: What's next for you guys? What do you think is sort of the future of the space?

[0:39:38] KN: Yes. So, I think right now we're just really excited to chat with a few more teams and kind of have our early customers really come out as really strong case studies. That's what we're really excited to come out with and really narrow our focus in terms of how we're positioning the product and how we're positioning the category. I think that's kind of our focus right now. Coming soon, I think we're very excited to announce some AI features, as we talked about, how can we really make this thing really safe or really understandable for the rest of the business?

I think some of that's in our pipeline, and we're really excited to potentially announce it. I really think that we're on this journey of trying to introduce this very different way of building software, of having to be dynamically configured having the rest of the business kind of interact with it. So, we're trying to figure out to get this out into the world as efficiently as possible. But like kind of our long-term vision is there. Right now, it's a really exciting time of just there's no one right way of getting this out there. So, we're excited to be working with some early design partners, early customers, and trying to get those stories out there.

[0:40:41] SF: Awesome. Yes. Well, Konrad, thanks so much for being here. I thought this is really fascinating. Well, definitely, assure there'll be many versions of what you're building. We'll have you back down the road.

[0:40:51] KN: Definitely. Thanks so much, Sean. It's great.

[0:40:53] SF: Yes. Cheers.

[END]