EPISODE 1715

[INTRODUCTION]

[0:00:00] ANNOUNCER: Static analysis is the examination of code without executing the program. It's used to identify potential errors, code quality issues, security vulnerabilities, and adherence to coding best practices. Abbas Sabra is a Principal Engineer at Sonar, which creates tools to help developers produce clean code. Abbas specializes in C++ static analysis and began his career in the financial industry where he identified inefficiencies within the C++ tooling ecosystem. He joins the show to talk about static analysis and static analysis tool development.

This episode of Software Engineering Daily is hosted by Sean Falconer. Check the show notes for more information on Sean's work and where to find him.

[INTERVIEW]

[0:00:54] SF: Abbas, welcome to the show.

[0:00:55] AS: Thank you for having me.

[0:00:56] SF: Yeah, I'm excited for this. You're a principal engineer at Sonar. You're working in C++ static analysis. Can we start off with, maybe you could give some insight into your role, your responsibilities? What's a typical day look like?

[0:01:10] AS: Yeah. As you said, I am a Principal Engineer at Sonar. I work on static analysis. I started my career in finance. I found myself lighting my own static analyzer on the side, so I decided to join Sonar to work on static analyzer. I work on a team that is focusing on different languages, C++, Python, Java, C#, and my specialty is C++. We try to integrate those analyzer in three products, SonarCloud, SonarQube, SonarLint.

As a C++ developer, my typical day is divided into three area. We have a group focusing on the language and its best practices. This group, we think about what is new in the language and you will standard what are the best practices. How can we help you to prevent typical bugs and code smell in the code. We try to reflect on our checks, try to write the description of why we are suggesting what we are suggesting in a indicative way. We also, in this, we call it bubble. In this bubble, we work on automatic fixes for C++, so how to generate code that would replace the previous code in a way that would make it better.

Second part is about bug detection. Here, it's more like static analysis technique to find typical bug, like buffer overflow, memory leaks, null pointer, dereference and just random code that you cannot assume anything about. This is the challenging part, because there is many heuristics that you need to apply. The third bubble would be on product integration, how to interact with different compilers, build system, product in a way that would make the static analysis accessible to the user.

[0:02:53] SF: Once you've reached the level of a principal engineer, how much of your time are you actually hands-on keyboard coding versus being more of serving as a leader and helping other people within the company that are maybe more junior with you implement the right things?

[0:03:09] AS: The percentage definitely decreased since I joined. When I joined, it was 90% coding. Now, it's around 30% to 40%. Depending on the subject, of course, because sometimes there is some subject that require expertise, and you need to be on the keyboard trying to implement a new feature. Sometimes it's just leadership, making sure that we have the right objective and we have the right organization to reach our code. I would say, it's 30% to 40% coding this day.

[0:03:38] SF: I want to talk about static analysis to start. There's lots of ways that we've built as an industry to help with code quality, catch errors, catch bugs and stuff like that. For example, code reviews. Now, it's standard practice that you're going to commit code through a PR, or whatever mechanism that you're doing it. Someone on the engineering team, or multiple people are going to review that code and hopefully, catch some issues and also, help make sure that it's following whatever the best practices are. What are some of the things that something like static analysis is going to catch that is very hard to catch during a code review?

[0:04:13] AS: Our vision of static analysis, because there's many different tools that apply static analysis, but our tool focus believes that static analysis should happen on pull request. Every pull request should be static to analyze. It's usually done before the human code review. What static analysis can detect in this case is we believe at least that static analysis, we believe at Sonar that static analysis can detect things that are hard to detect by human review. Let's say, that you remove a code to a function in your pull request, and then the function become unused. Usually on GitHub, on GitLab, the pull request review focus on the change code, so it's hard to detect by the reviewers that the function become unused.

Or if you have an endpoint of the reference, that the internal pointer is initialized 20 steps in the call stack, the code review will be hard to detect those kinds of things. In general, this coding guideline are easier to detect by tools than by a human reviewer. A typical example is that I get a lot on my pull request by static analysis is that this function should be const. I modify that code a little bit to remove the side effect from it, and then it becomes const. But as a code review, you don't focus on these things. By doing static analysis, code coverage, even dynamic analysis in pull request, the end will be that the code review will focus mostly on the functional review, rather than the technical review. Because not all will be able to detect that you have a functional inaccuracy in your code. Many tools can do better than you in detecting violation of your company guidelines.

[0:05:52] SF: Right. Yeah. Especially, something like a function no longer being used might be hard to tell if you're doing - you focus essentially on the PR for code review, because it's going to be such a micro snippet of the larger overall code base. You may not know that, okay, well, it's no longer used by these hundreds of different locations, or something like that. That's a hard thing to catch by a person.

In terms of understanding the context of program execution, like if we have these - especially if I think about a web application, or something like that, and I have different backend services that are maybe deployed in different ways that call each other, how does that influence, or be taken into account during static analysis in terms of the context with which that you're actually going to execute the code?

[0:06:38] AS: The entire context is available for the static analyzer, because I mean, the compiler is compiling your code usually in building this context. A static analyzer can redo what the compiler does. Most of the static analyzer, the first step is to redo the steps that the compiler does. There's many technique. One of them is symbolic execution of data flow, data flow analysis. Symbolic execution is what we usually use at Sonar, which basically means that we are building our control flow graph for the program and we are executing it statically. The same way that you expect a program to be executed dynamically. Of course, with a lot of constraint, because you don't know the input of that program statically. You are building heuristics on what the input can be and what constraint we have on it. At the end, in terms of control flow and interaction with a different part of the code base, the static analyzer usually can do the same way that the program does when it runs.

[0:07:38] SF: In terms of the control flow building, essentially, is it like a call graph, essentially, that you're building up statically to represent what the flow of the program is?

[0:07:46] AS: Yes. Usually, static analyzer starts bypassing the codes and you build an abstract syntax tree. From the AST, you build some sort of a control flow graph that represents how the program is going to run. Then you follow the control flow graph and you build the information at each step of the program.

[0:08:04] SF: Yeah. In some ways, it's similar to building, essentially, a compiler, at least the first part. Then in terms of when you're walking that graph, that's how you catch things, like a null pointer, or basically, an unused function, because in an unused function perspective, you'd have, essentially, a node in the graph that is not connected to anything. Is that right?

[0:08:23] AS: What you said is exactly right, because actually, a compiler is a static analyzer. Usually, when it compiles, try to detect that code and try to remove them, optimize them from your binary. We try to detect that store and usually, it runs about then. But if you're following the call graph example and you reach a point where, for example, you know that X cannot be bigger than five and you have an if condition that say, if X is bigger than five, you easily can detect that. Well, and this condition is always, too, by following the possible value of data on each point of the program.

[0:08:59] SF: Is building a static analysis tool for C++ more difficult, you think, than other languages?

[0:09:05] AS: It has its own difficulties. It's definitely more difficult than Python, for example, because Python has a - you can fit the grammar of Python in one page, but the grammar of C++ is multiple pages of complex grammar. Each brace or parenthesis in different locations mean different things. One of the difficulty is parser. Parser and C++ is complicated and costly, because the parser is as fast as a compiler, usually, and C++ compiler are known to be slower than other languages compilers. Other challenge is that we have many compilers and each compiler has its own extension of the language. If you are building a static analyzer, you need to think about each extension.

Other difficulties like, we don't have a common built system for C++, or dependency manager, so you need to -you cannot rely on standard structure of a repo compared to Java, where you always have maven, or a gradle and your repository are C#, or you rely on nougat. But these are the top three difficulties that usually come to my mind when I talk about C++.

[0:10:08] SF: Yeah, in terms of the compilation time, how do you deal with the slow compilation, especially for large programs?

[0:10:15] AS: One technique is to use something called PCH, which is pre-compiled headers. If you are analyzing a pull request, you can pre-compile the code. Once you have a pull request, you only have to update the compilation unit that changed and you don't have to recompile everything.

[0:10:32] SF: Then, what about navigating the different types of compilers? There's not necessarily, depending on the system that you're - it's tied to CPUs, tied to the operating system. Do you end up having to compile, basically run under different constraints?

[0:10:45] AS: The CPU part is not really relevant to static analysis, depending on the technique, because if you run the static analyzer on control flow graph, not on binary level, you don't have to care about what platform it is targeting. Because on a higher-level language, like LLVM intermediate language, or on an abstract syntax tree, the targeted architecture is less relevant. Of course, you need to - it matters sometimes, so you need to understand it. For example, on different architecture, you have different size of types, and if you store something that cannot fit on this type and this architecture, the static analyzer should work, but you don't have to think about things on a binary level when you are doing static analysis.

[0:11:28] SF: What are some of the heuristics that you commonly used in order to figure out, for example, if there's a memory leak within the program?

[0:11:38] AS: Memory leak is a hard problem.

[0:11:41] SF: Yes.

[0:11:43] AS: Especially in C++. We use symbolic execution. Basically, this control flow graph, we try to execute it statically, and building symbolic value for each variable. For memory leak, basically, you need to detect the pass where the memory is initialized and it's not initialized. If you find this pass, it means that you have a memory leak. The common pattern is an early return, or a surrounding exception that avoid, delete all the releases of the memory at the end of the function.

Symbolic execution cannot detect all these issues, because if you want to run the program statically, it might take as long as running the program dynamically even more. So, you need to put some bounds on the symbolic execution to be able to finish in a acceptable time, because remember, we said we need to analyze a code on pull request. Heuristic, in this sense, you usually try to limit the step of symbolic execution, try to assume a specific pattern in the code, and C++, there is contract with assert. You can assert that this cannot be null, and you can detect those construct and feed them to your symbolic execution to help it better understanding to the code.

At the end, you are trying to build heuristics, unlikely passes to be able to simulate the program up to 80%. We try to rely more on preventive measure. Like, yes, we can detect maybe 60%, 70% of memory leaks that are just in random C, or C++ program, but it is much more efficient to help people use a newer construct that would avoid memory leak than to detect a memory leak. For example, using smart pointers, or before a smart pointer in C++ 98, there was this rule of three, where you try to define a constructor, a copy constructor, and a destructive, and a standard way that would be hard to lead to a memory leak. This general advice would avoid memory leak, and would have basically, no false positive, because you are not simulating passes in the code, nor false negative, and would have more efficacy than symbolic execution.

We try to combine both of them to get a high coverage. But at the end, detecting 80% of the bug is already good, and you don't aim for 100% in static analysis, and you combine it with also, two dynamic analysis and code coverage to reach this 99% safety, or of confidence in your code.

[0:14:15] SF: You mentioned false positives, false negatives. What are some of the situations where you might end up with a false positive, or false negative?

[0:14:21] AS: Just to make it clear, false positive, when the stack analyzer reports an issue that is not accurate. If we say, there's a memory leak, while there's not a memory leak, that's a false positive. False negative is when the static analyzer don't say anything, while there's a memory leak. False positive, depending on the rule, sometimes it's enough to, for example, if you build a check, as I'd say, user, a smart pointer, it's highly likely that you will not face any false positive and false negative, because it's not really depending on the execution path of the program. You are just trying to find instances where the user is using a low pointer, like a new, and tells them to replace it. Here, you rely on standard method for making sure that your static analyzer is robust, like unit testing and degradation testing. We do this stuff to ensure that we don't have false positive and false negative.

For a symbolic execution where it's pass sensitive, it's more of a challenge, because you cannot guarantee that, and you are basically running on any sort of code. You do it by making, basically, the controlled graph very abstract in a way that it's not very affected by how the code is read. If you run the analysis on an abstract language, where there is not a lot of room for getting it wrong, you reduce the possibility of false positive, because usually in C++, for example, you can initialize a variable in 10 ways. If the newer standard add another way, your static analyzer will be broken, because it doesn't think about this way, and it will lead to false positives. If you try to abstract all this way of initialization to a standard syntax that you run on the static analyzer, it's much more unlikely that you are going to hit some false positives.

[0:16:13] SF: What's the typical acceptable runtime when you're doing static analysis? I'm doing a pull request, the static analysis is going to run, what runtime am I looking at in order to get a report back?

[0:16:27] AS: If it makes your pipeline slower, you will likely not use it. If you run your static analyzer on your main branch, it's not the right place to run it. You need to run it in a way that is as fast, or as slow as your build, for example. Usually, if your project built in five minutes, the static analyzer should finish in five minutes. If you have a bigger project and your built take 10 minutes, also it's acceptable to run in 10 minutes, because you need to analyze much larger code base. As long as we keep this ratio between build time and analysis time, it is generally acceptable. You have to wait for you to build anyway, so you can run the static analyzer and parallel it and will not slow you down.

[0:17:10] SF: One other thing in terms of the types of issues might catch is, am I able to use a static analyzer to also catch things like, concurrency issues, race conditions, deadlocks, stuff like that?

[0:17:23] AS: This issue is also similar to the memory leak problem, because there is preventive measures. We have in C++, CPP core guidelines that try to give you some preventive measure for deadlock and race condition and things like that. Like, don't use a global variable. Don't share the variable between the threads. Use a memory safe data structure. All of these things that would avoid the memory leak, it would avoid the race condition. Then you have, trying to simulate the execution of a thread to try to detect the race condition and deadlocks. That would be the same way as memory leak.

For example, if while you are simulating a thread, you find that that thread is locking the same new text multiple times, that's already a deadlock, because you cannot lock it while you already lock it before. This kind of picture of best practices, trying to avoid the memory leak is a race condition. Then statically executing your program to detect this race condition is the general pattern that we use for almost all bugs in the code, like memory leak and buffer overflow and the race condition and deadlocks.

[0:18:31] SF: What about things like, if there's an infinite loop in the program, what ends up happening to the call flow graph that you're building?

[0:18:38] AS: Loops is one of the hardest problems in symbolic execution, because sometimes it's not known statically how many times this loop can execute. Usually, most symbolic execution have pattern to unroll the loop, or to execute it on specific interesting value. Detecting infinite loop is one of the checks that a symbolic execution can detect. For example, if you are calling the same method without a condition, the engine can detect this infinite loop, and report it to the user and stop the simulation of the engine, because it knows that on this path there is a possible infinite loop.

[0:19:19] SF: Are there new advancements in this area that is particularly related to C++ that you're excited about, that you're looking at as something to implement into the work that you're doing at Sonar?

[0:19:30] AS: In static analysis in general, there's always new technique in a symbolic execution, or even new technology to run a static analyzer, like recently, including machine learning in Zappas that a symbolic execution can take is beneficial. There is other part where other languages with static analysis try to prove 100% that a method is doing what it's supposed to do. There is languages that are built around this, like Dafne by Microsoft, usually you define the pre-condition and post-condition and the static analyzer can build a mathematical expression that say, "Can I make sure that the output of the function, if the input is verified, that the output of the function will be as expected?"

This technology cannot be really applied to normal C++ code. In C++, I'm excited about the new features that we recently worked on at Sonar called AutoConfig which is trying to make C++ analysis accessible to any type of project independent of the compiler and tool chain that the project is using. This is mostly useful, because a lot of companies are stuck with older compiler and that don't have many toolings. Here in such technologies, the focus is mostly, it's mainly new parsing technologies that focus on being able to understand when the static analyzer don't fully get the entire context of the code and try to build the heuristics to understand it, even though it's not the real compiler that the user is using.

For this, we had to build a new part of the parser that can understand a reason about codes that is not typically understood by the top three compiler, Clang, GCC, and NSVC. Also, we have to detect dependency without relying on a dependency manager. We had to rebuild our own dependency managers that would look at the user code, and instead of looking at the dependency manager, look at the user code and deduce the dependency. We flip this around to know what dependencies this code base is using.

The end result of this is, on SonarQube, now the user can just click a button and analyze their code without providing a lot of context, without providing, actually, any context about the code base. We detect everything automatically.

[0:21:55] SF: Where does a lot of this innovation space come from? Is it driven by researchers in the space, then you got to stay up to date on what the most recent stuff that's happening in the research world? Or is a lot of it driven by industry, or maybe it's both?

[0:22:08] AS: It's mostly in research. Stack analysis is very research-y subject. At university, I did a project on formal verification, which is a form of static analysis to verify the program. There's always new papers that are about new technology in symbolic execution, taint analysis, data flow, abstract interpretation. It's mostly driven by the academic world. Also, there is sometimes some non-standard static analysis contribution from machine learning and AI that can be integrated into static analysis to make it more powerful in terms of heuristics.

[0:22:45] SF: Do you spend some of your time looking at the latest research in order to figure out maybe what's next, or some opportunities that you can use at Sonar?

[0:22:54] AS: I personally don't have the time to do so. We have an R&D department that is mostly compiler engineers, or static analysis engineers that have a PhD in static analysis that focus on this, kind of what is a new top technology, leading some new paper to improve our product.

[0:23:12] SF: Okay, great. I wanted to also talk a little bit about C++. You have a long history with C++ going back to the 90s. I think that was around the late 90s was when I first started dabbling with C++, but I've never spent the bulk of my career working in it. I've gone in and out with C++ at different points. It's been around since the mid-80s. There's been a lot of change in that time. I remember being introduced to STL, the standard template library, which was a huge leap forward, I thought at the time in terms of having built-in data structures and algorithms at your fingertips. What are some of the major milestones in the history of C++ and its modernization?

[0:23:52] AS: Yeah. Before going into the detail, how was your experience with C++?

[0:23:56] SF: I mean, I feel like, I have a love-hate relationship with it. I did competitive programming in university, the ACM ICPC programming competitions, and the only languages available at that time were C, C++, and Java. I did all of my coding in C++. The only industry, real industry experience I had was during my time at Google, where I wrote. I think it's actually still being used now, but I wrote a program there to help add additional security, essentially, to the way that text messages get sent. That was probably, I think, every text message at Google for a period of time was going through that program. That's my most successful in-prod thing that I've ever done in C++. That was the most recent experience I had, where I started actually touching to some of the more modern points of it, like smart pointers and stuff like that.

[0:24:41] AS: Makes sense. I mean, most of the C++ developers that might have a love-hate relationship with C++, and that's for the reason. C++ was started in the 98th standard, which was trying to do C with classes, so trying to give you some level of abstraction on top of C. The main motivation was, well, you can migrate easily from C. You get classes and you get inhabitants and all of the object-oriented features without paying extra cost for performance. There's always this promise of zero-cost abstraction in C++.

C++ 98 was, at the time, a good - I mean, it's definitely a good progress from C. It's hard to imagine yourself writing a large program with C, even though the next is written in C, but I personally find it hard to reason about a large program in C. With time, there was a lot of introduction of new languages, like Java and C#, and even more modern language, like Kotlin and Go, and C++ needed to keep up. For a long time, there was some minor improvement in C++03, but it didn't provide anything new until C++11, which was a totally new language, basically. We got move semantic, we get a smart pointer, we finally get multi-threading and the language, which was a big thing.

We have implicit type detection. You can write auto, instead of var and Java to not spell the type explicitly. It was almost a new language even. It's a new language. If you write now in C++11 and you use only C++11 features, you cannot recognize that it is C++. Even one of the most impactful features that it's still heavily used right now is lambda expression, which is also available in many other languages. C++ came and made the language much easier to use, especially smart pointer and all the memory issues.

Then after that, the language started to pick up speed. They started to release a new version every three years. We got C++14, which was a small standard compared to C++11. That was a good decision to - it's not good to put a lot of features in C++11 and get a lot of new features in C++14. It was mostly perfecting the features that was introduced in C++11. Then we get C++17, which was, I would say, a utility version, because we get things like, optional, std variant, std any string view. We got a file system. We got a lot of the good stuff that you would use in a normal program that you could do before, but now you can express it in a much easier way and safer way.

For example, for a std variant, previously you have to use a union and keep track of what element in the union is active. Now you can use std variant and it's much easier to use. C++17 was focusing on this utility improvement. Then you get C++20, which is almost like C++11 in terms of number of features. We get coroutines, which are also a language feature that's now available in many languages. We get models. Models is a new way of building your program without relying on the old three plus, had a file and preprocessor way. This thing should make the compilation faster. That's its promise and should make static analyzer faster.

Also, in C++20, we got ranges, so patterns, like map filter with use is now very easy to write in C++20. C++20 is a very large standard. Now we have C++23, which is, we go back, it's like, now we have standards that just go over the previous standard and fix normal, small things, just to give it time before introducing the next big thing. C++23 is similar to C++14. It's just a small standard with a lot of small utility.

[0:28:43] SF: When was 11 introduced? What was the motivation to modernizing it? Was it essentially, to try to keep C++ relevant?

[0:28:51] AS: Trying to keep it relevant, yes. I mean, every language is evolving and looking at C++, not evolving since 10 years, it was difficult to still advocate for the language. Even though the use cases of C++ wasn't challenged. In C++11, there was no language that is competing with C++. There was no language that has the abstraction without the performance course. It wasn't out of competition, but it was out of - I mean, the good thing about C++ is that it's run by an ISO committee and many people can contribute and they decided to improve the language and that was a big favor for the C++ committee. For the C++ community, sorry.

[0:29:34] SF: I mean, what about in terms of competition now with newer languages, like Rust and Zig? Are those starting to take people away from C++, or are they not directly competitive?

[0:29:45] AS: Zig is not directly competitive. Zig is more competing with C. It doesn't have a lot of abstraction. Rust is challenging C++, but I don't believe it is yet there. If you look at Rust objectively and in my subjective view, Rust is a better language than C++ in terms of getting things done as fast as possible, because you get dependency manager, which is really hard to do in C++. You get a lot of utility, a better standard library and all of these things. When you make a decision about a language, usually you take many considerations into account. Take how many C++ developers you have in the industry and how many of us developers you have in the industry. You also do some risk management and you say, is it guaranteed that Rust will be there in five years? Because C++ is definitely going to be there.

Rust is a real challenger. It's not like Zig, or other languages that are still early. But only time can tell if it's going to be a real competition. Usually, most languages that try to compete, the killer feature usually is being interoperable with the previous language. C++ was, we see Kotlin is succeeded because it was easy to combine it with Java, TypeScript with JavaScript, same pattern. Unless Rust do some heavy investment to make it really easy for a code base that is written in C++ to migrate to Rust, or to at least start writing code in Rust without doing a big investment, if they can do that, it will be definitely successful. That's basically that challenge, because it's not only about how beautiful the language is, there's a lot of things that you have to take into consideration.

Recently, there's many languages that were introduced. We have Carbon, which was introduced by Google to compete with C++. Many people that work on it are actually part of the C++ committee. Also, we don't know if it's going to succeed or not. We have cppfront, which is trying to build on top of C++ by keeping the good subset of it. Say, for example, you can write this new subset, the good subset of C++ and use it with your old C++ without need to learn about all the uncommon feature, or, let's say, not very relevant features that are still in the language but don't have many use cases. But Carbon and cppfront is still in their early days. Rust is the biggest competitor of C++.

[0:32:12] SF: When the modernization started to happen, what was the reaction from the base that had spent the bulk of their career programming in C++? Was this met with positivity that they were happy that these things were being introduced? Or was it more resistant of like, "Hey, I've been doing it this way for 20 years. Why are you now introducing these new constructs that take away from what C++ is supposed to be?"

[0:32:39] AS: I think it was mostly positive, at least in my community, the people that I interacted with. This boils down to one thing that, if the language adds a smart pointer, for example, you don't have to use them, but they are just an extra tool in your toolbox, basically. There is some features that get a bit of pushback, but most of them are positively received. I think most people complain about the things that are not yet in C++ than the things that were introduced to C++. If you talk to a C++ developer, they will complain why we don't have reflection yet, or a pattern matching, or even networking libraries at this part of the standard library. The usual reaction of a C++ developer is, "Can I have more, rather than why are you introducing all of these?"

[0:33:25] SF: I mean, you also mentioned some features of the language that are mostly obsolete, but they're still there, probably for backwards compatibility reasons. There's a lot of, I think, things that were introduced to C++, like intricacies that don't necessarily exist in other languages, things like operator overloading, multiple inheritance aren't super common in other languages. I understand why they're there and I've used them, but they lead to some challenges as well. What about virtual inheritance? Can you explain virtual inheritance? Have you ever actually used that?

[0:33:58] AS: Yeah. Let's introduce it by saying, there's a lot of features that are there for backward compatibility. C++11 introduced type aliases. But C++ kept the C way of introducing type alias, which is a type def. We don't remove those because of backward compatibility. Things that are broken in the language that are introduced in C++ usually get removed. Like, auto pointer was a smart pointer that was part of C++98 and got removed and C++ removed dynamic exception, dynamic exception specified from the language. They try to remove things when it's introduced by C++. They don't try to remove things that even if it's no longer relevant, break backward compatibility with C.

Now, there's a lot of features that exist in C++ due to its original design. Originally, C++ have multiple inheritance. It's a tool. You should use it carefully. But it's useful in some places. When you have multiple inheritance, you can get a shape, which is a diamond shape. Let's say, you have a base class A, you have B and C that inherit from A, and you have D that inherit from B and C. You have a diamond shape inheritance. When you have this pattern, D is basically inheriting A twice. You don't have this problem with other languages, because there's no multiple inheritance.

Once you have A twice, you have all sort of problem, like ambiguity, because which A are you referring to? Is it the one that comes from B, or the one comes from C? To solve this problem, we have something called virtual inheritance to say, yeah, even though I'm getting A twice, I only want it once. Why is it useful? I mean, if you are starting to write a code from scratch, I never use virtual inheritance. It's one of the features that in case of a diamond shape, you can raise a compiler ever and not accept this feature overall. But it was there from the beginning, so removing it would break users that rely on this feature.

There are use cases for this feature that you can argue they are not good use cases, but there are use cases for this feature. If there's codes that you cannot modify, that's an exception. If you want your new exception to be able to be code by the two handlers, you have to add a new device to class that would then have it from the new, the two exception class. I'm sure this example wasn't very clear, but there's cases in your code, and while coding where you have a lot of constraint, you cannot change this code. This code comes from a library and virtual inheritance can be a workaround in your toolbox that you should probably not use, but you can use.

[0:36:42] SF: What about friend functions? I don't think that's something that you typically see in other languages either.

[0:36:48] AS: Yeah, friend function is still something that you don't typically see, but friend function is more useful than virtual inheritance. Friend function is also part of the design of the language, because if you look at Java and C#, they all have this hierarchy where you have a base object that every class inherit from, that define things, like how to prevent the object, how to hash the object and things like that. In C++, we don't have this, which is good for performance. That's the reason why you don't have a common base class.

Other languages has something also called package private, or package visibility. You have private, public, protected and package private, I call it. C++ don't have this. Friend would usually solve this problem. Let's say, you have a factory pattern, for example. In Java, I would put the constructor as a package private and I would create a factory class that would call the constructor and you can only call the constructor from this package, or from the unit test, for example, of this package.

In C++, you can either call it private and then you cannot instantiate the class from outside. Or you do it public, then the user might misuse your interface and don't avoid using the factory pattern. Then you would solve this problem by calling the factory class as a friend, and then only the friend can access this constructor.

Another case is operator overloading. Usually, operators are declared as a friend, then defined as a free function to access the internal field. For example, how to print a class in C++, it's commonly implemented as a operator less than, less than, which is a friend function to the class that have access to its internal field. While in Java, you would implement it as two string methods that inherit from the basic class object. Friend is a unusual design. But if used carefully, it's not really something bad per se.

[0:38:58] SF: Yeah, absolutely. Then, are there other features, things like C cast, or virtual inheritance that are basically obsolete in some fashion from people who are writing new code in C++, but are still technically available?

[0:39:13] AS: Yes. If we go back to the hate-love relationship is that even if you learn more than C++, you have to still understand all the C construct. You still have to understand how function pointer works. You still have to understand C cast, which I would argue that there's no use case for if you are using the C++ explicit cast, like static cast, dynamic cast, to interpret cast, const cast. All of these things replaced C cast, but it's still there, because of backward compatibility.

As a C++ developer, I don't love the fact that I still have to remember how all of these things work, even though I never use them, because some other colleagues might use them, or you have another code base and you still need to understand them. In general, there is in C++, and objectively, C++ is a bloated language. There is many things that you need to understand, but you should never use. C cast is one of them. We talked about virtual inheritance as another one.

There is 10-way of initializing a variable in C++. You don't have to remember how each one works, if you use always the same way. Using type alias and type def, you should use type alias and never use a type def, or more at least.

[0:40:29] SF: What about defined versus constants? Sort of the way of defining a constant in C supported in C++, but there's also, you can define something as a constant.

[0:40:38] AS: Yeah, so C++ has a C-preprocessor, so you can use define. It's never a good idea to use define The use case for define is that, depending on the compiler, the compiler might define different variable, depending on the architecture, or target architecture that you are building to. That's in term of use cases for the define. But as a software engineer writing your code that is typical C++, you don't need to use a preprocessor. Usually, the recommendation is, if you can avoid the preprocessor, you should avoid the preprocessor.

In C++98, in most cases, you should write a const variable, instead of a define. Unless, the use case for it, if it's platform dependent macro. It's not really a variable. It's a variable that is defined differently by the compiler, depending on your platform. We still have a lot of old code bases that use define, but that's something that should be refracted. Even in C++11, we have constexpr. It's a compile time variable and it is used now for the global const variables.

[0:41:50] SF: We've been talking a lot about some of the modernization that's happening in C++. What's next when you look ahead in the landscape of C++?

[0:41:59] AS: There's two parties. What is next in term of compiler support? Because there's features that were standardized that not every compiler implement yet. Like, models is part of C++20, but it's not really yet fully supported by all the compilers. It's supported by MSVC. Clank is on the way, but people are looking forward to use this feature. That's a reminder for the committee not to - when they define things, it doesn't mean that it's going to happen directly after compiler need times to implement this language feature.

As a language feature, most people are looking at things, like reflection. Reflection is something that exists in every language that we don't have. Contract, contract is a feature that we hope to get in C++11, but we didn't get it yet. Contract is a way of saying that this function, these are a contract of the function. For example, if you have a variable, you can say that I expect the input of this function to be always positive. If it's not positive, the contract is broken. For example, a bullet is called, or something similar to this.

Contract are very useful, because even they help static analyzers. If a static analyzer can see the contact of every function, it can help it optimize the way it is simulating the program. Reflection, contract, pattern matching. Now we have in Python pattern matching, but we still don't have it in C++. In terms of library, networking is a common thing that you would need a separate library to use in C++, but it would be great if we have it in the language standard.

[0:43:37] SF: If networking was built in as a center tool, do you think that more people would be building the backend of web applications in C++, similar to how, like Go has become a very popular language for building web backends?

[0:43:49] AS: Not necessarily. I don't think if you add networking, C++ will be popular in networking. It's mostly, if you have a small networking task that you want to do and you are already using C++, you don't have to introduce another language, or another library to use it. Also, if you are in university and you are learning about networking and you are already familiar with C++, you don't have to learn another language to get introduced to networking. For C++ to be competing with Go and other languages and networking, it would need much more than a new library.

[0:44:24] SF: Yeah. Well, Abbas, thank you so much for being here. This was really interesting. I think there was a lot of stuff that we covered on the static analysis front, as well as a lot of the history with C++ and all the fun intricacies of the language.

[0:44:35] AS: Thank you for having me. It was a pleasure.

[0:44:37] SF: All right, cheers.

[END]