Fedora Podcast Episode 03

Langdon White talks about Fedora Modularity

Eduard Lucena:

[00:07] Welcome back to the Fedora podcast. This is episode three. I'm here with Langdon White, leader of the Modularity objective in Fedora. He is going to talk about what is Modularity. Welcome, Langdon.

Langdon White:

[00:20] Thanks for having me.


[00:22] We have this thing called Modularity that we are trying to push really hard. We want you to explain what is Modularity?


[00:31] If you've ever used a Linux distribution before, you know that there are many users who run into what we often refer to as too fast and the too slow problem.

[00:42] What that means is for, let's say, a developer, they want the bleeding-edge version or the cutting-edge version of some toolchain, and it may not actually be in the distribution yet because it's not considered stable.

[00:58] That is a problem for the developer, because that means they have to go hand build those components and deliver them themselves, and then maintain them, etc., and then hope that the version that they decided to settle on is the one the distribution picks up the next time the distribution changes usually major versions.

[01:19] That's the too slow problem. The too fast problem is when you have your application, and it's running on some kind of current version of some toolchain, and the distribution leapfrogs you. In other words, the distribution releases a major version. Now they have the newest version of that toolchain, and now your stuff doesn't work. What do you do?

[01:41] Right now, there's lots of different ways to try and solve that problem. One way is to pin to an old version. One is to go manually hand build stuff. Things like copper start to relieve this problem a little bit. Depending on what you're trying to accomplish, maybe in both directions, usually more forward than backward.

[02:00] This is the crux of the problem that we're trying to solve. We're also using the solution to that problem to try to also simplify what we understand about the overall distribution, and components of it, and how it comes together more generic. Let's put it less tightly coupled to, say, straight RPMs.

[02:27] We have a very defined delivery process for the RPMs distribution. We need to be able to deliver other things more simply. Things like containers, or things like the different editions of Fedora. We get locked into how we deliver those things based on the way we traditionally build Fedora over time.

[02:47] Modularity allows us to have more flexibility in our build of the structure as well. Those are the two big aspects of it. There's a lot of side benefits as well. For example, it simplifies container construction for individual users, and things like that. There's a lot of niceties that happen as a result.

[03:05] That's roughly what Modularity is.


[03:09] The first thing that comes to mind when you see Modularity is modules. It's related in some way? We have modules that work?


[03:19] Modules, yeah. The reason it's called Modularity is because what we're trying to do is instead of looking at things that you are interested in at the RPM level as a user, when you look at it at the RPM level, it's very, very small. It's very hard to understand that, what you need at an RPM level.

[03:40] We have various mechanisms through RPM to try to bring those other levels or 50 to say we do things like metapackages, or yum groups. That's how we try to solve some problems about discoverability and ease of use.

[03:57] If what you actually want is an application, you don't want an individual library, you want Apache, you want Firefox, you don't care whether or not it has library X, Y, or Z. Most users are in that condition.

[04:10] Maybe there are some, when you have a developer use case, maybe there are some edge cases to that, but generally speaking, even a developer, when they're trying to do some sort of web application, they want their Apache, and they want MariaDB, and their PHP.

[04:27] They don't generally care about the details within that, except to maybe add a few libraries. What we do is we try to have a description that is kind of like a yum group and kind of like a meta package, but tries to fall in between, where you just grab this thing called a module.

[04:44] A module allows you to say these are how you get this application, or how you get this larger component. It is made up of RPMs, but it describes the whole thing, as well as version of those RPMs that go together so that they continue to work.

[05:01] When I say I want Maria 10, then I get this set of libraries, but if I want Maria 5, I get a different set of libraries, so that we know they can work together.

[05:09] Then we can couple that into action with another Fedora objective into the Fedora CI infrastructure, so that I can say this version of Maria, the module, we know that all these versions of the RPMs work together, because we've actually tested them all together.

[05:26] We know that as a unit, they work together, and we don't necessarily need to test them as a unit against everything else. We only need to say Maria is going to provide, say, a service, like database service on some port maybe, and then that's going to be publicly exposed, but that the internals of Maria can be left to the Maria module.

[05:48] Those are two aspects which make things a little simple. The other thing that we added into modules is the ability to specify what we call install profiles. For example, there's a RPM package out there that will let you install Apache so that HTML pages are served out of your home directory, rather than out of /var/www .

[06:11] I can't remember the name of the package. I can never remember the name of that package, so I have to go find it. What I want to go find is Apache or HTTPD, and I want to just say I want to install that for development to actually contribute to the HTTPD co-base, or I want to develop HTML pages, or I want to run production.

[06:31] Instead, what we do is actually have these what we call install profiles on individual modules so that the end user can just say, "I want Apache, and I want it for this use case." Then they will only get the packages installed in a way that that use case supports.


[06:48] If I'm understanding well, the idea is to have a minimal installation for each use case, try to not bloat all of this with a lot or bunch of libraries that don't serve for me, because I don't develop the HTTP, the base code.


[07:03] Right. That one is aspect. The other aspect is to actually get the library. If you go and get the MariaDB RPM right now today, it's probably a meta package. I don't know off the top of my head, but probably it's a meta package that will give you a minimal production install.

[07:19] What if you don't want a production install? You want to have a development install. Or you want to have it for...maybe it's just going to serve data down your local machine. It shouldn't have any public accessibility at all.

[07:31] Those are very difficult to do with RPM, and without having to create a whole bunch of RPMs that are different meta packages for these different use cases which you can do, but the only way that you can indicate the information of what use case it is, is if you embed it in the name.

[07:45] Then you have to have RMPs call MariaDB for production, MariaDB for my local desktop, MariaDB for creating the new Microsoft Access, whatever. You have to encode all that information in the name. You have to individually manage those RPMs as RPMs.

[08:04] What we're trying to do with modules is simplify that both on the user experience side so it's discoverable, so you don't have to know what all the different names are, but then on top of that, from a maintenance burden perspective, they're often very, very similar between each other.

[08:20] Maybe there's one different package that has to be installed between the different profiles. The maintenance of that is much simpler.


[08:27] The idea is to simplify both management and usage.


[08:31] Right.


[08:32] I get it.


[08:33] Let me add to that too with the caveat that we want to make it as transparent as we possibly can. In other words, with the way we're designing and developing Fedora Modularity, you will never know that Modularity is there unless you want to go after one of those special use cases.

[08:55] Once the Modularity bits are released, if you type dnf install mariadb, you will get just what you expect as if you had installed it in the prior version of Fedora, because it's transparent. Then if you add a little more to your syntax, you can start to walk into those special use cases.


[09:15] That's neat. Is there an onboarding process to work with Modularity?


[09:23] To work with, like to contribute to actual Modularity, like the team? Yeah. It's pretty lightweight. There's multiple aspects. If you want to actually contribute to the Modularity project itself, what we recommend is you go hang out in #fedora-modularity on IRC, or come to our weekly meetings.

[09:44] We have a Modularity working group meeting every Tuesday at...I know it's 10:00 AM Eastern, but I can't remember, there's UTC. It started today half an hour ago, so 15:00 UTC.

[09:58] On alternate Tuesdays, we have what we call office hours, where we hang out. We promise there'll be a bunch of people floating around if you have questions about Modularity. At the same time, it's alternating Tuesdays. That's if you want to contribute.

[10:15] If you want to go and create your own module, there is a process identified on the Wiki for how to go and request a module, and then create it just like the package creation process. Hopefully, it's simpler. If you have any questions about it, come to the office hours, come to the IRC channel, and there should be people are around to help. That usually works pretty efficiently.


[10:36] We have a previous attempt to ship a Modularity version. Why in that time it failed? What failed that time?


[10:47] Let me explain the problem, and then what was interesting about it. For F27, we wanted to ship Fedora Server as a fully modular server.

[10:58] What we discovered is that to identify and maintain the low-level bits of the server -- what we refer to as the platform module, and the host module, and then a bunch of related modules, the guts of the operating system itself -- we found those modules were very difficult to maintain.

[11:19] Required any package owner who owned them to opt in, to participate whether they like it or not, and by extension, because that had a lot of problems and was difficult to maintain, putting modules on top of that kept getting delayed and blocked because the platform components were changing.

[11:38] There were a bunch of problems there. We decided to do that was too much risk and required a lot of commitment on behalf of the whole community -- Fedora package community, specifically -- on them contributing into this baseline set of the last modules.

[11:58] What we backed off and decided to do instead was leave the existing repository as it is, and then base the application part of the modules on top of the existing repository. Then anyone who wants to come along and say, "I want to make Postgres available as a module," then they can either keep the Postgres RPMs in base if they want to.

[12:24] If you are in the know on how Fedora is distributed, there is a big repository called everything. Then you have updates and stuff like that. You can keep Postgres, the RPMs, in the everything repo if you want, and then have maybe have one or two other versions available of different use cases or whatever of Postgres as modules in a separate repo.

[12:47] You can even remove Postgres altogether from the everything repo and include it via modules if you so desire. That's what we're planning to ship for F28. If you think about it, we're going to have two sets of repos. One will have modules, and one will have traditional RPMs.

[13:05] Our hope as the Modularity team is that the traditional RPMs repository will get smaller over time, while more and more stuff moves to modules. In this way, we don't have to force that to happen all at once.

[13:18] What I think is particularly interesting about this problem is that that was the goal we had always from the get/go, but we never figured out how to do what I used to refer to and have referred to in multiple talks and publications about this as the everything else module where people could just have their content there and not have to participate in Modularity per se until they were ready.

[13:43] We could never figure out a good way to do it, because everything else modular had very similar problems to the OS modules, because it's very big and hard to maintain. By giving up a few nice-to-haves on how the Modularity infrastructure works, we're able to layer our modules on top of the everything repo.

[14:04] That's a good compromise for everybody involved, because it still satisfies the goals of giving end users multiple versions of things being available at one, but also satisfies the goal of keeping Fedora running strong as it does without having to disrupt that.


[14:24] I found a problem like that, similar with the thing you're trying to solve. I was trying to install a package called qutim. It's a IRC client. It requires me to have a previous version of the hunspell library.

[12:40] Maybe with Modularity we can try to solve these kinds of problems, you have qutim with his version of libhunspell and the core version of libhunspell to other programs.


[15:22] That's exactly what we're trying to solve. The problem obviously is now we have two versions, what you say, hunspell?


[15:31] Hunspell.


[15:32] Two different versions of that now floating around and having to be maintained in Fedora. We do need policies around how often or for how long we're going to allow that to happen. That does introduce some difficulties, but at least the technology won't be the reason why you can't do that.

[15:22] We can make individual policy decisions around individual pieces of software, or as Fedora community, or whatever. Right now, whether we want to change that policy or not, we don't have that option. Technically, we cannot do it. Well, it's not technically true. We can do it in a bunch of really ugly ways. We generally try not to.

[15:45] This way, we'll have a clean way to do this, an understood way that builds through our infrastructure properly, and then we can decide on visual basis whether it's a good idea or not.


[15:59] I think this ease the way for users. For examples, because I'm inside the project, I know a lot of stuff. I have a bunch of IRC clients maybe don't be able to install qutim is nothing I have hexchat, I have anything.

[16:15] If we have another user that he googles IRC client, and the fifth one goes qutim, and he try to install and can't install it in his machine, he is just going to be pissed off. We try to make the life easier for the users. That's really, really good for the project.

[16:37] I'm not a developer myself, I used to be. This kind of seem related to concept that is managing python developer that is called Virtual Environments. This is kind of this.


[16:56] Before we decided to do the podcast, we've talked a little bit about this before. The symptom of too fast, too slow is a well-known system in the distribution world, in the Linux distribution world. There are numerous attempts to try to solve some of those symptoms.

[17:16] If you are a Python developer, you might use one of the environment tools -- there's actually a new one that's popular right now -- to try to solve that. Basically, you can say for this project, I want to have these libraries. For that project, I want to have those libraries.

[17:31] Ruby has an equivalent construct as well, a bunch of other languages. A number of years ago, software collections, today in CentOS and RHEL, software collections also tried to solve some of those problems by using name spacing to move the libraries away to do something else.

[17:55] Off the top of my head, I'm trying to think of some others. There has got to be, when I have my notes in front of me, there is probably 30 to 50 attempts to solve this problem from various different perspectives. They're making tradeoffs depending on who is most important to that use case.

[18:15] A Python developer is fine with -- I'm trying, but I can't remember the name of it -- that toolchain, but if you were a system administrator with a similar problem, you want your muscle memory to work. You want it to be DNF something or another to make this work.

[18:33] A software collection might actually solve that better for a system administrator because you're doing tradeoffs based on what you already know. What we're trying to do with Modularity is take your muscle memory as any kind of Fedora user, and your muscle memory works for how you might go about solving that problem without having to use specialized solutions for different use cases.

[18:58] It's particularly problematic, and the reason why I can't think of the Python one is for what we refer to in the developer spaces, polyglot developers who jump languages all the time. Having to remember the syntax for each of these individual tools is very difficult.

[19:13] Every time I need to go and use one of them, I have to go look up how to use it again. When you switch like that, and you can't just rely on your day-to-day operations, it's much harder.

[19:24] Like I said, there's many, many different solutions to this problem that have occurred over the years. Compat libraries are also a similar solution to this problem. There's even been proposals of doing what we've heard of as name mingling. You have RPMs that have complicated names. There's tons of them.


[19:43] I read somewhere something called the Boltron Server. Is this related with Modularity on some way or in what it is?


[19:51] We've been doing this for a few years now. The F27 server, they had a lot of publicity around getting it into mainline Fedora. The prior version, we wanted to do a prototype. That was the Boltron Server. We decided to call it Boltron. It was a joke.

[20:08] It's a portmanteau of this old cartoon in the US called Voltron that has been redone again on Netflix, which is about these five cats that are robots that are driven by people -- they come together to build Voltron, which is this big robot with a big, huge sword -- and bolt on, which is a term in English for when you attach something to the side. You bolt it on.

[20:39] It's a portmanteau of Voltron and bolt on. You bolt on a bunch of things, and it's better than it was before, and it's powerful. We have Smooch to thank for the name. It came up in a meeting, and it was a joke. That's what the Boltron Server was, a prototype version of what we wanted to do with Fedora Server.


[20:57] You have something that you want to share with the public, with the people that is interested in what it is like a user, or what it is from the perspective of development you want to share as a final thought?


[21:09] Definitely check us out. If you want to follow along with what we're doing, we're trying to publish to the community blog under the Modularity subject. We also try to publish to the council periodically on what we're doing.

[21:23] Check out the council meetings when we're going to be presenting or look at some of the prior presentations. I've also given a number of talks in various places. I'm actually giving one at Scale this weekend. Definitely engage.

[21:36] What we need is feedback. What we need is help. We really believe in our solution. We really think it's going to make a lot of people's lives easier. Obviously, there's some bumps to get there. We really hope you are interested. We hope you follow along.

[21:54] Reach out to us and give us your feedback. That's the biggest thing we want.


[21:59] Thank you, Langdon. That's all the time we have for today. We expect to have another episode released in two more weeks. Thank you everyone for listening.