#5 Nov 06, 25

AI Meets MLOps: Making Sense of the Mess

Rohit (00:00)
So hey, everyone. Welcome to another episode of AI in DevOps podcast. I'm Rohit, your host, one of the co-founders of ⁓ Faces.cloud. So we have had quite a few episodes now in this podcast where we have talked about how the advent of LLMs have changed our lives with respect to cloud operations, security, in general, day-to-day development, et cetera.

⁓ Today we have a guest with us who can shed light on all of that and some more, a domain that we have not really discussed on this podcast, which is MLOps and ⁓ I'm sure LLMOps also. So I have with me ⁓ Görkem Ercan who is a CTO at Jozu. He comes with over two decades of experience in the software industry. And with Jozu, he is building a DevOps platform for AI agents,

applications and models as well. So thank you for joining us Görkem. Welcome to the podcast and maybe a few words about yourself for the audience.

Gorkem Ercan (01:00)
Yeah, thank you. Thank you for having me. So I'm the CTO of Jozu. Jozu is a company that builds tools for AI/ML DevOps platform engineering. ⁓ We also have a platform, as you said, for AI applications or agents as...

Cool Kids likes to call them nowadays for the governance and management of those applications and models and all other AI artifacts that you can think of. And as you said, I'm old. I have been in the industry for more than 20 years and most of that was doing open source.

Rohit (01:38)
Yeah. That's great, Görkem And going through your profile, one thing that stands out is all of the contributions to the Eclipse Foundation and also having written some language servers there as well. So are you vibe coding on Eclipse?

Gorkem Ercan (01:55)
No, I actually am not. I haven't used Eclipse actively for a while now. One of the things that we did was started like we were one of the, I think we were the second ever language server, the Java language server that was, it is brought to the market.

And I was the implementer for that, the initial implementer for the Java language server. Since I did that work, I pretty much never worked on Eclipse actively. Of course, I do have an Eclipse installation which I use, but I've gotten used to VS Code after that. Yeah.

Rohit (02:33)
Got it. So then from that answer I do take it that you do vibe code.

Gorkem Ercan (02:39)
I do, yeah. Well, it depends on how you call, what you call a vibe coding, right? I am not sure if I'm vibe coding from the definition of what people call vibe coding because it's like, I do not think, or at least I'm not good enough at vibe coding that I don't get good results while just explaining

things to AI for pages and pages. Like there is this whole idea that you will be able to write definition documents, DRs and requirements and then give it to the AI and voila, you have a working software. I haven't been able to do that. But when I'm really good at

dividing my own work into smaller pieces. And the smaller pieces AI is able to handle very well. You need to be very specific with your prompts. You need to also say, hey, you know what? Do this, but do that to anything else. Because all of a sudden, you have these additional features that you were like, "Why is it

there?" I didn't really ask for that and I don't think anyone will ever use it. So I actually have a prompt that tells the AI that your job is not to predict the feature. So don't add features to it.

Rohit (04:02)
Yeah, so I think yeah, well said. I think planning is definitely one of those things that is important. Sort of AI itself has pointed it out to us because you see most of these coding assistants have a to-do list or a ⁓ problem breakdown sort of prompt engineering done to all of them. So probably we do also need to do that , not probably, we definitely need to

plan out what we need to get implemented and then, I think, take AI assistance.

Gorkem Ercan (04:31)
Yeah, I mean, one of the things that I think that changed for me a lot with the AI coding agents is now you can get the quick prototype. So you're not afraid of getting the quick prototype. So I get involved or have to talk to see a lot of open source software, whether as part of my day job or as part of Jozu

product or just because I'm curious about open source, right? So you come up with an idea and you're like, yeah, maybe if I can do this, this product, this open source project would be a good fit to work with. And you want to try if that is possible.

And at that point, you don't really care how the implementation is done, whether it's maintainable and up to the standards. You just want to see if it works. So for those cases, I actually use AI agents a lot and usually get something that either works as I hoped it would work or don't get anything that would

work and you're like, yeah, I guess that's just not gonna be like the end result is not what you thought is that, you know, it's not giving you what you need. Then you just abandon that idea or at that point you see that, yeah, there is light at the end of the tunnel. Then you go back and do the proper implementation for it. You may still use AI, but now you need to have

more prompts and more rigorous testing.

Rohit (06:00)
Yeah, makes sense. Yeah, so I mean, it definitely is a cool thing to do prototyping with AI. And one thing that I want to pick your brains on ⁓ is I'll give an instance from my own experience where we had to build sort of a CLI tool. It does a bunch of, know, small stuff, very predictable stuff.

And it had to be all self-contained, so decided to write a go CLI over there. One experiment that I did is I asked the AI to go ahead and ⁓ write some BDD test cases, the behavior-driven ⁓ test cases, so that the test cases all are in natural language, so that me or any of the product engineers can go and contribute or review and make sense out of it, and then start off with the implementation.

Gorkem Ercan (06:33)
Yeah.

Rohit (06:44)
Funnily enough, then after doing that, I sort of did not really care about how the code is written, the Go code itself, ⁓ the maintainability of it. As long as these BDD test cases pass and these test cases make sense to the product owners, do I need to care? So this did really pose a question to me in this day and age, especially like you also run a startup. What is it? mean, so do you think the...

Gorkem Ercan (06:52)
Hmm.

Rohit (07:10)
the appeal of designing or working with, let's say, an idiomatic way of writing a CLI or something like that, is that less important in today's world?

Gorkem Ercan (07:24)
It depends on context, right? We do like our project, our OpenSource Project, which is part of CNCF is called KitOps, right? And KitOps has CLI in it, the Kit CLI, which is a Go language CLI, we mostly we write it ourselves. We do get

help from AI here and there, like at the end of the day, the contributor team, the committer team needs to own the code. Like AI cannot own the code when you are doing open source because there are other people, there are contributors. cannot expect...

AI to do the contribution. Like there will be real people who are contributing to the code. The other problem is if you are implementing something very first time, like then AI actually doesn't know how to implement that thing. So it's like, it needs to know, there needs to be some knowledge coming in from somewhere else. So we actually run into that very early in the KIT CLI because there wasn't too many examples of what we were doing.

Rohit (08:16)
Yeah.

Gorkem Ercan (08:32)
it found it hard to find any existing context, ⁓ to be honest. So, in that context, I think you do care because it's an open source project. It's an open, maintained project. If you're contributing to an open source project and we get that a lot, there's a lot of people who actually just tell the,

AI agent to implement something and then they send us a ⁓ PR. The first thing that we ask is, did you actually try this? Because we have to try it. We can't put that into the open source project. We try it and we were like, this is not working. So like you actually run into this situation where people are.

Rohit (08:59)
yet.

Gorkem Ercan (09:16)
generating large amounts of code through AI and then submitting them to the open source project. And that's actually not helpful to the open source project because the contributors and committers in the open source project have limited time. And you can still generate your code, but you need to make sure that it is doing what it says it is doing. And AI is very chatty, very verbose.

make sure that it is only implementing what it is supposed to implement. And then you submit the code. And then when the code comes into the open source project maintainers, they don't have to like go through this 2000 line of code just to figure out that, it's actually not working. So we actually had a couple of these and we were actually discussing like

the Hacktoberfest is coming. KitOps is participating in Hacktoberfest since we have been doing that since we started the project. So this year we were, we are planning to participate in Hacktoberfest as well. And the concern on our side was,

Let's see how it will look this year because every year it changes, right? if we start getting this very large AI generated and not really tested, not really, ⁓ weathered code, then it will be really, really hard for us to do that Hacktoberfest next year. But you know, this we're, we're doing it.

Last year we did it. We actually met really good people who have done repeated contributions to KitOps So hopefully we will get to meet new faces this year as well.

Rohit (11:06)
Awesome, awesome. Yeah, so I had never really thought of the headaches that AI generated as code is posing to open source maintainers. Yeah, that's an interesting point to say the least. Probably contributions.md file or, I mean, that sort of thing now has to account for an AI agent and have some instructions.

Gorkem Ercan (11:18)
Yeah.

Yeah, we are adding those instruction files as well. But as with anything AI, there is no standard in that as well. I think AGENTS.md is trying to get there, but it's not there. You have AGENTS.md , CloudMD, and then what is it, copilot-instructions.md

I'm missing some other MDs there or YAMLs there probably. So that also is an issue, right? It's like you have these instructions for coding agents, but you can't just have one. Like we are able to put, let's say the edit configs, right, into it. There's just one. You want tabs or spaces. You can just define that. But there isn't anything like that for agents yet.

Rohit (12:15)
Yeah.

Gorkem Ercan (12:18)
Hoping that there will be one day. But with the way that industry is usually it takes really, really long time to get that to a standard.

Rohit (12:27)
Yeah, definitely. So that was one thing that I did have in my notes that I wanted to discuss with you on standardization, especially in terms of how models are packaged, how agents are packaged, probably we will get there a little bit down the line in the episode.

Quick introduction on what else, other than development, where does AI fit into your day to day? Like how has it changed your day to day life as a CTO? What other AI products is in your arsenal?

Gorkem Ercan (12:59)
So we do have, we actually have a lot of generic AI products that we use. Obviously we, for some of the things we actually are able to run our own LLMs, open source LLMs. So, and then we do have access to, for everyone, we do have access to cloud, co-pilot.

I think we stopped the OpenAI now, but that's just because claude gives us better results for what we are trying to do. What we do is there are different things that we do with them, right? It's like almost, I don't want to say agents, because I think that the people understand that a little bit differently.

But we do have small AI applications that do very specific jobs for us. For instance, being able to collect analytics and create analysis out of it is important to us. So we do have a mechanism that collects

our analytics from GitHub and our websites and so on and so forth sort of makes a correlation and sometimes allows us to see some of the patterns that we are not seeing. So that is something that basically works at the background for us and compared to the past

It was very easy for us to get that online and working. And the big difference was we didn't have to implement an ML model to analyze that data, but rather we were able to tell the LLM or prompt the LLM to be able to analyze that data. And that's a big difference for us because.

Training an ML with your data takes time and well, it's expensive, but it's not substantially expensive if you are very specific on the goal. So, but what we are able to do with the LLMs is now we can actually, we are not actually have to train them, but we can just

either prompt them or in some cases you can do a little bit fine tuning and be able to get them to a state where they are able to, you know, create these reports and give you patterns that you are missing on your data. So there's one example. Another one that we are working on is

A lot of the code review is...

Rohit (15:31)
Mm-hmm.

Gorkem Ercan (15:32)
Simple stuff, right? You have certain criteria in your code, in your code base where you want those fulfilled, right? And we do have, again, I don't want to call it an agent, but it is a task that goes and executes a series of prompts on the code base when a PR arrives. So we are working on that. I don't think we are at 100 % yet, but...

we are getting close to where we want to be with that job so that we have a way of removing the easy stuff from the tables of the developers. And then they are actually able to concentrate on things that require developer attendance when they are doing the code reviews.

For us, the code reviews is more like, how do we have a common understanding of the code base? So that's one. And the second one is how do we have a common understanding of the architecture and we stay within the architecture that we want to have. So I think those are the main things that we want to go after. And, you know, typos.

they should be done before it reaches the actual human code review.

Rohit (16:50)
Yeah. Awesome.

Gorkem Ercan (16:51)
And that's what we

are trying to do.

Rohit (16:53)
And so just out of curiosity, so do you use self-hosted open source models even for these two links or do you rely on some of the vendors?

Gorkem Ercan (17:03)
The coding agent is using claude well essentially the claude code with a lot of prompts. The analytics is using a self-hosted LLM.

Rohit (17:13)
Yeah, so I think the coding ones are, think, difficult to beat Claude at the moment, at least.

Gorkem Ercan (17:19)
I'm pretty sure you can. It's just not worth the effort at the moment. Like it's like we, our main job is not to, our main goal is not to, you know, to provide a coding agent. So if we were working to create a coding agent, we probably would.

Rohit (17:24)
Harder.

Gorkem Ercan (17:40)
have a coding agent that gives good results to, again, for that specific task, by the way. Like cloud code does a lot more. So that's different. But for the specific task of, ⁓ doing quality engineering, you know, security

reviews and this and that, because that's how it works for us. do like QA review, security review, performance review and all that in an order. So for those specific tasks, I think we could get to a state where we are getting similar results to claude code, but that would require us to work on it more than we wanted to.

Rohit (18:22)
Got it, got it, makes sense. So yeah, so before we head off to LLMOps or MLOps per se, I just was curious in your opinion, where do you see your conventional cloud operations engineers should leverage AI more or where do you see the most impact of the current wave of LLMs helping out the most in your conventional?

running a community cluster, having an applications up and running on the cloud. Where do you think in that journey, LLM has the most potential to disrupt?

Gorkem Ercan (18:56)
So yeah, good question. We are a Kubernetes shop. All of our stuff runs on Kubernetes. We do ship our product as a, you can put that in on-prem, on air-gapped environments on Kubernetes as well. So I guess our team is not your typical team because, and some of us have been,

with the Kubernetes for a very long time. So we do have a knowledge about how it works. But one of the things that we seem to be getting very good results with is the log analysis. So, you know, there are two things that happens when a Kubernetes cluster, right? You get the logs and then you get the metrics, you get the signals.

So one of the things that happens is when you get the signals, how do you determine the cause for the signals, right? That's always the investigative work. We have been getting really good results from log analysis. But in, to be honest, we haven't yet tied our signals to log analysis. So it's a kind of a manual process, but

Once we automate that, I have a feeling that we can just get an email saying that, hey, you know what, this is probably what's happening right now in your production or in your stage or wherever it is happening. And then we can get that message and do what we need to do for it. But I think that's one way to look at it. I'm not a big... I haven't really...

experimented with, know, LLM tell me how to run a pod or scale a deployment because those things I, and most of my team actually knows. So we never, we haven't done that much or we haven't tried to automate that much because we are able to do that already. So we, some of us actually worked on doing that for a decade now.

So therefore we haven't utilized LLM for that purpose, but we have utilized it for log analysis. And I think, you know, it will be a good idea once we tie the signals to local analysis at that point, we can even get more automated, you know, remediation processes and so on.

It's, again, I think the one thing to keep in mind is AI is a tool. So it's, and how useful it is depends a lot of your context or what your situation is. In our situation, we have a team that is capable of Kubernetes. So it helps us a little less, but it, for instance, we were...

what, that was actually one situation where we had to help a customer with their cloud provider. And we weren't that familiar with that cloud provider as well. So in that case, LLM shortens things because it, and also makes mistakes as well. So it's like, it shortens things, but it's like, yeah, but it's telling me to run this command.

That command doesn't exist kind of things happen. But, you know, it is helpful. It gets you to a place where you are able to work with it more truly. So, but if you are in our situation where you are really familiar with the environment, then it doesn't help that.

Rohit (22:11)
Correct. Yeah, so I think interesting, I think you said log analysis and in our use case, I mean, in our experience, in very recent experiences that it also helps, you know

Gorkem Ercan (22:14)
Okay.

Rohit (22:23)
dumb down the error messages or log messages to the end user. So ours is a platform where platform engineers can build automations and plug it in. developers can use a canvas to build things out. And once in a while, developers do run into some corner cases with how the module is written or some invalid configurations that they provided. So that used to be a point of, again, and fro between your infrastructure engineer and the developer.

is able to you know sort of.

interject in between and explain to the developer, hey, you made this configuration mistake. That's probably why don't you try it out. And that seems to smooth things out as well. So this is something very interesting that we did recently where we gave it the tools to go fetch the automation source, go fetch the configuration that the developers provided, mix and match, and give him suggestions. So this was one experiment that we did. And we have very cool results of late with this.

Gorkem Ercan (23:02)
Thank

Yeah,

I mean, it's a lot of text, you look at like one of the things I think that has changed as you were talking, I can say one of the things that has changed for us is we are less frugal with our logging. We do more logging nowadays because we know that we don't have to read all of it when the time comes, right? So I think that's one of the things that has changed as well because

Now you're thinking, you know what, thousand lines of logs, I don't need to read a thousand lines of code, but then the system that reads it may actually benefit from this. So that also helps.

Rohit (23:59)
Yeah. So Görkem, so the next part is where I personally would love to learn things from you now. My very first question is, why is MLOps different than your conventional operations on the cloud where you deploy an application to the cloud? How is it different in practice? And why is there a complete vertical of products for LLM ops?

Gorkem Ercan (24:27)
Yeah, so, ⁓ that's a big question. Why is it different? The MLOps and AIOps, or in general, is different because of several reasons. The first one is complexity. So if you think about your DevOps,

Things are very certain, right? You compile the code, you can compile the same code 100 times and you get the same result. It will not be different. And you put that in a Docker container, your Docker container will have the same shot if, you you did the containerization correctly. So it's very deterministic. But you don't get that with.

Rohit (24:51)
Yes.

Yes.

Gorkem Ercan (25:07)
AI and ML. So I think that's one part of the reason why you have a lot of these experimentations that needs to happen for an AI application to be able to, or ML application as well. Let's talk about predictive ML as well. That is not very different in that sense to be at a state where you are able to ship something. Before you are able to ship something, you need to do experimentations. You need to do evaluations.

Rohit (25:14)
Mm-hmm.

Gorkem Ercan (25:34)
So see the things that we are actually adding before even we are thinking about production or inference time, right? So all of these things actually make things, make the whole pipeline much more complex, right? The second part of it is it's more challenging to go to inference as well.

The inference is more complicated in the sense that you have larger, like the inference runtimes are larger for LLMs, not that much for ML, but for LLMs, are larger. Your model weights are larger. And also, I don't think we have figured out how to do prompts well. Some companies have, but in general, I think the prompting

how to get prompts to production is not done well. Most companies basically just put them on a Git repository and ship that with their code. Is that the right way to do it? ⁓ There are all these factors that are going into it. So again, that's just the complexity of the MLOps and AIOps.

Rohit (26:36)
Mm-hmm.

Gorkem Ercan (26:48)
And then you have the other problem where all of these tools in the MLOps and AIOps space is implemented so that they solve a problem to until for 80 % of the problem that they are designed for. You have something like ML flow, which solves your problem of experimentation. You have another tool that

does pipeline management. You have another tool that does evaluation and so on and so forth. The problem is none of the tools talk to each other. They have no way of communicating or they don't understand each other's language other than they don't. So now you have a very hard to implement pipeline because every tool that you have used is doing something

on the way that they do it. For instance, MLflow has a very specific format of putting its metadata and then you go to next level, the evaluations have a very specific way of doing things. Actually, that is the problem that we are trying to solve with KitOps. We thought that, ⁓ one of these things that happens on the pipeline is along the way, all these things happen on these tools actually work really well.

For what they are trying to do, they just don't work well with each other. So we thought that we need an abstraction that they can actually use and move along their pipeline, all the way starting from training to inference. So that's how we started the KitOps project. Because we actually needed it ourselves. We needed an abstraction that will allow us to do all of those things.

But while we were doing that, we actually thought that, hey, you know what? We need a format that is friendly to lineage, that is friendly to attestations, provenance and all that. Because that pipeline starts with your data, goes all the way to your inference, and you need proofs and lineage and immutability on all these properties. Otherwise,

that pipeline in any part of the place is going to lose coherency and you will never be able to figure out what went wrong. Like for instance, the classic question of, this model, what was the data it was trained with? And the answer is, there is an Excel sheet somewhere that if the data scientists have written the correct data set

And then you're like, yeah, okay. So, I mean, there is so many things. And then none of that actually works with regulated industries. Like they need to have records showing that, hey, you know what, that thing that is going to decide whether you have cancer or not is trained with the correct data. So,

I think that was kind of our concern when we started. And that's why we said, OK, we need something that is standard so that MLOps does not remain very different from DevOps. I think that was the other part of it as well. Because right now, the artifacts, not right now, I think that is changing as well. But the

the MLOps started as, these artifacts are very different than DevOps. So you can't use your DevOps teams. You can't use your platform engineering teams because things are very different. There are existing DevOps and platform engineering platforms out there that is able to handle DevOps and platform engineering very well. You have talent that is actually able to do these things, but you cannot use all of this. You're basically telling your

data scientists to build containers so that they can go to production. So I think that was wrong as well. So we came up with the KitOps project. We implemented an OCI packaging, OCI-based packaging called ModelKits and the CLI. Our CLI is, and also a Python library as well. ⁓ Our CLI is very friendly to any...

pipeline that you're going to put it into. And the idea in there was, you're doing MLflow. Keep on doing your MLflow. But at some point, if you want to move your experimentation to the next stage, put that into modelkits. And then the modelkits are OCI artifacts, which you can push to an OCI registry, essentially a Docker registry.

So as I was saying, like it puts it into an OCI registry and an OCI registry has everything that you already, first of all, it's a very scalable infrastructure that you are probably already using. It provides you with R back. It is immutable. You can have...

signing, attestations, all of these properties are part of it. And the way that we designed modelkits is you can put your parameters, model weights, data sets into a modelkit. And it will, because it's immutable, it will keep the relationship, the lineage by default. You don't even have to worry about that because it's there. It's just part of the...

what works, you don't even need to think about it. And then you can move that to the next stage and the next stage. And the good thing about, for instance, the thing that with the data scientists, it's like you're now trying to train your data scientists to be proficient and creating container runtimes. Now they don't have to because they can train their model,

put their parameters and everything else into a model kit, hand that over to a DevOps team and DevOps team does what they do best, get things into production, right? So they can start with a modelkit, create a container out of it and then push that into production. And that's where they actually have had their pipelines before.

They already know how to work with OCI. They don't even need to re-implement logins to that platform because they have been doing Docker logins to OCI registries for decades now. So all of those things makes it easier for DevOps and platform engineering to work with AI/ML Artifacts. I think this is true for...

not just for models, but also true for prompts as well. Because one of the things that we see with the prompts is there is no isolation of production versus development. So you have some prompts in your Git repository, right? And then at some point, you are going to take that prompt and put that into production. But how do you get to that? Like when we...

ship code, we don't put that into the Git repository and then ship it right away. It goes through some testing, it goes through some, like there are steps depending on where you are, what your organization is, there are steps that are executed before even for code, something more deterministic like code. But with prompts, we're like, send it there, and we're gonna just ship it.

Rohit (33:39)
Alright.

Gorkem Ercan (33:58)
You can like no evaluations, no optimizations, nothing. And that's not the way it should be, in my opinion. I think we need to learn how to take, you know, we need to develop best practices for how to ship prompts to production as well. And I don't think the way to do that is, I'm going to just put that to get together with my code.

Rohit (34:03)
Bye bye.

Gorkem Ercan (34:22)
Another thing about prompts is you do have experts that can work with those prompts. And if you're keeping them in Git repositories, is that the most friendly thing that you can do? So I think that's the other thing as well. But just to tie it back to your original question, MLOps is different. It's more complex.

Rohit (34:37)
Yeah.

Gorkem Ercan (34:43)
And the tooling space on MLOps is more fragmented. That's why it is different from DevOps. I do not think it needs to be that different. And with KitOps project, we are kind of trying to help to level that a little bit by giving it an abstraction, a model, or an AI/ML artifact that will work with

your platform engineering DevOps tools and also your AI/ML tools.

Rohit (35:11)
Yeah, that makes a lot of sense now. So in fact, in my personal opinion, I often when I'm asked what should a platform engineer do, I often answer with design the right abstraction for the right persona in the whole process. So I think what you said about data scientists and the abstraction at which they should be working on.

perfectly resonates with that sort of opinion. Like I think if I go back and...

at the time when everybody was to talk about shift left approach with respect to testing or whatever. It's not just that, okay, hey, you guys do the testing or you guys do this part as well. It should be with the right abstraction so that they with their skillset and their knowledge can still be able to ⁓ do that. So it's not just dumping the responsibility on the other person ⁓ before you on the pipeline.

So yeah, that makes a lot of sense with respect to MLOps. And I think this fragmentation of tooling is something that, you know, the conventional cloud ops has also navigated maybe in the last decade. And we have reached some sort of a consensus on how things should work. At least Kubernetes becomes a, you know, common API that most clouds speak. So that helps. So this brings me to the standardization. So I think you mentioned about the standard OCI for

in a format that you are bringing to models. So what other pieces in this whole journey require standardization? And also how is, I mean, so you come up with the OCI format, where do you see the adoption? you see more models being offered in that format?

Gorkem Ercan (36:43)
Yeah.

So two years ago when we started, we were kind of alone and we were a startup. Actually, we were talking to a lot of the VCs at the time and one of the VCs, I still remember that, she basically said, so you wanna shoot the biggest elephant.

Because we were trying to go after OCI and trying to get, like we were trying to get an OCI standard for AI/ML artifact packaging. So what happened in the last two years? We came up with the modelkits and started building our product around it and so on and so forth. But one thing that happened is if you look at, for instance, Docker model runner, it's actually runs OCI.

It doesn't run modelkits, but we're talking with them, but it does run OCI-based. If you look at Ollama all of its models are OCI-based packaging. It's not modelkits. It's not a single standard, but it is OCI. So, you know, they do package things into OCI artifacts, not modelkits, but OCI artifacts. So what is modelkits in that?

ModelKits actually brings you what that artifact looks like. It basically says if you expect to see this media type, these layers, and these files in a model kit or your OCI artifact, and that's a model kit. That's basically our specification. And if you go to kitops.org, you will see that specification saying that, you know, if you see something, an OCI artifact,

that has this media type and these layers, and this is the definition of those layers, then it's a model kit, right? So it basically describes the manifest that you need to generate for your OCI artifact, essentially, if you're familiar with OCI. So ollama uses something, Docker model runner uses something. If you go to other tools like RamaLama they actually come with OCI

support . We support KServe, so they do support ModelKit directly, but they also have a more generic OCI solution as well. And then on top of that, a group of companies have reached out to us and they were thinking about, can we actually create a standard for this? Because ModelKit's

Like although we are part of CNCF foundation, but we weren't, we are not a pure specification project. We are an implementation project. But what they wanted to do is to get a specification out. So we together with ⁓ ByteDance, Red Hat, lately Docker started doing the model pack project.

So the model pack, the name is very close to our modelkits, and that's completely by coincidence, by the way, is a specification. And the specification, if you look at it and compare that with modelkits, is very close to model kit specification. So it basically specifies how an OCI artifact should look.

for AI/ML, independent of the tool that you are using. So at some point, you can imagine our CLI KITCLI to be supporting that, but also you can imagine hopefully one day Docker model runner supporting that or ollamas supporting that. Like at that point, your artifacts are very flexible. And then,

Let's take that a little bit down the further because we're having that conversation. What happens when your OCI runtimes support that cri-o and others support that. It means that you now have an artifact that is Kubernetes native. You can just like a Docker container, you can just say, Hey, run this model kit or model pack, sorry, that run this model pack and it will just run.

Rohit (40:30)
No,

Gorkem Ercan (40:41)
It's like a Docker container at that point. So I think that's where we are trying to do. For me, the current state after almost two years of our journey is OCI has won. I think OCI is kind of recognized by most in the industry that the best way to actually store these AI/ML artifacts.

Rohit (40:41)
Yeah.

Gorkem Ercan (41:02)
⁓ Industry is still deciding on what that artifact looks like. I think Model Packs is a very good specification. And Model Kits is a very good implementation of both Model Kits and Model Packs because Model Kits, Model Packs, they are very close to each other. So I think those are.

good implementations. At some point, I think there will be a consolidation of these different formats that people are using. But I mean, two years ago, when we said, we're storing things in OCI, nobody was like, why and how? And nowadays it's like, OCI, yeah, it makes a lot of sense because we need governance. We need attestations, provenance.

⁓ Regulated industries, they want to be able to keep the lineage. For instance, some of the MLOps pipeline implementers, they were looking for a way to have a single ID in their pipelines that goes through multiple tools. And we told them, well, you're putting all this effort to create this ID and then pass them through

different stages of your pipeline. But if you are using modelkits, that ID is your, essentially your container reference, your docker reference, right? It's the registry and the SHA (Secure Hash Algorithm) and all of those Git commit together. That's your ID. So, and it's unique and it will never change. So you can pass it around as much as you want. It will be the same.

Rohit (42:20)
Yeah.

Gorkem Ercan (42:37)
And you can retrieve it, do whatever you want with it, and then continue on to the next one. it's like they were like, ⁓ I think they were like, yeah, all that effort gone for nothing. We can just switch to modelkits because they did all this thing so that they have an ID that, and then there's a service behind it that takes that ID and then gives them something else.

you have OCI that actually does that. So I think OCI has won. And now industry is trying to come up with the best specification to be the standard for like the shape of that OCI artifact. In there, I think the only one or the leading one is Model Packs. I think it's going to, from the looks of it, it's...

going to be able to satisfy all the needs. We do have a lot of companies that are using model packs and modelkits in production now. Like ByteDance, they're the makers of TikTok. They have been using heavily model packs now. So they have their own in-house implementation of model packs. So yeah.

Rohit (43:31)
Yeah.

Yeah, it's awesome. So there is a convergence in how it is all packaged together, but the specification of it is something that we hope to come to a consensus on.

Gorkem Ercan (43:50)
Yeah, I think those things take time, right? Standards, like I think that's what the VC meant when we were talking about it is like, it's a big elephant. It takes time to get to there. Like, because you need to talk to a lot of companies, a lot of people, a lot of thought leaders, get them on your side. So that takes time.

Rohit (43:52)
Yeah.

Gorkem Ercan (44:14)
And I am glad that we are in a place where we have an agreement on the OCI and we have enough companies to back up model packs right now. So I think that's the other important thing. So model packs is going to be a modelkits as an extension of that is going to be

the shape of OCI artifacts for AI and ML.

Rohit (44:39)
And on similar note, who actually needs in today's world to actually host their own models? I know the AI first or the ML first companies who are offering AI products, of course

they do have this usage. But in terms of consumption, the people who consume these models, how many of them actually do need to self-host and how much of an MLops expert would you have to be to do that and operate it at scale?

Gorkem Ercan (45:11)
Yeah, so it depends on, you know, the thing about modelkits and model packs is they are not specific to LLMs. So if you look at, we have decades actually of ML in many companies and there are companies who act or who operate in...

Rohit (45:19)
Yeah.

Gorkem Ercan (45:28)
different industries, including regulated industries that ship essentially hundreds of ML models. So they need to use that. Unfortunately, until now, nobody was like the ML and data science was sort of a strange

cousin in the organization, if I would say politely. So I think now what is happening is the industry is starting to see that, that can't remain a strange cousin because of LLMs, essentially. So I think there is a lot of companies who have been doing your KubeFlow and Kserve and all that, but not getting benefit of the modern

Rohit (46:00)
Yeah.

Gorkem Ercan (46:14)
stacks. Now what is happening is those modern stacks are extending, like for instance, Jozu and KitOps actually supports ML as well, your traditional ML artifacts as well, your models as well. So they are starting to adapt to this as well. So we see a lot of

interest to Jozu for instance, from regulated industries because of that, because they have been doing this. Some of them actually never get to a state where they are governed, they are governing their ML models or

They do not have a good way of keeping provenance and attestation. They want to take advantage of the new tooling that goes around models and AI. So in those cases, they come to our platform and say, hey, you know what, let's change this. The good thing about OCI and our platform is it's very easy to make that change. You you do your training or you do your experimentation and at one point,

you reach to a checkpoint, instead of pushing that to an S3 bucket, put that to a model kit and push that to a Docker registry. As easy as that. Like that conversion is so painless that we have seen cases where they converted all their pipeline in a week. And we're talking really large organizations. So...

So I think that's one part of it. For the LLM crowd, think for the LLM, we haven't seen the real usage. What we are seeing right now is, we have companies who are coming in and doing some of the early work and getting their essentially toes in the water.

And saying, you know, I can actually do something agentic and get some results. like think of it as, for instance, what we did with cloud code essentially, right? It's like, we're doing something. Do we need our own models for that? Probably not. But as you go in more,

involved with this more like you because these early use cases are sophisticated but not that sophisticated in the sense that because they are just looking at one workflow one one use case one usage, one area but now if you are starting to go into more sophisticated cases where we are talking about multiple AI agents access to more sensitive data

you know, those kind of situations. And you need to start thinking about, you know, confirmance with the compliance issues. You need to start thinking about sovereignty issues, know, sovereign AI data sovereignty, all of those issues. Now you're getting into a place where it starts to make sense for you to run your own LLMs. So I think we're not.

Most companies are not there yet. There are some companies who are there yet at the moment, and they are starting to think about how to do this, how to manage multiple agents, how to manage their compliance and governance for agents and models and prompts and all those, everything that goes with AI

⁓ projects. So I think as the as the environment gets more complicated, we are going to see more companies that are doing this. The other thing that also we need to think about is cost, right? So for instance, there were there are papers out there, if I can find

Rohit (49:44)
Yeah.

Gorkem Ercan (49:48)
A reference, I'll also send it to you so you can include that on the notes. There are paper that's out there that actually says, you know, a small LLM that is fine-tuned can give you better results than a generic LLM. The difference is it's not going to be able to write a poem about it.

Rohit (50:06)
Yeah.

Gorkem Ercan (50:10)
And this has been our experience. If all you need to do is analyze a bunch of data, tabular data into something else, then...

a small LLM is enough and you can actually run that very cheaply as well. But if, know, and the counter argument to that is like, the open AI and everything else is getting cheaper to use, like the token price is getting cheaper, sure, but still running a small LLM is...

comparably much, much cheaper. The difference is, are you able to do that without spending too much engineering time? And that's where platforms like Jozu comes in where it says, you know what, I'll make it easier for you to run your inference. And not everyone has that scale where they need to be able to run LLMs or

Like if you're doing this for one job, one task, that's kind of a luxury. But once you start to use multiple agents and you need to run one too many models at the same time for them, then it starts to get to a place where perhaps it makes more sense to run this in-house than through an AI provider like...

OpenAI or Anthropic But yeah, I guess we are kind of in a state where the industry is still forming up to have this full work. But there are a lot of good reasons why you shouldn't be on AI providers only.

Rohit (51:47)
Yeah, and probably to add to the fine tuning point, you probably also could have a more predictable and consistent model behavior when you have a fixed fine tuned model tuned to your purpose than a general purpose one hosted by a provider.

Gorkem Ercan (52:06)
Yeah,

Exactly. And you actually own that. Like I was reading somebody, these experience, they actually switched their LLM provider recently and all of their agents started working completely unexpected ways. And the reason for that is the new agent is the new LLM provider is actually cheaper.

But they are using quantized models, not unquantized models, which makes a difference to the results that you are getting. And of course, because they are quantized, they are smaller. They require less resources. Hence, they are cheaper. So you're actually controlling that. You can actually say, is I want, depending on your task, you can run a quantized

Rohit (52:43)
Yeah.

Gorkem Ercan (52:52)
LLM yourself, still get the benefits of using less resources. But now you know that you're using a quantized model. So if the quantization is not a factor on your results, then just use that. It's just cheaper.

Rohit (53:08)
Makes sense, yeah. So I think that also sets a lot of light onto one question that I think I've asked in an earlier episode as well, like what is the economics of self-hosting a model? So I think there are lots of things to consider other than just price per token.

Gorkem Ercan (53:22)
Yeah.

Yeah, and there are really good projects out there, open source projects out there that is helping on the Kubernetes world with that. Vllm is an excellent project for a large model, a large language model inference. The llm-d project, which is essentially a way to run LLMs or multiple LLMs at scale

in a Kubernetes native way. I think that's an excellent project as well. As we make more progress on these projects, I think it will make it even more efficient to run LLMs on Kubernetes clusters, on your own Kubernetes clusters. I think that will also get the costs and performance, costs down, performance up. That's what we want.

Rohit (54:06)
Awesome, Yeah, so I think it's been a fantastic episode where we got to learn a lot of stuff from you, and especially a lot of insights in the direction where the industry is headed and where it should head in terms of standardization. Thanks a lot for your time, Görkem. We would love to have you again in the podcast

another episode and by the time from I hope that modelkits and model packs are the standard implementation

Gorkem Ercan (54:35)
Yeah, thank you. Thank you for having me. This was fun.