Transcript
Stefan Tilkov: Hello and welcome to a new conversation about software engineering. This is Stefan Tilkov. Today my guest is Michele Leroux Bustamante. She's co-founder and CIO at Solliance, she's a cloud and security architect, she's also a Microsoft regional director and an Azure MVP. Welcome to the show, Michele!
Michele L.B.: Thank you. Nice to meet you, Stefan.
Stefan Tilkov: Our topic today is the one absolutely everyone talks about - microservices. I'm really happy to talk about that, and I'd like for you to maybe give us a brief definition of what microservices and a microservice architecture actually are.
Michele L.B.: Yes, because of course, there's only one definition, right?
Stefan Tilkov: Yes, of course.
Michele L.B.: I find that amusing a little bit, but without getting into buzzwords and such, I would say we progressed in the 2000s with the concept of SOA, the concept of decoupling parts of the system and getting better scale and distribution and statelessness, and microservices continues from there to give us an approach to solution design and architecture. It sort of adds new principles that promise to solve maybe some of the more modern architectural problems we face today, like DevOps and visibility into system movement, and self-healing, and there's a long list of things we could talk about there, and sort of promises to solve some of the issues SOA did not. I suppose that will come out in our discussion, as well.
Michele L.B.: It's an approach to architecture. There are some principles you can follow that help guide you in your path, but the truth is - any good consultant knows - it depends. When we do a lot of our practical work, we're doing a lot of finessing of choices, because that's what makes it customized to the situation.
Stefan Tilkov: You mentioned that it has evolved from things we've had in the past - what makes it different? What are some of the differences that you can think of as opposed to SOA style services or maybe even modules or components or objects, or what have you?
Michele L.B.: If you think about the goal of decoupling and reuse, there's constantly this battle between "How much should I decouple in order to achieve a goal and reduce friction when we update things? For example, not having too many copies of code." There's always been that friction in that goal, and that in alignment with distribution, and scale. Sometimes those are conflicting things, because the smaller services get - for example with microservices - the more we have to think about sharing and the badness of sharing code in terms of a binary component.
Michele L.B.: With SOA, the way we broke things up - we really solved the problem at the enterprise architect level, which is, you know, I have this whole system (a CRM, an ERP and other types of applications) that own their own data, and when you want to talk to that data, you have to go through the services layer, there's no other way in. The purest view of SOA was that services own their data, there's no other way in, simply put. The problem with that is when you got into building your own solutions and you had many services that you had to build across customers’ orders (the typical discussion), or if you think security, it'll be permissions and users, login history, user management stuff. There's relations between those tables if you think about it at the data layer, and now it becomes difficult to think "Well, is that just one service, and I can only go through the service boundary, or do I have many services, which means I have to aggregate the data between those services, which is not going to perform well?"
Michele L.B.: We start facing this problem of what I used to call "Big SOA, Little SOA." I might have even coined that term, I don't know that it went viral though... With Little SOA, we're building our own solutions and we're trying to find a way to build our solutions and still follow the premises of isolation of data behind a service boundary, for example... But you can't, unless you go down to the level of adding data services. When we start doing that, we start getting into performance issues because we're aggregating across data services, and it just kind of became messy.
Michele L.B.: What microservices gets us thinking about are things like "Why not allow services to own their data, but allow for an eventual consistency model across services and allow for the idea of building projections of data that fit the other microservices in our ecosystem?" For example, if I'm thinking about a security system, I create users, I manage users, I update profiles, and there's a system of record for that information. But when we think about logging in, I have an identity service somewhere else that needs to just do login, and all it cares about is username and password, e-mail confirmation and things like that. So why not have the user management side project a sort of runtime system of record with eventual consistency over to the read-only login view, for example.
Michele L.B.: There are imperfections with that when you start talking about security scenarios, which I won't get into right now, but let's just say it's an easy way to view isolation, right? I now can project another view of that data for another service and now we have that purist view. The runtime identify service owns that read path, and the user management service owns its read/write system of record. Then if we need to project, say, registrations or history of assignment of permissions and things like that, that can be another projection to another set of services. We didn't do that in SOA, we didn't think about eventual consistency; we were still trying to do transactions across service boundaries, which never worked, or at least never was widely adopted, for the obvious problems it brings.
Stefan Tilkov: Let me try to get clearer about this... When you say "project", you're not suggesting that it's just a view to the live data, you're suggesting that it's actually a copy of the data, of parts of the data, projected from one system to the other?
Michele L.B.: That is correct. The typical way to achieve that goal would be by working with a message-based system. A next step to that would be an event sourcing system. There's a difference between those two approaches.
Michele L.B.: With message-based, let's imagine that every API I call is done through a message queue; something like Kafka or Kinesis or Event Hub, that can allow for multiple consumers to review the messages in different ways and project different results. So we now have multiple topics, topics can be aligned with microservices, consumers decide which topics they care about, and project results. But they all get the same message, which then the message becomes the thing that actually happened.
Michele L.B.: If you do an SQRS-style design, the message could be what we would look at as a command - "Create a user." But if you look in past tense and say "This already happened", the message is the true system of record actually at this point; then "User was created" is the message. The idea is you project from those messages to read models that become the projected views for the services. In some cases we use the actual message queue as the system of record, but that requires additional work, which we might talk about when we focus on the event sourcing part of the discussion. Let's just say I have messages, it says everything that happened in the system; I can now project history tables, I can now project views for other services, and I also have the main service that that message was targeting in terms of a microservice, like a user management service, or something.
Stefan Tilkov: Very interesting. To me it's interesting because I've heard so many different definitions of microservices, and this is one that has a lot of many of the definitions that I know, but also adds some things as parts of the definition that I would have considered just one variant; that's really fascinating. I'm not at all saying there's anything right or wrong, as probably you suggested at the beginning, because there's so many definitions, but it's absolutely fascinating.
Stefan Tilkov: The key distinction to you is the fact that we're actually keeping data redundantly across multiple services, and each of those services is focusing on the parts that it's interested in and it's responsible for. You see that as opposed to something that is more cut apart along the boundaries of the entities with more of a centralized data model sitting behind that.
Michele L.B.: Correct.
Stefan Tilkov: Would that be a fair way to paraphrase it?
Michele L.B.: Right, and think about it this way... If you look at the pure principles of microservices, they follow the same mantra SOA did around data ownership - a service is the only means of communicating with a set of data, and no other services communicate with the same data. That's just a golden rule. The question is "How do you realistically get there, when we live in a world that has relations?" So there's sort of a progression here, where when you start with your design, you go and you look at the whole solution, the whole system that we're trying to, say, reform into a microservices architecture, and we start to learn, "Okay, here are some off-the-shelf products, here are some websites we've built, some services we've built, some data stores", then we've got maybe even some mainframe data, we've got big data analytics over here... There's all these moving parts in this whole entire solution that you may have in existence, and you have to look at it from now the business domain to break it into microservices, because there's really no value to microservices unless the business will benefit.
Michele L.B.: How does the business benefit? From being able to release new features in parallel, without impact to other parts of the system. How do you get there? You have to decouple the way developers think about a set of data and websites and services, and turn it into "How does the business use those services?" You've got the business needing features; there might be three different parts of the business that use the same set of data in a different way. Their UI's are different, the services they should go through are different, and the back-end data that they need to interact with could be different. But there's still sort of a starting point for all that data, so user management is still, again, another story, just because it's easy to imagine security; if I create a bunch of users and I have a management interface, that's one way to look at that data. But then I've got these users that are self-service; I forgot my password, I need to change my e-mail, I need to -- I guess maybe those are some key flows... I want to update my profile. The list could go on.
Michele L.B.: Those flows don't go through the same UI, and they really don't have the same feature evolution as the user management side. So although there may be some things in common, the parts that might evolve as new features are driven by a different group of people. Then you've got the identity service that does login and maybe federation to other providers and all these other things. Those features are also completely independent of the other two.
Michele L.B.: There's things in common, of course, and there's certain data that's in common, and if you did it the old way, then you'd probably have some different web pages, maybe even different websites, and then you'd have a central service or two that all talk to the same data store that was relational across all the things, including permissions and users and user profiles and login history, and stuff like that.
Michele L.B.: If you think in a microservices way, you could break that for performance improvements, because the runtime needs the services to scale a certain way, to log in with millions of users, and the management part, and the user self-service part, and now when somebody needs new features, you're not stomping on each other.
Michele L.B.: Again, security is just one example, and I know people can come up with "Well, what about this, and that?" and trust me, I've been there, because we do a lot of security consulting, but the point is every single set of features has the same discussion around it that you could get to, and start thinking from the business perspective.
Stefan Tilkov: If I understand you correctly, the key point here is that you want to have separate stakeholders lead to separate systems having to be touched, or separate services to be touched, so that they don't step on each other's toes all the time.
Michele L.B.: That's a great way to put it. It's a business domain that really has its own model associated with it. That's where you get into the Domain-Driven Design mantra, where there's a whole process you can go through to try to tease out your business domains. They talk about things like aggregates, where you can imagine each microservice owns an aggregate, and let's say one aggregate could be the user, another aggregate could be the permissions. There's clearly a relationship between the two, but the way aggregates in Domain-Driven Design are supposed to relate is through an identifier.
Michele L.B.: My user has an ID, and that ID will have to somehow be used in the permissions aggregate as well, but those could be two document databases, if you will, that are only talked to by their respective services. When I go and look up a list of users, I'm probably not going to join on all their permissions, but if I needed to do that, guess what I would do? I would project that as another view, that included all the things in one collection, as opposed to messing with these two separate services that do nothing but manage one or the other.
Stefan Tilkov: Okay. I think we'll have to do a separate episode on DDD (Domain-Driven Design), so I don't want to go into too much detail.
Michele L.B.: You should.
Stefan Tilkov: I'll sort of forward-reference an episode we haven't even recorded yet... But I'd like to address the other aspect, or one other aspect that you mentioned again this time, which is if you project this additional view on the whole thing, and we've clarified that this is actually a separate copy of the data, how much of a problem is the fact that this data is going to not be 100% consistent all the time?
Michele L.B.: And that is always something that the business has to decide their tolerance for. Eventual consistency, in my opinion, now having worked with so many people actually that have years more experience than I do doing message-based systems - you know, we have a whole team behind us that we've done many of these microservices design and implementations across many different platforms at this point, and it's really interesting to me that not one of those situations that have had any size to it in terms of enterprise-level implementation, not one of them has been viable without looking towards eventual consistency and message-based design.
Michele L.B.: Message-based systems give you visibility into what's happened across all the things. It lets you have history tables without having to build custom history tables. Everyone's done it, nobody really wants to do it, because it gets messy. "Oh, we forgot to build one for this table." Well, you always have your history if everything's a message, and asynchronous design is already part of how we live today, in every development platform, so that's not really as difficult as it used to be considered.
Michele L.B.: I'm sending messages, they become the history of what's happened in the system - user was created, user was locked out after three retries, user changed their password, user was deleted, so-and-so gave them admin permission, so-and-so took away their admin permissions... You see where I'm going. Everything being a message, I now can go back in time and say "Oh, what happened to that user?", or "How many users have been created in the last week?" or other analytics. The messages are now not a liability, but actually an asset.
Michele L.B.: Now, at the end of the day, the eventual consistency part has to be part of this, because now what we're saying is each microservice will access data associated to what it cares about, and that means that whoever is listening on the message topics for each microservice is responsible for projecting those read stores. They may not all project at the same time, and you'll have to have metrics in place for "Hey, our queue is down and things aren't projecting. We're way out of sync." That simply can't happen; you have to now start putting faith in the system working, to do its job, and you have to have lots of checks and balances to make sure your system actually is working and that it's recovered quickly. Mind you, that's something we could talk about in a separate thread here, which is the DevOps story around microservices is paramount for anything to be actually viable. You can't do it without strong DevOps and disaster recovery and visibility into diagnostics and errors and alerts and so on. So that's just something I'm assuming is in place, so that I can have faith that my projections are happening.
Michele L.B.: Now the question becomes "What happens if this report is out of date by five seconds?" I mean, it should be within the second, right? Millisecond even, if everything is running, so what's my level of tolerance and at what point are we breaking SLA? That is a decision the business has to make for every use case, which is why we have to look at microservices as things that fit in your head, and look at them individually. For every single microservice you have to think "What are my SLAs? What's my concern if this is delayed?" because then we can start looking at "Well, maybe this one can't be eventually consistent. How do we work around that?"
Stefan Tilkov: This leads me to a follow-up question - if the goal of microservices is to be really that small, as the name implies, and if the individual services are so small and easy to understand, which is all fine, don't we just move the complexity into the spaces between the services? And I'm not only talking about the infrastructure, even though you could count that in as well; I'm just more talking about the collaboration of the whole scenario of event-emitting services. Is anyone able to understand the system that emerges from this collaboration of microservices, and do they have to even?
Michele L.B.: Again, that's an excellent point and it's an excellent question. I always come back to this - if there isn't a business value for implementing microservices at this level in the first place, then you shouldn't be doing it. Most systems don't need microservices; maybe they could follow some of the principles in terms of isolating and decoupling logic and services from dependency on one another, but to get to this sort of beautiful endpoint whereby every service fits in your head, none of them are dependent on one another, they're independently deployable and schedulable, you have a versioning process in place that never causes problems...
Michele L.B.: If I'm the size of Netflix or Amazon or other similar enterprises, microservices carries a ton of value for them, even though they probably literally have thousands and thousands of services, because they also have teams managing the breakdown of those things; they also have orchestration platforms helping with the self-healing and recovery, they also have versioning patterns in place, and deep implementation efforts into diagnostics and dashboards and so on.
Michele L.B.: That's the absolute far end of the spectrum, which is it's very complex, but it's absolutely needed. Then there's the people that can just build a system out of several services, break it up with the principles in mind, deploy a couple Docker containers, run them manually, use some Jenkins or just do a simple DevOps process to replace and run the containers, but not really do any fancy discovery or self-healing discovery orchestration platforms and so on. So there's both ends of the spectrum, and then there's all this stuff in the middle.
Michele L.B.: The first and number one question that I go to people with is "Where is the business value?" When we do a design, we spend time on the whole solution, the whole system first, we pull together some patterns that look like a fit for microservices, but in the process we talk to the people that are the business side and ask them how they use the system today, and we discover things...
Michele L.B.: On probably three occasions I've worked with customers where the developers were actually troubleshooting problems in the enterprise, because nobody knows where the data is, because this job didn't run, or that data didn't land where it should, or this UI is not up to date. And it's literally a developer has to go in, which is very expensive, because those developers actually have other more important things they should be doing for the business.
Michele L.B.: Those DevOps folks, those developer folks shouldn't be troubleshooting individual one-off problems, but that's a sign that lack of visibility is a problem here, so why don't we take the biggest problem you have and let's just turn that into a microservices architecture, following the big picture where we might want to get to, and let's show the biggest business value first, so that the business not only sees why this effort is being put in place, because the first time you do it you will have to spend time and money to get there. You're going to spend ten months and lots of money and lots of resources to get your perfected DevOps orchestration platform for services in place, with all the messaging if you do that, all the dashboard visibility and alerts. Then, after that, adding more is much easier.
Michele L.B.: If that ten months doesn't show something of high business value, nobody's going to continue and you will have wasted your time and the business just doesn't back you from there.
Stefan Tilkov: What are some ways that you would go about building that business case? What are some considerations as to the return on investment or ROI that you would talk to the management about?
Michele L.B.: You do have to weigh all of the potential value, and a lot of times the value is things like new features, more rapidly solving business problems, like "Every time I need this it takes a month before we can get this one thing done", which maybe costs the business money. And if the business is trying to grow, that's extremely expensive long-term, so they're just in such a rut that they're never going to get out of it unless they change something. That's one side of it.
Michele L.B.: The other side obviously is the staff it takes to solve problems. You could do full-blown ROI, but usually - at least in my experience - a lot of the red tape around a full-blown ROI is just costing the company more money and delaying the decision, because most of this you can finesse in open discussions with the right people in the room. Here are the problems, here are the costs; generally speaking, people know in their head. There are various costs. You get all the right people in the room and the CTO for a decision, and then you talk about the risks.
Michele L.B.: The risks are, obviously -- well, if your team's really busy right now, then you need new people to run the new platform before you can sort of start migrating people over to participate. You might need external consultants to get you up to speed, because if your team doesn't internally know all these platforms, then it would be kind of silly to say "Yeah, let's just figure it out as we go" if you have any form of deadline. So it will save you money to probably work with other people.
Michele L.B.: I always bring in people that know more than me about various things, whether it's Kafka or certain orchestration platforms or messaging platforms, and coordinate all that effort together so that we do risk. I think that's another cost that you have to weigh.
Michele L.B.: The other downside is if you don't go the full way the first release, you could run into lots of issues and see it as a failure. You have to make sure you're prepared really early upfront to say "Look, I know this is going to cost, and I know I have to take this all the way in our first ten-month release (if you will)." Again, that cost has to mean something to the business people in the room.
Michele L.B.: If you want to get spreadsheets going to weigh all that out in an ROI, you can. I'm not saying ROI analysis isn't important in some cases, but I just repeat that in a lot of cases that I've been part of, as long as the right people are in the room, they're willing to make those bets and make those decisions. It usually becomes pretty obvious when you look at what the business problems really are.
Stefan Tilkov: Let's suppose you've managed to convince management that the money has to be made available for the business to succeed and you can start this effort. What are some ways to attack the blank slate that you have now when you have to come up with your first microservices architecture in actual reality?
Michele L.B.: This is where in fact management has probably already made a small commitment by saying "Let's do a design." Because if you don't start with a whiteboarding session for 3-5 days, depending on the size of the system/solution/company and how much you want to attack - I would say that's enough to get started - you need to take a look at all the pieces that are in the ecosystem of your solution, your system. Again, I would repeat, you probably start with a list of "Here are our data stores...", maybe there's various different kinds - there's NoSQL, there's SQL, there's mainframe, there could be legacy systems integrated there, there could be off-the-shelf products that have their own data, you have integration with those products, you have custom UI's you've built, legacy ones, new ones, a bunch of evolution, a bunch of technical debt, and each of those may be representing any number of high-level business functions. From a very high level you have ERPs and CRMs and websites that drive how your operation works, or other UIs, mobile devices etc. and you may come up with 20-30 major moving parts to start with. When we look at that, what we've started with is a very high-level microservices design, meaning okay, I can compartmentalize this into, say, 30 pieces: 30 groups of people, business domain, high-level... It's not going to be correct yet, because there's going to be a granular layer below that that's still important, but I might come up with high-level, almost SOA-style breakdown.
Michele L.B.: Then I would say "What's the area that's the biggest pain point right now?" What are the problems you're trying to solve today? And maybe that leads us to 2-3 of those areas where we open up the kimono, if you will, and sort of look inside and see "How can we tease this into a more granular set of services, UIs, workflows, business domains and use cases?" You have to start small, but it's nice to see the big picture because then you can sort of take every design decision you make when you look at a few services in more granularity and map that or marry that to "Could that pattern fit over there?"
Michele L.B.: For example, if I said "Yes, we're going to do a messaging architecture. It makes sense here. You have a problem with the visibility, you need history and diagnostics, you need history tables for audit and compliance, you need better visibility into security breaches", the list goes on. "So let's try this out, but let's do it here, in this particular area." But we'll always go back to the big picture and say "Could that still fit over there?" or "Are we still going in a direction that fits the whole solution eventually one day?"
Michele L.B.: You start with this high level, then you open up two or three areas and start defining a real microservices architecture, and trying to get down to the business domains. I do that with a typical pattern of use cases, which we've been doing for years and years and years, where you just sort of think through 2-3 use cases for each UI, and "Okay, what do users do? They start here, they go here. It hits this service, it hits that service." And while you're talking to people, they'll say "Yes, but there's this other special case. Yes, there's this other thing that happens, too."
Michele L.B.: When you bring the business owners in, with the developers, the interesting thing that comes out of that is even the business owners contribute things the developers forgot happen, or they don't know this pain point. "Oh, I didn't know you did that every time. Why do you that?" "Oh, because this doesn't work. Today I have to do this." That's why they say, when you go through the process of designing the microservices, breaking it into domains, it's so important the business owners are there, because you don't talk to them often enough.
Michele L.B.: Developers will say "Yes, I could help architect this. I already know how the whole system works, I wrote it all. I can design the microservices. We don't need to bother them." But you do need to bother them, because they're going to tell you stuff that you haven't talked to them about lately. That's where the business problems pop up, and that's where we get to a real conversation.
Michele L.B.: I would say you do a high-level design, you break up the pieces that feel like pain points, and as you're going through that, you're listening to what people are saying and you're saying "Okay, let's now take a step back. If we solve this problem with microservices, here's why it will benefit you. You're now going to have logs, you're going to have your history. You don't have an audit right now, you're not meeting your compliance requirements. That's a problem for you in the future; you're trying to solve it, let's solve it the right way. These people never know where we are in the state of the workflow. If we have messages, we can do a process manager and track the state of the workflow and surface that in a UI, and now you won't have to call the developer to find out, because you'll know this job didn't run, or the job ran but it failed etc."
Michele L.B.: I would say at that point you're starting to have real conversations with the business about "Look, we've come up with a design, we've come up with some of the real business issues that you seem to need to solve, so here's where I would take you. I think that's answered the first part of question. There's follow-on that must proceed from there, but I think what we're trying to get to is "Do we understand the personality of your solution, your system, and do we understand what type of fit microservices is for you, in terms of should we bother with the messaging architecture? Do you have the type of team and the size of solutions that warrants an orchestration platform and all the things, or should we maybe keep it simple for now and just start with better isolation of your websites and services?" Because there's maybe a baby step in the middle.
Michele L.B.: Maybe we do have [unintelligible]. This is another common pattern - a services tier in the middle, which becomes the sort of scalability central owner of business functionality that a monolith set of UI's can consume and a monolith back-end can be stored to, but the services in the middle, even though they will have eventually bottlenecks at the data layer if it doesn't scale well, at least give you sort of that micro-view of functionality that you can start adding features to. That's another pattern that could be sort of a step in the middle, for example.
Stefan Tilkov: Let's talk a bit about this technical aspect - how much of a platform do you actually need? One of the things that bothered people about SOA was the heavyweight infrastructure requirements in terms of ESBs and BPM platforms. Do you need any of that stuff or something comparable if you're building microservices?
Michele L.B.: Not necessarily. Again, we go back to the "I could just build microservices and call them microservices and deploy them as websites on VMs", for example, which won't be using the technologies that we're talking about that back microservices, like orchestration platforms, but could still get you toward the principles that you need to follow in maybe even some of the data isolation.
Michele L.B.: When we look at many small services that fit in your head, we start to have to think about all these other principles, like self-healing and versioning and "How do I deploy new instances without affecting others? How do I roll in an update that doesn't depend on three other services also being updated? How do we design for that?"
Michele L.B.: Messages could be a completely separate thing, so no, you don't have to do messaging; no, I don't need an ESB. In fact, I would say ESB is an antipattern only because we wanna have, as Martin Fowler coined it, "smart endpoints and dumb pipes." The pipes, if you're using messaging, should just be that, messaging. Let's not put a bunch of business logic in the center, that causes coupling really between the services.
Michele L.B.: If you want to have some sort of coordination across services and you have messages calling those services, you can use a process manager, which is a separate thing, and in that way there's no coupling between the services themselves.
Michele L.B.: Do you need an orchestration platform? That's a better question, because once you start having many services to manage and deploy, you want the self-healing, you want the scheduler capability where I can submit a job that might spin up 25 instances of that service in order to handle the load and then scale down. What you're trying to take advantage of here is server density.
Michele L.B.: A microservices platforms promises to - if I use a platform - use every available resource (memory, CPU, disk) that is available in my cluster. If I have five nodes, ten nodes, 25 nodes, depending on how big my solution is, I could say "Let's run a job" and it'll run as fast as it can based on how much is free across all those machines, and then release the resources when it's done.
Michele L.B.: If I'm running ongoing services that are constantly available and using up space, they can autoscale and fill density when there's a burst of requests. At some point, that autoscale fills the machines and you need an autoscale on the machine itself. Your cloud provider would have to provide that autoscale story. On-premise that's a bit more difficult, because you have to have spare.
Michele L.B.: The orchestration platform also promises to give you service discovery. I can have automatic load balancing and stitching of these additional service instances without having to worry "Where are they?" There's a lot of benefits that come with that.
Michele L.B.: Then of course there's the single cluster management tool across all of my nodes now becomes my one way to sort of view my system, right? So as long as my actual physical VMs or nodes are healthy, now the next thing is I can do interesting things to inspect all the services that run across all 25 or 3,000 machines, if there's that many.
Michele L.B.: Obviously, when you get into the high, high numbers, you need to compartmentalize; there's probably more than one platform, there's probably groups of clusters that solve different problems that don't need to talk to each other. That's a really big enterprise solution. Most people start with 10 nodes, 10 agents.
Stefan Tilkov: Is it a good approach to start small and wait for problems to hit you before you scale up?
Michele L.B.: I wouldn't do that... That's a loaded question, I think. I don't want to wait for problems, but I think you know upfront how you are going to manage your deployment. You can automate things to a VM and just have -- let's say I have a web service API that uses port 80, and I need to deploy that across three machines; I have a cluster, and it's going to be load-balanced. When I get a request, I don't know which of the machines it will go to; one machine dies, it hits the other two, right? Your traditional sort of load balance view.
Michele L.B.: There's nothing wrong with me creating a microservice, even using Docker containers, deploying it to those VMs, but what will happen is I can only have one of those on each VM, because I'm using the resource port 80.
Michele L.B.: A way to work around that is "Okay, now let's put a HAProxy on each VM. That will listen on port 80 and route. So now I'm going to create routing rules for API 1, 2, 3. Now I can have three APIs, each of them at whatever port, but I still have to fix the port - 3000, 3001, 3002 - so that I know which one HAProxy is talking to. So it's not dynamic yet.
Michele L.B.: When we talk about service discovery and needing that, it all depends. Can I get away with just having an HAProxy routing to three, five, ten, fifteen services and then have Jenkins or some other automation tool push those images out and trigger an update to the deployment and it'll just function fine? Yes, because I've got a fixed view of my architecture, and I'm gonna automate with that in mind. But when I want a fully dynamic view of scheduling deployment, management, statistics on container usage, versioning of containers so I can spin up another one V2 and it starts getting complicated, then I'm starting to think "Well, if I have an orchestration platform, I might be able to manage that a lot better."
Stefan Tilkov: Makes a lot of sense. As you said before, the standard consultant's answer is "It all depends", which is unfortunately just true. What can you do?
Michele L.B.: It's a little big of a finessing process. I wish I could say there's one recipe to "Follow these principles and you'll have your architecture", but architecture is an art, and you follow principles to guide you to the right path, and then there are reasons not to go all the way, if you will, when there's timelines to meet or resource issues to address - people, availability, knowledge. You've gotta be careful. If you go full-blown orchestration platform, now I need to know how to manage that platform, so my people either have to get up to speed or I have to bring in people that know, to stay with me until I know.
Stefan Tilkov: And one of the problems is that once you get to know it, there is a certain risk it's obsolete already, because things are moving so fast at the moment.
Michele L.B.: Yes, that's a real problem, in the sense that when we go to make decisions today around choosing a platform... The main ones I run into - Docker now has Docker Enterprise, and that is the Docker Data Center platform. That's a little bit newer, but of course, it's backed by the company that provides Docker, so there's something nice about that in terms of a direction and a path, and it's built on Swarm and that's come a long way, obviously, over the past year.
Michele L.B.: So we've got that platform, we've got Mesosphere with DCOS (Data Center Operating System), and that's a really strong platform, with Marathon at it's underpinning, which has been around for a good ten years as a scheduler, so it's really highly valued and trusted.
Michele L.B.: You've got Kubernetes, which is newer, but obviously Google runs on Kubernetes and it's a very strong orchestration platform as well. You've got Amazon with their own built-in ECS or EC2 container service, you've got Azure with the Azure container service, which supports actually either Swarm, Kubernetes or DCOS, and then you've got Service Fabric, which if you're doing .NET development it's an interesting path to go, because they actually have some features that the traditional orchestration platform don't have, and it's also a platform for building services that leverage stateful services, for example, and we've got the actor model that's implemented, so stateful actors, stateful queues, reliable queues, as they call it, and that's something that gives you that out of the box. "My service owns its data, and by the way, it also replicates it across the cluster, so I don't have to think about it."
Michele L.B.: As I'm getting started, if I'm a smaller business just getting started out of the box, I might not have to even build the data backend, potentially. It just depends on the size of the system. That's really powerful. Then I can also do, of course, containerization of those things, because it supports containers now, and it's a scheduler as well.
Stefan Tilkov: Let's just assume I've managed to actually make up my mind and have picked one of those, and decided to stay with it for the foreseeable future, like for the next 12 months, or something like that. What is the best approach for building the individual services? Because now as they're containerized, dockerized, I can potentially build them with any technology I like, as long as they follow whatever the platform needs. Is that actually a good idea? How much standardization would you suggest for each of the services internals?
Michele L.B.: I think what you're asking me is "Does it matter how I go about building the services and containerizing them across any of the platform choices?"
Stefan Tilkov: Yes.
Michele L.B.: The answer is "You don't know." As far as I'm concerned, your development process around, let's say, ASP.NET Core, or Golang, or Node.js or Python, Java... I think Spring Boot is the go-to right now for the Java stack (JVM) in terms of containerization, although it's not the only one. So any language that is best fit for my team or for the choice of what we're doing. I might choose Python if I'm doing machine learning, just because it's better for that, or I might choose Python because my people know Python. Or my people know .NET, so I go ASP.NET Core.
Michele L.B.: I build my services as I usually would, but what I need to do is target containerizing them. That means I'm building a Docker file describing the dependencies of the service, that means I'm building Docker images and running those maybe even locally in development to verify that everything's running containerized nicely, and that means I'm building Docker Compose files which would help me with local development to spin up an entire environment in some cases, so I can even build the back-end into a -- I do this a lot with my team, we will build Docker Compose implementations that spin up let's say a Kafka event store, SQL or RDS (or Postgres, I should say, because RDS is in Amazon) and we'll run those locally.
Michele L.B.: It'll actually script out all the data models required to run and everything, so now the developer literally does a “git pull” and “compose up -d” and they're ready to test their stuff. All of the configurations are pointing directly at localhost:X for each of those data stores and they don't have to know how to set it up and it saves a ton of time.
Michele L.B.: I can do all of that work, and then when we're ready to push those things out to a platform, let's say we have a centralized dev instance of the cluster, or test, or then production; my automated check-in should probably build images, push those to at registry, tag them as development latest, and then when we're ready to push them to test, we tag them with test, as an example, and when we're ready to push those to UAT for user acceptance, we tag them as UAT. When we are ready to push those to production, we tag them with prod.
Michele L.B.: We've got now this guarantee that the original image built is the thing that traversed all the way through, but it was irrespective of how I developed those actual services inside the container, because they can be anything from ASP.NET Core, to Java, to Golang, to Python and so on.
Stefan Tilkov: Okay. Let me ask a few other things that I wrote down as potential questions to ask you about microservices, because I was interested to get as many people's perspective on those things. Just very quickly... I think you answered the first one, but maybe you could elaborate a little bit - synchronous or asynchronous communication? What do you prefer?
Michele L.B.: Asynchronous, because of messaging. Because I'm typically faced with larger solutions, so I guess I'm kind of now brainwashed into thinking that way, regardless of if we go through an actual message broker.
Stefan Tilkov: I think you've answered the next one to a large degree as well, but let me ask it anyway - data sharing between microservices? Is it ever okay for two microservices to access the same database?
Michele L.B.: Can I say "Hell no!"?
Stefan Tilkov: You can.
Michele L.B.: Or do you need to bleep that?
Stefan Tilkov: No, we're European, at least at the moment, so we don't mind.
Michele L.B.: Okay. That's where the eventual consistency comes in, right? So that would be an absolute no.
Stefan Tilkov: Code sharing by means of libraries between services?
Michele L.B.: That's an interesting one, because there's a smell people seem to get from duplicate code. People hate duplicate code, right? But then, of course, we don't want dependencies between microservices, so we don't want the impact of one piece of code to affect other microservices, because it sort of defeats the purpose of them being isolated and independently deployable and non-dependent on one another.
Michele L.B.: There are some patterns that I like for this. If you have something very central to your ecosystem, like "All of our services are built with .NET and all of them need to validate tokens at the service boundary, and we all want them to do it the same way, so we're going to build a component that does that" - fine, you can have that shared component, but that shared component has to be distributed in a NuGet package, for example. And if you're using other languages, maybe you're using NPM, but you have to have packaging with versions, so that each service can independently choose to update to the next version and test that they are not going to have a regression out of that.
Michele L.B.: So if you at least build components that, again, have an intentional upgrade path, then I think you can say that shared code is sometimes a necessary evil for those types of things. But also, for certain things you might not want to be afraid to duplicate code. It might have been Sam Newman that coined the statement that there's a far worse evil in shared code in binary form then there is in duplicating code across your microservices. I think that's important to be careful about sharing. It is a smell, but also componentization of that with intentional updates has been a thing I found successful.
Michele L.B.: That doesn't work when you go cross-library, so now I have to deal with "Well, what if we also have services that are Python, and those need to do security API / token security as well? Then I would obviously need two libraries, right? One for my .NET people and another for my others, and that way we could at least have consistency in how we do those things."
Stefan Tilkov: Okay. Next question - what about the user interface? What's the UI's relation to each service? Is it part of the service, does it sit above each service? Does each service has its own UI or is there a monolith sitting on top of it? What's the best approach?
Michele L.B.: There is a purist view of the microservices principles that would state that a service emits its own representation of the UI, but you could argue JSON responses are an example of that. So if you're working with something simple like a JSON response for API's to gather information or get responses, then the UI can decide how to represent that in some other friendly form. So then to the question of "Does every service have its own UI?", I don't think that's realistic. It can be that case, and I think you want to get away from monolith, in the sense that it's not likely all your services should be called by one UI, so that's where perhaps the UI can have a little more flexibility in "How do we aggregate how we interact with this data at the UI level in terms of which services we should call?" and "Can multiple UI's potentially call the same service?" Absolutely. I think that's another smell that you just want to watch for, because perhaps theoretically that should be another view into that data, because is it really the same service?
Michele L.B.: "Is your perspective from a mobile device?" - here's a good example. Even just get away from microservices, when I expose just a general API to a web app, my perspective on that API, that content - let's call it user management - is very different in the way I want to return data to a mobile device for efficiency of that UI, versus the interface that I would use for a website that has a fully functional management interface, versus a third-party entry point through an API gateway that a third-party is building UI's in front of it. The intention of that integration shapes the API and/or microservice then.
Michele L.B.: Now what we're talking about is "Is there only one service in a microservice, or can there be multiple services that share still the same data store, but expose a different sort of way into the data?" and that example I just gave would be an example where those three services are different to target their UI, but they actually still have to talk to the same data because they're all three managing data the same way, they're all three doing user management. So until I see the need for an eventually consistent projection for performance reasons, I might consider those three services a microservice single deployment that shares the schema.
Michele L.B.: So there is no rule that a microservice is always one service instance, right? Because imagine CQRS - a read and a write path.
Stefan Tilkov: Well, I do have to ask now - isn't that the thing that you said "Hell no" to? Having three services accessing the same data? Why is that okay...?
Michele L.B.: Three microservices. A microservice can have three service instances, but a microservice - that means these three all go together: they deploy together, they version together, they are considered the same.
Stefan Tilkov: Okay, so just to be clear on my understanding - your granularity would be that a microservice is the deployment unit and the service is what it's inside the deployment unit.
Michele L.B.: Correct.
Stefan Tilkov: Okay. Fascinating.
Michele L.B.: And again, with large organizations that have the idea of scale in mind with every step of the way, they may out of the gate always say "Everything's gotta be a projection; never share", which means those would break up into their own independently deployable microservices with eventual consistency.
But imagine a smaller company that is just trying to achieve a microservice goal, but maybe they don't have millions of users; maybe they have 500 users. Maybe scale is not always a concern about these things, but they still want to keep that different perspective on how the API looks to the different devices: mobile, web and partners, third-party. Those could be separately deployed services, or they could just be separate controllers in the same service. Again, I hate to say it, it depends... But it does, and these are the things you finesse.
But imagine a smaller company that is just trying to achieve a microservice goal, but maybe they don't have millions of users; maybe they have 500 users. Maybe scale is not always a concern about these things, but they still want to keep that different perspective on how the API looks to the different devices: What you want to do though is have patterns that you follow everywhere - that's another thing. The whole "it depends" argument is fine as long as whatever it is you decide, do that everywhere. Have only a few choices, and do those choices only, everywhere.
Stefan Tilkov: One of the questions I wonder about is if you have this complex system of microservices working together to achieve a common goal, how do you make sure that things actually work? How do you test for regressions if you have somebody independently deploy a new version of a microservice? How do you make sure everything is still okay?
Michele L.B.: Okay, this is a problem not only with microservices, but I think with any integration of services into a greater system. The people that build the service can document the service and every single method in that service to a level of detail that helps people understand what's happening inside the method, but that doesn't cover all integration tests, unfortunately. Because until you think about the intent of the caller or the potential for a use case touching this service, but then calling this other one and this other one and this other one in the system, until you look at the integration test, you actually don't know if something could be wrong with the way you're thinking about this individual method. It still has to integrate.
Michele L.B.: The way that my view (and probably others) on testing with microservices is that the test should be written by the people consuming it. That way, if you have three or four UIs that potentially might use the same service, and they want you to test it in the context of their perspective, they provide the tests; they provide the list of tests that you must execute, and then you as a developer of the service can say "Okay, these are the tests that I need to cover when I make a change, so that I know everything still works across everything."
Michele L.B.: Until someone tells me how you might use my service, then my tests are just unit tests. They're not integration tests and they may absolutely not work. Microservices' success depends on tests being written by the consumers.
Stefan Tilkov: If you have those consumer-written tests, do you actually deploy them by means of more containers?
Michele L.B.: It could be, absolutely. People frequently use containers for spinning up test cases. Who writes the tests ultimately can be a collaboration, but the people who are going to use the service “I'm a UI, I'm going to have a page, I'm going to click a button, I'm going to change your password; I expected to do this, I expected to then when I go to this page show me the end result, or I get a link in an e-mail and then I click it and I go here…” These test workflows are important. Somebody has to define "I'm going to use it this way."
Michele L.B.: I run into this a lot. When you work with large organizations that have lots of services and lots of teams, they don't always know how the teams are going to use their services, so then we're constantly in this back-and-forth of "Oh, but this didn't work" and "That didn't work", and that's okay... That's another way to solve the problem, right? Find out as you go. But had those test cases been written upfront, you can avoid a lot of those problems.
Stefan Tilkov: Makes a lot of sense. Very good. Anything else I should have asked you?
Michele L.B.: Well, at the end of the day one thing that probably comes out of this discussion and others like it is it's not trivial; there's a lot of things to know here. I've had a fair number of experiences - myself and our extended team - around many of these platforms and it's not easy... I'm just not afraid of it, because we've done so much. But I have to put myself in the shoes of people just getting started - this sounds like a lot of work. I think the truth is it is a lot of work, so that's why the business value is so important, so I would just leave it with, you know, if you target business value and you understand why you're heading to microservices and why that's going to benefit you, that will help drive your success, because then the investment feels less painful.
Michele L.B.: The other thing I would leave people with is make sure you do take it all the way. Get your DevOps in order, get your orchestration platform drills done, do testing for failure, be ready for disaster recovery... Because it's actually forcing us to do the things we should probably arguable already be doing, which is those good practices for failure and recovery and rolling forward quickly and fixing problems quickly and diagnostics into what's going on. So yes, it's a lot of work, but it's the kind of stuff we should do. If you see the business value, be prepared to invest and get in and do it. It will pay off on the other side.
Stefan Tilkov: Awesome. Michele, it was a pleasure to talk to you about this whole topic. Thank you so much for your time, and I'm looking forward to having you on another episode maybe in the future.
Michele L.B.: Yes, thanks very much for the invitation. I enjoyed the conversation, Stefan.
Stefan Tilkov: Thanks, Michele. Bye, listeners!