Challenges with Microservices
October 2019
In my previous blog, I provided insights to the benefits of microservice architecture. This article focuses on the challenges that one should be aware of before embracing microservice architecture.
Decomposition Model
A monolithic application accrues technical debt over time. To convert the monolith into microservices requires decomposition, an action that helps to mitigate this debt to seek long-term benefits. If not done correctly, then the transition can result in chaos and you are bound to fail. Therefore, it is essential to know the widely used strategies, methods and best practices when decomposing the monolith application.
Before we consider various decomposition strategies and methods, I suggest you read and understand the three-dimensional scalability model as defined by the art-of-scalability book. This model helps to scale software based on 3 different dimensions, including the functionality scaling that microservices emphasize.
X-axis scaling is an option wherein you would achieve scalability horizontally by adding more and more servers behind the load-balancer to serve high traffic of requests. In such cases, the stateless applications are cloned and made to run on many servers to service the request identically.
Z-axis scaling is an option wherein you would achieve scalability through data sharding or partitioning. For example, requests are routed to the nearest data centers based on the origin of the request.
Y-axis scaling is an option wherein one would think of scaling the applications through functional decomposition, each defined by bounded context.
There are four strategies that one can consider for decomposition:
Decompose by Business Capability
Decompose by sub-domain using DDD (Domain-Driven Design)
Decompose by verb or use case and define services that are responsible for actions
Decompose by nouns or resources by defining a service that is responsible for all operations on entities/resources of a given type.
For more info - http://microservices.io/articles/scalecube.html
The key principles to be considered during decomposition are:
Single Responsibility
Cohesive Services
Loosely Coupled Services
Each Service is Independently Testable
Each service must be small enough to be managed by “two pizza” teams
Team Restructuring and Culture Shift
In 1967, Melvin Conway observed a phenomenon called Conway’s Law, which says
“ organizations which design systems ... are constrained to produce designs which are copies of the communication structures of these organizations ”
The same hypothesis can be applied to software systems too. Organizations often structure their teams based on their communication structure, e.g., among frontend developers, database engineers, Business Analysts, testers, etc. In most cases, only Business Analysts speak and understand the domain specific language, while others such as Developers, Testers and DBAs speak in their own technical terms. This leads to a constrained environment where most teams lack a basic understanding of the business challenge that they were indirectly told to solve. Since microservices emphasizes building the teams based on the business capability, team restructuring is inevitable. When implemented correctly, microservices means you will stop building the teams based on their technical ability.
For e.g., SQL Developer is no more required for a “OTP” microservice as it demands for resources who are experienced in NoSQL databases.
This is indeed transformative; however, there is a need from an Organization to invest heavily on managing an increasingly network of teams that includes people of various competencies and backgrounds. This has its own learning curve and takes time as there is a need to build processes and tools around delivery, collaboration, and communication channels.
Infrastructural and Operational Complexities
When you decompose the domain/function into granular services, you end up building more and more microservices to fulfill specific business cases. This leads to more moving parts resulting in operational and infrastructural overheads such as configuration management, provisioning, integration, deployment, monitoring, etc.
For example., just imagine provisioning and monitoring ‘n’ instances of ‘m’ microservices when compared to ‘n’ instances of monolith application.
Containerization is one way to simplify some of the complexities involved such as provisioning, configuration and deployment of microservices. One can use Docker in combination with Kubernetes to address the complexities involved; however, there is learning curve associated with this. So, either you end up building automation capabilities in-house or buy readily available tools/frameworks to address some of these challenges.
Inter-Service Communication
When you switch from monolith to microservices, the in-process invocation becomes remote invocation as microservices are distributed in nature. This inter-service communication is the most complex factor in the microservices architecture and hence it is essential to understand the necessary patterns that helps to overcome challenges involved.
Synchronous vs. Asynchronous calls
Synchronous - Preferably used for one-to-one communication between microservices. For example, the “Authorization” microservice must be invoked synchronously by all other microservices to authorize tokens. As HTTP is a synchronous protocol, it is ideal to use the REST/HTTP protocol.
Asynchronous - Preferably used for both one-to-one and one-to-many communications between microservices.
In case of one-to-one it’s mostly notification (or event). For example, the “ChangePassword” microservice sends an event to the “NotifyEmail” microservice to notify the user via email.
In case of one-to-many, a pub/sub model is best. For example, the “Payment” microservice, on successful payment, publishes an event which is then read by subscribers such as the “Shipment” and “NotifyEmail” microservices for further processing
Communication Pattern (Service Mesh or side-car proxy)
One of the benefit of Microservices being a distributed architecture is its resiliency. However, you must be prepared to mitigate the communication failures that could potentially occur between ever growing microservices that communicate with each other remotely. A communication failure could be due to network failure or latency issue, or a service that is down or throwing errors. To anticipate such failures and build resilient application it is essential to have an abstraction layer in the infrastructure that handles underlying communication patterns such as service registry, service discovery and fault-tolerance, through circuit-breakers, latency-aware load balancing, failover, timeouts and retries. This abstraction layer at an infrastructure level is called the Service Mesh that provides a uniform way to communicate and manage the traffic flows between microservices, thereby eliminating the need to implement the underlying communication patterns in each microservice.
Refer to this wonderful writeup on the evolution of network communications starting from computer networking to microservices networking.
There are several frameworks and platforms out there such as Linkerd, Spring Cloud, istio, Netflx’s Prana, and a few others.
Service Discovery - Client Side vs. Server Side
In the complex topology of microservices, most of them are fine-grained in nature (for example, NotifyEmail, Authentication, GetOrder, etc.). However, in exceptional cases there could be a few coarse-grained microservices as well (for e.g. PlaceOrder), which again invoke fine-grained microservices in a chained or composed manner. This type of invocation by various clients through different means (http/rpc/event-driven) brings in the complexity in terms of service configuration, lookup and invocation. And this invocation gets even more complicated when microservices are made to run in a containerized environment where the number of instances of a service and their locations changes dynamically. Hence it is essential to segregate the internal vs. external calls and have a layer in between client applications and microservices that frees the clients from those complexities. Service mesh is more suitable for inter-service communication between microservices as explained above; however, it may not be ideal for those services that are consumed by clients directly, because the client accesses APIs through the standard http/rest protocol whereas service mesh calls happens through a transportation layer beneath the http protocol. From a feasibility point of view, having an API gateway between clients and the underlying microservices will help to keep the communication consistent, as well as addressing common API functionalities such as security, metering, logging and auditing, etc. This API gateway should be complemented by service mesh as the inter-service communication channel for underlying microservices.
Many Points of Failure
In a distributed architecture such as microservice, there is no single point of failure. Indeed, this is advantageous as the microservices provide better fault tolerance compared to monolith applications. However, this same characteristic of being fault-tolerant can play a harmful role, as the point of failure can be anywhere in the distributed system. This is the reason why you need to “plan for failure” by designing your microservices based system to consider implementing fault-tolerant techniques such as health-checks, self-healing, retries, load-balancing, and failover caching. As stated earlier, service mesh can be handy here as it provides many of these techniques.
Lack of Transaction Management Capability
We apply transaction management to keep the data in a consistent state. In a monolith, this is not a challenge as it involves in-process invocations between components and layers; thus each invocation is context-aware. However, in a microservices architecture, a shared database is an anti-pattern, which means each microservice has its own private database. As microservices are stateless in nature which means the same request fulfilling process results in multiple remote invocations between many microservices, thereby you are constrained in handling transactions between entities participating in a call. And this complexity grows when a system involves both SQL and NoSQL based data stores.
One way to address this challenge is to apply the Eventual Consistency pattern through an event based programming model to trigger events to the relevant microservices when the state has changed. You may consider an Eventual Consistency platform such as http://eventuate.io/ that provides an abstraction to the underlying implementation of eventual consistency, or you may consider building your own. As Jonas Boner of Lightbend says, “No one wants Eventual Consistency but it is a necessary evil”, and hence a thoughtful approach is necessary during the process of decomposing the modules and architecting the microservices end-to-end.
Security
In the microservices world, some of the communication between microservices can be on the same machine, some between different machines and some between different data centers. When there is such complexity in communication, it is essential to govern and intercept each call before authorizing an access to a protected resource, as there is uncertainty in the origin of the request, whether it comes from an external client or from another microservice. And as the number of microservices increases, the number of endpoints explodes, leading to complex distributed systems that require various security processes and patterns applied at various levels:
API Gateway pattern that provides abstraction, security, auditing and monitoring of the calls from external clients to the underlying microservices
Authorized access to protected resources at an application level. Consider using OAuth delegated authorization schemes such as JWT
Tried and tested encryption algorithms to encrypt data in transit and at rest
Securing infrastructure-level components such as Containers, SSL communication layer, Firewall, etc.
DevSecOps approach that helps in integrating security processes, principles, best practices and tools early in the development lifecycle, thereby encouraging collaboration among Security experts and Business Analysts, Architects, Development and Operations team, thus making everyone accountable and responsible for building secured systems.
Continuous Security testing – Automating security tests for both application and infrastructure layers and integrating with CI/CD pipeline provides a platform for continuous testing of security elements of the distributed system that you are building.
Static code analysis that checks for code vulnerabilities
Security testing of functional and non-functional aspects including application and infrastructural components. One can consider using frameworks such as BDD-Security that help in automating security tests using a standardized BDD approach through natural language.
Rapid Pace of Release
How do you approach release management when cross-functional teams keep delivering multiple versions of microservices day in and day out? A Continuous Delivery process is going to play a key role, which in turn pushes for Continuous Testing through automated tests complimented by a DevOps/DevSecOps strategy. This pushes for tight collaboration between multiple stakeholders such as the release management board, business, operations, architects, security experts and development teams.
Handling Logging, Tracing, Auditing And Monitoring Capabilities
Logs and other business metrics generated by each microservice are going to be key in understanding the application’s behavior, and will further help troubleshooting and debugging purposes. Hence it is essential to centralize the logging and metrics generation so they can be cross-correlated across microservices for ease-of- tracing purposes. One can use ELK stack or Loggly to centralize the logging, and use Event Sourcing technique to collect business metrics information from various microservices.
Deployment Strategy
In a monolith, the usual strategy is to deploy one application per host/machine, which is then scaled horizontally when needed. In microservices various patterns could be applied such as
Multiple service instances per Host
Provide some benefits such as efficient resource utilization and faster deployment due to less configuration. However isolation of service instances is a drawback. As the resources are shared amongst multiple instances, there is a possibility of one service instance consuming all the memory thereby impacting other service instances.
Service instance per Virtual Machine
Each service instance runs in its own isolated VM, where it has a fixed amount of CPU and memory allocated to it. However the resource utilization is less efficient. If the instances are hosted on the cloud, then they can leverage cloud infrastructure such as auto-scaling and load-balancing capabilities.
Service instance per Container
This approach has many benefits such as resource isolation, monitoring resource utilization, encapsulation of technology, ease-of-packaging and faster startup time. But there could some overheads such as administering containers, security, etc. This pattern is the most preferred approach, given the microservices’ demands for self-contained, lightweight, independent and auto-scaling capabilities. However the same capabilities can be provisioned through VMs by leveraging cloud infrastructure as well.
Besides these patterns, there is a need to think of other aspects such as Blue-Green deployments for safer release, rollback if there are issues, canary release to support multiple versions of services, A/B testing, etc.
Testing Strategy
Microservices emphasize teams distributed based on business capabilities, and responsible for end-to-end aspects. That includes the functional and sanity checks of their respective microservices. But there is a challenge with integration testing, as there is still a need to test end-to-end aspects of applications that involves many microservices. And the complexity grows when each team starts delivering multiple iterations of their respective microservices. Thus, there is a need to adopt various testing methodologies and continuous testing capabilities, through automation and closed collaboration between teams through the standard Agile process.
For example:
100% unit tests coverage
Automated acceptance tests using standardized approach such as BDD for functional coverage
PACT based consumer contract testing to check for integrity of contracts between microservices