Full playlist: • Designing Event-Driven Systems Concepts an...
CHAPTER 6 Processing Events with Stateful Functions
Designing Event-Driven Systems Concepts & Patterns for Streaming Services with Apache Kafka
The pdf version is available here. https://assets.confluent.io/m/7a91acf...
CHAPTER 6 Processing Events with Stateful Functions
Imperative styles of programming are some of the oldest of all, and their popu‐ larity persists for good reason.
Procedures execute sequentially, spelling out a story on the page and altering the program’s state as they do so.
As mainstream applications became distributed in the 1980s and 1990s, the same mindset was applied to this distributed domain.
Approaches like Corba and EJB (Enterprise JavaBeans) raised the level of abstraction, making distributed pro‐ gramming more accessible.
History has not always judged these so well.
EJB, while touted as a panacea of its time, fell quickly by the wayside as systems creaked with the pains of tight coupling and the misguided notion that the net‐ work was something that should be abstracted away from the programmer.
In fairness, things have improved since then, with popular technologies like gRPC and Finagle adding elements of asynchronicity to the request-driven style.
But the application of this mindset to the design of distributed systems isn’t nec‐ essarily the most productive or resilient route to take.
Two styles of program‐ ming that better suit distributed design, particularly in a services context, are the dataflow and functional styles.
You will have come across dataflow programming if you’ve used utilities like Sed or languages like Awk.
These are used primarily for text processing; for example, a stream of lines might be pushed through a regex, one line at a time, with the output piped to the next command, chaining through stdin and stdout.
This style of program is more like an assembly line, with each worker doing a specific task, as the products make their way along a conveyor belt.
Since each worker is con‐ cerned only with the availability of data inputs, there have no “hidden state” to track.
This is very similar to the way streaming systems work.
Events accumulate in a stream processor waiting for a condition to be met, say, a join operationMaking Services Stateful There is a well-held mantra that statelessness is good, and for good reason.
State‐ less services start instantly (no data load required) and can be scaled out linearly, cookie-cutter-style.
Web servers are a good example: to increase their capacity for generating dynamic content, we can scale a web tier horizontally, simply by adding new servers.
So why would we want anything else?
The rub is that most applications aren’t really stateless.
A web server needs to know what pages to render, what sessions are active, and more.
It solves these problems by keeping the state in a database.
So the database is stateful and the web server is stateless.
The state problem has just been pushed down a layer.
But as traffic to the website increa‐ ses, it usually leads programmers to cache state locally, and local caching leads to cache invalidation strategies, and a spiral of coherence issues typically ensues.
Streaming platforms approach this problem of where state should live in a slightly different way.
First, recall that events are also facts, converging toward the stream processor like conveyor belts on an assembly line.
So, for many use cases, the events that trigger a process into action contain all the data the pro‐ gram needs, much like the dataflow programs just discussed.
If you’re validating the contents of an order, all you need is its event stream.
Sometimes this style of stateless processing happens naturally; other times imple‐ menters deliberately enrich events in advance, to ensure they have all the data they need for the job at hand.
But enrichments inevitably mean looking things up, usually in a database.
Stateful stream processing engines, like Kafka’s Streams API, go a step further: they ensure all the data a computation needs is loaded into the API ahead of time, be it events or any tables needed to do lookups or enrichments.
In many cases this makes the API, and hence the application, stateful, and if it were restar‐ ted for some reason it would need to reacquire that state before it could proceed.
This should seem a bit counterintuitive.
Why would you want to make a service stateful?
Another way to look at this is as an advanced form of caching that better suits data-intensive workloads.
To make this clearer, let’s look at three examples —one that uses database lookups, one that is event-driven but stateless, and one that is event-driven but stateful.
The Event-Driven Approach Say we have an email service that listens to an event stre
Информация по комментариям в разработке