Message brokers are a very powerful components in the architecture of distributed applications, which should absorb an indefinite, unknown, or unforeseen amount of workloads and data movement.
When using a message broker or a messaging queue in production, it is often the crucial component that has to be alive for the rest of the app or platform to be fully functional. Therefore, plenty of handlers need to be considered when you're using a typical message broker before reaching production.
The Top 8.
- Centralized Monitoring from clients to broker, all the way to the infrastructure. Many different events can cause infrastructure and client breaks, and often it's hard to get to root-cause. Having a unified tool that can correlate the events can reduce the fixed time significantly.
- Auto-scaling. Unpredictable scales and workloads are commonplace in data streaming. You should be able to handle it using code and predefined policy.
- Self-healing. In streaming, there are multiple crash scenarios like network lags, unbalanced brokers, consumer crashes, and schema transformation. Having an auto-scaling policy, real-time notifications, and a self-healing mechanism installed and configured can save you a lot of effort and time in both keeping your broker alive during peak events and unexpected workloads.
- A Dead-letter queue. For auditing and root-cause fixes. Depending on the use case, having a "recycle bin" with a "restore" option is a much easier way to debug an async issue, as debugging data usually requires multiple cycles of reproducing the "bad" event to the client.
- Retransmit mechanism. Consumers have three states -
Listening, processing, acknowledging. If a consumer crashes during the processing stage before acknowledging the message, it should be known to both the consumer and the broker to retransmit only the "unfinished" message, not send the next one.
- Storing offsets. To achieve the above section, it is the client's responsibility to preserve the already read and acked offsets (or message ids), to understand where the needle is, and what needs to be retransmitted/reasked.
- Enforce schemas. Messages are being pushed from everywhere. The ability to enforce a single standard across the different producers is a must to enable a healthy scale and proper governance.
- Real-time notifications. Make sure you are aware of every bit and component. In the best case, if something went wrong on your streaming layer - you have failed to submit a BI report. In the worst case - you have failed to process payment transactions, and the quicker you will respond, the lower the impact will be.
In case you prefer not to handle the above 8 yourself -
Memphis.dev offers most of it out-of-the-box.