Over the last six weeks we have experienced several outage events lasting on average of around an hour.
Our engineering team quickly identified the root cause of this issue as originating with a third party infrastructure provider responsible for providing the message queue that allows our internal services to communicate with each other. This queue is essential to the platform functioning well and allows us to handle spikes in demand whilst still serving API requests quickly and at scale. It is a core aspect of modern software architecture and a vital part of our infrastructure here at Fundstack.
As part of our outage response procedures we conduct extensive post mortem investigations and work to remediate the issue. When issues are the result of a failure by a service provider our first action is always to talk to the service provider and attempt to work with them to improve the service moving forward.
Unfortunately we have been unable to find a constructive path of improvement with our existing service provider and have thus started a relationship with a new service provider who can offer us a higher SLA / SLO.
Yesterday that transition was completed with no disruption of service.
We hope this will greatly improve the reliability of our product. Going forward you can monitor the platform status at status.fundstack.com.