Previous incidents
Juniper Core Switch Issue | Mitigation in progress
Resolved Jun 09 at 05:45pm MSK
The software update helped to rectify the situation and at the moment the load on the software part has been reduced to a minimum.
However, we are still considering replacing the Juniper stack with another one in the near future.
1 previous update
Partial Outage - RX-Line Cluster Node (Ryzen 9 9950X)
Resolved Jun 06 at 12:57am MSK
After a more thorough inspection, our engineer identified a faulty motherboard. It has been replaced, and the server is currently active — no recurrence is expected.
The faulty motherboard will be sent to the vendor for further analysis to help prevent similar incidents in the future.
2 previous updates
High w_await on RX nodes
Resolved May 19 at 12:05am MSK
Our monitoring systems recorded a sharp increase in the w_await indicator on some servers of the RX cluster. This indicator reflects the response time of NVMe drives during write operations.
Since we use RAID1 on our servers to increase the reliability of data storage, its speed depends on the "slowest disk" itself. If at least one disk from the array starts working incorrectly, this is reflected in the entire array.
During an internal investigation, we found that all the problematic drives...
CloudFlare Network Unavailability
Resolved May 03 at 07:03pm MSK
Since we did not receive a clear response from CloudFlare, what exactly is the reason for this incident, we decided to replace the IP addresses with others. The replacement was successful, this subnet has been decommissioned, and customers have no problems working with the CloudFlare network.
1 previous update