Wide Area Networks (WANs), global backbones and workhorses of today’s internet that connect billions of computers across continents and oceans, are the foundation of current online services. As a result of COVID-19’s reliance on online services, today’s networks are failing to meet the high bandwidth and availability demands imposed by new workloads.
Scientists from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and from Facebook recently devised a method for preserving the network and lowering costs when the fibre goes down. ARROW, their technology reconfigures the optical light from a broken fibre to a healthy fibre while using an online algorithm to proactively plan for future fibre cuts ahead of time, depending on real-time internet traffic demands.
ARROW is based on two distinct approaches: “failure-aware traffic engineering,” which directs traffic to where the bandwidth resources are after fibre breaks, and “wavelength reconfiguration,” which recovers failed bandwidth resources by rearranging the light.
Despite its strength, this combination is theoretically difficult to solve due to its NP-hardness in computational complexity theory.
The researchers developed a unique method that can basically generate “lottery tickets” as an abstraction for the wavelength reconfiguration problem on optical fibres while only feeding relevant information into the traffic engineering problem. This works in conjunction with their optical restoration technology, which transfers light from the severed fibre to surrogate healthy fibres to re-establish network connectivity.
“ARROW can be used to improve service availability and enhance the resiliency of the internet infrastructure against fiber cuts. It renovates the way we think about the relationship between failures and network management—previously failures were deterministic events, where failure meant failure, and there was no way around it except over-provisioning the network,” says MIT postdoc Zhizhen Zhong, the lead author on a new paper about ARROW. “With ARROW, some failures can be eliminated or partially restored, and this changes the way we think about network management and traffic engineering, opening up opportunities for rethinking traffic engineering systems, risk assessment systems, and emerging applications too.”