## Sunflow

#### Efficient Optical Circuit Scheduling for Coflows







**Xin Sunny Huang**, Xiaoye Steven Sun, T. S. Eugene Ng Rice University





**Big Data and Optical Lightpaths Driven Lab** 

## This Work

- Optical Circuit Switching has many advantages over packet switching.
- **Disadvantage**: usually worse traffic performance.
- Sunflow overcomes disadvantage with efficient circuit scheduling.

#### Optical Circuit Switch v.s. Electrical Packet Switch

|                      | Optical Circuit | Electrical Packet | OCS                              |
|----------------------|-----------------|-------------------|----------------------------------|
|                      | Switch (OCS)    | Switch (EPS)      | better?                          |
| Traffic              | No packet       | Store and forward | Energy                           |
| processing           | processing      | EACH packet       | efficiency                       |
| Bandwidth<br>upgrade | Reuse old       | Buy new           | Future proof,<br>cost efficiency |
| Switching speed      |                 |                   |                                  |

| Optical Circuit Switch Witch V.S. |                                                                                                                                                                                               |                         |                                  |  |  |
|-----------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------|----------------------------------|--|--|
| E                                 |                                                                                                                                                                                               | et Swit                 | ch                               |  |  |
|                                   |                                                                                                                                                                                               | cal Packet<br>ch (EPS)  | OCS<br>better?                   |  |  |
| Traffic<br>processin              | $ \begin{array}{c}     \mathbb{N} - \mathbb{I} \\     \mathbb{N} \\   \end{array} $ $ \begin{array}{c}     \mathbb{N} \\   \end{array} $ $ \begin{array}{c}     \mathbb{N} \\   \end{array} $ | and forward<br>H packet | Energy<br>efficiency             |  |  |
| Bandwidt<br>upgrade               | Send at "link rate",<br>up to <b>10s of ms</b><br><b>switching delay</b>                                                                                                                      | uy new                  | Future proof,<br>cost efficiency |  |  |
| Switching<br>speed                |                                                                                                                                                                                               |                         |                                  |  |  |



#### Optical Circuit Switch v.s. Electrical Packet Switch

|                      | Optical Circuit                                      | Electrical Packet               | OCS                              |
|----------------------|------------------------------------------------------|---------------------------------|----------------------------------|
|                      | Switch (OCS)                                         | Switch (EPS)                    | better?                          |
| Traffic              | No packet                                            | Store and forward               | Energy                           |
| processing           | processing                                           | EACH packet                     | efficiency                       |
| Bandwidth<br>upgrade | Reuse old                                            | Buy new                         | Future proof,<br>cost efficiency |
| Switching speed      | Setting up a circuit<br>up to 10s of <mark>ms</mark> | Packet granularity<br>10s of ns | Traffic delay                    |









Circuit Switching



Due to circuit switching delay, the performance of circuit switching is usually **worse** than packet switching for **small data**.



For **larger data**, performance of circuit switching may become **closer** to packet switching.



**Fundamental question:** Can circuit-switching be **as good as** packet-switching for **big data** traffic?

### Big data often comes in Coflows

- Coflow <sup>[1]</sup>: A set of parallel flows.
- Produced by distributed applications (e.g. Hadoop & Spark).
- Performance is measured by Coflow Completion Time (CCT), i.e. the last flow's completion time.



[1] Chowdhury, M. et al. Coflow: An application layer abstraction for cluster networking. (HotNets'12)

| Packet<br>Switching                          | Performance                                                 |
|----------------------------------------------|-------------------------------------------------------------|
| Circuit<br>Switching                         |                                                             |
| Fundamental question as good as packet-swite | : Can circuit-switching be ching for <b>Coflow</b> traffic? |



## **Existing circuit scheduling** algorithms: performance suffers from **inefficient** scheduling.

Existing circuit scheduling algorithms all rely on matrix decomposition



Coflow demand matrix

|      | out.6 out.7             |                         |  |  |  |
|------|-------------------------|-------------------------|--|--|--|
| in.1 | <b>p</b> <sub>1,6</sub> | <b>p</b> <sub>1,7</sub> |  |  |  |
| in.2 | <b>p</b> <sub>2,6</sub> | <b>p</b> <sub>2,7</sub> |  |  |  |
| in.3 | <b>P</b> <sub>3,6</sub> | <b>P</b> <sub>3,7</sub> |  |  |  |
| in.4 | <b>P</b> <sub>4,6</sub> | <b>P</b> <sub>4,7</sub> |  |  |  |
| in.5 | <b>P</b> <sub>5,6</sub> | <b>P</b> <sub>5,7</sub> |  |  |  |

scheduling algorithm (**decompose**)















Too strong: All circuits stop during switching.



Too strong: All circuits stop during switching.



Too strong: All circuits stop during switching.



Too strong: All circuits stop during switching.



## $\begin{bmatrix} A_1 & A_2 \\ A_1 \neq A_2 \end{bmatrix} = \begin{bmatrix} A_1 \neq A_2 \\ A_1 \neq A_2 \end{bmatrix} = \begin{bmatrix} A_1 \neq A_2 \\ A_1 \neq A_2 \end{bmatrix} = \begin{bmatrix} A_1 \neq A_2 \\ A_1 \neq A_2 \end{bmatrix} = \begin{bmatrix} A_1 \neq A_2 \\ A_1 \neq A_2 \end{bmatrix} = \begin{bmatrix} A_1 \neq A_2 \\ A_1 \neq A_2 \end{bmatrix} = \begin{bmatrix} A_1 \neq A_2 \\ A_1 \neq A_2 \end{bmatrix} = \begin{bmatrix} A_1 \neq A_2 \\ A_1 \neq A_2 \end{bmatrix} = \begin{bmatrix} A_1 \neq A_2 \\ A_1 \neq A_2 \end{bmatrix} = \begin{bmatrix} A_1 \neq A_2 \\ A_1 \neq A_2 \end{bmatrix} = \begin{bmatrix} A_1 \neq A_2 \\ A_1 \neq A_2 \end{bmatrix} = \begin{bmatrix} A_1 \neq A_2 \\ A_1 \neq A_2 \end{bmatrix} = \begin{bmatrix} A_1 \neq A_2 \\ A_1 \neq A_2 \end{bmatrix} = \begin{bmatrix} A_1 \neq A_2 \\ A_1 \neq A_2 \end{bmatrix} = \begin{bmatrix} A_1 \neq A_2 \\ A_1 \neq A_2 \end{bmatrix} = \begin{bmatrix} A_1 \neq A_2 \\ A_1 \neq A_2 \end{bmatrix} = \begin{bmatrix} A_1 \neq A_2 \\ A_1 \neq A_2 \end{bmatrix} = \begin{bmatrix} A_1 \neq A_2 \\ A_1 \neq A_2 \end{bmatrix} = \begin{bmatrix} A_1 \neq A_2 \\ A_1 \neq A_2 \end{bmatrix} = \begin{bmatrix} A_1 \neq A_2 \\ A_1 \neq A_2 \end{bmatrix} = \begin{bmatrix} A_1 \neq A_2 \\ A_1 \neq A_2 \end{bmatrix} = \begin{bmatrix} A_1 \neq A_2 \\ A_1 \neq A_2 \end{bmatrix} = \begin{bmatrix} A_1 \neq A_2 \\ A_1 \neq A_2 \end{bmatrix} = \begin{bmatrix} A_1 \neq A_2 \\ A_1 \neq A_2 \end{bmatrix} = \begin{bmatrix} A_1 \neq A_2 \\ A_1 \neq A_2 \end{bmatrix} = \begin{bmatrix} A_1 \neq A_2 \\ A_1 \neq A_2 \end{bmatrix} = \begin{bmatrix} A_1 \neq A_2 \\ A_1 \neq A_2 \end{bmatrix} = \begin{bmatrix} A_1 \neq A_2 \\ A_1 \neq A_2 \end{bmatrix} = \begin{bmatrix} A_1 \neq A_2 \\ A_1 \neq A_2 \end{bmatrix} = \begin{bmatrix} A_1 \neq A_2 \\ A_2 \neq A_2 \end{bmatrix} = \begin{bmatrix} A_1 \neq A_2 \\ A_2 \neq A_2 \end{bmatrix} = \begin{bmatrix} A_1 \neq A_2 \\ A_2 \neq A_2 \end{bmatrix} = \begin{bmatrix} A_1 \neq A_2 \\ A_2 \neq A_2 \end{bmatrix} = \begin{bmatrix} A_1 \neq A_2 \\ A_2 \neq A_2 \end{bmatrix} = \begin{bmatrix} A_1 \neq A_2 \\ A_2 \neq A_2 \end{bmatrix} = \begin{bmatrix} A_1 \neq A_2 \\ A_2 \neq A_2 \end{bmatrix} = \begin{bmatrix} A_1 \neq A_2 \\ A_2 \neq A_2 \end{bmatrix} = \begin{bmatrix} A_1 \neq A_2 \\ A_2 \neq A_2 \end{bmatrix} = \begin{bmatrix} A_1 \neq A_2 \\ A_2 \neq A_2 \end{bmatrix} = \begin{bmatrix} A_1 \neq A_2 \\ A_2 \neq A_2 \end{bmatrix} = \begin{bmatrix} A_1 \neq A_2 \\ A_2 \neq A_2 \end{bmatrix} = \begin{bmatrix} A_1 \neq A_2 \\ A_2 \neq A_2 \end{bmatrix} = \begin{bmatrix} A_1 \neq A_2 \\ A_2 \neq A_2 \end{bmatrix} = \begin{bmatrix} A_1 \neq A_2 \\ A_2 \neq A_2 \end{bmatrix} = \begin{bmatrix} A_1 \neq A_2 \\ A_2 \neq A_2 \end{bmatrix} = \begin{bmatrix} A_1 \neq A_2 \\ A_2 \neq A_2 \end{bmatrix} = \begin{bmatrix} A_1 \neq A_2 \\ A_2 \neq A_2 \end{bmatrix} = \begin{bmatrix} A_1 \neq A_2 \\ A_2 \neq A_2 \end{bmatrix} = \begin{bmatrix} A_1 \neq A_2 \\ A_2 \neq A_2 \end{bmatrix} = \begin{bmatrix} A_1 \neq A_2 \\ A_2 \neq A_2 \end{bmatrix} = \begin{bmatrix} A_1 \neq A_2 \\ A_2 \neq A_2 \end{bmatrix} = \begin{bmatrix} A_1 \neq A_2 \\ A_2 \neq A_2 \end{bmatrix} = \begin{bmatrix} A_1 \neq A_2 \\ A_2 \neq A_2 \end{bmatrix} = \begin{bmatrix} A_1 \neq A_2 \\ A_2 \neq A_2 \end{bmatrix} = \begin{bmatrix} A_1 \neq A_2 \\ A_2 \neq A_2 \end{bmatrix} = \begin{bmatrix} A_1 \neq A_2 \\ A_2 \neq A_2 \end{bmatrix} = \begin{bmatrix} A_1 \neq A_2 \\ A_2 \neq A_2 \end{bmatrix} = \begin{bmatrix} A_1 \neq A_2 \\ A_2 \neq A_2 \end{bmatrix} = \begin{bmatrix} A_1 \neq A_2 \\ A_2 \neq A_2 \end{bmatrix} = \begin{bmatrix} A_1 \neq A_2 \\ A_2 \neq A_2 \end{bmatrix} = \begin{bmatrix} A_1 \neq A_2 \\ A_2 \neq A_2 \end{bmatrix} = \begin{bmatrix} A_1 \neq A_2 \\ A_2 \neq A_2 \end{bmatrix} = \begin{bmatrix} A_1 \neq A_2 \\ A_2 \neq A_2 \end{bmatrix} = \begin{bmatrix} A_1 \neq A_2 \\ A_2 \neq A_2 \end{bmatrix} = \begin{bmatrix} A_1 \neq A_2 \\ A_2 \neq A_2 \end{bmatrix} = \begin{bmatrix} A_1 \neq A_2 \\ A_2 \neq A_2 \end{bmatrix} = \begin{bmatrix} A_1 \neq A_2 \\ A_2 \neq A_2 \end{bmatrix} = \begin{bmatrix} A_1 \neq A_2 \\ A_2 \neq A_2 \end{bmatrix} = \begin{bmatrix} A_1 \neq A_2 \\ A_2 \neq A_2 \end{bmatrix} = \begin{bmatrix} A_1 \neq A_2 \\ A_2 \neq A_2 \end{bmatrix} = \begin{bmatrix} A_1 \neq A_2 \\ A_2 \neq A_2 \end{bmatrix} = \begin{bmatrix} A_1 \neq A_2 \\ A_2 \neq A_2 \end{bmatrix} = \begin{bmatrix} A_1 \neq A_2 \\ A_2 \neq A_2 \end{bmatrix} =$

Too strong: All circuits stops during switching.

## 

#### **Not-all-stop Model**

In practice: Unchanged circuits remain active.



Too strong: All circuits stops during switching.

#### **Not-all-stop Model**

In practice: Unchanged circuits remain active.





Too strong: All circuits stops during switching.



#### **Not-all-stop Model**

In practice: Unchanged circuits remain active.



#### All-stop Model No Too strong: All circuits In

stops during switching.

#### **Not-all-stop Model**

In practice: Unchanged circuits remain active.



#### Not-all-stop switch model: Less stringent and more accurate.



← duration →
 ■ reconfiguration
 Output port No.
 circuit scheduled
 idle
 transmitting







## Simple & Efficient!

Greedy heuristic

Provably within 2x optimal, 1.03x in practice ...

|                  | Sunflow                                                                                                 | Other circuit schedulers                                                                        |
|------------------|---------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------|
| Intra-<br>Coflow | <ul> <li>Not allow subflows to preempt each other.</li> <li>Proved within 2x of the optimal.</li> </ul> | <ul> <li>Lots of preemptions<br/>and switching delay.</li> <li>Observed 10x optimal.</li> </ul> |
| Inter-<br>Coflow | <ul> <li>Flexible preemption<br/>policy. (e.g. shortest-<br/>Coflow-first)</li> </ul>                   | X Aggregated demand<br>matrix loses Coflow<br>boundary.                                         |
| Switch<br>model  | ✓ Not-all-stop (flexible)                                                                               | × All-stop (too strong)                                                                         |

## Simulation setup

- Implemented a flow-level, discrete-event simulator
- Workload[1] : realistic trace derived from Facebook cluster
  - 1hr traffic trace, ~500 Coflows, ~700k flows
- Circuit switching delay 10 ms (typical of today's products)
- Evaluated at the intra-Coflow and inter-Coflow level

## Simulation results

- At the intra-Coflow level:
  - Sunflow is close to the optimal
  - Sunflow is more efficient than the most viable circuit scheduling alternative, **Solstice (CoNEXT'15)**

Intra-Coflow circuit scheduling (Sunflow vs Solstice)



## Intra-Coflow circuit scheduling (Sunflow vs Solstice)





#### Intra-Coflow circuit scheduling (Sunflow vs Solstice)



## Simulation results

- At the intra-Coflow level:
  - Sunflow is close to the optimal
  - Sunflow is more efficient than the most viable circuit scheduling alternative, Solstice (CoNEXT'15)
- At the inter-Coflow level:
  - Sunflow's circuit switching achieves performance close to packet-switched Coflow schedulers, Varys (SIGCOMM'14)
  - Same link rate for Sunflow and Varys
  - Sunflow: 10ms switching delay. Varys: no switching delay.

## Circuit Switching vs Packet Switching (Sunflow vs Varys)

Sunflow's average-CCT over Varys's average-CCT



#### Circuit Switching vs Packet Switching (Sunflow vs Varys)

Sunflow's average-CCT over Varys's average-CCT



Change idleness by changing link rate and/or traffic size, but Coflow structures (flow endpoints) remain the same. 44

### More in the paper

- At the intra-Coflow level:
  - Sunflow's optimality based on Coflow structures
  - Sunflow v.s. packet switching based on Coflow sizes.
  - Switching overhead for Sunflow and Solstice
  - Sensitivity to flow ordering.
- At the inter-Coflow level:
  - Sunflow v.s. **Aalo (SIGCOMM'15)**, another Coflow schedulers based on packet switching.
- Sensitivity to **switching delay** at both levels.
- **Proof** of Sunflow's **performance guarantee** against circuit (packet) switching and Sunflow's **complexity**.

## Conclusions

We **can** simultaneously obtain

- benefits of optical circuit switching and
- good traffic performance for Coflows!
   Enabled by Sunflow:
- Efficient & flexible *not-all-stop* switch model
- Provably within 2x of the optimal, 1.03x in practice
- Near-packet-switching performance



#### Circuit Switching v.s. Packet Switching





This work is an apple-to-apple comparison between circuit switching and packet switching.

#### Circuit Switching v.s. Packet Switching



## Conclusions

We can simultaneously obtain

- benefits of optical circuit switching and
- good traffic performance for Coflows!
   Enabled by Sunflow:
- Efficient & flexible *not-all-stop* switch model
- Provably within 2x of the optimal, 1.03x in practice
- Near-packet-switching performance



## Backup slides

## Sensitivity to circuit switching delay on *intra*-Coflow scheduling



# Sunflow v.s. the optimal on *intra*-Coflow scheduling

| # senders      | > 1              | > 1     | 1       | 1       | any              |
|----------------|------------------|---------|---------|---------|------------------|
| # receivers    | > 1              | 1       | > 1     | 1       | any              |
| % bytes        | 99.9%            | 0.028%  | 0.024%  | 0.005%  | 100%             |
| % Coflows      | 26.6%            | 40.1%   | 9.9%    | 23.4%   | 100%             |
| Sunflow<br>CCT | 1.10x<br>optimal | optimal | optimal | optimal | 1.03x<br>optimal |

#### Circuit Switching Overhead (Sunflow v.s. Solstice)



#### Sunflow v.s. Varys on *inter*-Coflow scheduling

| avoraga flavu                          |                 |                  |       |
|----------------------------------------|-----------------|------------------|-------|
| average now<br>size ≥ 5 MB             | Long<br>Coflows | Short<br>Coflows | All   |
| % bytes                                | 98.8%           | 1.2%             | 100%  |
| % Coflows                              | 25.2%           | 74.8%            | 100%  |
| Per Coflow<br>Sunflow CCT<br>Varys CCT | 1.07x           | 2.16x            | 1.87x |



- Sort Coflow on priority.
- Assign circuit active time for flows.



- Sort Coflow on priority.
- Assign circuit active time for flows.



- Sort Coflow on priority.
- Assign circuit active time for flows.



High priority Coflow can preempt circuits from low priority Coflow.



#### Previous work on Coflow-aware network scheduling



- Why? Optimizing CCT reduces job completion time.<sup>[1]</sup>
- Key idea: Coordinate and schedule Coflows upon contention.
- Previous works are all in **packet** switching:
  - Min  $\Sigma$  CCTs (*Varys* in SIGCOMM'14)<sup>[2]</sup>  $\star$
  - Other variants:
    - uncertain Coflow byte size ('15)<sup>[3]</sup>, uncertain Coflow structures ('16)<sup>[4]</sup>
- [1] Chowdhury, M. et al. Coflow: An application layer abstraction for cluster networking. (HotNets'12)
- [2] Chowdhury, M. et al. Efficient coflow scheduling with Varys. (SIGCOMM'14)
- [3] Chowdhury, M. et al. Efficient coflow scheduling without prior knowledge. (SIGCOMM'15)
- [3] Zhang, H. et al. CODA: Toward Automatically Identifying and Scheduling Coflows in the Dark. (SIGCOMM'16)

# Thank You!







Xin Sunny Huang, Xiaoye Steven Sun, T. S. Eugene Ng Rice University





**Big Data and Optical Lightpaths Driven Lab**