ccamp                                                         Q. Hu, Ed.
Internet-Draft                                                   W. Wang
Intended status: Informational                                  J. Zhang
Expires: 24 April 2026                                           Y. Zhao
                      Beijing University of Posts and Telecommunications
                                                             Y. Tan, Ed.
                                                                Y. Zheng
                                                            China Unicom
                                                         21 October 2025


   A Control Framework for Unified Optical Networks and AI Computing
                         Orchestration (UONACO)
               draft-hu-ccamp-uonaco-control-framework-00

Abstract

   This document presents the control framework for Unified Optical
   Networks and AI Computing Orchestration (UONACO).  Specifically, it
   defines the AI computing service model over wide-area networks,
   outlines the UONACO control architecture, identifies a set of UONACO
   components and interfaces, and describes their interactions.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on 24 April 2026.

Copyright Notice

   Copyright (c) 2025 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents (https://trustee.ietf.org/
   license-info) in effect on the date of publication of this document.


Hu, et al.                Expires 24 April 2026                 [Page 1]

Internet-Draft          UONACO Control Framework            October 2025


   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.  Code Components
   extracted from this document must include Revised BSD License text as
   described in Section 4.e of the Trust Legal Provisions and are
   provided without warranty as described in the Revised BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
     1.1.  Requirements Language . . . . . . . . . . . . . . . . . .   3
   2.  Service Model for AI Computing over Optical Network . . . . .   4
     2.1.  Customer  . . . . . . . . . . . . . . . . . . . . . . . .   4
     2.2.  Service Provider  . . . . . . . . . . . . . . . . . . . .   4
     2.3.  Network Provider  . . . . . . . . . . . . . . . . . . . .   4
     2.4.  Computing Power Provider  . . . . . . . . . . . . . . . .   5
   3.  UONACO Control and Management Architecture  . . . . . . . . .   5
     3.1.  Overview  . . . . . . . . . . . . . . . . . . . . . . . .   6
     3.2.  Service Orchestrator  . . . . . . . . . . . . . . . . . .   6
     3.3.  Unified Compute-Optical Orchestrator  . . . . . . . . . .   6
     3.4.  Optical Network Controller  . . . . . . . . . . . . . . .   7
     3.5.  Computing Power Scheduler . . . . . . . . . . . . . . . .   7
     3.6.  UONACO Interfaces . . . . . . . . . . . . . . . . . . . .   7
   4.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .   8
   5.  Security Considerations . . . . . . . . . . . . . . . . . . .   8
   6.  References  . . . . . . . . . . . . . . . . . . . . . . . . .   8
     6.1.  Normative References  . . . . . . . . . . . . . . . . . .   8
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .   9

1.  Introduction

   Distributed AI computing has become a dominant paradigm for
   delivering large-scale AI services, enabling providers to meet
   stringent performance and scalability requirements by leveraging
   geographically dispersed AI data centers (AIDCs).  In such
   environments, the efficiency of distributed training, inference, and
   remote service access depends critically on tight coordination
   between optical transport networks and compute orchestration systems.


Hu, et al.                Expires 24 April 2026                 [Page 2]

Internet-Draft          UONACO Control Framework            October 2025


   However, today's infrastructure operates with fundamentally isolated
   control planes: the optical transport layer, despite providing the
   high-bandwidth, low-latency, and deterministic backbone for wide-area
   AI collaboration, remains blind to the dynamic, heterogeneous demands
   of AI workloads.  It cannot discern whether a traffic flow stems from
   a bandwidth-intensive distributed training job requiring synchronized
   all-reduce operations across thousands of GPUs, or from a latency-
   critical inference request demanding sub-10ms end-to-end response.
   Consequently, optical networks provision static or best-effort
   lightpaths without adapting to the real-time compute intent, leading
   to underutilized spectral resources or, worse, congestion-induced
   stalls during critical gradient synchronization phases.

   Conversely, AI compute schedulers (e.g., Kubernetes-based
   orchestrators in AIDCs) make placement decisions based solely on
   local GPU/CPU availability and memory capacity, with no awareness of
   the underlying optical fabric's state, such as available wavelength
   continuity, end-to-end propagation delay, per-link bandwidth
   headroom, or even the presence of OXC-based reconfigurable paths.  As
   a result, a training job may be split across geographically distant
   AIDCs with abundant but poorly interconnected GPU pools, causing
   prolonged communication phases and severe “compute efficiency loss.”
   Similarly, a low-latency inference service might be deployed in a
   remote AIDC simply because it has idle GPUs, even though the optical
   path violates the application's SLA due to high round-trip delay or
   lack of dedicated wavelength isolation.

   To address these challenges, this document introduces the Unified
   Optical Networks and AI Computing Orchestration (UONACO) framework.
   UONACO establishes a unified control architecture that enables
   bidirectional signaling, joint resource modeling, and synchronized
   orchestration between the compute and optical domains.  The framework
   supports three representative service models: AI training, AI
   inference, and accessing remote AI inference services.  By aligning
   network provisioning with compute intent—and vice versa—UONACO aims
   to improve the efficiency of wide-area collaborative AI computing.

1.1.  Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described in BCP
   14 [RFC2119] [RFC8174] when, and only when, they appear in all
   capitals, as shown here.


Hu, et al.                Expires 24 April 2026                 [Page 3]

Internet-Draft          UONACO Control Framework            October 2025


2.  Service Model for AI Computing over Optical Network

   The deployment of wide-area AI services over optical infrastructure
   involves multiple stakeholders, each playing a distinct role in the
   end-to-end service delivery chain.  To clarify responsibilities and
   interactions, this document defines a service model comprising the
   Customer, Service Provider, Network Provider, and Computing Power
   Provider.

2.1.  Customer

   The Customer is the end user or enterprise that consumes AI
   capabilities.  Three primary service patterns are observed:

   • In AI training, the customer delegates the training of large-scale
   AI models to service providers, typically specifying performance,
   scale, and data privacy requirements.

   • In AI inference, the customer leases computing resources to deploy
   and operate inference models, often serving downstream internet users
   with real-time or batch inference services.

   • In accessing remote AI inference service, the customer invokes pre-
   deployed inference APIs offered by third parties, expecting
   deterministic latency, reliability, and quality of service without
   managing underlying infrastructure.

2.2.  Service Provider

   The Service Provider acts as the business orchestrator, interfacing
   directly with the Customer to translate high-level service
   intents—such as SLAs, geographic constraints, or performance
   targets—into concrete resource demands.  It coordinates with both the
   Network Provider and the Computing Power Provider to fulfill these
   demands, and is responsible for service lifecycle management,
   billing, and customer support.

2.3.  Network Provider

   The Network Provider operates and manages the underlying optical
   transport infrastructure.  It delivers high-bandwidth, low-latency,
   and deterministic connectivity services, including inter-AIDC
   backbone links and user-to-AIDC dedicated access circuits.  The
   Network Provider exposes network capabilities—such as available
   bandwidth, path latency, and reliability—through standardized control
   interfaces to enable coordinated service provisioning.


Hu, et al.                Expires 24 April 2026                 [Page 4]

Internet-Draft          UONACO Control Framework            October 2025


2.4.  Computing Power Provider

   The Computing Power Provider owns and operates one or more Artificial
   Intelligence Data Centers (AIDCs).  It offers compute, memory, and
   accelerator resources (e.g., GPUs, TPUs) for AI training and
   inference workloads.  The Computing Power Provider reports real-time
   resource availability and performance metrics to the Service Provider
   and supports dynamic task placement and scaling based on
   orchestration instructions.

3.  UONACO Control and Management Architecture


                 +----------------------+
                 |       Customer       |
                 +---------+-^----------+
                           | |
                 +---------v-+----------+
                 | Service Orchestrator |
                 +---------+-^----------+
                           | | SUI
           +---------------v-+------------------+
           |Unified Compute-Optical Orchestrator|
           +-----+-^-------------------+-^------+
                 | | UCI               | | UOI
   +-------------v-+-------+ +---------v-+--------------+
   |Compute Power Scheduler| |Optical Network Controller|
   +-----------+-^---------+ +------+-^-----------------+
               | |                  | |
               | |                  | |    +------------+
         +-----v-+-----+            | |    |User Access |
         |Compute Power|            | |    |   Point #1 |
         |    Pool     |            | |    +------+-----+
         |  +-------+  |            | |           |
         |  |AIDC #1|+-+-+   +------v-+------+    |
         |  +-------+  |  \+-|               |----+
         |             |     |Optical Network|
         |  +-------+  |   +-|               |----+
         |  |AIDC #2|+-+-+/  +---------------+    |
         |  +-------+  |                          |
         |             |                   +------+-----+
         +-------------+                   |User Access |
                                           |   Point #2 |
                                           +------------+


Hu, et al.                Expires 24 April 2026                 [Page 5]

Internet-Draft          UONACO Control Framework            October 2025


            Figure 1: UONACO Control and Management Architecture

3.1.  Overview

   As shown in Figure 1, the UONACO framework establishes a layered
   control architecture that enables end-to-end coordination between
   service intent, compute resources, and optical transport
   infrastructure.  This architecture comprises five core functional
   components—Customer, Service Orchestrator (SO), Unified Compute-
   Optical Orchestrator (UCOO), Optical Network Controller (ONC), and
   Computing Power Scheduler (CPS)—interconnected through three
   standardized interfaces.

3.2.  Service Orchestrator

   The SO serves as the business-facing interface of the UONACO
   framework.  It is responsible for accepting AI service requests from
   customers—such as “deploy a distributed training job across multiple
   AIDCs with end-to-end latency under X ms” or “provision an inference
   service with Y GPU instances and guaranteed bandwidth”—and
   translating these intent-based specifications into structured
   resource requirements.  The SO also handles service lifecycle
   management, including billing, SLA enforcement, and user
   authentication.  It does not manage physical resources directly but
   instead communicates abstracted demands to the UCOO via the SUI
   interface.

3.3.  Unified Compute-Optical Orchestrator

   The UCOO is the central coordination engine of the UONACO
   architecture.  It receives service intents from the SO and
   continuously collects real-time telemetry from both the optical
   network (via ONC) and compute infrastructure (via CPS).  Based on
   this global view, the UCOO executes joint optimization algorithms
   that consider both compute capabilities (e.g., GPU availability,
   memory) and network conditions (e.g., path latency, available
   bandwidth, congestion).  The output of this decision process is a
   pair of synchronized instructions: one for optical path provisioning
   and another for compute task placement.  The UCOO thus bridges the
   semantic and operational gap between the service layer and the
   infrastructure layer.


Hu, et al.                Expires 24 April 2026                 [Page 6]

Internet-Draft          UONACO Control Framework            October 2025


3.4.  Optical Network Controller

   The ONC represents the control plane of the underlying optical
   transport infrastructure.  It may encompass a hierarchy of
   controllers, including intra-domain optical controllers and inter-
   domain coordinators (e.g., multi-domain WSON or OXC orchestrators).
   The ONC is responsible for managing physical and virtual optical
   resources—such as wavelengths, time slots, fgOTN/OSU slices, and OXC
   cross-connects—and for executing path computation, signaling, and
   protection mechanisms.  In the UONACO framework, the ONC exposes
   network topology, available capacity, and performance metrics to the
   UCOO through the UOI interface, and applies provisioning commands
   issued by the UCOO to establish, adjust, or release optical
   connections in response to compute workload dynamics.

3.5.  Computing Power Scheduler

   The CPS acts as the controller for the AI compute pool, typically
   spanning one or more Artificial Intelligence Data Centers (AIDCs).
   It manages heterogeneous compute resources—including CPUs, GPUs,
   TPUs, memory, and storage—and reports their real-time availability,
   utilization, and performance characteristics (e.g., FLOPS, VRAM
   usage) to the UCOO.  Upon receiving placement instructions from the
   UCOO via the UCI interface, the CPS schedules AI workloads (e.g.,
   training jobs or inference containers) onto appropriate nodes,
   configures runtime environments, and ensures that compute tasks are
   aligned with the concurrently provisioned optical connectivity.

3.6.  UONACO Interfaces

   The UONACO framework defines three key interfaces which have been
   shown in Figure 1, to enable interoperability and decoupled evolution
   of its components.

   SUI (SO-UCOO Interface): SUI connects SO and UCOO.  Through this
   northbound interface, the SO conveys high-level service intent,
   including abstracted SLA requirements (e.g., maximum end-to-end
   latency, minimum bandwidth, geographic constraints), service type
   (e.g., AI training, inference, or remote access), and lifecycle
   events (e.g., service activation, modification, or termination).  The
   UCOO interprets these intents as concrete resource demands and
   initiates joint optimization.  The SUI thus serves as the bridge
   between business-oriented service definitions and infrastructure-
   aware orchestration.

   UOI (UCOO-ONC Interface): UOI links UCOO with ONC.  This interface
   enables bidirectional communication: the UCOO sends optical resource
   requests specifying required connectivity attributes such as


Hu, et al.                Expires 24 April 2026                 [Page 7]

Internet-Draft          UONACO Control Framework            October 2025


   bandwidth, end-to-end latency bounds, path isolation level, and
   resilience requirements; in return, the ONC provides real-time
   network state updates, including topology, available wavelengths or
   time slots, link utilization, propagation delay, and fault status.
   By exposing network capabilities and constraints to the orchestration
   layer, the UOI allows the UCOO to make network-feasible decisions and
   enables the ONC to provision optical paths that are aligned with
   compute workload dynamics.

   UCI (UCOO-CPS Interface): UCI connects UCOO and CPS.  Through this
   interface, the UCOO issues compute resource demands and task
   placement directives—such as the number and type of accelerators
   required, memory footprint, and preferred deployment topology—based
   on the outcome of joint compute-optical optimization.  Conversely,
   the CPS reports real-time compute resource availability, node load,
   energy efficiency metrics, and task execution status (e.g., job
   progress, failure alerts).  This feedback loop ensures that compute
   allocation respects both application requirements and the quality of
   the concurrently provisioned optical connectivity, thereby avoiding
   placements that would violate network SLAs.

   These interfaces are designed to be protocol-agnostic but are
   expected to leverage standardized, model-driven approaches (e.g.,
   YANG/NETCONF or RESTCONF) to ensure vendor neutrality and
   scalability.

4.  IANA Considerations

   TBD

5.  Security Considerations

   TBD

6.  References

6.1.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <https://www.rfc-editor.org/info/rfc2119>.

   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
              May 2017, <https://www.rfc-editor.org/info/rfc8174>.


Hu, et al.                Expires 24 April 2026                 [Page 8]

Internet-Draft          UONACO Control Framework            October 2025


Authors' Addresses

   Qiaojun Hu (editor)
   Beijing University of Posts and Telecommunications
   Email: qiaoj475@bupt.edu.cn


   Wei Wang
   Beijing University of Posts and Telecommunications
   Email: weiw@bupt.edu.cn


   Jie Zhang
   Beijing University of Posts and Telecommunications
   Email: jie.zhang@bupt.edu.cn


   Yongli Zhao
   Beijing University of Posts and Telecommunications
   Email: yonglizhao@bupt.edu.cn


   Yanxia Tan (editor)
   China Unicom
   Email: tanyx11@chinaunicom.cn


   Yanlei Zheng
   China Unicom
   Email: zhengyanlei@chinaunicom.cn


Hu, et al.                Expires 24 April 2026                 [Page 9]