AI Performance and Latency: How Infrastructure and Networks Matter

I’ve been thinking about this topic for a few days. I believe it’s a huge issue that spans multiple sectors: infrastructure, server cluster controllers, content delivery networks (CDNs), and even the physical distance between users and AI server farms.

Physical Distance Between AI Infrastructure and Users
We need to understand how we use AI Large Language Models (#LLM) like #ChatGPT or #Grok. Users access the web or a mobile application and input a query into the AI LLM. The AI server farm then processes the query and sends back a response to the user.

In this interaction, how far away is the user from the AI infrastructure? And how much latency does this distance create?

We might think about whether a CDN could help deliver responses (via caching) to users in order to reduce latency. However, replies and generated images are often dynamic content, which typically cannot be cached effectively. Of course, there are techniques like WAN acceleration, byte caching, or session caching that can improve efficiency to some extent.

That said, we all know CDNs still provide important functionality beyond caching, such as DDoS protection and DNS services.

According to reports, the top six AI infrastructure hubs in the world are located in the United States, Canada, Norway and Sweden, Singapore, India, and the United Arab Emirates [https://www.rcrwireless.com/20250501/fundamentals/top-ai-infrastructure]. That said, users generally do not know where the queried LLM model is actually running. To determine this, deeper measurements—such as DNS queries and web interaction analysis—would be required. Regardless, long-distance traffic transmission between regions is still unavoidable.

Internet infrastructure and Edge Computing
If the above assumption is correct, what role can Internet infrastructure and Tier 1 network operators play? Additional network deployment will certainly be required to support the high bandwidth demands of AI usage. However, this is not only a matter of bandwidth — latency is equally important.

In the earlier era of the web and HTTP content delivery, the processing requirements were relatively lightweight. A cache server could be deployed at the “edge” to serve static or semi-dynamic web content for multiple users in a given region. This model works well for traditional CDNs.

For AI, however, the situation is different. While lightweight LLMs might be deployed on edge computing nodes—provided those nodes have sufficient processing capability—large-scale AI infrastructure is far more demanding. Deploying an AI system with thousands of GPUs worldwide is not straightforward. Beyond hardware availability, challenges include massive power consumption and the cooling requirements of such clusters, which vary greatly depending on the region.

It’s an interesting question: how do we bring the scale of AI infrastructure closer to the edge while balancing efficiency, cost, and sustainability?

A CDN provider has already taken action in collaboration with several AI companies. The AI Gateway or MCP Gateway could help address those technical questions. Let’s take a look at the latest update:
[https://blog.cloudflare.com/ai-gateway-aug-2025-refresh/] [https://blog.cloudflare.com/how-cloudflare-runs-more-ai-models-on-fewer-gpus/?utm_campaign=cf_blog&utm_source=linkedin&utm_medium=organic_social&utm_content=20250827/] #20250827

AI Infrastructure, Controllers and Network Nodes
Building on the above points, it’s important to note that most AI infrastructure today is already quite mature. The robust frameworks developed during the Cloud Computing era—such as VXLAN, Kubernetes, and other cluster node controllers and load balancer as well—provide a strong foundation for managing large-scale AI workloads.

Major cloud providers like Google, AWS, and Microsoft have already demonstrated the stability and scalability of these architectures, showing how well-established the underlying infrastructure has become.

Managing a stable cluster node controller capable of handling millions of AI requests is truly an art of operations. It requires careful consideration of latency at every level, including the low-level hardware channels.

When a request is processed, the load balancer sends it to the appropriate nodes through the infrastructure’s network devices and routing protocols. The nodes then process the request, consolidate the results, and send the final response back to the user. Each of these steps is interrelated, and optimizing them is crucial for delivering fast, reliable AI responses.

Some researchers have started studying how to reduce—or determine the necessary speed of—latency in these processes. This spans multiple layers: operating systems, RAM, GPU channels, network connections, network protocols, control and data planes, fiber channels, and more.

If all distributed nodes could be interconnected with specialized high-speed links—such as PCIe, ASIC boards, Scalable Link Interface (SLI), or CrossFire links—what would that mean for latency? It’s an interesting thought experiment worth exploring. In the latest Nvidia blog, the Quantum-X InfiniBand Photonics was introduced as an integrated fiber-optic switch, which reduces the electrical-to-optical conversion between the switch and the fiber at the optical transceiver. [ https://developer.nvidia.com/blog/scaling-ai-factories-with-co-packaged-optics-for-better-power-efficiency/?ncid=so-link-416360&linkId=100000378640319]. I believe that while it may initially be used only in AI infrastructure, one day it could become essential for every Tier 1 network infrastructure.

Open discussion.

Geo-Mapping ASN and Prefixes: Still Relevant?

I am currently revamping the entire BGPtrace system. Many ideas, but limited time.

One topic I keep thinking about is: how important is the country code associated with an ASN or prefix? We understand that it can affect localization applications or impose IP-level service restrictions.

Applications may attempt to force the client (browser or app) to use location data for verification (yes, we know about GPS spoofing tricks).

However, at the ISP or network service provider level, with multihoming and Anycast structures, prefixes are advertised globally without strict geographic boundaries. So, is it still useful to assign a country code to them? (Yes, we could use the country of company registration, but that’s a different issue.)

This ties into another topic: summarizing traffic by country code. For that, probes or NetFlow exporters should be geographically distributed. Only then can accurate data reports be produced. This is common practice in Internet traffic analysis.

If that’s the case, then using data from multiple RouteViews servers becomes essential for prefix analysis. Or perhaps combining tunneling with NetFlow/BGP route exports (an older method) is still the better approach?

Thoughts? Discussion welcome.

TCP Congestion Control – again

This topic keeps coming up again and again:

Which TCP congestion control algorithm has the best performance?
Which handles high latency, delays, and bottlenecks better?
And some might ask, what does it mean when the congestion window becomes zero? AH…

Yes, it’s true — no matter how much technology improves, how fast chipsets and processors become, or how advanced fast-forwarding gets — congestion still happens.

Today’s topic isn’t about local datacenter LAN-side congestion (which may be caused by high network usage in clustered, fabric-based structures — AI workloads, maybe?).

We’re talking about long-haul network connections — like between two distant endpoints. Take Starlink, for example: a satellite-based system with unpredictable latency and environmental effects.

So, how are these “OLD” TCP congestion control algorithms still being used today?


Can monitoring TCP congestion control behavior — like how the congestion window changes — be used to detect potential congestion and trigger a BGP or transit path switch?
Maybe — but it depends on the access technology. If the last mile is wireless, measurements might not be reliable enough.


So, which one is best suited for the modern era?

Here’s a list of Linux-supported TCP congestion control algorithms (summarized by ChatGPT):

highspeed: Designed for networks with large bandwidth-delay products.

reno: The traditional TCP congestion control algorithm.

cubic: Default in Linux since kernel 2.6.19; optimized for high-speed networks.

bbr: Developed by Google (since kernel 4.9); focuses on bottleneck bandwidth and round-trip time.

bic: Binary Increase Congestion Control; predecessor to CUBIC.

htcp: Hamilton TCP; designed for high-speed, long-distance networks.

vegas: A delay-based algorithm that adjusts the sending rate based on RTT variations.

westwood: Optimized for lossy networks; adjusts congestion window based on bandwidth estimation.

yeah: Yet Another Highspeed TCP; combines delay and loss-based congestion detection.

hybla: Improves performance over high-latency networks by compensating for RTT.

illinois: Dynamically adjusts the congestion window for high-speed, long-distance links.

lp: Low Priority; designed for background traffic to yield to more important flows.

veno: Hybrid of Reno and Vegas; optimized for wireless networks.

scalable: Increases congestion window more rapidly than traditional TCP.

nv: New Vegas; an improvement over TCP Vegas.

cdg: Congestion Distance; uses delay gradients to detect congestion.

dctcp: Data Center TCP; optimized for data center networks using ECN.

Can AI Operate the Internet?

Skynet? No, no Terminator, please.

Just joking—don’t jump to the movie scene right away. Let’s consider the idea of AI operating a network, similar to how automation handles network device configuration and zero-touch provisioning. Is it possible?

Let’s start with automation. A device comes with an initial configuration that includes an IP address and gateway. Once it’s network-accessible, the central server recognizes the device and pushes the final configuration. That’s a simple flow, right?

But what happens next—who operates the device?
Today, it’s still a human network operator.

Could AI replace the network operator? It depends on how the AI model is trained, how it handles fault tolerance, and how it avoids operational errors. There’s no such thing as 100% guarantee.

However, using AI to operate parts of the network—such as traffic tuning, AS path changes, or metric modifications—seems realistic.

AI can assist with tasks like network measurements, route and traffic tuning, DDoS detection and auto-mitigation (which already exists), device fault detection and traffic redirection, and DNS updates. That all seems feasible. You could train an AI module specifically for this purpose.

But then again—do we need AI if existing technologies already meet these requirements?

Let’s consider a scenario and see what AI (specifically, ChatGPT) can do.


Scenario:

  1. An Internet provider has 2 transit links:
    • Link A: 1 Gbps
    • Link B: 10 Gbps
  2. The subnets 10.0.0.0/24 and 11.0.0.0/24 (AS 65000) are advertised to Link A.
  3. The same subnets, but with AS path prepending (65000, 65000), plus 12.0.0.0/24 (65000), are advertised to Link B.
  4. Link A is congested—11.0.0.0/24 is using nearly 900 Mbps.
  5. Link B’s total usage is just 5 Gbps.
  6. A customer on 10.0.0.0/24 complains they cannot access the Internet.

ChatGPT’s Suggested Operation:

Summary:

  • Remove or reduce AS path prepend on Link B.
  • Optionally add prepends on Link A.
  • Let traffic shift to Link B (10 Gbps).
  • This will relieve congestion on Link A.
  • It should also restore Internet access for users on 10.0.0.0/24.

Business Follow-Up:

“From a technical perspective, that works. But from a business point of view, Link A is now underutilized. If Link B has a higher cost per bps, this is inefficient.”

ChatGPT’s Response:

AspectStrategy
Technical QoSOffload heavy prefixes like 11.0.0.0/24 to Link B
Cost ControlKeep lighter prefixes like 10.0.0.0/24 on Link A to avoid overusing B
Business FitAvoid paying premium for B unless capacity is needed
Scalable OptionAutomate prepend adjustments using monitoring-based thresholds

Yes, that’s a great example of how AI can analyze your needs if you provide enough detail and criteria.

The remaining concern is AI’s operational cost and power consumption.

This topic definitely deserves more discussion—SD-WAN + AI?

#BGP #bgptrace #AI #chatGPT #AS #ASPrepend #path #link #transit #INTERNET #SDWAN

Enhance Internet performance by using the right Public DNS servers

If you are thinking of how to enhance your Internet performance, it is great that you can subscribe a higher bandwidth Internet service. But is it the right way?

No really.

Increasing the bandwidth cannot shortern the latency between you and the destination server. But You cannot control our provider’s network path.

Under the current Web server depolyment, using the Content Delivery Network to deliver the content to Internet is a comment way. However, your network provider’s DNS server may not response the optimial server for the request domain. Therefore, you cannot enjoy the lowest latency between you and the request server.

HOW to?

A little measurement you can do, you can try to make a query to several publilc DNS servers, the reponse result may not be the same. For Example, by 8.8.8.8 Google or 1.1.1.1 Cloudflare. Based on the result, a simple ping test you can perform and record the lower latency one. Finally, you can setup a bind server to forware the domain to that DNS server to have a better Internet performance.

Reference what our work. Feel Free to discuss.

https://www.bgptrace.com/DNS/running_result.html

#DNS #Internet #Measurement #Ping #Latency #CloudFlare #Google #1.1.1.1 #8.8.8.8