AI Performance and Latency: How Infrastructure and Networks Matter

I’ve been thinking about this topic for a few days. I believe it’s a huge issue that spans multiple sectors: infrastructure, server cluster controllers, content delivery networks (CDNs), and even the physical distance between users and AI server farms.

Physical Distance Between AI Infrastructure and Users
We need to understand how we use AI Large Language Models (#LLM) like #ChatGPT or #Grok. Users access the web or a mobile application and input a query into the AI LLM. The AI server farm then processes the query and sends back a response to the user.

In this interaction, how far away is the user from the AI infrastructure? And how much latency does this distance create?

We might think about whether a CDN could help deliver responses (via caching) to users in order to reduce latency. However, replies and generated images are often dynamic content, which typically cannot be cached effectively. Of course, there are techniques like WAN acceleration, byte caching, or session caching that can improve efficiency to some extent.

That said, we all know CDNs still provide important functionality beyond caching, such as DDoS protection and DNS services.

According to reports, the top six AI infrastructure hubs in the world are located in the United States, Canada, Norway and Sweden, Singapore, India, and the United Arab Emirates [https://www.rcrwireless.com/20250501/fundamentals/top-ai-infrastructure]. That said, users generally do not know where the queried LLM model is actually running. To determine this, deeper measurements—such as DNS queries and web interaction analysis—would be required. Regardless, long-distance traffic transmission between regions is still unavoidable.

Internet infrastructure and Edge Computing
If the above assumption is correct, what role can Internet infrastructure and Tier 1 network operators play? Additional network deployment will certainly be required to support the high bandwidth demands of AI usage. However, this is not only a matter of bandwidth — latency is equally important.

In the earlier era of the web and HTTP content delivery, the processing requirements were relatively lightweight. A cache server could be deployed at the “edge” to serve static or semi-dynamic web content for multiple users in a given region. This model works well for traditional CDNs.

For AI, however, the situation is different. While lightweight LLMs might be deployed on edge computing nodes—provided those nodes have sufficient processing capability—large-scale AI infrastructure is far more demanding. Deploying an AI system with thousands of GPUs worldwide is not straightforward. Beyond hardware availability, challenges include massive power consumption and the cooling requirements of such clusters, which vary greatly depending on the region.

It’s an interesting question: how do we bring the scale of AI infrastructure closer to the edge while balancing efficiency, cost, and sustainability?

A CDN provider has already taken action in collaboration with several AI companies. The AI Gateway or MCP Gateway could help address those technical questions. Let’s take a look at the latest update:
[https://blog.cloudflare.com/ai-gateway-aug-2025-refresh/] [https://blog.cloudflare.com/how-cloudflare-runs-more-ai-models-on-fewer-gpus/?utm_campaign=cf_blog&utm_source=linkedin&utm_medium=organic_social&utm_content=20250827/] #20250827

AI Infrastructure, Controllers and Network Nodes
Building on the above points, it’s important to note that most AI infrastructure today is already quite mature. The robust frameworks developed during the Cloud Computing era—such as VXLAN, Kubernetes, and other cluster node controllers and load balancer as well—provide a strong foundation for managing large-scale AI workloads.

Major cloud providers like Google, AWS, and Microsoft have already demonstrated the stability and scalability of these architectures, showing how well-established the underlying infrastructure has become.

Managing a stable cluster node controller capable of handling millions of AI requests is truly an art of operations. It requires careful consideration of latency at every level, including the low-level hardware channels.

When a request is processed, the load balancer sends it to the appropriate nodes through the infrastructure’s network devices and routing protocols. The nodes then process the request, consolidate the results, and send the final response back to the user. Each of these steps is interrelated, and optimizing them is crucial for delivering fast, reliable AI responses.

Some researchers have started studying how to reduce—or determine the necessary speed of—latency in these processes. This spans multiple layers: operating systems, RAM, GPU channels, network connections, network protocols, control and data planes, fiber channels, and more.

If all distributed nodes could be interconnected with specialized high-speed links—such as PCIe, ASIC boards, Scalable Link Interface (SLI), or CrossFire links—what would that mean for latency? It’s an interesting thought experiment worth exploring. In the latest Nvidia blog, the Quantum-X InfiniBand Photonics was introduced as an integrated fiber-optic switch, which reduces the electrical-to-optical conversion between the switch and the fiber at the optical transceiver. [ https://developer.nvidia.com/blog/scaling-ai-factories-with-co-packaged-optics-for-better-power-efficiency/?ncid=so-link-416360&linkId=100000378640319]. I believe that while it may initially be used only in AI infrastructure, one day it could become essential for every Tier 1 network infrastructure.

Open discussion.

Can AI Replace Network Engineers? A Real-World Perspective from the Trenches

Today, I watched the latest YouTube video by NetworkChuck.
He discussed how AI is affecting network engineers. It was really interesting — and it motivated me to continue my journey as a network engineer.

Of course, I’m facing some difficulties right now.
If you have any open positions for network engineers, please feel free to message me anytime. I’d really appreciate it.

I’m currently based in the UK.
Yes, I could join Amazon — I mean, Amazon Flex.
No joking. I’m serious.


Right now, I’m especially interested in data center networking and hyperscale networking for AI.
Regardless of the use case — whether it’s business, research, or something more creative like cryptocurrency — I find these areas fascinating.

Also, Internet route checking (RPKI again? Yes, definitely) is something I want to explore further.


On the topic of device deployment — automation, zero-touch provisioning, scripting, etc.:

Yes, scripting in Bash or Perl (SSH into the device, wait for a prompt, automate the commands) has been used for a long time. But today, there are many modern approaches: agents, tailored remote access methods, tunneling deployments, and more.

How do they compare? Which method is best?
Ansible? Python scripting? YAML? Network programming?

And… you prepare every configuration manually? (No way — maybe for 10,000+ devices?!)


OK, let’s step back. Here’s how it usually goes:

  1. Gather and analyze all requirements
  2. Define the needed solutions
  3. Prepare configurations
  4. Plan deployment and coordinate across multiple parties

So the question is: what can AI help with?

I asked ChatGPT about step 3 — configuration preparation and the best automation method.
It gave me this:

Use Case                  | Best Tool
--------------------------|--------------------------
First-time provisioning | 🟠 ZTP
Day-2 config, updates | 🟢 Ansible
Dynamic runtime scripting | 🔵 Python

You don’t have to choose just one — combining them leads to full end-to-end automation.

Great! But… what about the rest of the steps to complete a full solution?
Can AI help us with the entire plan?

I think this is how you should approach it:

  1. Analyze the requirements in detail
  2. Identify what the solution needs
  3. Prepare the configurations
  4. Design the deployment plan

You’d need to provide AI with a comprehensive scope of work, and then ask it to generate a plan.


But can you trust it?

YOU need to verify all outputs from AI.
YOU need to plan the project timeline and human resources.
YOU need to review the configurations generated by AI carefully.

If you don’t know networking, project planning, or resource management — how would you even do that?

That’s why:
YOU = Network Engineer, Network Project Manager

We’re always learning new things. AI is becoming a great assistant, but right now, we shouldn’t fully trust it without verification.


As I said before:
Can AI operate the Internet?
Well… that depends on how the AI is designed to manage networks, and which parts of the network it’s capable of operating.

Discussion continues…

Can AI Operate the Internet?

Skynet? No, no Terminator, please.

Just joking—don’t jump to the movie scene right away. Let’s consider the idea of AI operating a network, similar to how automation handles network device configuration and zero-touch provisioning. Is it possible?

Let’s start with automation. A device comes with an initial configuration that includes an IP address and gateway. Once it’s network-accessible, the central server recognizes the device and pushes the final configuration. That’s a simple flow, right?

But what happens next—who operates the device?
Today, it’s still a human network operator.

Could AI replace the network operator? It depends on how the AI model is trained, how it handles fault tolerance, and how it avoids operational errors. There’s no such thing as 100% guarantee.

However, using AI to operate parts of the network—such as traffic tuning, AS path changes, or metric modifications—seems realistic.

AI can assist with tasks like network measurements, route and traffic tuning, DDoS detection and auto-mitigation (which already exists), device fault detection and traffic redirection, and DNS updates. That all seems feasible. You could train an AI module specifically for this purpose.

But then again—do we need AI if existing technologies already meet these requirements?

Let’s consider a scenario and see what AI (specifically, ChatGPT) can do.


Scenario:

  1. An Internet provider has 2 transit links:
    • Link A: 1 Gbps
    • Link B: 10 Gbps
  2. The subnets 10.0.0.0/24 and 11.0.0.0/24 (AS 65000) are advertised to Link A.
  3. The same subnets, but with AS path prepending (65000, 65000), plus 12.0.0.0/24 (65000), are advertised to Link B.
  4. Link A is congested—11.0.0.0/24 is using nearly 900 Mbps.
  5. Link B’s total usage is just 5 Gbps.
  6. A customer on 10.0.0.0/24 complains they cannot access the Internet.

ChatGPT’s Suggested Operation:

Summary:

  • Remove or reduce AS path prepend on Link B.
  • Optionally add prepends on Link A.
  • Let traffic shift to Link B (10 Gbps).
  • This will relieve congestion on Link A.
  • It should also restore Internet access for users on 10.0.0.0/24.

Business Follow-Up:

“From a technical perspective, that works. But from a business point of view, Link A is now underutilized. If Link B has a higher cost per bps, this is inefficient.”

ChatGPT’s Response:

AspectStrategy
Technical QoSOffload heavy prefixes like 11.0.0.0/24 to Link B
Cost ControlKeep lighter prefixes like 10.0.0.0/24 on Link A to avoid overusing B
Business FitAvoid paying premium for B unless capacity is needed
Scalable OptionAutomate prepend adjustments using monitoring-based thresholds

Yes, that’s a great example of how AI can analyze your needs if you provide enough detail and criteria.

The remaining concern is AI’s operational cost and power consumption.

This topic definitely deserves more discussion—SD-WAN + AI?

#BGP #bgptrace #AI #chatGPT #AS #ASPrepend #path #link #transit #INTERNET #SDWAN

Model Training on AMD 16-core CPU with 8GB RAM running in a virtual machine for Bitcoin Price Prediction – Part 2 – Updated

Continuing with Over 500,000+ Data Points for Bitcoin (BTC) Price Prediction

Using the Python program, the first method I tried was SVR (Support Vector Regression) for prediction. However… how many steps should I use for prediction? 🤔

Previously, I used a Raspberry Pi 4B (4GB RAM) for prediction, and… OH… 😩
I don’t even want to count the time again. Just imagine training a new model on a Raspberry Pi!

So, I switched to an AMD 16-core CPU with 8GB RAM running in a virtual machine to perform the prediction.

  • 60 steps calculation: Took 7 hours 😵
  • 120 steps: …Man… still running after 20 hours! 😫 Finally !!! 33 Hours

Do I need an M4 machine for this? 💻⚡

ChatGPT provided another approach.
OK, let’s test it… I’ll let you know how it goes! 🚀

🧪 Quick Example of More Time Steps Effect

Time Step (X Length)Predicted AccuracyNotes
30⭐⭐⭐Quick but less accurate for long-term trends.
60⭐⭐⭐⭐Balanced context and performance.
120⭐⭐⭐⭐½Better for long-term trends but slower.
240⭐⭐Risk of overfitting and slower training.

#SVR #Prediction #Computing #AI #Step #ChatGPT #Python #Bitcoin #crypto #Cryptocurrency #trading #price #virtualmachine #vm #raspberrypi #ram #CPU #CUDB #AMD #Nvidia

Model Training Using TensorFlow on Raspberry Pi 4B (4GB RAM) for Bitcoin Price Prediction

The development of a CRYPTO gaming system https://www.cryptogeemu.com/ has been ongoing for around two years. What does it actually do? Well… just for fun!

The system captures data from several major crypto market sites to fetch the latest price list every minute. It then calculates the average values to determine the price. Users can create a new account and are given a default balance of $10,000 USD to buy and sell crypto—but there’s no actual real-market trading.

The Thought Process

Suddenly, I started wondering:
How can I use this kind of historical data? Can I make a prediction?

So, I simply asked ChatGPT about my idea. I shared the data structure and inquired about how to perform predictions.

ChatGPT first suggested using Linear Regression for calculations. However, the predicted values had a large difference compared to the next actual data point.

Next, it introduced me to the Long Short-Term Memory (LSTM) method for training under the TensorFlow library.

I fed 514,709 lines of BTC price data into the training program on a Raspberry Pi 4B (4GB RAM).
The first run took 7 hours to complete the model !!!!!!!!!!!!!!!!!

But the result… um… 😐

I’m currently running the second round of training. I’ll update you all soon!

Sample Data:

YYYY/MM/DD-hh:mm:ss  Price  
2025/02/17-20:06:09 95567.20707189501
2025/02/17-20:07:07 95582.896334665

P.S.: I’m not great at math. 😅

#BTC #Bitcoin #TensorFlow #AI #CryptoGeemu #RaspberryPi #Training #Crypto #ChatGPT #LinearRegression #LSTM #LongShortTermMemory

AI Network Operator – under Deepseek case

We all know how successful Deepseek has been in recent months. It demonstrates that a low-processing-power, CPU-based AI is possible. Adopting this type of AI anywhere, including IoT devices or even routers, could be feasible.

Cisco, Juniper, Arista, and other network device manufacturers already produce hardware with high processing power. Some of these devices run Linux- or Unix-based platforms, allowing libraries and packages to be installed on the system. If that’s the case, can AI run on them?

Based on Deepseek’s case, tests have shown that an ARM Linux-based Raspberry Pi can successfully run AI. Although the response time may not meet business requirements, it still functions.

Running AI on a router (perhaps within the control plane?) could enable AI to control and modify router configurations. (Skynet? Terminator?) But then, would the AI become uncontrollable?

There are several key questions to consider:

  1. What can AI do on routers and firewall devices?
  2. Can AI self-learn the network environment and take further control?
  3. Can AI troubleshoot operational issues?

It seems like an interesting topic for further research. However, before diving deeper, teaching AI about network operations should no longer be a major concern.

Paragraph proofreading by #ChatGPT

AI Picture generated by #CANVA

#AI #Network #internet #networkoperation #operation #IP #Router #RaspberryPI #PI #Cisco #Juniper #Arista #opensource #BGP #routing

Install and Run OLLAMA on Linux Machine

So many tech guys already share how to install the OLLAM. I wont say too details. Just a brief step for you.

  1. Prepare a Machine with good GPU, CPU and > 16G RAM. (Raspberry Pi can run with the Deepseek 1.5B, other………. Please chec my last post)
  2. Install update your linux repos.
    sudo apt-get update -y
    sudo apt-get upgrade -y

  3. Install Ollam by the follow command
    curl -fsSL https://ollama.com/install.sh | sh
  4. Run the LLM model, if you wont have the model at your machine, it will be download automatically.
    ollama run <model>
    e.g.: ollama run deepseek-r1:8b

  5. The model will be downloaded to /usr/share/ollama/.ollama/models/
  6. what model you can run? Check here
    https://ollama.com/search
  7. OLLAMA command line is a little bit similar with Docker, check this.

PS: You also can install OLLAMA at WINDOWS, please also check OLLAMA website.

Lets try your own AI locally!

#OLLAMA #Model #AI #CPU #GPU #CUDB #RAM #RaspberryPI #Docker

Raspberry Pi 4b 4G Ram with Deepseek 8b….

Answer is ……. Fail………..

I cannot run Deepseek 8b on my Raspbeery Pi 4b 4G Ram version…….

OLLAMA keeps loading and crash and load again….. Suspect the storage full…..

Anyone?

2025-02-06

Hooray!!! Model directory MAP to share drive.

ERROR!!!!!!!! Not enough Memory!!!!!!!!!!

#AI #deepseek #raspbeerypi #R1-8B #memory #ram #ERROR

Deepseek 1.5b vs 8b version

Well, we all expect that 1.5b and 8b may have a different of AI’s knowledge.

We made a test,
1. 1.5b we are on Raspberry PI 4B 4G Ram.,
2. 8b on virtual machine with AMD Radeon and 16G Ram on Ubuntu.

We only ask a question.

“what is the difference between you and chatGPT”

  • 1. 1.5b versions

  • 2. 8b versions

The knowledge base of course 8b will be better. However, we will most concern of the resource usage. Can Raspberry PI CPU base can process this efficient?

#deepseek #AI #CPU #raspberrypi #GPU #nvidia #CUDB #AMD

Deepseek on Raspberry PI?????

Tech guys are interested in how AI and LLM model processing on an IOT, low power devices such as Raspberry PI.

But??!!!!

NO GPU!!!!!!!!!!!!

How to run the AI model????

OK, We dont want to talk about how to install and run on OLLAMA.

We have tried on 1.5b version of Deepseek on our PI4 4G RAM device.

Amazing that it works! However, you cannot expect the response time and token would be good enough for fast response.

By this kind of success, we can imagine that more other model can be running on CPU based IOT device. Therefore, will the home assistant widely adopt?

Let see……………….