JOURNAL METRICS

CiteScore 2025: 2.7 ℹCiteScore:

CiteScore is the number of citations received by a journal in one year to documents published in the three previous years, divided by the number of documents indexed in Scopus published in those same three years.

SCImago Journal Rank (SJR) 2025: 0.276 ℹSCImago Journal Rank (SJR):

The SJR is a size-independent prestige indicator that ranks journals by their 'average prestige per article'. It is based on the idea that 'all citations are not created equal'. SJR is a measure of scientific influence of journals that accounts for both the number of citations received by a journal and the importance or prestige of the journals where such citations come from It measures the scientific influence of the average article in a journal, it expresses how central to the global scientific discussion an average article of the journal is.

Source Normalized Impact per Paper (SNIP) 2025: 0.598 ℹSource Normalized Impact per Paper(SNIP):

SNIP measures a source’s contextual citation impact by weighting citations based on the total number of citations in a subject field. It helps you make a direct comparison of sources in different subject fields. SNIP takes into account characteristics of the source's subject field, which is the set of documents citing that source.

A Queueing Theory-Based Dynamic Load Balancing Algorithm for Optimizing Software-Defined Networking Performance in Homogeneous and Heterogeneous Environments

Maghrib Abidalreda Maky Alrammahi^* | Mohanad Yahya Al-hamami | Ali Mohammed Taher

Information Technology Research and Development Centre, University of Kufa, Najaf 54001, Iraq

Department of Mathematics, Faculty of Basic Education, University of Kufa, Najaf 54001, Iraq

Corresponding Author Email:

maghrib.alramahi@uokufa.edu.iq

Received:

6 March 2026

Revised:

3 May 2026

Accepted:

19 May 2026

Available online:

31 May 2026

| Citation

jesa_59.05_22.pdf

OPEN ACCESS

Abstract:

Software-Defined Networking (SDN) is one of the most prominent modern technologies in network management because it separates the control plane from the data plane, enabling improved performance and more dynamic resource management. Load balancing among servers is a major challenge in SDN networks because it directly affects service quality and network efficiency. To address this issue, a dynamic load balancing with utilizes queueing theory (DLBQT) algorithm is proposed to distribute traffic among servers with the aim of improving network performance. Simulations were conducted in Mininet with a Python OpenFlow eXtensible (POX) SDN controller using real traffic traces from Facebook data centers. Three representative load levels were evaluated: 1,000, 100,000, and 1,000,000 requests. For each algorithm and environment (homogeneous and heterogeneous), the experiments were repeated three times under identical request sequences, and the performance metrics were averaged. The reported percentage gains in packet loss, response time, and throughput are computed using a standard percentage-improvement metric over the aggregated results across the three load levels. Within the evaluated SDN setup, DLBQT reduces packet loss and response time by up to 37.5% and 8.61%, respectively, and improves throughput by 6.61% in the homogeneous environment, while in the heterogeneous environment, the corresponding improvements reach 39.2%, 11.4%, and 18.4%, respectively.

Keywords:

Software-Defined Networking, load balancing, least Control Process Unit, least Random Access Memory, Least Connection, Python OpenFlow eXtensible, Mininet, queueing theory

1. Introduction

Software-Defined Networking (SDN) is a recent development in network management, providing more flexibility since it decouples the data plane and the control plane. This division allows programmability in the control center, optimum utilization of resources, and better performance of the network [1]. The SDN architecture comprises three main layers. The application layer defines the network policies and services. The control layer contains the network controller, which processes traffic and decides how to route flows. The data plane consists of forwarding devices that route traffic according to the controller’s instructions [2].

Despite these benefits, one of the most significant SDN aspects is load balancing [3]. The distribution of loads across servers should be done in an efficient manner to minimize congestion and the loss of packets, to enhance response times, and to enhance the rate of data transfer across servers [4]. Network environments are either homogeneous or heterogeneous [5].

In a homogeneous environment, servers have similar hardware specifications, such as Control Process Unit (CPU) and Random Access Memory (RAM) capacity. In contrast, a heterogeneous environment contains servers with different CPU and memory capacities [6]. Heterogeneous environments are closer to real-world deployments, where resource variability helps accommodate diverse workloads and optimize utilization.

The load balancing techniques can be classified as both static and dynamic [7]. In static algorithms, the decision parameters are fixed during execution and do not change in response to real-time network conditions. As a result, static schemes cannot adapt to dynamic traffic and often lead to inefficient resource usage [8]. Conversely, dynamic algorithms can be configured to use the current measurements of server load, response time, and data traffic status to make decisions and therefore can be more adaptive and offer a higher degree of performance [9]. It has been shown previously that the dynamic approaches are more effective, especially when operating within heterogeneous environments.

In this paper, proposes a load balancing algorithm, namely dynamic load balancing with utilizes queueing theory (DLBQT) to optimize load distribution in SDN networks. The algorithm is designed to achieve more accurate decision-making and a balance between performance and decision complications.

This research included the following key contributions:

- Proposal of a dynamic DLBQT algorithm based on the queueing theory concept to improve load balancing in SDN networks.

- Evaluation of the algorithm in homogeneous and heterogeneous environments using Mininet and the Python OpenFlow eXtensible (POX) controller.

- Comparing performance with the Least CPU (LCPU), Least Connection (LC), and Least RAM (LRAM) algorithms using packet loss, response time, and throughput.

- Providing a practical solution to improve the performance of dynamic networks and reduce congestion.

The remainder of this paper is organized as follows. Section 2 reviews previous studies in the field of load balancing. Section 3 provides an explanation of performance metrics and the improvement method. Section 4 explains the proposed methodology, flowchart, and general diagram. Section 5 explains and describes the proposed network and discusses the results. Section 6 discusses and analyses the work achieved for the proposed algorithm. Finally, Section 7 presents the conclusion and future work.

The main abbreviations used throughout this paper are summarized in Table 1 for ease of reference.

Table 1. Abbreviations used in the paper

Abbrevia	Description
SDN	Software-Defined Networking
DLBQT	Dynamic Load Balancing based on Queueing Theory
LC	Least Connection
LCPU	Least CPU
LRAM	Least RAM
QoS	Quality of Service
CI	Confidence Interval
SD	Standard Deviation
MM1	M/M/1 queueing model
R	Number of experimental runs per configuration

2. Related Work

This section reviews previous studies on dynamic load balancing in SDN networks, their key techniques, results, and weaknesses.

In the study [10], the Adaptive Lowest Load Ratio (ALLR) algorithm was suggested, and it approximates server load according to the monitoring and adaptive computations. It was also compared with the Dynamic Least Connection (DLC) and Dynamic Least Bandwidth (DLB) algorithms and had better response time (13.37%) and throughput (8%) and low CPU consumption. This method helps with decentralized decision-making, but it is based upon periodic updates, and this can lead to problems when the load is high. In the study [11], a dynamic weighting method of the Weighted Round Robin (WRR) algorithm was suggested that requires fewer communications. This method balances the weights based on the actual data rate that enhances the performance of load balancing and resource consumption. Findings showed an increase in the response time and packet loss, although it is not easy to compute the optimal weights because of unexpected data traffic variations, which can lead to inefficiencies in the algorithm's performance and affect overall network reliability. In the study [12], the Load Balancing based on Resource Utilization (LBORU) algorithm was developed on a hybrid SDN network, using different measurements, including CPU, I/O, and network speed. The algorithm achieved high load balancing while maintaining a CPU variance of 10%. Even though it is a better use of resources and less latency, it is more complex and has not been demonstrated to be effective in practice, particularly in real-world scenarios where varying workloads and network conditions can impact performance. In the study [13], a dynamic load balancing algorithm was implemented in SDN data centers and executed in Mininet and Floodlight controllers. This method diverts the streams of data to less overloaded routes, enhancing throughput (as much as 50 percent) and reply time. Nevertheless, the routing decision-making and controller load can be higher when data updates are high in the network, which may lead to increased latency and reduced overall performance of the data center operations. In the study [14], Enhanced_Conn, which is a combination of dynamic and static factors, was used as the extended LC algorithm. It showed better load balancing and reduced response times in comparison to the RR and LC algorithms. The need to run the data process continuously complicates its computation, even though it is efficient, particularly when compared to the static nature of the RR and LC algorithms, which do not require continuous data processing.

In this study [15], four dynamic load balancing algorithms (namely LRAM, LCPU, Least Connection-Least CPU-RAM (LCLCPURAM), and Least CPU-RAM (LCPURAM) were proposed and tested to compare them with the famous LC algorithm in a scenario that includes the servers with different capacities. The experiments were performed with Mininet, an OpenFlow switch, and a POX controller. The findings revealed that the LCLCPURAM algorithm would have a great latency and waiting time improvement, although the LC algorithm had better outcomes in service time. In general, the study proved the efficacy of dynamic approaches to the enhancement of performance in a heterogeneous environment.

In this study [16], a deep learning-based multi-label classification method with an adaptive load balancing mechanism was proposed for distributed computing systems. The method is based on a Recurrent Neural Network (RNN) with Long Short-Term Memory (LSTM) architecture for feature extraction and traffic classification, and then a Deep Reinforcement Learning (DRL) agent is used to optimize the load distribution dynamically. The proposed model showed remarkable improvements; the average throughput rose from 85-115 requests/s to 120-150 requests/s, and the average latency reduced from 95-120 ms to 65-95 ms. It has been observed, however, that the model has some misclassifications for some traffic types (such as video streaming), and the combination of several deep learning components makes the model more complex, potentially restricting its use in real time in resource-limited SDN environments.

In this study [17], DeepBalance, a DRL framework for dynamic load balancing in SDNs, was proposed. A Deep Q-Network (DQN) agent is continuously observing real-time network states, such as link utilization, queue sizes, and flow statistics, and learns optimal routing policies using a multi-objective reward function, which consists of three components: load distribution (60%), congestion avoidance (30%), and throughput-to-latency efficiency (10%). Experimental results demonstrated that the variance of link utilization was reduced by 37%, the throughput was increased by 28%, and the latency was decreased by 42% compared to shortest-path routing. Even with its excellent performance, DeepBalance needs lots of training time and computation power, and the initial exploration period may result in sub-optimal early routing decisions. Additionally, the high state-action space complexity makes it challenging to scale up to hundreds of nodes.

In this study [18], a decentralized Quality of Service (QoS)-aware load balancer for edge and cloud environments was proposed, namely, QEdgeProxy. The load balancing problem is stated as a Multi-Player Multi-Armed Bandit (MP-MAB) problem with heterogeneous per-client QoS rewards. The load balancers work independently and use Kernel Density Estimation (KDE) to estimate the QoS success probability of each service instance while also using an adaptive exploration mechanism to cope with performance changes and non-stationary traffic patterns. The system outperformed the other two baselines (proximity and reinforcement learning) in terms of per-client QoS satisfaction. This approach is for compute-continuum and IoT environments and is decentralized, unlike classical SDN server-side load balancing, and the bandit-based model is not analytically guaranteed like a queueing-theoretic formulation.

In summary, the dynamic load balancing algorithms presented in the literature (e.g., ALLR, WRR with dynamic weights, LBORU, Enhanced_Conn, and the LCPU/LRAM family) have many merits compared to static algorithms but still have some disadvantages in real SDN scenarios. These have the following limitations: (i) they are less accurate when sudden traffic bursts occur or when server capacities vary (single or loosely coupled metrics, such as CPU or connections only); (ii) they have high control overhead and decision complexity because they need frequent state updates; and (iii) they have not been fully validated with realistic traffic traces in both homogeneous and heterogeneous environments.

In particular, the LCPU/LRAM family of algorithms introduced by reference [15] in 2026 represents a recent state-of-the-art baseline for dynamic server-side load balancing in SDN, demonstrating clear advantages over the classic LC strategy in heterogeneous environments. In this work, these drawbacks are explicitly addressed and empirically demonstrated by comparing the proposed DLBQT algorithm, which is based on queueing theory, with three widely used dynamic algorithms (LCPU, LC, and LRAM) under identical network topologies, traffic loads, and performance metrics (packet loss, response time, and throughput), as detailed in Sections 5 and 6. It should be noted that this comparison is conducted in terms of the algorithms’ mechanisms of operation, implementation, and efficiency in load distribution within the specific simulation environment of this research, without relying on the quantitative results, network design, or operating conditions reported in previous studies. Instead, the study is based on the network design proposed in this work, evaluated under two distinct environments (homogeneous and heterogeneous) and using a real-world dataset adopted for generating packets and requests. This ensures that the comparison provides a current and realistic assessment of the DLBQT algorithm’s ability to enhance SDN performance under practical network conditions.

The proposed DLBQT algorithm explicitly models each server as a queueing system, where the arrival rate (λ), service rate (μ), utilization (ρ), and average waiting time are continuously estimated in real time. Based on these parameters, the controller computes a weight for each server and always selects the server with the lowest weight for incoming requests, while assigning a high penalty weight when the utilization approaches saturation. This analytically grounded decision mechanism allows DLBQT to proactively avoid congestion and to distribute load more accurately than heuristics that rely only on CPU usage, RAM, or connection counts.

The recent state-of-the-art studies in SDN load balancing have continued to seek dynamic and hybrid schemes. Some of the publications provided better dynamic weighted round robin and hybrid dynamic-static algorithms that combine heuristic measurements and metaheuristics or intelligent controllers to enhance response time and throughput in SDN systems. Multi-parameter decision algorithms and load balancing at the controller level have been proposed in other work. In large-scale SDN deployments, researchers have used metrics such as CPU, memory, connection utilization, and QoS limitations to reallocate load across controllers or servers. Recent surveys and reviews (2018–2026) have also noted a trend towards multi-metric, QoS-conscious, and AI-assisted load balancing solutions, as well as the importance of accurate traffic modeling and realistic testing environments. Nevertheless, most of these methods still depend on empirically tuned metrics or heuristic combinations rather than an analytically founded queueing model to make decisions. They are often evaluated under limited traffic conditions or without a direct comparison between homogeneous and heterogeneous settings. In contrast to the above AI-driven and QoS-aware approaches, which rely on empirically trained models, heuristic reward functions, or bandit-based probabilistic policies, the proposed DLBQT algorithm provides an analytically grounded decision mechanism based on M/M/1 queueing theory. This allows for interpretable, closed-form weight computation in real time without requiring training data or exploration phases, while achieving competitive performance improvements in both homogeneous and heterogeneous SDN environments.

3. Performance Metrics and Improvement Method

This section describes the definitions of performance requirements in the measurement of efficiency in the network, mathematic approaches used in calculations, and optimization of results. Furthermore, the transmission of packets and requests is simulated in the simulation environment using real-world data, which enables the evaluation of the proposed algorithm to become more relevant and real. Network analysis identifies three types of incoming requests (small, medium, and large) based on repetition and data volume [19]. Small requests are those that have a low arrival rate, which is usually less than 1,000 requests per second, and they can add only a small increment to the system load. Medium requests are between 1,000 and 499,999 requests per second and take a huge share in the network load. Large requests are defined as those with arrival rates of 500,000 requests per second or more and can be up to 1,000,000 requests per second or more. This categorization makes sure that all the possible request rates are covered without gaps or overlap and represent the various load levels used in the experiments (1,000, 100,000, and 1,000,000 requests). This classification is useful in distributing resources, enhancing performance, and minimizing congestion.

3.1 Percentage improvement

The given metric holds the utmost importance during the evaluation, as the given ratio is used to compare the suggested DLBQT algorithm to the rest of the load balancing algorithms. These calculations allow proving the high level of the proposed algorithm performance compared with other algorithms [20]. The comparison is made by the sum of the values at all the test load levels, which are 1,000, 100,000, and 1,000,000 requests. The sum of the values (results) obtained at each load level is calculated as a total value for each algorithm. The average is then calculated for the proposed algorithm. For parameters where lower values are better (e.g., packet loss), the formula is defined in Eq. (1):

$P I=\frac{{Original\ Number}-{New\ Number}}{{Original\ Number}}\times 100$ (1)

where,

Original Number: Value recorded by the other algorithm.

New Number: Value recorded by the DLBQT algorithm.

In this paper, the percentage improvement values reported in the abstract and in the result tables correspond to this metric applied to the aggregated performance across the three evaluated load levels (1,000, 100,000, and 1,000,000 requests) for each pair of algorithms.

This technique is widely used in network performance analysis due to its ease of measurement and simplicity; it improves performance by aggregating the outputs of different algorithms under various load conditions without the need for statistical normalization or weighting.

3.2 Real-world datasets

In this paper, real-world datasets were used to test the results in order to obtain a realistic and accurate evaluation of the proposed algorithm. The evaluation was conducted using a research paper that collected network traffic statistics from Facebook data centers [21]. To examine traffic patterns in an SDN load balancing simulation, the data was separated into 16 real-world-size text files. Light to heavy traffic loads, the file sizes were varied (1 KB to 6.8 MB) to enable a thorough test of network behavior with different loads of traffic. The 1 KB and 2 KB files represent light traffic, while the larger files, ranging from 196 KB to 6,836 KB, are classified as heavy traffic conditions [22]. This distribution allows the testing of the suggested algorithm on diverse real-life network scenarios.

3.3 Performance metrics

These are the most significant fundamental measures employed to determine the overall network performance, and the main focus is on the calculation of packet loss, response time, and throughput.

A) Packet Loss (PL): It is the loss of data packets sent across the network because of congestion in the network, loss of packets during transmission, or interruption of connections. The result of a high packet loss rate is delays in execution and response time, resulting in poor network performance and QoS [23]. Optimal performance is attained when the loss of packets is low. This metric is mathematically expressed as follows in Eq. (2) [24]:

$P L(\%)=\frac{\mathrm{N}_{ {lost }}}{\mathrm{N}_{ {sent }}} \times 100$ (2)

where,

PL (%): Packet Loss.

N_lost: Number of lost packets.

N_sent: Total number of packets sent

B) Response Time (RT): This is the amount of time it takes for a system to respond to a request in second. This time begins when the request is sent and ends when it is fully received. This metric helps measure system performance and the user experience [25]. Low response times indicate more responsive systems [26]. The mathematical formula for this metric in Eq. (3) [27]:

$Response\ Time = Tend - T Start$ (3)

where,

RT: It takes seconds to respond.

T_end: The time the service ends or the response is received.

T_Start: The time the request or transmission begins.

C) Throughput (TH): Refers to the actual rate at which data is transferred over a given period of time across the network as (bits/s), the higher the throughput, the better the performance [27]. The mathematical formula for this metric in Eq. (4) [28]:

$Throughput =\frac{{ Total\ Data\ Transferred }({bits})}{ {Total\ Time }(s)}$ (4)

where,

Throughput (bits/s): The rate at which data is transferred through a system or network, in bits per second.

Total Data Transferred (bits): The total amount of data transferred during the measured period, in bits.

Total Time (s): In seconds, the total time taken to transfer that data.

4. Proposed Methodology

This section explains the proposed dynamic model for load balancing in SDN environments. It begins by clarifying the general design framework associated with SDN, then proposes the DLBQT model and its flowchart, which illustrates the steps and mechanism of operation, in addition to the proposed methodology and pseudocode for the model.

4.1 Proposed model

The proposed model for dynamic load balancing in SDN, shown in Figure 1 consists of three layers: the application layer, the control layer, and the data layer. The application layer refers to the multiple users or requests that will be sent, as previously mentioned, in three types. The control layer, on the other hand, represents the SDN controller integrated with the proposed load balancing algorithm at the core of the architecture, which serves as the central hub for policy formulation and the determination of the proposed algorithms for application and load distribution. Using the concept of queueing theory, the final layer is the data unit, which operates dynamically by selecting the server requested by each server in the network based on a set of mathematical calculations.

Figure 1. The diagram illustrates the architectural design of the proposed model

The schematic illustrates the proposed Dynamic Load Balancing based on Queueing Theory (DLBQT) in an SDN network, which aims to make real-time decisions, as shown in Figure 2. The term “multiple requests” refers to the transmission of several requests from the application layer to the SDN controller, which incorporates the proposed DLBQT model. Through this model, decisions are made to efficiently distribute data traffic among available servers at the data plane, leveraging the real-time monitoring capability built into the controller. This controller uses the arrival rate (λ) and service rate (μ), which are used to determine the server used (p) and the average waiting time (W). Using these parameters, the controller can make the appropriate decision and dynamically reroute data traffic away from congested paths to optimize server load. With this configuration, dynamic load balancing can be performed proactively by analyzing data based on the above parameters. This ensures network responsiveness, reducing latency and improving resource utilization across widely distributed server networks.

Figure 2. Diagram proposed algorithm dynamic load balancing with utilizes queueing theory (DLBQT)

4.2 Methodology

The proposed DLBQT algorithm can be summarized at four levels as follows: The SDN controller maintains a continuous record of incoming requests and the service rate for each server in the first step. Secondly, it calculates the utilization and the corresponding queueing theory weight for each server. Third, it chooses the server with the lowest weight for every new request but gives a very high weight to saturated servers. Lastly, the controller updates the statistics after every time window and continues the cycle for the next requests. This conceptual perspective is summarized in a flowchart (Figure 3) and in Table 2.

Figure 3. Simplified conceptual flow of the proposed dynamic load balancing that utilizes the queueing theory (DLBQT) algorithm

Table 2. Main modules and parameters of the dynamic load balancing with utilizes the queueing theory (DLBQT) controller

Module	Input Data	Main Computation	Output / Role
Traffic monitoring	Packet/flow counters per server, time	Compute arrival rate λ and service rate μ	Updated λ and μ for each server
Utilization estimator	λ, μ	Compute utilization ρ = λ/μ	ρ for each server
Weight calculator	Ρ, queueing model (M/M/1)	Compute weight W, apply penalty if ρ ≈ 1	W for each server
Decision and routing	W for all servers	Select server with minimum W, break ties deterministically	Server choice for each incoming request
QoS statistics collector	Completed requests, timestamps, packet counts	Compute packet loss, response time, throughput	Performance metrics for evaluation

Figure 3 illustrates a simplified conceptual flow of the DLBQT algorithm. Incoming requests are first observed at the controller, which updates the arrival and service rates for each server over a sliding time window. From these measurements, the utilization and weight of each server are computed using the M/M/1 queueing model. The controller then selects the server with the minimum weight for the next request, while assigning a large penalty weight to servers whose utilization is close to saturation. After the request is processed, the QoS metrics are updated and the process repeats. Table 3 summarizes the main functional modules of the DLBQT controller and clarifies how the parameters λ, μ, ρ, and W are computed and used in the decision process.

Table 3 summarizes the main parameters used in the proposed DLBQT algorithm and clarifies their meanings and units. These parameters are used by the SDN controller to estimate server load conditions and support the routing decision process in real time.

Table 3. Main symbols and units used in the dynamic load balancing with utilizes queueing theory (DLBQT) algorithm

Symbol	Meaning	Unit
λ	Arrival rate of incoming requests/tasks to a server	requests/s
μ	Service rate of completed requests/ tasks at a server	requests/s
ρ	Server utilization, computed as λ/μ	dimensionless
W	Estimated average time a request spends in the system	s
Twin	Observation window length	s

The DLBQT model was developed to distribute network traffic efficiently among servers by calculating key parameters based on queueing theory, where the final result depends on the weight of each server (W) and the selection of the server with the lowest weight, which determines the routing decision. This technique aims to optimize resource utilization to achieve optimal and balanced performance. The DLBQT model continuously monitors task metrics in real time.

It relies on two basic parameters, both of which are considered inputs calculated in real time for each request. The first is the arrival rate (λ), which is the rate at which requests or tasks arrive at the system per unit of time, and second, the service rate (μ), which is the number of requests or tasks the system can process or serve per unit of time. These are mathematically formulated in Eqs. (5) and (6):

$\lambda=\frac{{Number\ of\ incoming\ requests }}{ {Elapsed\ time }}$ (5)

$\mu=\frac{{Number\ of\ completed\ requests }}{ {Elapsed\ time }}$ (6)

where, λ is the arrival rate and μ is the service rate are measured in requests per second (requests/s).

Each server in the proposed paradigm is modeled as an M/M/1 queueing system. Our Mininet configuration does not strictly enforce a Poisson arrival process; rather, the traffic generator generates a huge number of separate flows with randomized inter-arrival periods and file sizes. However, the aggregate behavior of many such separate random flows has been commonly modeled by M/M/1-type assumptions in networking and SDN studies. Under this approximation, the server utilization is ρ, the mathematical expression is shown in Eq. (7):

$\rho=\lambda / \mu$ (7)

where, ρ is the server utilization ratio and is dimensionless.

Finally, the weight (W) is calculated as the average time a request spends in the system, based on the utilization rates of each server and its service. The server selected to process incoming requests or tasks is determined by the lowest weight (W) value, according to the mathematical formula shown in Eq. (8):

$W=P /(\mu *(1-P))$ (8)

where, W denotes the estimated average system time in seconds (s).

This amount is used as the queueing-theoretic routing criterion in DLBQT: at each decision period, the controller chooses the server with the smallest estimated W, thus minimizing the predicted time a request spends in the system. Although actual SDN traffic may not conform to the idealized Poisson and exponential assumptions, the M/M/1 based models can nonetheless give valuable approximations and design guides for SDN and other networking systems. The randomized traffic patterns in our simulation environment lead to aggregate server loads that are well approximated by this M/M/1 approximation, and therefore the expression for W is not simply an intuitive heuristic, but a theoretically grounded approximation of the expected system time in the simulated SDN setting.

4.3 Online estimation of queueing parameters

In the implementation, each server keeps two counters: one for newly arrived requests and one for completed requests. At any time when the controller needs to take a routing decision, it looks at the time elapsed since the last update and uses this period as an observation window. This window is never shorter than one second, even if decisions are taken more frequently.

For each server, the algorithm stores a short history of the last ten observation windows. At every decision step, it computes:

the average number of arrivals over the last ten windows,
the average number of completions over the last ten windows,
the arrival rate λ as “average arrivals ÷ window length”,
the service rate μ as “average completions ÷ window length”, with a small lower bound on μ to avoid division by zero,
the utilization P as the ratio between λ and μ.

Using these quantities, the weight W is calculated from the M/M/1 queueing model, so that W represents the estimated average time a request spends in the system (waiting plus service). After each update, the per-window counters are reset and the start time of the next window is set to the current time. This procedure smooths out short-term fluctuations and bases the decision on recent load history rather than on a single instantaneous measurement.

4.4 Routing, tie-breaking, and penalty policy

Each routing decision involves the controller updating the values of λ, μ, P, and W of all live servers as indicated above. The new request is then sent to the server with the minimal weight W, i.e., the server with the minimum estimated average system time.

When the estimated utilization of a server is one (P ≥ 1) or more, the algorithm sets its weight to a very large value (W = 1000). This “penalty” makes the server very unlikely to be selected and effectively avoids sending new requests to a saturated server. In case two or more servers have the same smallest W value, tying is deterministically resolved by selecting the first server in the internal list.

4.5 Flowchart of proposed model dynamic load balancing with utilizes queueing theory

This section outlines the flowchart proposed in Figure 4 within an SDN environment, utilizing queuing theory and real-time performance analysis. The diagram starts by sending a request; it then decides the rate of arrival (λ) and the rate of service (μ) offered by the servers and then computes the utilization factor (ρ) to evaluate the level of busyness of the server. Then, the weight (W) is computed, which is the average time that a request is in the system; in case W is more than the threshold, a high default weight is placed on the request to be redirected to the less busy servers and maintain the QoS. The system identifies the server that has the lowest weight to attain optimal distribution, and the QoS measures of packet loss, response time, and throughput are computed. This process is repeated until the request is fulfilled, and the network is made efficient, stable, and scalable according to the SDN principles.

Figure 4. Flowchart proposed algorithm dynamic load balancing with utilizes queueing theory (DLBQT)

Algorithm 1: Proposed Dynamic Load Balancing based on Queueing Theory

Input: Arrival Rate (λ), Service Rate (μ (

Output: Weight (W), Selected Server

1. Begin

2. While (true) do:

3. For each server, calculate Arrival Rate (λ) based on Eq. (5)

4. For each server, calculate Service Rate (μ) based on Eq. (6)

5. For each server, calculate Utilization (P) based on Eq. (7)

6. For each server, calculate Weight (W) based on Eq. (8)

7. If P < 1:

W = P / (μ * (1 - P))

Select a server with a minimum W

8. Else:

W = 1000

Select the first server in the server list.

9. If the request is finished:

Return QoS metrics

10. Else:

Continuing processing requests

11. End If

12. End While

13. End

The pseudocode for the algorithm implements dynamic load balancing in an SDN environment using queueing theory, as illustrated in the flowchart above. The controller continuously checks the status of each server and calculates the arrival rate (λ) and service rate (μ) for each server to determine the weight (W). Requests are directed to the least-loaded server, and a request is discarded when the weight (W) exceeds the maximum limit, while measuring (QoS) metrics and upon request completion (Algorithm 1).

5. Network Setup and Experimental Methods

This section explains the general network structure, the types of environments and their characteristics, along with server specifications, and finally discusses the results of the proposed algorithm compared with other algorithms.

5.1 Network structure

The simulation environment was implemented using the Mininet simulator. Figure 5 illustrates the proposed topology in an SDN environment, known as the “Single Topology,” in which a group of hosts (host1 through host4) send data traffic to a central switch. The switch, in turn, routes this traffic to Server1 or Server2 depending on the rules configured within the network controller. The controller, his type being POX, dynamically manages flow rules to achieve optimal data traffic distribution and load balancing among the servers.

Figure 5. The proposed network topology in Software-Defined Networking (SDN)

5.2 Environments

The proposed models were tested in two different environments: a homogeneous environment and a heterogeneous environment. A homogeneous environment refers to one in which servers have the same specifications and computational resources, providing a stable and balanced testing environment, while the heterogeneous environment reflects the reality of real-world networks, where server capabilities vary in terms of processor and memory, posing a challenge to load-balancing algorithms in terms of adaptation and effective load distribution. As shown in Table 4, the table details the full specifications of the servers for both environments as well as the computer specifications.

Table 4. Server specifications for homogeneous and heterogeneous experimental environments

Environment	Server	CPU Model	CPU Cores	Random Access Memory (RAM) (GB)
Homogeneous	Server 1	Intel i7-10850H	12	32
Homogeneous	Server 2	Intel i7-10850H	12	32
Heterogeneous	Server 1	Intel i7-10850H	3	8
Heterogeneous	Server 2	Intel i7-10850H	1	3

To establish a baseline fairness protocol, the algorithms LC, LCPU, and LRAM are implemented and configured exactly as in our previous work, where they were introduced and evaluated as server-side load balancing schemes for SDN and collectively used as baseline algorithms. In this study, we reuse the same parameter settings (e.g., CPU and RAM sampling intervals and thresholds for LCPU and LRAM, and the standard configuration for LC) to ensure a fair and consistent comparison, while integrating them into the same POX controller module as DLBQT. All load balancing algorithms (LC, LCPU, LRAM, and DLBQT) are implemented in the same POX controller module and use the same event-driven update mechanism. Table 5 summarizes the decision metrics and main configuration parameters of each algorithm, helping to ensure that performance differences arise from the decision rules rather than from different monitoring frequencies or controller settings.

Table 5. Configuration of load balancing algorithms

Algorithm	Decision Metric(s)	Update Mechanism	Key Parameters
LC	Number of active connections	On new-flow installation	Configured as in this study [15]
LCPU	CPU utilization	Event-driven (same as DLBQT)	Configured as in this study [15]
LRAM	RAM utilization	Event-driven (same as DLBQT)	Configured as in this study [15]
DLBQT	Queueing Weight (W)	Event-driven at each routing decision using λ,μ from last 10 windows	$\begin{gathered}T_{\{{win }\}} \geq 1\ s, \\ W=1000\end{gathered}$

Note: LC = Least Connection; LCPU = Least CPU; LRAM = Least RAM; DLBQT = Dynamic Load Balancing based on Queueing Theory; CPU = Control Process Unit; RAM = Random Access Memory

All simulations were executed on a dedicated Linux workstation equipped with an Intel Core i7‑10850H CPU, 32 GB of RAM, and a 512 GB SSD, running Ubuntu 22.04 LTS. Mininet version 2.3.0 and the POX controller (git snapshot 2023_01_15) were used as the SDN emulation platform. The main configuration parameters of the experimental setup are summarized in Table 6.

Table 6. Unified experimental configuration for the Software-Defined Networking (SDN) simulations

Category	Parameter	Description / Value
SDN platform	Simulator	Mininet
SDN platform	Controller	POX (single centralized controller)
Topology	Network structure	Single switch with multiple hosts sending traffic to two servers (as in Figure 5)
Environments	Homogeneous environment	Two servers with identical CPU and RAM specifications (see Table 4)
Environments	Heterogeneous environment	Two servers with different CPU cores and RAM capacities (see Table 4)
Server specs	CPU model	Intel i7‑10850H for all servers
Load-balancing schemes	Compared algorithms	LC, LCPU, LRAM, and DLBQT are implemented in the same POX module
Dataset	Traffic traces	16 real-world trace files derived from Facebook data-center traffic
Dataset	File-size range	1 KB to 6.8 MB, covering light and heavy objects
Traffic generation	Request pattern	Hosts repeatedly request files using a pseudo-random permutation of the 16 traces
Traffic generation	Inter-arrival times	Uniformly distributed random inter-arrival times over a fixed interval
Randomization	Random seed	Fixed seed per run, shared across all algorithms for fair comparison
Randomization	Additional seeds	Different recorded seeds used when multiple runs are executed
Load levels	Number of requests	1,000; 100,000; and 1,000,000 total requests per experiment
Run configuration	Number of runs	3 runs per configuration (results reported as averages)
Evaluation metrics	Performance metrics	Packet loss, response time, and throughput

Note: POX = Python OpenFlow eXtensible; LCPU = Least CPU; LRAM = Least RAM; DLBQT = Dynamic Load Balancing based on Queueing Theory; CPU = Control Process Unit; RAM = Random Access Memory

5.2.1 Traffic generation from real-world dataset

The 16 real-world trace files of sizes 1 KB to 6.8 MB are used to generate traffic. In both experiments, hosts request these files over and over again on the servers. The pseudo-random permutation of the set of 16 files is used to select the file index of each new request so that all the 16 files are exercised with different load levels. This process makes sure that in the traffic mix, there are light (1-2 KB) and heavy (196-6836 KB) objects in the traffic mix that will reflect realistic content diversity.

5.2.2 Randomization and reproducibility

To make the experiments reproducible, all pseudo-random choices in the traffic generator (file selection and inter-arrival times) are driven by a fixed random seed at the start of each run. Using the same seed for different algorithms guarantees that DLBQT, LCPU, LC, and LRAM are evaluated under identical request sequences. When additional runs are performed, different seeds are used, but the seeds are recorded so that the entire sequence of requests can be replayed if needed.

5.2.3 Inter-arrival time distribution

Inter-arrival times between consecutive requests are generated from a uniform distribution over a fixed time interval. This uniform randomization models bursty yet bounded request arrivals and, combined with the diversity of file sizes, produces a wide range of instantaneous loads on the servers. The same inter-arrival time sequence (determined by the fixed random seed) is used for all algorithms, ensuring a fair comparison under identical traffic conditions.

5.2.4 Number of runs and averaging

Each experimental condition (algorithm, environment, and total number of requests 1,000, 100,000, and 1,000,000) is executed R = 3 times with different random seeds. For each metric (packet loss, response time, throughput), the value reported in Tables 7-12 at each load level is the average across the R runs. The “Improving” rows in these tables are then computed using the percentage-improvement metric in Eq. (1) on the aggregated results across the three load levels for DLBQT versus each baseline algorithm.

5.3 Results and Discussion

For all metrics, the results at each load level (1,000, 100,000, and 1,000,000 requests) represent the averages over three independent runs per algorithm and environment, while the percentage improvements in the last row summarize the overall gain of DLBQT using Eq. (1).

Figure 6. Packet loss in homogeneous and heterogeneous environments

The packet-loss results for both homogeneous and heterogeneous environments are summarized in Tables 7 and 8 and illustrated in Figure 6. Tables 7 and 8 present the results of the DLBQT algorithm compared to the LCPU, LC, and LRAM algorithms in terms of their ability to reduce packet loss in homogeneous and heterogeneous environments at different load levels. No packet loss was observed at low loads (1,000 requests), while loss increases as the load increases for all algorithms. The DLBQT algorithm achieved the lowest packet loss rates (200 and 1848 packets), and its improvement rates were 37.5%, 26.4%, and 29.2% higher, respectively, compared to the other algorithms. In the heterogeneous environment, DLBQT achieved the lowest packet loss values (172 and 1730 packets), representing substantially better performance than the other algorithms, with improvement rates of 36.8%, 28.2%, and 39.2%, respectively. This demonstrates its high efficiency in reducing packet loss and its strong ability to adapt to variable resources in the heterogeneous environment.

Table 7. Packet loss (%) in homogeneous environment

Algorithms Total Requests	LCPU	LC	LRAM	DLBQT
1000	0.00	0.00	0.00	0.00
100000	301	245	251	200
1000000	3160	2819	2987	1848
Improving	37.5%	26.4%	29.2%	-