Maximizing IoT Throughput with Optimized IRS-Assisted Symbiotic Radio

ABSTRACT


INTRODUCTION
The applications of internet of things in sixth-generation (6G) are predicted to require high connectivity, high energy and spectral efficiency as a result of expected massive IoT devices where IoT will become more significant in the future that massive devices can connect to each other [1,2].6G introduces various technologies to satisfy these requirement as artificial intelligence (AI), intelligent reflecting surfaces (IRS) and symbiotic radio (SR).Artificial intelligence (AI) can provide the wireless network by intelligent and automation where it simulates the human through processes and intelligent behaviors.AI is considered as a data analysis tool.where the machine learning models are implemented to take a correct decision automatically [3,4].Machine Learning (ML) Machine learning is an approach to building AI systems where ML algorithms have a variety of uses in wireless communication networks as it optimizes the network resources to improve its performance.ML methods involve supervised learning, unsupervised learning and deep reinforcement learning (DRL).DRL is a more popular algorithm than others where it involves training artificial agents to learn and make decisions in complex environments through interaction and feedback from the environment.There are several types of the DRL algorithms such as Deep Q-Network (DQN) which contains deep neural networks with the Q-learning algorithm to learn a Q-function approximation.
DQN has been successful in solving complex tasks especially in wireless communication network and PPO, another type of DRL algorithm that aims to find an optimal policy by iteratively updating the parameters of the policy.It uses a surrogate objective function to guarantee stable and effective policy updates.PPO has gained popularity due to its simplicity, quick and sample efficiency, making It's appropriate for a variety of uses and also it performs well in the wireless communication networks so, we employ it in this paper.Overall, DRL algorithm can learn policies to make intelligent decisions on the wireless network performance as resource allocation, throughput, capacity and latency aiming to improve it [5,6].
IRS, another technology represented in 6G, involves the use of passive surfaces to efficiently control and manipulate the radio signal.The IRS surface is configurable, spots on its surface can modify the wave that impinges on it where it is typically composed of multiple tiny passive reflective elements that are controlled to reflect the signal in a particular direction allowing IRS to enhance the communication channel between source and destination which improves the performance of the communication system [7][8][9].
Symbiotic radio (SR) is a technology that supports passive Internet of things communication, where SR systems enable IoT tags to share transmitter, spectrum and receiver of a primary user by backscatter the signal between this primary user and MBS or Wi-Fi access point which increase the energy efficiency and spectral efficiency of the system.At the primary signal receiver, the information signals of the primary user and IoT tags can be decoded depending on the utilize of successive interference cancellation (SIC) where The receiver first used SIC to decode the primary signal.It then detects the backscattered signal from the tags after subtracting the primary signal.Recently, symbiotic radio-based machine learning has been applied to improve IoT system performance [10][11][12].
Furthermore, None Orthogonal Multiple access (NOMA) is a techniques which used to increase the spectrum efficiency of wireless communication networks which enable more than one user to share the same spectrum at the same time therefore, the spectrum can be effectively utilized by using NOMA technique [13,14] A. Related work Recent research studies discuss how to maximize the Internet of Things system's throughput.depending on 6G techniques.Liu et al. [15] established a SR model for multiusers random access and using a coding algorithm to prevent the multiple reflected signals from interfering with one another which enhances the system performance.In the study of Liang et al. [11] SR is proposed aiming to improve the backscattering link between the primary and secondary user and getting highly reliable communications through joint decoding at the primary receiver.In the study by Long et al. [16], SR is used to aid passive Internet-of-Things (IoT) where the IoT device is parasitic on the primary user signal which minimize the power efficiency of the system.A symbiotic radio system's downlink rate was determined by analysing a NOMA technique for cellular system that was batched with it., which was then used to establish a formulation for the outage probability of the signal-to-interference and noise ratio [17].In the study by Naeem et al. [18,19], IRS have emerged as one of 6G solution which used to smartly control the wireless communication channel to enhance the spectrum and energy efficiency of the 6G network.In the study by Al-Abbasi et al. [20], an IRS-rely on NOMA is proposed for a wireless network where the purpose of this paper is improving the performance of an IRS-NOMA combination by optimizing the number of the IRS reflective elements then utilize a new approach of multiple -IRS-NOMA to boost the received signal quality.In the study by Zhang et al. [21], a DRL algorithm is used to solve the problem of IoT system sum-rate maximization based on symbiotic radio technique where DRL takes an appropriate decision for the IoT device association.Bharadia et al. [22] used the SR scheme to support passive IoT devices and enhances the IoT system performance either using LTE or Wi-Fi network.Furthermore, Double Deep Reinforcement Q-Learning DDQL algorithm has been proposed to achieve the optimal IoT tags clusters aiming to improve the IoT system performance.All of the studies that we discuss indicate that researchers are interested in utilizing 6G technologies in order to meet IoT requirements such as reliability, power efficiency, throughput and capacity.

B. Contribution
Our goal in this work is enhancing the performance of the IoT network which is required for communication of IoT devices thus, SR technique is proposed by allow the IoT tags to backscatter the signal which is transmitted from the MBS or AP to smartphones where it is subsequently decoded at the smart phone.SR provide intelligent cooperation between the system devices which enhance the system performance.Furthermore, NOMA technique is used between the IoT tags which use the LTE network to enable multiple users to share the same time-frequency resource which increase the system spectral efficiency.Additionally, we focus on optimizing the IRS location and its reflectors phase shift aiming to enhance communication between IoT tags  and the smart phones .IRS improve the wireless communication systems by intelligently manipulating wireless signals in complex environments.Its ability to adaptively optimize signal reflections can contribute to enhanced performance, coverage, and capacity in various wireless communication scenarios.The proximal policy optimization algorithm is proposed as a solution to this problem where PPO is a reinforcement learning algorithm that can be applied to train policies in complex environments.In this case, it can be used to optimize the IRS location and the phase shift of the reflectors to maximize the communication performance between IoT tags and smartphones.These techniques and algorithm of our proposed scheme can contribute in improve the IoT system performance which is one of the 6G challenges.the main contributions of this work are: (1) We propose a symbiotic radio communication technique for the IoT tags, whether they use LTE or Wi-Fi network and a NOMA technique is also proposed for IoT devices that use the LTE network to increase the throughput of IoT system.
(2) IRS system is proposed to enhance the communication between IoT tags and the smartphones.An optimization problem is formulated to increase the IoT system uplink total data rate.
(3) Proximal policy optimization Algorithm has been applied to find a solution to this problem by achieving the optimum location and phase shift of the IRS.
(4) The performance of our proposed scheme based IRS system is evaluated by simulation to demonstrate the increase in the system's total data rate in comparison to the system without using IRS and the system which using the DDQL algorithm for clustering.This paper's reminder is structured as follows.In section ΙΙ we introduce the proposed system model including MBS, AP, IOT tags, the smartphones and IRS .insection ΙΙΙ The throughput maximization problem is formulated based on optimizing the IRS phase shifts and location.In section ΙV we explain the proximal policy optimization algorithm.In section V we detail the suggested algorithm.Section VI demonstrate the simulation results.Finally, this paper is concluded at section VIΙ.

SYSTEM MODEL
Consider an IRS-assisted a symbiotic radio communication system between  IoT tags and  smart phone which are represented by  = {1,2, .… , t, . . ., } and  = {1,2, .… , , … } respectively, under coverage of micro base station MBS and Wi-Fi access point AP as shown in Figure 1.The downlink link primary signal of the smart phone from the MBS or the AP is backscattered by the IoT tags.The IoT tags modulates its information on the primary signal to be received at the smart phone using the SR communication system.IRS with its  reflecting elements is used to assist in the communication between the IoT tags  and the smart phones  where the tags reflect their signal to the IRS which can be received by the smart phone.The amplitude and phase shift of the  reflecting elements are denoted by   [0,1] and θ n [0,2] respectively, the reflecting elements are represented by n {1, … , n, . . ., } and the set of IRS horizontal location which is represented by the two-dimensional (2-D) coordinate of the IRS .For simplicity, we set   = 1 .The optimum location and phase shift of IRS are the location and phase shift values which achieve the maximum total data rate of the system.NOMA technique is proposed to allow users to share the same spectrum resources, therefore increasing the system's spectral efficiency, which increases the system's throughput.Each tag associates the nearest smart phone then, tags which are associated the same smartphone and use the LTE network can share the same subcarriers  using NOMA technique which increase the spectral efficiency of the system then, the smartphone uses the SIC technique to detect the signal it has received.SIC starts by detecting the primary signal, subtracting this detected signal from the entire signal, the receiver then estimates backscattered signal of the tag.On the other hand, tags rely on orthogonal frequency division multiplexing (OFDM) technology for a Wi-Fi network transmission where all subcarriers on a channel are occupied by one user and sends a complete data packet.OFDM technique is distinguished by its ability to cope with severe channel conditions.The detected signal at the smart phone through LTE network can be calculated as: where, B(t) denote the smartphone downlink primary signal  and x(t) denotes the information of the tags.ℎ  represents gain of the communication that exists between the smartphone  and the (MBS), ℎ  represents the gain of the communication channel between the tag t and the RF source (MBS),   denotes the gain of the channel between the tag t and the smartphone , ℎ  is the gain of the channel between the tag t and the IRS reflector, ℎ  is the gain of the channel between the IRS reflector and the smart phone ,  represent the IRS reflector phase shift matrix,  0 is the additive white Gaussian noise AWGN and  denotes the reflection coefficient.The Signal-to-Interference Noise Ratio SINR    of the tag detected signal on the smart phone through the LTE network can be calculated as: where,   is the downlink RF signal power to tag.
) indicate the interference produced by other   − 1 tags that backscatter the same LTE downlink primary signal of smartphone  on the same subcarriers  (NOMA) [17,21,23].The Shannon Model can be used to determine data each tag t ∈ T rate at the smartphone  ∈  using the LTE network and sharing the same subcarriers S that each has bandwidth  as: which is used to calculate the IoT system throughput as one of its performance metrics.In addition, The SINR    of the tag information signal at the smart phone by using the Wi-Fi network can be calculated as: indicate the interference produced by other   − 1 tags that backscatter the same primary Wi-Fi signal between WAP and the smartphone  [20,22,24].Similarly, each tag t ∈  rate at the smartphone  ∈  can be calculated as:

PROBLEM FORMULATION
In this section, a formulation of the system problem is provided to maximize the throughput of the IoT system by jointly optimizing IRS' location and the phase shifts of its reflecting elements.This problem is represented by an optimization problem P, a proximal policy optimization (PPO) algorithm is proposed to solve this problem because of its simplicity, quickness and its capability to handle complex environments.Constraint C1 was used to grantee a perfect SIC where SIC starts by estimating the primary signal which has a greater SINR After that, the estimated signal is subtracted from the received signal by the receiver, and the backscattered signal of the tags is eventually detected.Constrain C2 was considered to guarantee the coverage of the IoT tags and the smartphones under the MBS or WAP network.Backscattering is indicated by the constrain C3 that,   is a binary number which can either be 1 or 0. It is expected that by resolving this problem and getting the optimal IRS location and phase shifts which help in improving the communication channel between the IoT tags and smart phones, the performance of the IoT system will be greatly enhanced.

PROXIMAL POLICY OPTIMIZATION ALGORITHM
In recent years, several approaches to reinforcement learning using neural network function have been proposed as deep Q-learning, policy gradient methods and trust region policy optimization (TRPO).However, Q-learning is poorly understood and relatively complicated where it observes various activities in a buffer to learn and reply, then randomly selects a sample from the buffer experience to take a decision.The robustness of policy gradient methods is limited, and TRPO is relatively complex and incompatible with noisecontaining systems.PPO algorithm has the stability and reliability of TRPO however, it is simpler to implement than TRPO and it has better overall performance.PPO utilizes a slightly different method as compared to imposing a strict constraint, this algorithm limits the policy change in each iteration through the KL-divergence which determine the difference between two policies and utilize the advantage function rather than the expected reward because it lowers the estimation's variance.PPO has two variants to avoid the bad policy decision by constraining the change of the objective function in each iteration.The first variant is theoretical foundation of TRPO that encapsulates KL divergence as a soft penalty as explained in Eq. ( 7). ̂ is the advantage estimator at time step t, β controls the penalty's weight, penalizing the goal in cases when the new policy deviates from the old one and it follows the fact that a certain surrogate objective aims to form a lower bound on the the policy performance by computing the maximum KL over the states [24].In practice, this penalty is excessively restrictive, leading to only minimal updates that it is difficult to find a single value for β which can work for multiple problem settings.The second variant is Clipped PPO which we proposed in this paper.Clipped PPO simply restrict the policy's changing range by ε as shown in Eq. (8).
refers to conservative policy iteration,   ( )denote the probability ratio where   ( ) =   (  |   ) (  |   ) and the term clip (  ( ), 1 −  , 1 + ) ̂ adjusts the surrogate objective by clipping the probability ratio, which eliminates the incentive to move   outside of the interval ( 1 −  , 1 + ) where it takes the minimum value of the clipped and unclipped objective as shown in Figure 2. PPO can outperform the penalty-based variant and has a simpler implementation [25,26].

PROPOSED PPO ALGORITHM FOR FINDING THE OPTIMUM IRS' LOCATION AND THE PHASE SHIFTS
Our objective is to determine the optimal location and phase shift of the IRS reflector to solve the system throughput maximization problem.Initializing the clipping threshold  and the policy parameter  0 which is the set of parameters that define a policy function.Moreover, we initialize the environment which includes one MBS, one AP,  tags,  smart phones.The input of the neural network at the initial state s 0 is the IoT tags SINR (2), (4) depending on the initial location coordinates of the IRS and its initial reflecting elements phase shifts regardless of whether it uses the LTE or Wi-Fi network.Every time step t, the agent receives a reward from the environment consequent to the action   that was sent by this agent to the environment depending on the objective function (8) after estimating the advantage  ̂ and the probability ratio   .PPO run to decide clipping or unclipping the policy then, the state changes to a new state s t+1 with new IRS location coordinates and phase shifts where if the ratio   moves outside the interval (1 −  , 1 + ) it will be clipped (8).

SIMULATION RESULTS
In this section, the proposed scheme of IRS-based PPO algorithm performance is evaluated using parameters that listed in Table 1 [12,27].
Figure 3 illustrate that the proposed scheme which uses the PPO algorithm to achieve the optimum location and phase shift of the IRS can success in improving the system data rate compared to the performance of system which uses the DDQL algorithm without IRS to achieve the optimum tags clusters and the system without using IRS.Optimum IRS location and phase shifts achieved by the PPO can improve the communication channel between IoT tags and smartphones.As explained in this figure, the suggested scheme increase the the data rate of the IoT system by 10% on average when compared to a system that uses DDQL_clustering algorithm and 24% above the system without IRS which is one of 6G challenge.The figure also explains that the system data rate increases when the tags number increases at  =2 then, it decreases due to a rise in mutual interference between the tags which associates the same smart phone where using NOMA technique.4 illustrates our proposed algorithm performance for different numbers of smartphones at a constant number of tags  =20 where our proposed algorithm exceeds the system performance by utilizing DDQL_clustering algorithm and the system without using IRS by 25%,40% respectively at  =4.
Additionally, we find that the total system data rate is increased when increases the number of smart phones due to decreasing in the number of tags which associates the same smartphone on the same resources and that causes a less tag's mutual interference.Furthermore, we note that the total system data rate has increased slightly after smart phones number =4 as a result of the association of no more than two tags per smart phone so, we can consider =4 is a recommended number of smartphones for each  =20 number of tags.
Figure 5 shows how the probability of a system outage is affected by adding more tags where the outage probability here is that the percentage of tags whose achievable data rate is less than a required reference rate which is set to be 5 Mb/S.As explained in this figure that our proposed algorithm successfully decreases the system outage probability than the system which using DDQL_clustering algorithm which works to get the IoT tags optimum clusters based on each cluster achieved total data rate to maximize the IoT system data rate [12].It also, succeeded in decreasing the outage probability than the system without using IRS which may face obstacles through the channel between IoT tags and smartphones.IRS at its optimum location and phase shift can significantly enhance the communication between tags and the smart phone and avoid any obstacle which increase the achieved tag data rate more than DDQL_clustering algorithm and system without using IRS.
Figure 6 explains the IoT system capacity verus the smartphones number at of 10 Mb/s data rate per IoT tag which is explained in Table 2 and considering the tags number is  =20.The figure demonstrates that when the number of smart phones increases, the system capacity increases due to fewer tags backscattering their signals on each smart phone which decreases the mutual interference and increase the system capacity.Furthermore, we observe the effect of using the IRS on the system capacity compared to system without using the IRS where our proposed scheme depending on the IRS can increase the system capacity at a lower number of smart phones for example, at  =4 it can improve the capacity of the system on average by 40% above the system without utilizing the IRS scheme.The IoT system capacity without using IRS 7 13 17 17

CONCLUSION
In this paper a symbiotic radio technology (SR) is suggested for Internet of things (IoT) network to support the passive IoT tags and improve the performance of the uplink transmission of IoT system which is one of 6G challenges for IoT systems.SR enables the IoT tags to backscatter its neighbor smart phones' signals.Furthermore, we use the intelligent reflecting surfaces (IRS) to enhance the channels and avoid any obstacles between the IoT tags and smart phones either uses LTE or Wi-Fi network.Getting The optimal IRS phase shifts and location by formulating an optimization problem in order to maximize the total system data rate.Proximal policy optimization (PPO) algorithm is employed to get a solution for this problem.The simulation results show that the suggested scheme success in increasing the total system data rate by 10% on average when compared to a system that uses the Double Deep Q-learning (DDQL) algorithm and 24% above the system without IRS at  =2.additionally, the proposed scheme improve the capacity of the system by 45% above the system without IRS at  =2.Finally, this proposed scheme can enhance the performance of IoT networks which is necessary for IoT applications relying on 6G technologies and by using one of the machine learning algorithms.Improving IoT performance is a critical need for our world to be able to connect everything intelligently.Therefore, it is recommended that future researchers work on enhancing the other aspects of IoT network such as latency, power consumption, and reliability.

Algorithm 1 :
PPO algorithm for getting the optimum IRS location and phase shifts • Initialization: smart phones, IoT tags, IRS and coordinates of AP location.• Initializing the phase shift of the IRS elements.• Initialize the policy parameter  0 , clipping threshold  • for episode = 1……; k do • for t = 1….. ; T do SINR for each tag will be calculated as a neural network input.Run policy   (  |   ) Estimate the advantages  ̂ Select the IRS location and phase shifts action   and observe the reward,the next state s t+1 • Training of PPO The policy update is computed as:  +1 = arg max     If The polices deviate so far   ( ) is clipped end for end for

Table 2 .
The IoT system capacity vs. the smart phones number