Security Surveillance Using UAVs and Embedded Systems in Industrial Areas

ABSTRACT


INTRODUCTION
In the last decade, drones have been widely changing and their usage has steadily increased with each passing year in many application domains such as navigation, military, delivery, topography, etc.In various fields, Drones have exceeded expectations in performance, accuracy, and the tasks assigned to them.Therefore, based on our approach, we can exploit the drone for surveillance of our areas with a determinant algorithm.For surveillance purposes to provide safe and secure industrial facilities.Surveillance systems proposed here include secure real-time photography, motion, object detection, camera calibration, and tracking of the object's trajectory using coordinates constructed from the 3D coordinates of a calibrated image inserted by three drone cameras, relying solely on ground-based surveillance measures limits visibility, especially in sprawling industrial complexes with complex layouts.Blind spots and obscured areas can easily become vulnerable points for unauthorized access or security breaches.hence the need to develop advanced surveillance systems attached to UAVs, such as the system proposed in this paper.
Motion and object detection is an important security research and surveillance application area.With the presence of neural network technology, it is being used more and more in various fields of application.Peculiarly in security, motion, and object detection have a role.Several approaches have been suggested in object detection.Lightweight feature-enhanced convolutional neural network methods are used for low altitude and small size [1] to solve the problem of real-time flying and to improve guidance information to suppress blackflying UAVs.A variety of approaches [2][3][4][5][6][7] to enhance and improve the appearance of objects and increase the precision marge of the algorithm used based on deep neural network method (Faster R-CNN, SSD, YOLO, LSL\-Net....), the difference between these methods is processing architecture.As for the areas of application [8][9][10][11][12][13][14][15], (underwater, robotic arm, tracking.),researchers are limited to developing algorithms to increase quality only in the field in which they are applied, as they sometimes become unsuitable in other areas.
Our drone uses a three-camera to take 360-degree photography instead of a 360 camera.A 360-degree camera could capture all information in every degree that it sometimes lacks focus.
One of the biggest challenges is the large barrel distortion caused by the ultra-wide angle fisheye lens.Even the 360degree camera is not versatile compared to traditional cameras.For example, all the photos taken with a 360-degree camera can look too similar, which is a problem for us since we need to detect the movement of an object in an area.Detecting the type of object becomes more complicated and can be catastrophic.Also, it is very difficult to correct radial distortions within 360 degrees.The wide-angle camera we use with drones is severely affected by radial distortion, so correcting radial camera distortion is an important step toward 100% object coordinate accuracy.Nowakowski and Skarbek's method [16] uses a Homography of Central Points for Lens Radial Distortion Calibration.
However, the method used to determine the center of distortion gives accurate results with no errors.Zhu et al. [17] methods use QR Factorization to correct the radial distortion in a non-iterative way, which can be faster but misses some pieces of the picture.Henrique Brito et al. [18] methods propose Self-Calibration Based on observing straight lines through the distortion center.Zhang et al. [19] propose a new robust line-based distortion estimation method to correct radial distortion.
Cho et al. [20] propose the Automatic Estimation of the Distortion Coefficient method, which performs well in radial distortion estimation for more correction, the algorithm needs to be repeated to satisfy the termination condition.Kim et al. [21] solve radial distortion compensation by Illuminating the epipolar lines with a Projector [22].
Propose using a cascaded one-parameter split model that requires execution time for each block with a repetitive process to achieve a satisfactory result.The method developed by Huang et al. [23] is founded on the principles of direct linear transformation.Liao et al. [24] propose an estimation approach for distortion rectification based on the Training Process of the Proposed Network algorithm that contains two loops of repetition inside each other to increase the result performance.
In our situation, in the presence of a drone with threecamera support, things get different, and it is attributed to the drone's movement.Therefore, the Euler coordinates change every moment, making it difficult to determine the actual coordinates.
Huu et al. [25] propose introducing two fixed camera models to calculate the distance between the camera system and the installation.
In this research, we provide a mathematical model to correct camera distortion and identify genuine object coordinates in real-time.Single-board computers are preferred in these circumstances.These single-board computers have microprocessors, memory, input/output, and other useful components and are constructed on a single circuit board [26].
Due to the possibility of combining different fields of technology, single-board computer systems with a wide range of applications and a low price are often chosen [27].
The rest of the work can be summarized as follows: in the second section, an overview of the work that describes the basic concept of the surveillance system, and in section III, the proposed research work is explained.In the next section, we discuss experimental results.Finally, in section V, we conclude the proposed research work and suggest some future directions.

OVERVIEW
In this work, we propose our SoC-based computing system in which the CPU performs multitasking between inputs, large processing units, and outputs.We optimized motion detection and object detectors forever after calculating the center of mass coordinates and implementing them on different embedded systems.We found the best processing time using our proposed system that can perform a better result with low energy consumption, increasing the flying time of the UAV.
Figure 1 shows an overview of the proposed surveillance system.As shown in the figure, the autonomous drone records video data with three cameras tilted at 130 degrees to each other.Each camera recognizes the scene in its field of view.A specific processing system processes this extracted information (frame).For each camera, there is a sub-algorithm in the overall algorithm system.

METHODOLOGY
The proposed method consists of eight main processing steps for efficient drone operation.Initially, three cameras connected to the drone record all the scenes within their respective fields of view.Subsequently, noise-removing techniques to enhance data quality are applied to the stream.The third stage involves motion detection in the recorded scenes, identifying areas with movement.Once motion is detected, the system proceeds to generate a scan of the affected region in the fourth stage.In the fifth step, the method identifies the nature of objects within the scanned area, enabling the calculation of their center of mass coordinates.These center of mass coordinates serve as a crucial input for the sixth stage, where the actual (x, y, z) coordinates are computed.These real-world coordinates are saved in a structured CSV file within the system's database.Finally, the results are displayed on the supervisor's screen in the last step, providing valuable insights and actionable information.
Figure 2 represents the eight-step process that ensures the effective use of the drone system for various applications, including surveillance and data collection.

Frame extraction
Streaming from three cameras or real-time video collection (Figure 3), which can reflect a series of N images and stand for by S= (fs1; fs2; …; fsN).Given the build quality, it's better to use a triple camera instead of a 360-degree camera.Because RGB represents the red, green, and blue components, each video image is recognized as a color image.Therefore, the images are periodically moved to the next step.

Motion detection
Motion detection serves a variety of functions in this application.After we have detected a movement of one of the three cameras, we immediately start the security process.
Real-time capturing can be treated as a set of frames.Here, we compared different frames to the first frame using the frame differencing technique.Figure 4 shows the motion detection algorithm used for each frame.
We need to look at some tiny, moving fragments of the picture, not the whole.This will reduce our drone's energy and speed up our processing system.

Object detection
Within Deep Learning, the sub-discipline (Object Detection) involves this application to identify objects through real-time video.Essentially expressed, this detection approach aims to locate objects in the frame (object localization), which will help us to track the item in the following processing step.

Figure 4. The motion detection algorithm
For this method, you need image processing algorithms to verify image content.We must follow several guidelines offered by the manufacturers (Intel, INVIDIA, Raspberry Pi, etc.) to implement our approach on various hardware.These guidelines enable us to apply these AI models to our UAV application.For instance, the OpenVINO toolkit offers a collection of pre-trained Intel models that can be employed for software development, learning, and demo purposes.
Various detection models can detect a set of the most common objects.Object detection with SSD-MobileNet v2 framework is widely used for real-time object detection.For the surveillance area, MobileNet V2 is the network used for the feature extractor and is the object localizer.Since most networks are SSD-based and offer a reasonable adjustment between efficiency and performance, we decided to use Mobile Networking SSD v2 for the object detection part.Networks that detect objects and offer the option of higher accuracy/broad application at the cost of lower performance can be expected to detect objects of the same type more exactly.

Intensity weighted centroiding coordinates of moving objects for accurate area tracking
Once we have the position of a moving object in the image, we use intensity weighted centroid (IWC) to detect it.
The center of gravity (CoG) is the basis of the IWC calculations [44].CoG is the same calculation as in physics but only applied to the image Figure 5.

Determination of the correct real coordinates of a moving object
As soon as IWC coordinates are identified, we calculate real (X,Y,Z) coordinates.For this, we use the pinhole camera model [45].This model uses perspective transformation to project 3D points onto the image plane to create a view of the scene.Figure 6.The following equation, Eq. ( 2), can give us the actual coordinates [45][46][47][48]. where, As we know, the drone does not stay in its initial position, and there is an infinitely small shift δψ between the two frames, Figure 6 and Figure 7, performed at any instant in our calculation Figure 8.We can straighten the difference in the following Eq.( 4).
where, For camera calibration and distortion correction applications, these distortions must be corrected first.To determine these parameters, we provide some sample images of a known pattern (such as a chessboard).We find some specific points (square corners on the chessboard).We have its coordinates in the real world, and its coordinates in the image.With these data, some mathematical processes are in the background to get the distortion coefficients.

SYSTEM PROCESSING ARCHITECTURE OF THE IMPLEMENTATION
To figure out our approach, many processing systems can be exploited, but we need to find a compatible processing system with our surveillance system.There are two main factors to consider when choosing the suitable device for us, Suitable energy System and processing time.

CPU: Parallel computing
The "core," or heart of the CPU, is where all computing and reasoning takes place.A core normally runs through a process known as the "instruction cycle," in which instructions are read from memory, converted to processing language, and then executed through the core's logical gates (execute).Initially, all processors were single-core, but as multi-core processors became more common, computing power increased, and parallel processing arrived Figure 9.

Figure 9. CPU internal architect
Most parallel computing hardware is often housed in a single data center with multiple processors (or cores within a processor) advanced over a server rack.The application server distributes compute requests in small chunks, subsequently executed concurrently on each server.
Security surveillance using UAVs and embedded systems, CPUs are suitable for managing lower-complexity tasks like data preprocessing, basic motion detection, and simple object identification.

CPU and computing
Graphical processing is one of these activities, typically regarded as one of the CPU's more difficult processing tasks.Because of the complexity of the solution, technology now has uses far beyond graphics.The difficulty in processing graphics is that, to render properly, visuals require complicated mathematics, which must be computed in parallel.For instance, a graphically demanding computer game may simultaneously have hundreds or thousands of polygons on the screen, each with its movement, color, lighting, and other characteristics.Such a workload is not designed for CPUs.Graphical processing units (GPUs) are used in this situation.
GPUs perform similarly to CPUs, having cores, memory, and other parts.GPU acceleration emphasizes parallel data processing with a high number of cores rather than context switching to manage many activities.Typically, each of these cores is less powerful than the CPU core.Additionally, GPUs are frequently incompatible with various hardware APIs and homeless storage.They perfectly support the simultaneous transfer of many processed data.
The GPU takes batch instructions and transmits them at high volume for faster processing and display instead of switching between graphics processing tasks Figure 10.

Figure 10. CPU and GPU internal architect
With GPUs, they can simultaneously increase application data throughput and the number of active computations.
Because of parallelism, the GPU can do more work than the CPU in a given period.
GPUs excel in parallel processing and are well-suited for tasks that require high-throughput computations, such as image and video processing.For security surveillance, GPUs are advantageous in scenarios that involve real-time object detection, tracking, and advanced image analysis.

Cluster computing
In cluster computing, a group of closely related or ad hoc computers work together to function as a single unit.The collective action of connected computers creates the idea of a single system.Typically, fast local area networks are used to connect the clusters (LANs).Cluster computing offers a great low-cost alternative to huge server or mainframe platforms.The demand for content criticality is met, and services are processed more quickly Figure 11.
Many businesses use cluster computing and IT companies to improve scalability, availability, processing speed, and resource management at a reasonable cost.It guarantees constant access to computing power.It offers a unique, superior approach to designing and operating highperformance parallel systems not dependent on specific hardware dealers or their product line choices [49].However, clusters might be less suitable in the context of UAVs due to their inherent resource constraints and the need for real-time processing.The platform provided by VPU allows companies to differentiate their products by customizing camera features.
It is a specialized hardware that supports cameras and can carry out real-time processing tasks.In the past, these were often passed on to the CPU or GPU, but the VPU consumes only a fraction of the power.The VPU can still be used alone or in conjunction with the CPU/GPU in a truly heterogeneous computing environment on the same memory subsystem for complex multi-application or multi-function activities.
For security surveillance UAVs, VPUs are highly suitable as they provide efficient and fast processing of visual data.They are particularly valuable for real-time object detection, tracking, and other computationally intensive vision tasks.VPUs optimize power consumption while delivering the required processing power for surveillance applications, making them well-suited for embedded systems on UAVs.
In summary, as shown in Table 1, while each architecture has its strengths, a VPU emerges as a promising option for security surveillance using UAVs and embedded systems.It offers a well-balanced combination of high processing speed, power efficiency, real-time capabilities, and accuracy, making it suitable for real-time object detection, tracking, and image analysis-all essential for effective security surveillance applications.

Implementation
The different hardware systems of processing system architects discussed earlier are employed to figure out our approach.Raspberry Pi 4b (Figure 13), jetson nano (Figure 14), Raspberry Pi 4b+Intel Neural Compute Stick 2 (Intel NCS2) (Figure 15), Personnel Computer (Figure 16), and Google collab cluster to work out the performance of successively Parallel CPU (CPU and GPU), Cluster, Vision Processing Unit (VPU) computing.For model execution on Movidius NCS 2, Intel provides an Opensource deep learning toolkit package, OpenVINO.The OpenVINO toolkit allows us to deploy pretrained deep learning models, via a high-level Python programming language Inference Engine API paired with application logic.The model must be restructured into an Intermediate Representation (IR) network, which can be inferred by the Inference Engine.IR consists of two binary files, which are.xml and .binfiles.Our work is performed on a Windows system for Personal computers and clusters, and in Linux for other systems by using the model optimizer built-in to the OpenVINO package.System processing the average processing time and power consumption are the most expensive parts that we have to calculate.It depends on the parameters of the model and its variables, including the number of layers, the number of cores, the size of the core, and the activation function.Figure 17 shows the video processing results of a moving object in the monitored area.
To measure processing time, the system uses a time stamp at the start and end of each processing step.The difference between the start and end time stamps provides the elapsed time for that step.This time is measured in milliseconds.The time module integrated into the Python programming language was used in our case.Power is quantified using sensors that measure components' electrical current and voltage.These measurements are then used to calculate power consumption using the following formula: The workflow scenario included in this work concerns motion detection and identification of moving vehicles in an industrial complex and tracking their trajectory in an open space using the different monitoring platforms.Input images were captured from a video.The initial settings default to the coordinates of the proposed drone system.As mentioned in Table 1, the Raspberry Pi 4 has poor performance and high power consumption.Jetson Nano and Cluster are proven to have higher power sources and consumption due to their high hardware capabilities.If we analyze the model classification over time, we see that it happens faster (Raspberry Pi 4+VPU).But as for the power consumption, we can see that (Raspberry Pi 4+VPU) is a lower power consumption system.This system is the preferred processing system of our surveillance system, consumes less power and has a short average processing time.NVIDIA Jetson Nano has adequate performance for our surveillance UAV applications.For resolutions up to 4K, realtime performance and energy consumption.
It uses CUDA "Compute Unified Device Architecture" an architecture developed by NVIDIA for parallel calculations.On the opposite side, some of the limits of this card have been encountered, the setting up of the environment is rather complicated in addition it does not support some last tools versions.
Intel hardware offers high performance, deep learning, simplified development, write once, and deploy anywhere.
Intel's generation of the OpenVINO toolkit makes accepting and maintaining our approach easier.By building an optimized network and controlling inference processes on specific devices, we can use the runtime (inference engine) to optimize performance.Optimization is also done automatically by detecting peripherals, balancing load, and inferring parallelism between CPU, GPU, and VPU.
The results prove that though system processing average processing time and processing system consumption marginally went down with (The Raspberry Pi+VPU) system, we noticed a significant increase in (Personal computer and Cluster) systems.Our (Raspberry Pi+VPU) system Figure 18.Achieves those two parameters, 1watt and 18 ms lower than (Personal computer and Cluster), respectively.At the same time, the parameters of each system are shown in Figure 17.
In this work, the results attained are compared with the ones achieved by the Small-Scale Object Detection for Unmanned Aerial Vehicles (UAVs) system proposed by Saeed et al. which modified the architecture of the detection network and executed on different embedded systems, as we present early our system can detect and locate the object in the surveillance area in the real-time [50].Singhal and Barick also proposes an application-aware Multi-Path Weighted Load-balancing (MWL) routing protocol for managing congestion, this system executes its process in the ground center, which increases the processing time and makes it out of service and powerless in the event of interruption or penetration [51].Teng et al. developed a trajectory planner based on particle swarm optimization with surveillance area priority, exploiting highly consumed existing UAVs to obtain optimal trajectories [52].
In our work, we provide a surveillance system that can reduce the energy consumed and processing time to locate any object in the surveilled area, whatever the object's trajectory.
Figure 18.System processing average time and processing system consumption of each system

CONCLUSION
This paper presents a novel approach to industrial area surveillance, which combines the Raspberry Pi 4B with an Intel Neural Compute Stick 2 VPU as the edge computing device to provide high image processing with low consumption of energy which can increase the UAV flying time in the surveilled area.In this study, performance tests of our big data approach were performed on Jetson Nano and Raspberry PI 4 boards, clusters, and personal computers (Raspberry PI 4+Intel VPU).Performance benchmarks included power consumption and average processing time.
We want to ensure a minimum of hardware, cost, and hardware choice in our real-time monitoring applications after the benchmark evaluation.In this context, the model was developed using the CNN algorithm from deep learning algorithms, and mathematical equations were transferred between real-world coordinates and image (pixel) coordinates.
According to the test results, the cluster consumes more power but delivers better performance with a shorter average processing time.The major challenge of this system in our application is the certainty of achieving a shorter average processing time.We need to transfer data over UDP instead of TCP communication protocol, and UDP protocol is less secure than TCP.In the last part, the system (Raspberry PI 4+Intel VPU) is the preferred processing system for our surveillance system.It uses less power and has a pretty good average processing time Table 2.

Figure 1 .
Figure 1.Proposed surveillance system overview

Figure 11 .
Figure 11.Cluster internal architect 4.4 Vision Processing Unit (VPU) computing A later class of microprocessors called the Vision Processing Unit (VPU) is a special type of AI accelerator designed specifically to speed up operations that use computer vision.The new Vision Processing Unit (VPU) is a quick 500MHz DSP (ISP) linked with the Image Signal Processors Figure 12.

Figure 12 .
Figure 12.Vision Processing Unit (VPU) internal architect Real-time Depth of Field is just one of the fascinating camera functions.It offers a dedicated processing platform for freeing up the CPU and GPU to conserve and computing resources.The platform provided by VPU allows companies to differentiate their products by customizing camera features.It is a specialized hardware that supports cameras and can , framei+1 Camera Matrix or a Matrix of Intrinsic Parameters Rotation-Translation Matrix [|] UAV Initial Position Initialize: framei=CaptureFirstFrame() While (framei+1 captured) Save framei+1 Pretreatment, noise removal, and filtering Match (framei+1 pixels) with (frame pixels) If (any motion is generated in the vision area) Localization of motion area Objection Identification in the Motion Area Calculation of IWC coordinates (xg,yg) for every object in the motion area Determination of δψ between frames and framei+1 Calculation of the real coordinates xg' Save in .csvfile (Time (t)||Object id||(xg,yg )||xg') Print real objects' trajectory Remove frame frame  frame i+1 Else (capture new framei+1)

Table 1 .
The key attributes of each embedded system

Table 2 .
Technical specifications of processing systems and results