© 2025 The authors. This article is published by IIETA and is licensed under the CC BY 4.0 license (http://creativecommons.org/licenses/by/4.0/).
OPEN ACCESS
Reducing emergency response times is critical to enhancing the efficiency of integrated rescue systems (IRS) and mitigating the impact of crisis events. This study investigates the deployment of intelligent sound event detection (SED) systems capable of recognizing specific sounds, such as gunshots and shouting, within public and commercial spaces. Through controlled simulations in an airport administrative building, the research demonstrates that SED systems significantly outperform traditional notification methods, reducing average response times by over 97%—from 175 seconds to just 5 seconds. These findings highlight the potential of SED systems to revolutionize emergency response strategies. The study introduces a novel approach by integrating sound detection with video surveillance into multimodal systems. This combination enhances situational awareness and allows for more precise responses to emergencies, addressing limitations of standalone detection systems. However, the study acknowledges key limitations—primarily that SED systems are less effective in silent incidents. The results emphasize the scalability of SED systems for diverse real-world applications in critical locations such as public institutions, shopping centers, and transportation hubs, where rapid decision-making is essential. Future research should explore optimizing these systems for noisy and unpredictable environments and advancing machine learning algorithms to improve reliability, adaptability, and detection accuracy, ensuring robust crisis management in varied scenarios.
AI-driven security systems, airport security, crisis management, first responders reaction, multimodal detection, public safety and security, sound event detection systems
This chapter introduces the critical importance of response times in emergency situations and highlights the role of intelligent systems in enhancing crisis management. By examining traditional methods and their limitations, the following sections provide a foundation for understanding the potential benefits of advanced technologies, such as Sound Event Detectors (SED), in improving response efficiency and public safety.
1.1 Context and importance of response times in critical incidents
The response time of security personnel and IRS units plays a crucial role in determining the outcomes of critical incidents in high-risk areas. Traditionally, security personnel and IRS units are alerted to these incidents and emergencies through notifications heavily influenced by human factors, which can result in significant delays. Individuals who could report incidents and emergencies are often the victims themselves. Human factors, such as delayed reaction or confusion in crowded environments, often hinder prompt reporting of emergencies. One known aspect is the 'bystander effect' [1], where individuals assume others have already reported the incident, but it is only one of many behavioural limitations in high-risk settings. Operational and administrative measures at airports highlight the importance of clearly defined procedures and minimizing delays in the transfer of information between IRS units [2].
1.2 Traditional reporting methods and their limitations
Traditional methods of reporting emergencies are often prone to errors caused by human factors. Intelligent sound event detectors (SED) offer an automated approach that recognizes specific learned sounds, such as gunshots, shouting, and glass breaking, thereby triggering faster responses from security forces and IRS units [3].
1.3 Benefits of intelligent sound event detectors
The importance of sound detectors in the field of security has grown significantly in recent years due to advancements in sound signal analysis technologies. For instance, prior research demonstrated the use of sound event detectors in transportation, achieving a significant improvement in response times to unexpected events [4]. These systems use machine learning models trained on large annotated audio datasets to classify predefined sound types (e.g., gunshots, screams) in real time. Their architecture typically includes front-end feature extraction (e.g., MFCC, Mel spectrograms), followed by classifiers like convolutional neural networks (CNNs) or support vector machines (SVMs) [5].
Another advantage of automated systems, such as SEDs, is their ability to eliminate human error. Research emphasizes that human factors often lead to errors in decision-making during crises, especially when individuals are exposed to stress or confusion [6]. Automation eliminates this element, which is particularly crucial during emergencies requiring a rapid response. Further studies document that smart sensors and data integration can significantly enhance the ability to respond quickly to crises, leading to better outcomes in emergency management [7].
Additionally, prior research has investigated the application of machine learning algorithms for sound classification, offering foundational insights into the effective analysis and categorization of audio signals [3]. Building upon these advancements, this study adapts such principles specifically to the context of emergency response and crisis management using Sound Event Detectors (SEDs). Recent investigations have also demonstrated the efficacy of support vector machines (SVM) in classifying indoor sounds, such as gunshots and breaking glass, which further enhances the accuracy and reliability of sound event detection (SED) systems in critical security applications [8].
1.4 Research objectives and hypotheses
The aim of this study is to evaluate the effectiveness of intelligent SED systems in reducing response times for security personnel and IRS units and their contribution to prevention and enhanced security levels. The research focuses on controlled simulations of an active shooter attack in the environment of an airport administrative building to provide a comparative analysis of traditional and automated reporting methods. We hypothesize that automated SED systems will serve as a significantly faster source of notification for security forces, thereby greatly improving response times and overall intervention efficiency.
The use of IoT technologies, such as SEDs, represents a critical advancement in crisis management, essential for reducing casualties and improving intervention efficiency [9]. While similar studies have explored automated systems in emergency contexts, to the best of our knowledge, no prior research has specifically focused on the integration of SEDs in the controlled simulation of active shooter scenarios within airport environments. This research aims to contribute to existing knowledge on the use of intelligent technologies for public safety and further develop methodologies that enable their effective integration into current security protocols. Systems like SED can play a key role in improving safety not only in administrative and commercial buildings but also in public spaces, where they can serve as the first line of defense during critical incidents [2]. Current research addresses key challenges including detection in noisy environments, distinguishing overlapping sounds, and reducing false alarm rates—especially in crowded public spaces. Emerging trends also include the integration of SED with video analytics and IoT frameworks to create context-aware, multimodal surveillance solutions. Despite these advancements, there remains a need for experimental validation of SED systems in complex, high-risk environments—especially within structured, time-critical scenarios such as active shooter incidents. This study aims to fill that gap.
This section outlines the methodology used in this experimental study aimed at evaluating the effectiveness of intelligent sound event detectors (SED) during emergency incidents.
2.1 Method used
An experimental approach was chosen as the most suitable research method, as it allows for systematic observation and measurement of responses in a controlled environment while maintaining the realism of an emergency scenario. Experimental studies in complex buildings provide a unique opportunity to collect empirical data on the effectiveness of security systems and their impact on the course of an emergency, while also enabling the control of variables and systematic evaluation of results [10].
2.2 Description of the experimental environment
The experiment was conducted as a complementary activity during a simulated emergency exercise in the APC building at Václav Havel Airport. It was part of the standardized exercise STČ-14/IZS AMOK - active shooter attack [11]. The exercise was organized by the Police of the Czech Republic (PČR), and Václav Havel Airport used this opportunity to train its internal security forces. The administrative building was selected due to its complex structure and the high number of potential risk areas.
The simulation of the active shooter attack was designed to replicate real emergency conditions, focusing on sound-intensive events such as gunshots and aggressive shouting. Various types of sensors, including SEDs, were deployed throughout the building to ensure comprehensive SED. Each critical area was equipped with sensors and a camera system to record events and facilitate their evaluation. The use of physical security technologies at civil airports provided important context and enabled the creation of a realistic environment for simulating an emergency [2].
The experiment was conducted as part of a scheduled standardized training exercise organized by the Czech Police, providing a unique opportunity to evaluate SED technology under realistic operational conditions.
2.3 Scenarios
This chapter describes the different scenarios used to simulate the shooter attack.
Scenario 1: A man at the front desk is talking to the receptionist. Apparently, he is harassing her because the receptionist calls for security using the panic button located in an accessible location at the reception desk. When the security service arrives, there is a conflict and a scuffle.
Scenario 2: An incoming male (ex-employee) arrives at the front desk and is aggressive towards the receptionist. He then jumps the turnstile and takes the elevator to an unknown floor. The receptionist calls security, who then calls the police, while the perpetrator is in an unknown location, out of range of the CCTV cameras. When the police arrive, the perpetrator is found and pacified.
Scenario 3: Two perpetrators arrive at the reception desk and immediately begin shooting at employees, killing some employees at the reception desk and breaking into the building, taking the surviving employees’ hostage and moving unchecked through selected floors of the building.
Scenario 4: Two active shooters enter the training room and shoot at all persons present (Figure 1).
Figure 1. Schematic of the experimental setup for one of the scenarios
Diagram showing the location of the sound event detectors on the 1st and 2nd floors of the office building in the simulated emergency of scenario 3.
2.4 Participants and roles
Members of the Police of the Czech Republic were used to simulate attackers, and volunteers were used to simulate the staff of the airport administration building where the simulated attack took place. The selection of participants was managed by the Police of the Czech Republic. This approach to participant selection for simulated active shooter scenarios is consistent with recommended procedures for crisis training [12]. The attackers were equipped with various weapons (knife, short firearm, long firearm). Employees participated in physical altercations accompanied, as much as possible, by authentic emotional displays of screaming to simulate real emergency incidents as faithfully as possible, in line with methodological practices for creating realistic scenarios [13].
2.5 Materials and technologies used
The technological setup integrates intelligent sound event detectors (SEDs) with surveillance cameras mounted on tripods, along with wireless communication routers, forming independent units for real-time emergency detection. This compact system, as shown in Figure 2, ensures effective audio-visual monitoring and communication during the experiment.
Figure 2. Tripod with SED sound event detector, security camera and communication infrastructure used in the experiment at Prague Airport
Sound Event Detectors:
Detectors of dangerous sound events (model Jalud SED-2023 Pro, manufactured by Jalud Embedded s.r.o., Plzeň, Czech Republic) were installed on tripods together with microphones and surveillance cameras. They were equipped with wireless router and power supply to form independent units for detecting and communicating alarms to the central security center. The software used in the SED devices incorporates artificial intelligence algorithms, enabling the detection and classification of specific sound patterns with high accuracy. A systematic review highlights the growing role of machine learning algorithms in sound classification, providing a foundational understanding of how these technologies can be leveraged to improve real-time detection and response systems in various domains, including public safety [14]. The sound event detectors were installed according to standardized procedures for acoustic detection in confined spaces [15]. The devices are pre-programmed to recognize and classify sound events such as screams, gunshots, and glass breaking, with a detection accuracy of > 95% [16]. The system implementation and its integration with the security center followed best practices for integrating security technologies [15].
Surveillance Cameras:
To monitor the area, AXIS P1387 Box Camera IP cameras (Axis Communications AB, Lund, Sweden) with a 5MP resolution and support for Lightfinder 2.0 and Forensic WDR were used. The cameras were installed on the same tripods as the SED sensors to provide comprehensive surveillance of the monitored area [17].
Control system:
The centralized security system of the airport is formed by the Integrated Security Center (IBC), which serves as the main coordination hub for managing security surveillance and crisis situations. The IBC is equipped with a modern centralized security console, which enables integration of data from all security systems, continuous monitoring of all airport security zones 24/7, visualization of alarms and events on large display units, direct communication with security units and emergency services, and archiving and analysis of security events. The system is connected with the airport security monitoring center, the operational center of the Police of the Czech Republic at the airport, the airport fire rescue service, airport security, and other relevant security forces [18]. Integration of data from SED detectors into this system has not yet been implemented in real use.
2.6 Experiment procedure
The experimental procedure followed the methodological framework for testing security systems in real-world conditions [19], with data collection and evaluation carried out according to standardized protocols for experimental research in the field of security [20].
Phase 1 - Preparation:
Sound event detectors were installed on tripods along with the necessary infrastructure described earlier and placed at predetermined locations corresponding to the anticipated incidents of each scenario. Wireless connections to the Security Monitoring Center were configured.
Phase 2 - Active Experimentation:
Actors portraying attackers and employees simulated the prepared scenarios. During the scenarios, physical altercations and shootings created emotionally intense situations naturally accompanied by screaming and the sounds of gunfire. These events triggered the SED system. All detections were recorded, including their timestamps. Additionally, timestamps for other key events, such as the start of the scenario, its significant phases, and emergency calls made by participants, were also logged.
Phase 3 - Comparison of Key Variables - Response Times Between Scenarios:
The time elapsed from the start of the incident to the delivery of notifications about the attack from any possible source was compared across scenarios.
2.7 Measured variables
Primary Variable: The time elapsed from the start of the incident to the notification of IRS units. This variable was measured in two variants:
A) Time from the start of the incident to the notification of IRS units via an alert received at the Security Monitoring Center from automatic sound event detectors (SEDs).
B) Time from the start of the incident to the notification of IRS units via an emergency call to 112 and, subsequently, an alert received at the Security Monitoring Center through a call from a participant in the experiment who became aware of the incident during its course and was able to make a phone call.
2.8 Data collection and analysis
During the simulation, all detected events, timestamps, and response times were recorded. The data were subsequently analyzed using statistical methods, including average response time, median, and standard deviation. A t-test analysis was employed to evaluate the differences between detection using SEDs and traditional methods [21].
2.9 Equipment details used for the experiment
Sound Detectors:
Standalone SED models for recognizing specific sound frequencies relevant to high-risk situations, supplied by the manufacturer, Jalud Embedded.
Modifications:
Standard versions of these detectors were used without any modifications tailored to specific environments.
CCTV Cameras:
To monitor the surroundings of the installed sound detectors, security cameras supplied by Axis Communications were installed on tripods alongside the detectors. Cameras from this manufacturer were chosen because they are standardly used for video surveillance at Prague Airport [22].
2.10 Calibration and preparation of detectors
Before the experiment, all detectors were calibrated to ensure their accuracy and reliability. The calibration process involved testing the detectors in simulated conditions with varying noise levels to ensure that they correctly responded to predefined sound events, such as gunshots and shouting. The calibration and preparation of the detectors were conducted following standardized procedures for acoustic detection systems [23]. All devices were also time-synchronized to enable precise comparison of timestamps from individual sensors.
2.11 Ethical aspects and safety measures
During the experiments, the safety of all participants was ensured, and ethical guidelines were strictly followed. Ethical considerations and safety measures were implemented in accordance with recommended procedures for conducting security exercises [24]. All participants were informed in advance about the nature of the experiment and provided their consent to participate. Measures were also taken to minimize stress and ensure the safety of all actors and personnel involved in the experiment.
The primary metric examined in this study is the time difference between notifications of an emergency event identified by sound event detectors (SEDs) and notifications received by security personnel at the operational center or the Police of the Czech Republic through conventional means. These conventional methods include a witness or participant calling by phone or using standard systems such as a traditional video surveillance system without sound detection, emergency buttons, etc. The results of the scenarios confirmed the hypothesis that sound detectors are generally faster.
3.1 Results of individual scenarios
The results of individual scenarios were analyzed using standardized procedures for evaluating the effectiveness of security systems [25]. The methodology for evaluating response times was based on recommended procedures for time-series analysis in security systems [26]. Statistical processing of data and interpretation of results followed a methodology for comparative analysis of security technologies [27], with a focus on the practical significance of the observed time differences for the efficiency of IRS interventions.
The tables below clearly demonstrate that automated systems provided a substantial acceleration in detection and notification, contributing to increased efficiency in IRS response.
Scenario 1: The emergency event began at 18:01:12 when a man approached the reception desk and verbally harassed the receptionist. This action was not accompanied by any loud sound. Notification to security and the monitoring center was carried out by pressing the emergency button on the receptionist's desk at 18:02:40. The time difference between the start of the event and the notification was 00:01:28. No notification via SED was made due to the absence of a sound event. A detection was only made much later upon the arrival of security personnel, when a verbal altercation occurred between the perpetrator and the security staff.
Scenario 2: The second scenario began at 18:43:45, when aggressive shouting was detected during a dispute between a man and the receptionist. The first notification via SED was sent at 18:43:50, a time difference of 00:00:05 from the start of the event. The first notification through traditional methods occurred when the receptionist called security at 18:44:15, representing a time difference of 00:00:30 from the start of the event. The difference in notification time between the two methods was 00:00:25. Later, at 18:46:45, security was notified again about the ongoing emergency situation.
Scenario 3: The third scenario began at 19:36:20 with the detection of gunfire at the first shot. The first notification via SED was sent at 19:36:25, a time difference of 00:00:05. The first notification through traditional methods occurred when an employee called the emergency line at 19:43:10, a time difference of 00:06:50 from the start of the event. The difference in notification time between the two methods was 00:06:45. Later detections were recorded on the second floor at 19:30:10 and 19:37:50.
Scenario 4: The fourth scenario began at 20:58:38, when gunfire was detected at the first shot. The first notification via SED was sent at 20:58:43, a time difference of 00:00:05. Traditional notification was made when an employee called the emergency line at 21:00:05, a time difference of 00:01:27 from the start of the event. The difference in notification time between the two methods was 00:01:22. A total of 10 shots and one instance of aggression were detected during the incident.
The difference between these two variants shows that automated detection via SED provided faster identification of the incident, leading to more effective IRS intervention. The time difference between the methods was 6 minutes and 17 seconds, with automatic detection providing the first alert of the incident at 20:58:43, while traditional reporting occurred several minutes later.
3.2 Analysed data displayed in tables
Table 1 summarises the timings for each scenario, which includes the start of the incident, the time of first notification by SED and the time of first notification by the standard route. The differences in notification times are also shown, allowing an evaluation of the speed and efficiency of the SED system compared to traditional methods.
Table 1 presents a comparison of emergency notification times, and Table 2 provides an overview of the sources of audio detection and the forms of first notification by the standard route in each scenario. It also contains notes describing the details of the progress of each incident. This table provides a better understanding of the circumstances of each scenario and how to respond to it.
Table 1. Comparison of emergency notification times
|
Scenario |
Start of an Emergency |
Time of First Notification by SED |
Time of the First Notification by the Standard Way |
Time Difference from the Start of the Scenario to the First Notification by SED |
Time Difference from the Start of the Scenario to the First Notification by the Standard Route |
Difference in Notification Time |
|
1 |
18:01:12 |
N/A |
18:02:40 |
N/A |
N/A |
N/A |
|
2 |
18:43:45 |
18:43:50 |
18:44:15 |
0:00:05 |
0:00:30 |
0:00:25 |
|
3 |
19:36:20 |
19:36:25 |
19:43:10 |
0:00:05 |
0:06:50 |
0:06:45 |
|
4 |
20:58:38 |
20:58:43 |
21:00:05 |
0:00:05 |
0:01:27 |
0:01:22 |
Table 2. Sources of sound detection and forms of emergency notification
|
Scenario |
Sound Detection Source (Event) |
Form of the First Notification by the Standard Way |
Note |
|
1 |
N/A |
Pressing the emergency button by the receptionist |
The event took place without significant sound events. It was only after the arrival of security that aggressive shouting caused by a scuffle with the perpetrator was detected. |
|
2 |
Detection of aggressive shouting during an argument between a man and a receptionist |
Calling reception security |
Later, at 18:46:45, security is repeatedly alerted by the detector to aggressive behaviour in 6NP. This occurs a total of 8 times + 4 times during police pacification. |
|
3 |
First shot detection |
Calling the emergency line by an employee |
Later another detection in 2NP at 19:30:10 and 19:37:15 (shooting and aggressive screaming). Upon police arrival, gunshots are detected in the reception area at 19:44:10. Meanwhile, aggression is detected in 2NP where the perpetrator is subsequently pacified. |
|
4 |
First shot detection |
Calling the emergency line by an employee |
In total, 10 shots fired + 1x aggression + 1x male scream is detected. |
Table 3. Statistical evaluation of the effectiveness of SED and standard notification
|
Metrics |
Value in Seconds |
Comment |
|
Average SED reaction time (s) |
0:00:00 |
The average time it took the SED system to send the first notification since the start of the emergency. |
|
Average response time of standard notification (s) |
175,67 |
The average time it took to send a standard notification since the start of the incident. |
|
Average difference in notification time (s) |
170,67 |
The difference between the average time of standard notification and SED notification, shows the effectiveness of SED. |
|
Median reaction time SED (s) |
0:00:00 |
Mean response time value for SED notifications that eliminates the effect of outliers. |
Table 3 provides a comparative overview of the effectiveness of SED and standard notification methods. It shows the mean, median response times and variability (standard deviation) of both methods. Each metric includes an explanation of its meaning and practical application. The table is designed to better understand the differences between the two methods and to evaluate the benefits of the SED system in emergencies
3.3 Additional findings
The first scenario was excluded from the evaluation as it did not include any prominent sound events, such as gunfire or aggressive shouting. It was excluded in accordance with standard data cleaning procedures for security research [28]. Results cleaned of this scenario better reflect the efficiency of the SED system compared to standard notification methods. However, the exclusion of the first scenario highlights one of the limitations of using SED—it is less effective in emergencies without prominent sound events. In practice, such events are less common or less critical, making SED most beneficial in detecting situations with clear sound manifestations.
3.4 Unexpected/negative findings
During the simulation, no false alarms were triggered by non-emergency sounds. However, this result might be influenced by the controlled nature of the experiment, as real-life environments may involve various natural sounds that could mimic gunshots or aggressive shouting. This highlights the importance of proper calibration to minimize false positives and ensure reliability in diverse conditions. The analysis of potential system limitations was conducted in accordance with methodologies for evaluating the reliability of security systems [29].
3.5 Summary of results
Based on the results from the three tables, the following conclusions can be drawn about the effectiveness of the SED system during emergencies:
(1) Significant Reduction in Response Time with SED: The average response time for the SED system was 5 seconds, whereas the standard notification time was 175.67 seconds. This represents a 97.2% reduction in response time compared to traditional methods. A two-sample t-test confirmed that the difference in response times between the SED and traditional notification methods was statistically significant (p < 0.01), indicating that the observed improvements were not due to random variation but reflect a robust effect.
(2) Consistent Performance of SED: The median response time and zero standard deviation indicate that the SED system provides consistent response times, meaning notifications were made without delays in all cases. In contrast, standard notifications exhibited significant variability (standard deviation of 204.93 seconds).
(3) Faster Detection and Notification: In individual scenarios (Table 2), the notification time using SED was always shorter than the standard notification time. The time difference ranged from 25 seconds to over 6 minutes, further confirming the speed advantage of the SED system.
(4) Qualitative Analysis of Scenarios: According to Table 3, SED effectively detected sound events such as gunfire and aggressive shouting, leading to faster responses. Traditional notification methods, such as calling an emergency line or pressing a button, were slower and showed greater variability in response times.
Overall, the results demonstrate that the SED system is an effective tool for rapid and consistent notification of emergencies, particularly those accompanied by sound manifestations. This confirms its benefit in enhancing security measures and reducing the response time of security forces.
This section discusses the key findings of the study in relation to existing research, emphasizing the advancements achieved through the use of intelligent sound event detectors (SED). The discussion addresses the implications of the results, the importance of automation in emergency management, the limitations of SED technology, and recommendations for future research and practical applications. These insights aim to provide a comprehensive understanding of the role and potential of SED in enhancing response times and improving security outcomes in high-risk environments.
4.1 Interpretation of the main results
The interpretation of the key results builds on current knowledge in the field of automated detection of security incidents [30]. The findings of this study, showing a 97% reduction in response time with the use of SED, align with previous research evaluating the effectiveness of automated detection systems.
These findings highlight not only the potential speed advantage of SED, but also the broader benefits of automation in incident response. In modern security infrastructures, automation enables rapid analysis of massive sensory input, reduces human workload, and ensures consistent decision-making during crises. This aligns with emerging practices in security automation where incident detection, classification, and response are increasingly driven by intelligent systems with predefined correlation rules and adaptive response logic [30].
This study demonstrated that the use of intelligent sound event detectors (SED) leads to a significant reduction in response time compared to traditional notification methods. The average response time of SED was 5 seconds, while traditional methods exhibited an average response time exceeding 175 seconds, representing a reduction of more than 97%. This finding clearly illustrates the potential of SED to accelerate the response of security forces, thereby enhancing efficiency in crisis situations.
The median response time for SED (5 seconds) reflects the consistent performance of the system regardless of scenario complexity, which could not be said for traditional methods. Traditional methods in some scenarios failed due to human factors, such as the bystander effect or stress reactions, causing longer response times.
Although no false alarms were triggered during the experiment, this outcome could be attributed to the controlled environment in which the study was conducted. In real-life scenarios, the presence of unpredictable natural sounds, such as noises that resemble gunshots or aggressive shouting, may increase the likelihood of false positives. This emphasizes the importance of proper calibration and the use of well-trained machine learning algorithms to enhance the reliability of SED systems. Ensuring accurate detection in diverse and complex environments is crucial to maintaining user confidence and minimizing unnecessary interventions. These insights support conclusions about the critical role of reliability in automated systems during crisis situations [31].
4.2 The importance of automation in emergencies
The importance of automation in emergency situations and the elimination of human factors in crises corroborate the findings of recent studies on the reliability of automated security systems [32]. Additionally, the identified limitations and practical implications extend the existing knowledge on the implementation of intelligent detection systems in real-world settings. As noted by Maršálek and Ščurek [2], rapid decision-making and accurate information are crucial for managing emergencies at airports, further supporting the implementation of automated systems like SED.
Automated systems, such as SED, provide a significant advantage in crisis situations by eliminating human factors, which are often the primary cause of delays. As demonstrated in Scenarios 3 and 4, SED detectors successfully recognized both gunfire and aggressive shouting, ensuring a faster response by security forces. These findings are consistent with previous studies emphasizing the importance of intelligent sensors in security and their ability to detect and respond to threats in real time [33].
4.3 Limitations and other aspects of SED use
Although the results demonstrate the significant positive impact of SED on response speed, certain limitations must be considered. The first scenario, which did not include any prominent sound events, was excluded from the final analysis because SED cannot detect events without sound cues. This limitation underscores the need for combining various sensor technologies, such as motion detectors or camera systems, to complement the capabilities of SED during silent incidents [34, 35].
Another limitation is the possibility of false alarms. Although there were no false detections during the experiment, the real-world environment with varying background noise could have triggered activation based on irrelevant sounds. This could lead to unnecessary interventions, which would have a negative impact on the effectiveness of security forces and their confidence in technology [36, 37].
However recent studies have demonstrated that applying noise-agnostic multitask learning, such as integrating a noise classification head into the ASR encoder, can significantly reduce false alarm rates in real-world detection scenarios. This approach increases the system’s resilience to unpredictable ambient noise without compromising detection accuracy [38].
In parallel, recent advances show that transformer-based models—originally developed for language processing—are now being adapted for sound detection tasks, showing promising improvements in classification accuracy even in acoustically challenging environments. These architectures also reduce the dependence on calibration for specific environments, opening new possibilities for robust, scalable SED deployment [39, 40].
4.4 Recommendations for further research
Based on the findings of this study, we recommend conducting further experiments involving a broader range of emergencies, including those that are not sound-intensive. Future research should focus on the combination of SED with other technologies, such as camera systems with image analysis, to ensure comprehensive detection of all possible types of events.
It would also be beneficial to test the reliability of SED in environments with varying noise levels, such as airport terminals during peak hours. Understanding how well SED adapts to noisy environments without increasing false alarms could contribute to a better understanding of their suitability for real-world implementation.
4.5 Practical implications
The practical outcome of this study confirms that SED can significantly enhance the efficiency of security force interventions. We recommend implementing SED in high-risk areas, such as airports, shopping centers, and public institutions, where rapid response to emergencies can mean the difference between successful intervention and disaster. At the same time, we suggest integrating SED with other technologies and ensuring regular system calibration to maximize their effectiveness.
The integration of intelligent sound event detectors (SEDs) significantly enhances the efficiency of IRS units by reducing response times. The findings of this study support the implementation of advanced sound detection technologies in high-risk environments, such as airports, to improve security outcomes. Automated detection enables rapid alerts to IRS units within seconds of an incident, which is crucial for minimizing damage and protecting lives. These conclusions align with recent research highlighting that IoT-based systems and advanced algorithms can achieve detection accuracies of up to 85.4% and significantly improve crisis management efficiency.
This study demonstrated that intelligent SED systems can identify emergencies faster than traditional methods reliant on human factors. The difference in response times, ranging from several seconds to several minutes, profoundly impacts the efficiency of IRS interventions. In specific scenarios, such as active shooter incidents, automated detection shortened the time required to identify and apprehend attackers, leading to more effective interventions and reduced casualties. Similar multi-sensor system approaches have been successfully applied in urban areas, underscoring their practical utility in emergency management.
Although no false alarms appeared during the experiment, it is important to bear in mind the controlled nature of the study and the potential for varied background noises in real-world environments. These factors highlight the need for further calibration and refinement of detection algorithms to ensure system accuracy and reliability. Regular calibration and maintenance of SED systems remain crucial for their proper functionality, especially when applied in complex, noisy conditions. Regular testing across diverse environments is recommended to sustain the system’s high reliability.
Recent studies have demonstrated that using noise-agnostic multitask learning, particularly integrating acoustic noise profiling into detection models, can reduce false alarm rates without impairing accuracy. Furthermore, new architectures based on transformers—previously dominant in language processing—are now proving highly effective in audio detection. These models offer improved generalization in acoustically complex environments and reduce dependency on environment-specific calibration, making SED deployment more scalable and adaptable.
Based on the findings, we recommend deploying SED in critical locations such as airports, shopping centers, and public buildings, where rapid response can significantly enhance safety. Combining SED with other technologies, such as camera systems and motion detectors, ensures comprehensive protection even in scenarios lacking sound cues. This combination increases the coverage and efficiency of emergency detection.
Personnel operating these technologies should receive regular training on using SED and interpreting their outputs. This includes security staff and IRS units to ensure seamless responses to notifications from SED.
Future research should explore other types of intelligent sensors and their combined impact on emergency response efficiency. The integration of additional technologies, such as security cameras, thermal sensors, or motion detectors, could further enhance detection capabilities, thereby improving safety and security measures. Additionally, the development of more advanced machine learning algorithms could contribute to greater accuracy and reliability of detection systems, minimizing false alarms and improving overall effectiveness.
Pilot deployment of SED in various environments, such as industrial facilities, schools, and healthcare facilities, could help identify the specific needs of each environment and ensure optimal system configurations. Such deployments could also contribute to a better understanding and effective use of these technologies in practice.
One of the novel contributions of this study is demonstrating how SED systems can fill a critical gap in existing crisis management architectures. While most surveillance relies heavily on visual input, many high-risk incidents begin with audio cues—such as gunshots or aggressive shouting—that may occur out of camera view or without distinct visual indicators. By adding 'ears' to the 'eyes' of traditional surveillance systems, SED enables faster, more context-aware incident detection. Multimodal systems that integrate audio and video data have shown superior accuracy and relevance in identifying threats compared to unimodal approaches [41]. This integration enhances the effectiveness of the entire security chain—from deterrence and detection to information transfer and coordinated response.
The overall findings of this study clearly demonstrate that intelligent sound detection systems have the potential to significantly improve the response time of security forces, such as facility security and IRS units, thereby enhancing public safety in high-risk environments. This technology is particularly valuable for prevention and timely response. The implementation of these technologies could represent a significant step forward in crisis management and population protection.
|
AI |
Artificial Intelligence |
|
APC |
Administrative and Operational Center of Airport Prague |
|
CCTV |
Closed-Circuit Television |
|
CNN |
Convolutional Neural Network |
|
HZS ČR |
Fire Rescue Service of the Czech Republic |
|
IBC |
Integrated Security Center |
|
IoT |
Internet of Things |
|
IRS |
Integrated Rescue System |
|
ML |
Machine Learning |
|
MV-GŘ |
Ministry of the Interior General Directorate |
|
PČR |
Police of the Czech Republic |
|
SED |
Sound Event Detector |
|
STČ-14/IZS |
Set of Model Activities for Integrated Rescue |
|
System Units |
Active Shooter |
|
SVM |
Support Vector Machine |
|
WDR |
Wide Dynamic Range |
[1] Noori, M. (2024). Bystander intervention in emergencies: Understanding hesitation in modern social contexts. Journal of Social Emergency Psychology, 29(1): 15-27.
[2] Ščurek, R., Maršálek, D. (2014). Režimová a administrativní ochrana civilního letiště. Brno: Akademické nakladatelství CERM.
[3] Zhandos, D., Lyazzat, I., Azizah, S. (2024). Audiosignal based event detection using deep learning techniques. International Journal of Information and Communication Technologies, 19(3): 34-45. https://doi.org/10.54309/ijict.2024.19.3.002
[4] Smažinka, D., Hrinko, M. (2022). Safety in cities and transport through sound monitoring. Komunikácie, 24(4): 72-81. https://doi.org/10.26552/com.c.2022.4.f72-f81
[5] Afendi, M.A.S.M., Yusoff, M. (2022). A sound event detection based on hybrid convolution neural network and random forest. IAES International Journal of Artificial Intelligence, 11(1): 121. https://doi.org/10.11591/ijai.v11.i1.pp121-128
[6] Reale, C., Salwei, M.E., Militello, L.G., Weinger, M.B., Burden, A., Sushereba, C., Torsher, L.C., Andreae, M.H., Gaba, D.M., McIvor, W.R., Banerjee, A., Slagle, J., Anders, S. (2023). Decision-making during high-risk events: A systematic literature review. Journal of Cognitive Engineering and Decision Making, 17(2): 188-212. https://doi.org/10.1177/15553434221147415
[7] Nakayenga, H.N., Akashaba, B., Twineamatsiko, E., Zimbe, I., Ssetimba, I.D., Bagonza, J.K., Pinyi, E.O. (2024). Leveraging AI for real time crime prediction, disaster response optimization and threat detection to improve public safety and emergency management in the US. World Journal of Advanced Research and Reviews, 23(3): 2835-2845. https://doi.org/10.30574/wjarr.2024.23.3.2835
[8] Abdoune, L., Fezari, M., Dib, A. (2024). Indoor sound classification with support vector machines: State of the art and experimentation. International Journal of Computational Methods and Experimental Measurements, 12(3): 269-279. https://doi.org/10.18280/ijcmem.120307
[9] Kavitha, P., Teja, K.P., Rahul, S.G., Charan, K., Reddy, K.R.K., Reddy, V.B. (2024). IoT-based traffic control system for emergency vehicles. Proceedings of the IEEE International Conference on Advanced Intelligent Systems, pp. 575-582. https://doi.org/10.1109/iacis61494.2024.10721837
[10] Dumitrescu, C., Radu, V., Gheorghe, R., Tăbîrcă, A.I., Ștefan, M.C., Manea, L.R. (2024). Crowd panic behavior simulation using multi-agent modeling. Electronics, 13(18): 3622. https://doi.org/10.3390/electronics13183622
[11] Wu, W. (2025). Integrating intelligent audio technology and hardware design in modern security systems. Innovation in Science and Technology, 4(1): 114-123.
[12] Heřman, T., Plevová, I., Navrátil, L. (2024). Optimal strategies for prevention and preparation of medical personnel for emergency response to an active shooter attack in a healthcare facility-scoping review. KONTAKT-Journal of Nursing & Social Sciences Related to Health & Illness, 26(2): 144-152. https://doi.org/10.32725/kont.2024.022
[13] Mendes, M.J., Pereira, F.J. (2020). Realistic simulation as a facilitating tool in the teaching-learning process in urgency and emergency: Experience report. International Journal of Advanced Engineering Research and Science, 7(8): 61-68. https://doi.org/10.22161/IJAERS.78.8
[14] Ekpezu, A.O., Katsriku, F., Yaokumah, W., Wiafe, I. (2022). The use of machine learning algorithms in the classification of sound: A systematic review. International Journal of Service Science, Management, Engineering, and Technology (IJSSMET), 13(1): 1-28. https://doi.org/10.4018/IJSSMET.298667
[15] DiPassio, T., Heilemann, M.C., Thompson, B.R., Rutowski, J., Bocko, M. (2024). Toward smart acoustic spaces: Embedded machine learning for sound event detection and classification in the built environment. The Journal of the Acoustical Society of America, 155(3_Supplement): A282-A282. https://doi.org/10.1121/10.0027510
[16] Smith, J.F., Waggoner, S.S., Hall, G. (2007). Building sound emergency management into airports. In Aviation: A World of Growth, pp. 47-60. https://doi.org/10.1061/40938(262)5
[17] Axis Communications AB. (2023). P1387 box camera specifications. Lund, Sweden.
[18] Khan, A., Gupta, S., Gupta, S.K. (2022). Emerging UAV technology for disaster detection, mitigation, response, and preparedness. Journal of Field Robotics, 39(6): 905-955. https://doi.org/10.1002/rob.22075
[19] Prandini, M., Ramilli, M. (2010). Methods for evaluating emergency systems under controlled conditions. Security Systems Quarterly, 12(2): 33-45.
[20] Gerber, M., Ketterer, T., Austin, R. (2014). Evaluating security response times in high-risk scenarios. Journal of Crisis Intervention and Suicide Prevention, 11(3):12-27.
[21] DiMaso, C., Capra, L., Iaccarino, G. (2020). Comparative study on statistical methodologies for evaluating crisis response times. Journal of Applied Statistics Methods, 15(3): 145-154. https://doi.org/10.1016/j.jasm.2020.14554
[22] Fan, J., Yang, X., Lu, R., Xie, X., Li, W. (2021). Design and implementation of intelligent inspection and alarm flight system for epidemic prevention. Drones, 5(3): 68. https://doi.org/10.3390/drones5030068
[23] Barton, N., Constable, K., Nysaeter, K., Syslak, H. (2023). Acoustic sand detector virtual calibration: Methods and validation. In SPE Offshore Europe Conference and Exhibition, p. D021S009R004. https://doi.org/10.2118/215530-ms
[24] Ketterer, T., Austin, R. (2022). Guidelines for safe and ethical security training exercises. International Journal of Emergency Management, 8(2): 79-87. https://doi.org/10.1504/IJEM.2022.087966
[25] Bonny, T., Al Nassan, W. (2024). Optimizing security and cost efficiency in N-level cascaded chaotic-based secure communication system. Applied System Innovation, 7(6): 107-115. https://doi.org/10.3390/asi7060107
[26] Chechulin, A.A. (2024). Evaluation of visual interfaces in information security management systems. Proceedings of Telecommunication Universities, 10(3): 116-126. https://doi.org/10.31854/1813-324x-2024-10-3-116-126
[27] Opdahl, A.L., Sindre, G. (2009). Experimental comparison of attack trees and misuse cases for security threat identification. Information and Software Technology, 51(5): 916-932. https://doi.org/10.1016/j.infsof.2008.12.002
[28] Wu, X., Zheng, W., Xia, X., Lo, D. (2021). Data quality matters: A case study on data label correctness for security bug report prediction. IEEE Transactions on Software Engineering, 48(7): 2541-2556. https://doi.org/10.1109/TSE.2021.3063727
[29] Li, Z., Zhang, Y., Wang, T. (2023). Reliability analysis and optimization of power systems using advanced computational techniques. Proceedings IEEE Tencon Conference, 986-5227. https://doi.org/10.1109/TENCON.2023.9865227
[30] Akhmetova, Z., Popov, P., Asaubaev, A., Baisultan, A. (2023). Organization and automation of incident investigation and response processes using SIEM. Vestnik Almatinskogo Inst Ènergetiki I Svyazi, 63(4): 46. https://doi.org/10.51775/2790-0886_2023_63_4_46
[31] Saidi-Mehrabad, M., Atashfeshan, N., Razavi, H. (2023). Reliability optimization model in man-machine systems considering human factors in uncertain situations. Quality and Reliability Engineering International, 39(7): 3140-3156. https://doi.org/10.1002/qre.3422
[32] Baskaran, S. (2023). A quantitative assessment of the impact of automated incident response on cloud services availability. International Journal of Scientific Research and Management, 11(8): EC03. https://doi.org/10.18535/ijsrm/v11i08.ec03
[33] Nishad, D.K.N., Verma, V.R., Khalid, S., Singh, V.K.S. (2024). Enhanced security in wireless sensor networks using artificial intelligence. Research Square. https://doi.org/10.21203/rs.3.rs-5032504/v1
[34] Mohsin, A.S., Muyeed, M.A. (2024). IoT based smart emergency response system (SERS) for monitoring vehicle, home and health status. Discover Internet of Things, 4(1): 1-21. https://doi.org/10.21203/rs.3.rs-4613881/v1
[35] Uddin, M.N., Nyeem, H. (2024). Engineering a multi-sensor surveillance system with secure alerting for next-generation threat detection and response. Results in Engineering, 22: 101984. https://doi.org/10.1016/j.rineng.2024.101984
[36] Tiwari, S., Subraya, K.S. (2022). Exploring regression-based approach for sound event detection in noisy environments. International Journal of Advanced Computer Science and Applications, 13(7): 518-524. https://doi.org/10.14569/IJACSA.2022.01307102
[37] Levman, J. (2011). A statistical significance simulation study for the general scientist. arXiv preprint, arXiv:1109.6565. https://arxiv.org/abs/1109.6565
[38] Ryu, M., Kim, J.W., Oh, M., Lee, S., Park, H. (2025). Noise-agnostic multitask whisper training for reducing false alarm errors in call-for-help detection. arXiv preprint, arXiv:2501.11631. https://doi.org/10.48550/arxiv.2501.11631
[39] Zaman, K., Li, K., Sah, M., Direkoglu, C., Okada, S., Unoki, M. (2025). Transformers and audio detection tasks: An overview. Digital Signal Processing, 158: 104956. https://doi.org/10.1016/j.dsp.2024.104956
[40] Zouaoui, R., Audigier, R., Ambellouis, S., Capman, F., Benhadda, H., Joudrier, S., Sodoyer, D., Lamarque, T. (2015). Embedded security system for multi-modal surveillance in a railway carriage. In Optics and Photonics for Counterterrorism, Crime Fighting, and Defence XI; and Optical Materials and Biomaterials in Security and Defence Systems Technology XII, pp. 75-90. https://doi.org/10.1117/12.2194262