Publications
dSalmon: High-Speed Anomaly Detection for Evolving Multivariate Data Streams
Alexander Hartl, Félix Iglesias, Tanja Zseby
In: 16th EAI International Conference on Performance Evaluation Methodologies and Tools. Springer, 2023.
We introduce dSalmon, a highly efficient framework for outlier detection on streaming data. dSalmon can be used with both Python and C++, meeting the requirements of modern data science research. It provides an intuitive interface and has almost no package dependencies. dSalmon implements main stream outlier detection approaches from literature. By using pure C++ in its core and making the most of available parallelism, data is analyzed with superior processing speed.
We describe design decisions and outline the software architecture of dSalmon. Additionally, we perform thorough evaluations on benchmarking datasets to measure execution time, memory requirements and energy consumption when performing outlier detection. Experiments show that dSalmon requires substantially less resources and in most cases is able to process datasets between one and three orders of magnitude faster than established Python implementations.
Anomaly detection in streaming data: A comparison and evaluation study
Félix Iglesias Vázquez, Alexander Hartl, Tanja Zseby, Arthur Zimek
In: Expert Systems with Applications. Elsevier Ltd., 2023.
The detection of anomalies in streaming data faces complexities that make traditional static methods unsuitable due to computational costs and nonstationarity. We test and evaluate eight state of the art algorithms against prominent challenges related to streaming data. Results show insights regarding accuracy, memory-dependency, parameterization, and pre-knowledge exploitation, thus revealing the high impact of some data characteristics to establish a most appropriate algorithm—namely: locality (i.e., whether outlierness is relative to local contexts), relativeness (i.e., if past data defines outlierness), and concept drift (if it is expected, its intensity and frequency). In most applied cases, such factors can be inferred in advance through the use of historical data and domain knowledge. Assuming the viability of the studied methods in terms of time efficiency, this work discloses key findings to achieve optimal designs of streaming data anomaly detection in real-life applications.
SDOoop: Capturing Periodical Patterns and Out-of-phase Anomalies in Streaming Data Analysis
Alexander Hartl, Félix Iglesias Vázquez, Tanja Zseby
arXiv Preprint arXiv:2409.02973, 2024.
Streaming data analysis is increasingly required in applications, e.g., IoT, cybersecurity, robotics, mechatronics or cyber-physical systems. Despite its relevance, it is still an emerging field with open challenges. SDO is a recent anomaly detection method designed to meet requirements of speed, interpretability and intuitive parameterization. In this work, we present SDOoop, which extends the capabilities of SDO's streaming version to retain temporal information of data structures. SDOoop spots contextual anomalies undetectable by traditional algorithms, while enabling the inspection of data geometries, clusters and temporal patterns. We used SDOoop to model real network communications in critical infrastructures and extract patterns that disclose their dynamics. Moreover, we evaluated SDOoop with data from intrusion detection and natural science domains and obtained performances equivalent or superior to state-of-the-art approaches. Our results show the high potential of new model-based methods to analyze and explain streaming data. Since SDOoop operates with constant per-sample space and time complexity, it is ideal for big data, being able to instantly process large volumes of information. SDOoop conforms to next-generation machine learning, which, in addition to accuracy and speed, is expected to provide highly interpretable and informative models.
Anomaly Detection for Network Security based on Streaming Data
Alexander Hartl
Dissertation, TU Wien, 2023.
Identifying attacks in network traffic constitutes a promising application area of Machine Learning (ML) and data mining techniques. While in related work many traditional ML techniques are presented with impressive detection performance under laboratory conditions, they show severe shortcomings and performance drops when implemented in real life. This can be explained when considering several challenges that data scientists in this area have to face. In particular,
(a) traditional static models cannot cope with dynamics of network data,
(b) model predictions often lack explainability, impeding successful deployability in practice,
(c) systems that aim at detecting network attacks are faced with a highly adversarial environment, and
(d) detectors developed in the past frequently relied on information that is not available for encrypted traffic.
In this thesis, we address these challenges by developing novel methods for network traffic analysis and attack detection.
In particular, we investigate techniques appropriate for dealing with concept drift in the context of network traffic that allow continuous training throughout usage. We analyze algorithms suited for streaming anomaly detection, which are thus able to adjust to evolving characteristics of observed traffic, and present a new algorithm suited specifically for the high-speed requirements in data network environments. We propose and evaluate the use of visualization techniques for explainable ML in the field of network traffic analysis, which are applicable even when deploying opaque recurrent deep learning techniques, and we develop novel techniques for analyzing encrypted traffic.
The methods and approaches we outline in this thesis are highly relevant for network traffic analysis in high-security infrastructures due to the very specific combination of challenges in this field. However, there is a variety of other fields and application areas in data science to which our methods can be applied. With this thesis, we introduce new directions for future research, and
we outline methods and algorithms to address the challenges that analysis of network traffic yields in modern times.
Separating Flows in Encrypted Tunnel Traffic
Alexander Hartl, Joachim Fabini, Tanja Zseby
In: 21st IEEE International Conference on Machine Learning and
Applications. IEEE, 2022.
In many scenarios like wireless Internet access or encrypted VPN tunnels, encryption is performed on a per-packet basis. While this encryption approach effectively protects the confidentiality of the transmitted payload, it leaves traffic patterns involving inter-arrival times and packet lengths observable, e.g., to eavesdroppers on the air interface. It is a widespread belief that by only observing interleaved packets of different parallel flows, analysis and classification of the corresponding traffic by an eavesdropper is very difficult or close to impossible.
In this paper, we show that it is indeed possible to separate packets belonging to different flows purely from patterns observed in the interleaved packet sequence. We devise a novel deep recurrent neural network architecture that allows us to detect individual anomalous packets in a flow. Based on this anomaly detector, we develop an algorithm to find a separation into flows that minimizes the anomaly score indicated by our model. Our experimental results obtained with synthetically crafted flows and real-world network traces indicate that our approach is indeed able to separate flows successfully with high accuracy.
Being able to recover a flow's packet sequence from multiple interleaved flows, we show with this paper that the common packetlevel encryption might be insufficient in scenarios where high levels of privacy have to be achieved. On the defender's side, our approach constitutes a valuable tool in encrypted traffic analysis, but also contributes a novel neural network architecture in the field of network intrusion detection in general.
SecTULab: A Moodle-Integrated Secure Remote Access Architecture for Cyber Security Laboratories
Joachim Fabini, Alexander Hartl, Fares Meghdouri, Claudia Breitenfellner, Tanja Zseby
In: The 16th International Conference on Availability,
Reliability and Security. ACM, 2021.
The Covid-19 crisis has challenged cyber security teaching by creating the need for secure remote access to existing cyber security laboratory infrastructure. In this paper, we present requirements, architecture and key functionalities of a secure remote laboratory access solution that has been instantiated successfully for two existing laboratories at TU Wien. The proposed design prioritizes security and privacy aspects while integrating with existing Moodle eLearning platforms to leverage available authentication and group collaboration features. Performance evaluations of the prototype implementation for real cyber security classes support a first estimate of dimensioning and resources that must be provisioned when implementing the proposed secure remote laboratory access.
Subverting Counter Mode Encryption for Hidden Communication in High-Security Infrastructures
Alexander Hartl, Joachim Fabini, Christoph Roschger, Peter Eder-Neuhauser, Marco Petrovic, Tanja Zseby
In: The 16th
International Conference on Availability, Reliability and Security. ACM,
2021.
In highly security-critical network environments, it is a popular design decision to offload cryptographic tasks like encryption or signature generation to a dedicated trusted module or key server with paramount security features, we in this paper refer to with the general term Cryptographic Key Management Device (CKMD). While this network design yields several benefits, we demonstrate that the use of popular counter mode encryption modes like CTR or GCM can show substantial shortcomings in terms of security when used in conjunction with this network design. In particular, we show how the use of authenticated encryption using GCM enables the possibility of establishing a subliminal channel by exploiting the authentication information within messages. We show how decoding of hidden information can proceed in addition to decryption of overt information without raising authentication failures.
With an exemplary but typical infrastructure, we show how the subliminal channel might be exploited and discuss approaches to mitigating the threat by preventing the ability to embed hidden information. In contrast to previous work, we conclude that, when using an infrastructure involving a CKMD and GCM is deployed, the use of random, CKMD-generated Initialization Vectors (IVs) is beneficial to avoid the subliminal channel described in this paper. However, the most potent remedy is deploying a different operational mode like GCM-SIV.
Explainability and Adversarial Robustness for RNNs
Alexander Hartl, Maximilian Bachl, Joachim Fabini, Tanja Zseby
In: The Sixth IEEE International Conference on Big
Data Computing Service and Machine Learning Applications. IEEE, 2020.
Recurrent Neural Networks (RNNs) yield attractive properties for constructing Intrusion Detection Systems (IDSs) for network data. With the rise of ubiquitous Machine Learning (ML) systems, malicious actors have been catching up quickly to find new ways to exploit ML vulnerabilities for profit. Recently developed adversarial ML techniques focus on computer vision and their applicability to network traffic is not straightforward: Network packets expose fewer features than an image, are sequential and impose several constraints on their features.
We show that despite these completely different characteristics, adversarial samples can be generated reliably for RNNs. To understand a classifier's potential for misclassification, we extend existing explainability techniques and propose new ones, suitable particularly for sequential data. Applying them shows that already the first packets of a communication flow are of crucial importance and are likely to be targeted by attackers. Feature importance methods show that even relatively unimportant features can be effectively abused to generate adversarial samples. We thus introduce the concept of feature sensitivity which quantifies how much potential a feature has to cause misclassification.
Since traditional evaluation metrics such as accuracy are not sufficient for quantifying the adversarial threat, we propose the Adversarial Robustness Score (ARS) for comparing IDSs and show that an adversarial training procedure can significantly and successfully reduce the attack surface.
SDOstream: Low-Density Models for Streaming Outlier Detection
Alexander Hartl, Félix Iglesias, Tanja Zseby
In: 28th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN). 2020.
Data commonly changes over time. Algorithms for anomaly detection must therefore be adapted to overcome the challenges of evolving data. We present SDOstream, a distance-based outlier detection algorithm for stream data that uses low-density models, therefore operating in linear time and avoiding the limitations of sliding windows and instance-based methods.
SDOstream is designed to ensure a good integration in applications, hence the definition of “outlier” is not predetermined, but can be decided by the application based on distances to representative point locations. We describe the algorithm and evaluate algorithm performance with several datasets.
Walling up Backdoors in Intrusion Detection Systems
Maximilian Bachl, Alexander Hartl, Joachim Fabini, Tanja Zseby
In: 3rd Workshop on Big Data, Machine Learning and
Artificial Intelligence for Data Communication Networks. ACM, 2019.
Interest in poisoning attacks and backdoors recently resurfaced for Deep Learning (DL) applications. Several successful defense mechanisms have been recently proposed for Convolutional Neural Networks (CNNs), for example in the context of autonomous driving. We show that visualization approaches can aid in identifying a backdoor independent of the used classifier. Surprisingly, we find that common defense mechanisms fail utterly to remove backdoors in DL for Intrusion Detection Systems (IDSs). Finally, we devise pruning-based approaches to remove backdoors for Decision Trees (DTs) and Random Forests (RFs) and demonstrate their effectiveness for two different network security datasets.
BeaconBlocks: Augmenting Proof-of-Stake with On-Chain Time Synchronization
Alexander Hartl, Tanja Zseby, Joachim Fabini
In: 2019 IEEE International
Conference on Blockchain. IEEE, 2019.
Blockchain protocols based on Proof-of-Stake (PoS) algorithms aim to provide an alternative to the energy-consuming Proof-of-Work mining procedure. Following a PoS algorithm, nodes have to agree on the miner next eligible to contribute a block and on the point in time he is allowed to broadcast it. The latter requirement raises to the need for synchronous clocks. In this paper we describe BeaconBlocks, a new scheme for constructing PoS protocols. A major difference to former work is incorporating time synchronization as an essential element of the protocol itself, gaining independence of the nodes' clocks and allowing the protocol to resist attacks on clock synchronization infrastructure. To this end, we describe both a mechanism for obtaining the correct time during node startup and for retaining synchronicity of estimated time during a node's lifetime. In contrast to prior work, our approach for miner selection exhibits an interleaved unslotted structure. We show that fairness is achieved when miners follow our scheme and we provide a discussion of attack possibilities, allowing developers to choose secure parameters when adopting the scheme.
Are Network Attacks Outliers? A Study of Space Representations and Unsupervised Algorithms
Félix Iglesias, Alexander Hartl, Tanja Zseby, Arthur Zimek
In: Workshop on Machine Learning For Cybersecurity. 2020.
Among network analysts, "anomaly" and "outlier" are terms commonly associated to network attacks. Attacks are outliers (or anomalies) in the sense that they exploit communication protocols with novel infiltration techniques against which there are no defenses yet. But due to the dynamic and heterogeneous nature of network traffic, attacks may look like normal traffic variations. Also attackers try to make attacks indistinguishable from normal traffic. Then, are network attacks actual anomalies? This paper tries to answer this important question from analytical perspectives. To that end, we test the outlierness of attacks in a recent, complete dataset for evaluating Intrusion Detection by using five different feature vectors for network traffic representation and five different outlier ranking algorithms. In addition, we craft a new feature vector that maximizes the discrimination power of outlierness. Results show that attacks are significantly more outlier than legitimate traffic -specially in representations that profile network endpoints-, although attack and non-attack outlierness distributions strongly overlap. Given that network spaces are noisy and show density variations in non-attack spaces, algorithms that measure outlierness locally are less effective than algorithms that measure outlierness with global distance estimations. Our research confirms that unsupervised methods are suitable for attack detection, but also that they must be combined with methods that leverage pre-knowledge to prevent high false positive rates. Our findings expand the basis for using unsupervised methods in attack detection.
Subliminal Channels in High-Speed Signatures
Alexander Hartl, Robert Annessi, Tanja Zseby
In: Journal of Wireless Mobile Networks, Ubiquitous Computing, and Dependable Applications. 2018.
Subliminal channels in digital signatures can be used to secretly transmit information between two or more communication partners. If subliminal messages are embedded in standard signatures in network protocols, neither network operators nor legitimate receivers notice any suspicious activity. Subliminal channels already exist in older signatures, such as ElGamal and ECDSA. Nevertheless, in classical network protocols such signatures are used only sparsely, e.g., during authentication in the protocol setup. Therefore, the overall potential subliminal bandwidth and their usability as carrier for hidden messages or information leakage is limited. However, with the advent of high-speed signatures such as EdDSA and MQ-based signatures such as PFlash or MQQ-SIG, scenarios such as signed broadcast clock synchronization or signed sensor data export become feasible. In those scenarios large sequences of packets are each individually signed and then transferred over the network. This increases the available bandwidth for transmitting subliminal information significantly and makes subliminal channels usable for large scale data exfiltration or even the operation of command and control structures. In this paper, we show the existence of subliminal channels in recent high-speed signatures and discuss the implications of the ability to hide information in a multitude of packets in different example scenarios: broadcast clock synchronization, signed sensor data export, and classical TLS. In a previous paper we already presented subliminal channels in the EdDSA signature scheme. We here extend this work by investigating subliminal channels in MQ signatures. We present specific results for existing MQ signatures but also show that whole classes of MQ-based methods for constructing signature schemes are prone to the existence of subliminal channels. We then discuss the applicability of different countermeasures against subliminal channels but conclude that none of the existing solutions can sufficiently protect against data exfiltration in network protocols secured by EdDSA or MQ signatures.
A Subliminal Channel in EdDSA: Information Leakage with High-Speed Signatures
Alexander Hartl, Robert Annessi, Tanja Zseby
In: ACM International Workshop on
Managing Insider Security Threats. ACM, 2017.
Subliminal channels in digital signatures provide a very effective method to clandestinely leak information from inside a system to a third party outside. Information can be hidden in signature parameters in a way that both network operators and legitimate receivers would not notice any suspicious traces. Subliminal channels have previously been discovered in other signatures, such as ElGamal and ECDSA. Those signatures are usually just sparsely exchanged in network protocols, e.g. during authentication, and their usability for leaking information is therefore limited. With the advent of high-speed signatures such as EdDSA, however, scenarios become feasible where numerous packets with individual signatures are transferred between communicating parties. This significantly increases the bandwidth for transmitting subliminal information. Examples are broadcast clock synchronization or signed sensor data export. A subliminal channel in signatures appended to numerous packets allows the transmission of a high amount of hidden information, suitable for large scale data exfiltration or even the operation of command and control structures.
In this paper, we show the existence of a broadband subliminal channel in the EdDSA signature scheme. We then discuss the implications of the subliminal channel in practice using thee different scenarios: broadcast clock synchronization, signed sensor data export, and classic TLS. We perform several experiments to show the use of the subliminal channel and measure the actual bandwidth of the subliminal information that can be leaked. We then discuss the applicability of different countermeasures against subliminal channels from other signature schemes to EdDSA but conclude that none of the existing solutions can sufficiently protect against data exfiltration in network protocols secured by EdDSA.
Subliminal channels in high-speed signatures
Alexander Hartl
Master's thesis, TU Wien, 2018.
One of the fundamental building blocks for achieving security in data networks is the use of digital signatures. A digital signature is a bit string which allows the receiver of a message to ensure that the message indeed originated from the apparent sender and has not been altered along the path. In certain cases, however, the functioning of signature schemes allows an adversary to additionally utilize the signature string as a hidden information channel. These channels are termed subliminal channels and have been known and tolerated since the 80s. Due to the recent progress in the development of high-speed signature algorithms, however, application scenarios for digital signatures become feasible that lead to a large exploitable bit rate for data exfiltration, given that the deployed signature scheme allows the utilization as subliminal channel.
This thesis shows how certain high-speed signature schemes can be exploited to carry hidden information. In particular, we analyse the recent EdDSA signature scheme, which yields substantial future potential, as well as the class of Multivariate Quadratic (MQ) signature schemes. We discuss how an adversary can proceed to embed and recover subliminal information and what bit rate the adversary can achieve for transmitting hidden information. Scenarios like signed NTP broadcasts, signed sensor data transmissions and the TLS key exchange are depicted, where the existence of a subliminal channel gives rise to new attack possibilities threatening network security. To confirm these findings we discuss the results of performed experiments, which attest a considerable subliminal bandwidth to the analysed signature schemes.
Furthermore, we depict several methods for preventing the exploitation of subliminal channels in EdDSA, but we have to conclude that none of them is viable in a practical situation, reinforcing the threats that originate from the described subliminal channels.