Alexander Hartl

ML Researcher and IT Security Enthusiast · me@alexhartl.eu

I am AI engineer at PhonicScore and former PhD student at the institute of telecommunications at TU Wien. I am excited about various research topics that touch machine learning, communication networks and, in particular, network security. Besides the field of network traffic analysis, I have done academic work on network steganography, but also on time synchronization for blockchain consensus algorithms. In my PhD thesis, I evaluated streaming outlier detection methods and explainable AI in the context of network traffic analysis. I am proud to contribute to machine learning research, as the machine learning community recently proved to deliver outstanding results in various fields of application.


Skills

Python
100%
Linux System Administration
80%
C++ Development
80%
Machine Learning in Python
90%
Web Frontend Development
50%
LaTex
60%
ML Frameworks
  • tensorflow
  • scikit-learn
  • PyOD
  • PyTorch
  • pandas
Development Tools
  • Version control with git/svn
  • CI/CD on GitLab/GitHub
  • Docker
  • SQL
  • Elasticsearch, Kibana, Logstash
  • Redis
  • Google Compute Engine
ML Techniques
  • Anomaly Detection (PyOD, dSalmon)
  • Supervised data mining (DTs, SVMs, kNN, ...)
  • Deep Learning
  • GANs and VAEs
  • Explainable AI
  • Reinforcement Learning

Projects and Work

PhonicScore

I am currently employed at PhonicScore, the MusicTech Company from Vienna, where I make use of my machine learning skills.

At PhonicScore, I am involved in all steps of the training pipeline for the development of deep learning models in the field of Music Information Retrieval (MIR). My work targets at model deployment on mobile devices and makes heavy use of transfer learning to adapt cutting-edge research in the field to our needs. I also contribute in the application of project funding.

February 2023 - now

MALORI

FFG-funded project with several partners in industry and academia

IT security is a main concern with the increasing usage of communication networks in critical infrastructures. The MALware cOmmunication in cRitical Infrastructures (MALORI) project explored possibilities for hidden communication in communication networks related to e-charging and smart metering. With a consortium consisting of TU Vienna, the Austrian Institute of Technology (AIT), IKARUS Security Software GmbH, illwerke vkw AG, Wiener Netze GmbH, the University of Vienna and the Austrian Federal Ministry of the Interior, we were able to conduct application-oriented research and investigate the infrastructures from several perspectives.

My contributions to MALORI cover all phases of the project. Besides participating in vivid discussions and security testing of the investigated infrastructures, I had main contributions in academic work, maintenance of project-related IT infrastructure, and in administrative tasks. First author of one of the project's main publications Subverting Counter Mode Encryption for Hidden Communication in High-Security Infrastructures, I contributed to MALORI's success from an academic perspective.

For more information, see #SuccessStory: Smarte Energienetze vor IT-Angriffen schützen.

January 2020 - June 2022

secTULab

TU Wien

The Covid-19 pandemic required prompt adaptions of technology used in teaching to sustain TU Vienna's high standards in education. In this project funded by the .digital office of TU Vienna, we developed a novel tooling for accessing our cyber security laboratories from remote, focusing on strong information security for all participants and ease of use. We thus not only enabled distance learning during the Covid-19 pandemic, but also contributed to TU Vienna's constant striving for digitalization in the teaching process.

In the secTULab project, I was the main contributor in the initial development of the concept and implementation. I took over advisory and leading roles in later phases of the project and contributed in the resulting publication SecTULab: A Moodle-Integrated Secure Remote Access Architecture for Cyber Security Laboratories.

October 2020 - June 2021

Education

ML Research and Network Traffic Analysis

PhD program at TU Wien
September 2018 - September 2023

Telecommunications

Master program at TU Wien
October 2015 - June 2018

Electrical Engineering

Bachelor program at TU Wien
  • Graduation with distinction
  • Thesis: LASSO based Graphical Model Selection for High-Dimensional Time Series
October 2011 - August 2015

Teaching

Communication Networks

Communication networks become increasingly important in today's technological and societal landscape. During my deployment as university assistant at TU Vienna I enjoyed supervising a laboratory that allows students to practice their theoretical skills on packet-switched networks. Besides being a main contact person for theoretical questions, I maintained the lab's technical infrastructure and contributed in conducting exams. I also contributed in conducting exams in a bachelor course on data communication.

Teaching activities:

Telecommunications

I enjoyed the study of electrical engineering as a combination of strongly mathematical, yet highly application-oriented concepts. As such, I tutored in classes on signal and systems, supervising exercises on the basic mathematic concepts of modern signal processing, and on digital communications, supervising exercises on channel coding techniques.

Teaching activities:


Publications

dSalmon: High-Speed Anomaly Detection for Evolving Multivariate Data Streams

Alexander Hartl, Félix Iglesias, Tanja Zseby
In: 16th EAI International Conference on Performance Evaluation Methodologies and Tools. Springer, 2023.

We introduce dSalmon, a highly efficient framework for outlier detection on streaming data. dSalmon can be used with both Python and C++, meeting the requirements of modern data science research. It provides an intuitive interface and has almost no package dependencies. dSalmon implements main stream outlier detection approaches from literature. By using pure C++ in its core and making the most of available parallelism, data is analyzed with superior processing speed.

We describe design decisions and outline the software architecture of dSalmon. Additionally, we perform thorough evaluations on benchmarking datasets to measure execution time, memory requirements and energy consumption when performing outlier detection. Experiments show that dSalmon requires substantially less resources and in most cases is able to process datasets between one and three orders of magnitude faster than established Python implementations.

Anomaly detection in streaming data: A comparison and evaluation study

Félix Iglesias Vázquez, Alexander Hartl, Tanja Zseby, Arthur Zimek
In: Expert Systems with Applications. Elsevier Ltd., 2023.

The detection of anomalies in streaming data faces complexities that make traditional static methods unsuitable due to computational costs and nonstationarity. We test and evaluate eight state of the art algorithms against prominent challenges related to streaming data. Results show insights regarding accuracy, memory-dependency, parameterization, and pre-knowledge exploitation, thus revealing the high impact of some data characteristics to establish a most appropriate algorithm—namely: locality (i.e., whether outlierness is relative to local contexts), relativeness (i.e., if past data defines outlierness), and concept drift (if it is expected, its intensity and frequency). In most applied cases, such factors can be inferred in advance through the use of historical data and domain knowledge. Assuming the viability of the studied methods in terms of time efficiency, this work discloses key findings to achieve optimal designs of streaming data anomaly detection in real-life applications.

Anomaly Detection for Network Security based on Streaming Data

Alexander Hartl
Dissertation, TU Wien, 2023.

Identifying attacks in network traffic constitutes a promising application area of Machine Learning (ML) and data mining techniques. While in related work many traditional ML techniques are presented with impressive detection performance under laboratory conditions, they show severe shortcomings and performance drops when implemented in real life. This can be explained when considering several challenges that data scientists in this area have to face. In particular, (a) traditional static models cannot cope with dynamics of network data, (b) model predictions often lack explainability, impeding successful deployability in practice, (c) systems that aim at detecting network attacks are faced with a highly adversarial environment, and (d) detectors developed in the past frequently relied on information that is not available for encrypted traffic. In this thesis, we address these challenges by developing novel methods for network traffic analysis and attack detection.

In particular, we investigate techniques appropriate for dealing with concept drift in the context of network traffic that allow continuous training throughout usage. We analyze algorithms suited for streaming anomaly detection, which are thus able to adjust to evolving characteristics of observed traffic, and present a new algorithm suited specifically for the high-speed requirements in data network environments. We propose and evaluate the use of visualization techniques for explainable ML in the field of network traffic analysis, which are applicable even when deploying opaque recurrent deep learning techniques, and we develop novel techniques for analyzing encrypted traffic.

The methods and approaches we outline in this thesis are highly relevant for network traffic analysis in high-security infrastructures due to the very specific combination of challenges in this field. However, there is a variety of other fields and application areas in data science to which our methods can be applied. With this thesis, we introduce new directions for future research, and we outline methods and algorithms to address the challenges that analysis of network traffic yields in modern times.

Separating Flows in Encrypted Tunnel Traffic

Alexander Hartl, Joachim Fabini, Tanja Zseby
In: 21st IEEE International Conference on Machine Learning and Applications. IEEE, 2022.

In many scenarios like wireless Internet access or encrypted VPN tunnels, encryption is performed on a per-packet basis. While this encryption approach effectively protects the confidentiality of the transmitted payload, it leaves traffic patterns involving inter-arrival times and packet lengths observable, e.g., to eavesdroppers on the air interface. It is a widespread belief that by only observing interleaved packets of different parallel flows, analysis and classification of the corresponding traffic by an eavesdropper is very difficult or close to impossible.

In this paper, we show that it is indeed possible to separate packets belonging to different flows purely from patterns observed in the interleaved packet sequence. We devise a novel deep recurrent neural network architecture that allows us to detect individual anomalous packets in a flow. Based on this anomaly detector, we develop an algorithm to find a separation into flows that minimizes the anomaly score indicated by our model. Our experimental results obtained with synthetically crafted flows and real-world network traces indicate that our approach is indeed able to separate flows successfully with high accuracy.

Being able to recover a flow's packet sequence from multiple interleaved flows, we show with this paper that the common packetlevel encryption might be insufficient in scenarios where high levels of privacy have to be achieved. On the defender's side, our approach constitutes a valuable tool in encrypted traffic analysis, but also contributes a novel neural network architecture in the field of network intrusion detection in general.

SecTULab: A Moodle-Integrated Secure Remote Access Architecture for Cyber Security Laboratories

Joachim Fabini, Alexander Hartl, Fares Meghdouri, Claudia Breitenfellner, Tanja Zseby
In: The 16th International Conference on Availability, Reliability and Security. ACM, 2021.

The Covid-19 crisis has challenged cyber security teaching by creating the need for secure remote access to existing cyber security laboratory infrastructure. In this paper, we present requirements, architecture and key functionalities of a secure remote laboratory access solution that has been instantiated successfully for two existing laboratories at TU Wien. The proposed design prioritizes security and privacy aspects while integrating with existing Moodle eLearning platforms to leverage available authentication and group collaboration features. Performance evaluations of the prototype implementation for real cyber security classes support a first estimate of dimensioning and resources that must be provisioned when implementing the proposed secure remote laboratory access.

Subverting Counter Mode Encryption for Hidden Communication in High-Security Infrastructures

Alexander Hartl, Joachim Fabini, Christoph Roschger, Peter Eder-Neuhauser, Marco Petrovic, Tanja Zseby
In: The 16th International Conference on Availability, Reliability and Security. ACM, 2021.

In highly security-critical network environments, it is a popular design decision to offload cryptographic tasks like encryption or signature generation to a dedicated trusted module or key server with paramount security features, we in this paper refer to with the general term Cryptographic Key Management Device (CKMD). While this network design yields several benefits, we demonstrate that the use of popular counter mode encryption modes like CTR or GCM can show substantial shortcomings in terms of security when used in conjunction with this network design. In particular, we show how the use of authenticated encryption using GCM enables the possibility of establishing a subliminal channel by exploiting the authentication information within messages. We show how decoding of hidden information can proceed in addition to decryption of overt information without raising authentication failures.

With an exemplary but typical infrastructure, we show how the subliminal channel might be exploited and discuss approaches to mitigating the threat by preventing the ability to embed hidden information. In contrast to previous work, we conclude that, when using an infrastructure involving a CKMD and GCM is deployed, the use of random, CKMD-generated Initialization Vectors (IVs) is beneficial to avoid the subliminal channel described in this paper. However, the most potent remedy is deploying a different operational mode like GCM-SIV.

Explainability and Adversarial Robustness for RNNs

Alexander Hartl, Maximilian Bachl, Joachim Fabini, Tanja Zseby
In: The Sixth IEEE International Conference on Big Data Computing Service and Machine Learning Applications. IEEE, 2020.

Recurrent Neural Networks (RNNs) yield attractive properties for constructing Intrusion Detection Systems (IDSs) for network data. With the rise of ubiquitous Machine Learning (ML) systems, malicious actors have been catching up quickly to find new ways to exploit ML vulnerabilities for profit. Recently developed adversarial ML techniques focus on computer vision and their applicability to network traffic is not straightforward: Network packets expose fewer features than an image, are sequential and impose several constraints on their features.

We show that despite these completely different characteristics, adversarial samples can be generated reliably for RNNs. To understand a classifier's potential for misclassification, we extend existing explainability techniques and propose new ones, suitable particularly for sequential data. Applying them shows that already the first packets of a communication flow are of crucial importance and are likely to be targeted by attackers. Feature importance methods show that even relatively unimportant features can be effectively abused to generate adversarial samples. We thus introduce the concept of feature sensitivity which quantifies how much potential a feature has to cause misclassification.

Since traditional evaluation metrics such as accuracy are not sufficient for quantifying the adversarial threat, we propose the Adversarial Robustness Score (ARS) for comparing IDSs and show that an adversarial training procedure can significantly and successfully reduce the attack surface.

SDOstream: Low-Density Models for Streaming Outlier Detection

Alexander Hartl, Félix Iglesias, Tanja Zseby
In: 28th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN). 2020.

Data commonly changes over time. Algorithms for anomaly detection must therefore be adapted to overcome the challenges of evolving data. We present SDOstream, a distance-based outlier detection algorithm for stream data that uses low-density models, therefore operating in linear time and avoiding the limitations of sliding windows and instance-based methods.

SDOstream is designed to ensure a good integration in applications, hence the definition of “outlier” is not predetermined, but can be decided by the application based on distances to representative point locations. We describe the algorithm and evaluate algorithm performance with several datasets.

Walling up Backdoors in Intrusion Detection Systems

Maximilian Bachl, Alexander Hartl, Joachim Fabini, Tanja Zseby
In: 3rd Workshop on Big Data, Machine Learning and Artificial Intelligence for Data Communication Networks. ACM, 2019.

Interest in poisoning attacks and backdoors recently resurfaced for Deep Learning (DL) applications. Several successful defense mechanisms have been recently proposed for Convolutional Neural Networks (CNNs), for example in the context of autonomous driving. We show that visualization approaches can aid in identifying a backdoor independent of the used classifier. Surprisingly, we find that common defense mechanisms fail utterly to remove backdoors in DL for Intrusion Detection Systems (IDSs). Finally, we devise pruning-based approaches to remove backdoors for Decision Trees (DTs) and Random Forests (RFs) and demonstrate their effectiveness for two different network security datasets.

BeaconBlocks: Augmenting Proof-of-Stake with On-Chain Time Synchronization

Alexander Hartl, Tanja Zseby, Joachim Fabini
In: 2019 IEEE International Conference on Blockchain. IEEE, 2019.

Blockchain protocols based on Proof-of-Stake (PoS) algorithms aim to provide an alternative to the energy-consuming Proof-of-Work mining procedure. Following a PoS algorithm, nodes have to agree on the miner next eligible to contribute a block and on the point in time he is allowed to broadcast it. The latter requirement raises to the need for synchronous clocks. In this paper we describe BeaconBlocks, a new scheme for constructing PoS protocols. A major difference to former work is incorporating time synchronization as an essential element of the protocol itself, gaining independence of the nodes' clocks and allowing the protocol to resist attacks on clock synchronization infrastructure. To this end, we describe both a mechanism for obtaining the correct time during node startup and for retaining synchronicity of estimated time during a node's lifetime. In contrast to prior work, our approach for miner selection exhibits an interleaved unslotted structure. We show that fairness is achieved when miners follow our scheme and we provide a discussion of attack possibilities, allowing developers to choose secure parameters when adopting the scheme.

Are Network Attacks Outliers? A Study of Space Representations and Unsupervised Algorithms

Félix Iglesias, Alexander Hartl, Tanja Zseby, Arthur Zimek
In: Workshop on Machine Learning For Cybersecurity. 2020.

Among network analysts, "anomaly" and "outlier" are terms commonly associated to network attacks. Attacks are outliers (or anomalies) in the sense that they exploit communication protocols with novel infiltration techniques against which there are no defenses yet. But due to the dynamic and heterogeneous nature of network traffic, attacks may look like normal traffic variations. Also attackers try to make attacks indistinguishable from normal traffic. Then, are network attacks actual anomalies? This paper tries to answer this important question from analytical perspectives. To that end, we test the outlierness of attacks in a recent, complete dataset for evaluating Intrusion Detection by using five different feature vectors for network traffic representation and five different outlier ranking algorithms. In addition, we craft a new feature vector that maximizes the discrimination power of outlierness. Results show that attacks are significantly more outlier than legitimate traffic -specially in representations that profile network endpoints-, although attack and non-attack outlierness distributions strongly overlap. Given that network spaces are noisy and show density variations in non-attack spaces, algorithms that measure outlierness locally are less effective than algorithms that measure outlierness with global distance estimations. Our research confirms that unsupervised methods are suitable for attack detection, but also that they must be combined with methods that leverage pre-knowledge to prevent high false positive rates. Our findings expand the basis for using unsupervised methods in attack detection.

Subliminal Channels in High-Speed Signatures

Alexander Hartl, Robert Annessi, Tanja Zseby
In: Journal of Wireless Mobile Networks, Ubiquitous Computing, and Dependable Applications. 2018.

Subliminal channels in digital signatures can be used to secretly transmit information between two or more communication partners. If subliminal messages are embedded in standard signatures in network protocols, neither network operators nor legitimate receivers notice any suspicious activity. Subliminal channels already exist in older signatures, such as ElGamal and ECDSA. Nevertheless, in classical network protocols such signatures are used only sparsely, e.g., during authentication in the protocol setup. Therefore, the overall potential subliminal bandwidth and their usability as carrier for hidden messages or information leakage is limited. However, with the advent of high-speed signatures such as EdDSA and MQ-based signatures such as PFlash or MQQ-SIG, scenarios such as signed broadcast clock synchronization or signed sensor data export become feasible. In those scenarios large sequences of packets are each individually signed and then transferred over the network. This increases the available bandwidth for transmitting subliminal information significantly and makes subliminal channels usable for large scale data exfiltration or even the operation of command and control structures. In this paper, we show the existence of subliminal channels in recent high-speed signatures and discuss the implications of the ability to hide information in a multitude of packets in different example scenarios: broadcast clock synchronization, signed sensor data export, and classical TLS. In a previous paper we already presented subliminal channels in the EdDSA signature scheme. We here extend this work by investigating subliminal channels in MQ signatures. We present specific results for existing MQ signatures but also show that whole classes of MQ-based methods for constructing signature schemes are prone to the existence of subliminal channels. We then discuss the applicability of different countermeasures against subliminal channels but conclude that none of the existing solutions can sufficiently protect against data exfiltration in network protocols secured by EdDSA or MQ signatures.

A Subliminal Channel in EdDSA: Information Leakage with High-Speed Signatures

Alexander Hartl, Robert Annessi, Tanja Zseby
In: ACM International Workshop on Managing Insider Security Threats. ACM, 2017.

Subliminal channels in digital signatures provide a very effective method to clandestinely leak information from inside a system to a third party outside. Information can be hidden in signature parameters in a way that both network operators and legitimate receivers would not notice any suspicious traces. Subliminal channels have previously been discovered in other signatures, such as ElGamal and ECDSA. Those signatures are usually just sparsely exchanged in network protocols, e.g. during authentication, and their usability for leaking information is therefore limited. With the advent of high-speed signatures such as EdDSA, however, scenarios become feasible where numerous packets with individual signatures are transferred between communicating parties. This significantly increases the bandwidth for transmitting subliminal information. Examples are broadcast clock synchronization or signed sensor data export. A subliminal channel in signatures appended to numerous packets allows the transmission of a high amount of hidden information, suitable for large scale data exfiltration or even the operation of command and control structures.

In this paper, we show the existence of a broadband subliminal channel in the EdDSA signature scheme. We then discuss the implications of the subliminal channel in practice using thee different scenarios: broadcast clock synchronization, signed sensor data export, and classic TLS. We perform several experiments to show the use of the subliminal channel and measure the actual bandwidth of the subliminal information that can be leaked. We then discuss the applicability of different countermeasures against subliminal channels from other signature schemes to EdDSA but conclude that none of the existing solutions can sufficiently protect against data exfiltration in network protocols secured by EdDSA.

Subliminal channels in high-speed signatures

Alexander Hartl
Master's thesis, TU Wien, 2018.

One of the fundamental building blocks for achieving security in data networks is the use of digital signatures. A digital signature is a bit string which allows the receiver of a message to ensure that the message indeed originated from the apparent sender and has not been altered along the path. In certain cases, however, the functioning of signature schemes allows an adversary to additionally utilize the signature string as a hidden information channel. These channels are termed subliminal channels and have been known and tolerated since the 80s. Due to the recent progress in the development of high-speed signature algorithms, however, application scenarios for digital signatures become feasible that lead to a large exploitable bit rate for data exfiltration, given that the deployed signature scheme allows the utilization as subliminal channel.

This thesis shows how certain high-speed signature schemes can be exploited to carry hidden information. In particular, we analyse the recent EdDSA signature scheme, which yields substantial future potential, as well as the class of Multivariate Quadratic (MQ) signature schemes. We discuss how an adversary can proceed to embed and recover subliminal information and what bit rate the adversary can achieve for transmitting hidden information. Scenarios like signed NTP broadcasts, signed sensor data transmissions and the TLS key exchange are depicted, where the existence of a subliminal channel gives rise to new attack possibilities threatening network security. To confirm these findings we discuss the results of performed experiments, which attest a considerable subliminal bandwidth to the analysed signature schemes.

Furthermore, we depict several methods for preventing the exploitation of subliminal channels in EdDSA, but we have to conclude that none of them is viable in a practical situation, reinforcing the threats that originate from the described subliminal channels.