\section{Introduction} The security posture of the Internet increased significantly throughout the last decade. For example, the cleaned-up and formally verified TLS 1.3 protocol that underpins HTTPS has been rolled-out gradually~\cite{tls-timeline}, the certificates that specify which public keys to use when bootstrapping a secure connection can be obtained for free and automatically~\cite{le}, and web browsers have shifted from positive to negative security indicators in favor of security-by-default~\cite{browser-ui}. The use of end-to-end encryption has further become the norm with services such as DNS-over-HTTPS~\cite{rfc8484}, virtual private networks~\cite{wireguard}, Tor~\cite{tor}, and secure messaging~\cite{mls} gaining traction. In other words, the era of attackers that can passively snoop and actively tamper with unencrypted~network~traffic~is~over. What will remain the same is the incentive for attackers to snoop and tamper with network traffic. Therefore, the focus is (and will likely continue to be) on circumventing protocols that add security and privacy as they are deployed in the real world. For example, there is a long history of certificate mis-issuance that allows attackers to impersonate websites and thus insert themselves as machines-in-the-middle (``MitM'') without actually breaking TLS~\cite{sok-https,sslmate-history}. Or, in the case of encrypted channels that are hard to intercept, instead analyzing traffic patterns to infer user activity like which website is being visited~\cite{cheng98,herrmann09,hintz02,liberatore06,panchenko11,sun02}. The bad news is that attackers only need to find one vulnerability in a deployed protocol or its software. Sometimes, such vulnerabilities can be purchased by zero-day brokers like Zerodium~\cite{zerodium}. To address an attack vector, it is common to add countermeasures that frustrate attackers and/or increase the risk involved while trying to exploit a system. A good example is how the certificate authority ecosystem evolved. For background, certificate authorities are trusted parties that validate domain names before issuing certificates that list their public keys. Web browsers are shipped with hundreds of trusted certificate authorities, which means that the resulting TLS connections cannot be more secure than the difficulty of hijacking the weakest-link certificate authority~\cite{sok-https}. A proposal eventually deployed to mitigate this issue is Certificate Transparency: an ecosystem of public append-only logs that publishes all certificates so that any mis-issuance can be \emph{detected} by monitors~\cite {ct,rfc6962}. These logs have a cryptographic foundation that holds them and the issuing certificate authorities \emph{accountable}, at least in theory. In practice, the logs are essentially trusted parties that must act honestly due to how web browsers shape their policies to respect user privacy~\cite{apple-log-policy,google-log-policy,sok-sct-auditing,ct-history}. The first objective of this thesis is to better understand the current limits of Certificate Transparency by proposing and evaluating improvements which \emph{reduce} the amount of trust that needs to be placed in third-party monitors and logs. We make a dent in the problem of Certificate Transparency verification both generally and concretely in the context of Tor Browser, which unlike Google Chrome and Apple's Safari does not support Certificate Transparency yet. For context, Tor Browser is a fork of Mozilla's Firefox that (among other things) routes user traffic through the low-latency anonymity network Tor~\cite{tor,tb}. As part of our pursuit to improve the status quo for Certificate Transparency verification in Tor Browser, the second objective of this thesis is to evaluate how the protocols used during website visits affect unlinkability between senders (web browsers) and receivers (websites). Our evaluation applies to our addition of Certificate Transparency and other protocols already in use, e.g., DNS, real-time bidding~\cite{rtb}, and certificate revocation checking~\cite{ocsp}. The remainder of the introductory summary is structured as follows. Section~\ref{sec:background} introduces background that will help the reader understand the context and preliminaries of the appended papers. Section~\ref{sec:rqs} defines our research questions and overall objective. Section~\ref{sec:methods} provides an overview of our research methods. Section~\ref{sec:contribs} describes our contributions succinctly. Section~\ref{sec:appended} summarizes the appended papers that are published in NordSec (Paper~\ref{paper:lwm}), SECURWARE (Paper~\ref{paper:ctga}), PETS (Paper~\ref{paper:ctor} and~\ref{paper:cat}), WPES (Paper~\ref{paper:sauteed}), and USENIX Security (Paper~\ref{paper:tlwo}). Section~\ref{sec:related} positions our contributions with regard to related work. Section~\ref{sec:concl} concludes and briefly discusses future work. \section{Background} \label{sec:background} This section introduces background on Certificate Transparency and Tor. \subsection{Certificate Transparency} The web's public-key infrastructure depends on certificate authorities to issue certificates that map domain names to public keys. For example, the certificate of \texttt{www.example.com} is issued by DigiCert and lists a 2048-bit RSA key~\cite{crt:www.example.com}. The fact that DigiCert signed this certificate means that they claim to have verified that the requesting party is really \texttt{www.example.com}, typically by first ensuring that a specified DNS record can be uploaded on request~\cite{ca/b}. If all certificate authorities performed these checks correctly and the checks themselves were fool-proof, a user's browser could be sure that any certificate signed by a certificate authority would list a verified public key that can be used for authentication when connecting to a website via TLS. Unfortunately, there are hundreds of trusted certificate authorities and a long history of issues surrounding their operations in practice~\cite{bambo-cas,sok-https,sslmate-history}. One of the most famous incidents took place in 2011: an attacker managed to mis-issue certificates from DigiNotar to intercept traffic towards Google and others in Iran~\cite{black-tulip}. The astonishing part is that this incident was first detected \emph{seven weeks later}. Certificate Transparency aims to facilitate \emph{detection} of issued certificates, thus holding certificate authorities \emph{accountable} for any certificates that they mis-issue~\cite{ct,rfc6962}. The basic idea is shown in Figure~\ref{fig:ct-idea}. In addition to regular validation rules, browsers ensure certificates are included in a public append-only Certificate Transparency \emph{log}. This allows anyone to get a concise view of all certificates that users may encounter, including domain owners like Google who can then see for themselves whether any of the published certificates are mis-issued. The parties inspecting the logs are called \emph{monitors}. Some monitors mirror all log entries~\cite{crt.sh}, while others discard most of them in pursuit of finding matches for pre-defined criteria like \texttt{*.example.com}~\cite{certspotter}. Another option is subscribing to certificate notifications from a trusted third-party~\cite{ct-monitors}. \begin{figure}[!t] \centering\includegraphics[width=0.8\textwidth]{src/introduction/img/ct} \caption{% The idea of Certificate Transparency. Certificates encountered by users must be included in a public log so that monitors can detect mis-issuance. } \label{fig:ct-idea} \end{figure} What makes Certificate Transparency a significant improvement compared to the certificate authority ecosystem is that the logs stand on a cryptographic foundation that can be verified. A log can be viewed as an append-only tamper-evident list of certificates. It is efficient\footnote{% Efficient refers to space-time complexity $\mathcal{O}(\mathsf{log}(n))$, where $n$ is the number of log entries. } to prove cryptographically that a certificate is in the list, and that a current version of the list is append-only with regard to a previous version (i.e., no tampering or reordering).\footnote{% Interested readers can refer to our Merkle tree and proof technique introduction online~\cite{merkle-intro}. } These properties follow from using a Merkle tree structure that supports \emph{inclusion} and \emph{consistency} proofs~\cite{history-trees,ct-formal,rfc6962,merkle}. The reader only needs to know that these proofs are used to reconstruct a log's Merkle tree head, often referred to as a \emph{root hash}. It is a cryptographic hash identifying a list of certificates uniquely in a tree data structure. The logs sign root hashes with the number of entries and a timestamp to form \emph{signed tree heads}. So, if an inconsistency is discovered, it cannot be denied. Log operators are therefore held accountable for maintaining the append-only property. A party that verifies the efficient transparency log proofs without downloading all the logs is called an \emph{auditor}. A log that signs two inconsistent tree heads is said to perform a \emph{split-view}. To ensure that everyone observes the same append-only logs, all participants of the Certificate Transparency ecosystem must engage in a \emph{gossip protocol}~\cite{chuat,nordberg}. In other words, just because Alice observes an append-only log, it is not necessarily the \emph{same append-only log} that Bob observes. Therefore, Alice and Bob must exchange signed tree heads and verify consistency to assert that the log operators play by the rules and only append certificates. Without a secure gossip protocol, log operators would have to be trusted blindly (much like certificate authorities before Certificate Transparency). RFC~6962 defers the specification of gossip~\cite{rfc6962}, with little or no meaningful gossip deployed yet. Rolling out Certificate Transparency without breakage on the web is a challenge~\cite{does-ct-break-the-web}. Certificates must be logged, associated proofs delivered to end-user software, and more. One solution RFC~6962 ultimately put forth was the introduction of \emph{signed certificate timestamps}. A signed certificate timestamp is a log's \emph{promise} that a certificate will be appended to the log within a \emph{maximum merge delay} (typically 24 hours). Verifying if a log holds its promise is usually called \emph{auditing}. Certificate authorities can obtain signed certificate timestamps and embed them in their final certificates by logging a \emph{pre-certificate}. As such, there is no added latency from building the underlying Merkle tree and no need for server software to be updated (as the final certificate contains the information needed). The current policy for Google Chrome and Apple's Safari is to reject certificates with fewer than two signed certificate timestamps~\cite{apple-log-policy,google-log-policy}. How to request an inclusion proof for a promise without leaking the user's browsing history to the log is an open problem~\cite{sok-sct-auditing}. In other words, asking for an inclusion proof trivially reveals the certificate of interest to the log. Other than embedding signed certificate timestamps in certificates, they can be delivered dynamically to end-users in TLS extensions and stapled certificate status responses. For example, Cloudflare uses the TLS extension delivery method to recover from log incidents without their customers needing to acquire new certificates~\cite{cloudflare-scts}. Several log incidents have already happened in the past, ranging from split-views~\cite{trustasia-err,izenpe-err,venafi-err} to broken promises of timely logging~\cite{wosign-err,google-err,digicert-err,starcom-err} and potential key compromise~\cite{digicert-kc}. These are all good \emph{scares} motivating continued completion of Certificate Transparency in practice. In summary, the status quo is for web browsers to require at least two signed certificate timestamps before accepting a certificate as valid. Merkle tree proofs are not verified. Gossip is not deployed. The lack of a reliable \emph{gossip-audit model} means that the logs are largely trusted parties.\footnote{% Historical remark: the lack of verification led Google to require that all certificates be disclosed in at least one of their logs to validate~\cite{ct-history}. The so-called \emph{one-Google log requirement} was recently replaced. Google Chrome instead interacts with Google's trusted auditor. See Section~\ref{sec:related}. } We defer discussion of related work in the area of gossip-audit models until Section~\ref{sec:related}. \subsection{Tor} The Tor Project is a 501(c)(3) US nonprofit that advances human rights and defends privacy online through free software and open networks~\cite{tpo}. Some of the maintained and developed components include Tor Browser and Tor's relay software. Thousands of volunteers operate relays as part of the Tor network, which routes the traffic of millions of daily users with low latency~\cite{mani}. This frustrates attackers like Internet service providers that may try linking \emph{who is communicating with whom} from their local (non-global) vantage points~\cite{tor}. Usage of Tor involves tunneling the TCP traffic of different destinations (such as all flows associated with a website visit to \texttt{example.com}) in fixed-size \emph{cells} on independent \emph{circuits}. A circuit is built through a guard, a middle, and an exit relay. At each hop of the circuit, one layer of symmetric encryption is peeled off. The used keys are ephemeral and discarded together with all other circuit state after at most 10 minutes (the maximum circuit lifetime). This setup allows guard relays to observe users' IP addresses but none of the destination traffic. In contrast, exit relays can observe destination traffic but no user IP addresses. The relays used in a circuit are determined by Tor's end-user software. Such path selection is randomized and bandwidth-weighted but starts with a largely static guard set to protect users from \emph{eventually} entering the network from a relay an attacker volunteered to run. Tor's \emph{consensus} lists the relays that make up the network. As the name suggests, it is a document agreed upon by a majority of trusted \emph{directory authorities}. Five votes are currently needed to reach a consensus. Examples of information added to the Tor consensus include tunable network parameters and uploaded relay descriptors with relevant metadata, e.g., public key, available bandwidth, and exit policy. Each relay in the consensus is also assigned different flags based on their configuration and observed performance, e.g., \texttt{Guard}, \texttt{MiddleOnly}, \texttt{Fast}, \texttt{Stable}, and \texttt{HSDir}. The latter means that the relay is a \emph{hidden service directory}, which refers to being part of a distributed hash table that helps users lookup \emph{onion service introduction points}. An onion service is a self-authenticated server identified by its public key. Onion services are only reachable through the Tor network. Users that are aware of a server's \emph{onion address} can consult the distributed hash table to find its introduction points. To establish a connection, a user builds a circuit to a \emph{rendezvous point}. A request is then sent to one of the current introduction points, which informs the onion service that it may build its own circuit to meet the user at their rendezvous point. In total, six relays are traversed while interacting with an onion service. This setup allows not only the sender but also the receiver to be anonymous. The receiver also benefits from a large degree of censorship resistance as the server location may be hidden. The main drawback of onion services is that their non-mnemonic names are hard to discover and remember. Some sites try to overcome this by setting their onion addresses in \emph{onion location} HTTP headers or HTML attributes~\cite{onion-location}. Many users use Tor Browser to connect to the Tor network. In addition to routing traffic as described above, Tor Browser ships with privacy-preserving features like first-party isolation to not share any state across different origins, settings that frustrate browser fingerprinting, and \emph{disk-avoidance} to not store browsing-related history as well as other identifying information to disk~\cite{tb}. Tor Browser is a fork of Mozilla's Firefox. Unfortunately, neither Firefox nor Tor Browser supports any form of Certificate Transparency. Conducting undetected machine-in-the-middle attacks against Tor users is thus relatively straightforward: compromise or coerce the weakest-link certificate authority, then volunteer to operate an exit relay and intercept network traffic. Such interception has previously been found with self-signed certificates~\cite{spoiled-onions}. While global attackers are not within Tor's threat model, it is in scope to guard against various local attacks~\cite{tor}. For example, the intended attacker may passively observe a small fraction of the network and actively inject their own packets. Figure~\ref{fig:wf} shows the typical attacker setting of \emph{website fingerprinting}, where the attacker observes a user's entry traffic with the goal of inferring which website was visited solely based on analyzing encrypted traffic~\cite{cheng98,herrmann09,hintz02,liberatore06,panchenko11,sun02}. Website fingerprinting attacks are evaluated in the \emph{open-world} or \emph{closed-world} settings. In the closed-world setting, the attacker monitors (not to be confused with Certificate Transparency monitoring) a fixed list of websites. A user visits one of the monitored sites, and the attacker needs to determine which one. The open-world setting is the same as the closed-world setting, except that the user may also visit unmonitored sites. The practicality of website fingerprinting attacks is up for debate, e.g., ranging from challenges handling false positives to machine-learning dataset drift~\cite{onlinewf,juarez14,perryCrit,realistic}. \begin{figure}[!t] \centering\includegraphics[width=0.85\textwidth]{src/cat/img/setting} \caption{ The setting of a website fingerprinting attack. A local passive attacker analyzes a user's encrypted network traffic as it enters the network. The goal is to infer which website is visited. (Figure reprinted from Paper~\ref{paper:cat}.) } \label{fig:wf} \end{figure} In summary, Tor is a low-latency anonymity network often accessed with Tor Browser. Among the threats that Tor aims to protect against are local attackers that see traffic as it enters or leaves the network (but not both at the same time all the time). A website fingerprinting attack is an example of a passive attack that operates on entry traffic. A machine-in-the-middle attack is an example of an active attack that typically operates on exit traffic. Discussion of related work in the area of website fingerprinting is deferred until Section~\ref{sec:related}. \section{Research Questions} \label{sec:rqs} The overall research objective spans two different areas: transparency logs and low-latency anonymity networks. We aim to reduce trust assumptions in transparency log solutions and to apply such solutions in anonymous settings for improved security and privacy. We defined the following research questions to make this concrete in Certificate Transparency and Tor, the two ecosystems with the most history and dominant positions in their respective areas. \begin{researchquestions} \item[Can trust requirements in Certificate Transparency be reduced in practice?] Transparency logs have a cryptographic foundation that supports efficient verification of inclusion and consistency proofs. Such proofs are useful to reduce the amount of trust that is placed in the logs. The roll-out of Certificate Transparency has yet to start using these proofs, and to employ a gossip protocol that ensures the same append-only logs are observed. Part of the challenge relates to privacy concerns as parties interact with each other, as well as deploying gradually without breakage. We seek practical solutions that reduce the trust requirements currently placed in the logs and third-party monitors while preserving user privacy. \item[How can authentication of websites be improved in the context of Tor?] Tor Browser has yet to support Certificate Transparency to facilitate detection of hijacked websites. This includes HTTPS sites but also onion services that may be easier to discover reliably with more transparency. We seek incremental uses of Certificate Transparency in Tor that preserve user privacy while engaging in new verification protocols to reduce trust. \item[How do the protocols used during website visits affect unlinkability between Tor users and their destination websites?] Several third-parties become aware of a user's browsing activities while a website is visited. For example, DNS resolvers and certificate status responders may be consulted for domain name resolution and verification of if a certificate has been revoked. Fetching an inclusion proof from a Certificate Transparency log would reveal the same type of information. We seek to explore how unlinkability between Tor users and their exit destinations is affected by the multitude of protocols used during website visits. The considered setting is the same as in website fingerprinting, except that the attacker may take additional passive and active measures. For example, the attacker may volunteer to run a Certificate Transparency log (passive) or inject carefully-crafted packets into Tor~(active). \end{researchquestions} \section{Research Methods} \label{sec:methods} We tackle privacy and security problems in the field of computer science~\cite{icss,smics}. Our work is applied, following the scientific method for security and experimental networking research. \emph{Exactly} what it means to use the scientific method in these areas is up for debate~\cite{rfenr,sse}. However, at a glance, it is about forming precise and consistent theories with falsifiable predictions \emph{as in other sciences} except that the objects of study are \emph{information systems in the real world}. A prerequisite to formulating precise, consistent, and falsifiable theories is that there are few implicit assumptions. Therefore, scientific security research should be accompanied by definitions of security goals and attacker capabilities: what does it mean that the system is secure, and what is the attacker (not) allowed to do while attacking it~\cite{secdefs}? Being explicit about the overall \emph{setting} and \emph{threat model} is prevalent in formal security work like cryptography, where an abstract (mathematical) model is used to show that security can be \emph{proved} by reducing to a computationally hard problem (like integer factorization) or a property of some primitive (like the collision resistance of a hash function)~\cite{provsec}. It is nevertheless just as crucial in less formal work that deals with security of systems in the real (natural) world---the exclusive setting of the scientific method---which usually lends itself towards break-and-fix cycles in light of new observations. Where to draw the line between \emph{security work} and \emph{security research} is not trivial. However, a few common \emph{failures} of past ``security research'' include not bringing observations in contact with theory, not making claims and assumptions explicit, or simply relying on unfalsifiable claims~\cite{sse}. While deductive approaches (like formal reduction proofs) are instrumental in managing complexity and gaining confidence in different models, more than these approaches are required as a model's \emph{instantiation} must also be secure~\cite{secdefs}. It is common to complement abstract modeling with \emph{real-world measurements} as well as \emph{systems prototyping and evaluations}~\cite{rfenr}. Real-world measurements measure properties of deployed systems like the Internet, the web, and the Tor network. For example, a hypothesis in a real-world measurement could be that (non-)Tor users browse according to the same website popularity distribution. Sometimes these measurements involve the use of research prototypes, or the research prototypes themselves become the objects of study to investigate properties of selected system parts (say, whether a packet processor with new features is indistinguishable from some baseline as active network attackers adaptively inject packets of their choosing). If it is infeasible, expensive, or unsafe (see below) to study a real-world system, a simulation may be studied instead. The downside of simulation is that the model used may not be a good approximation of the natural world, similar to formal cryptographic~modeling. The appended papers use all of the above approaches to make claims about security, privacy, and performance in different systems, sometimes with regard to an abstract model that can be used as a foundation in the natural world to manage complexity. Paper~\ref{paper:lwm} contains a reduction proof sketch to show reliance on standard cryptographic assumptions. Paper~\ref{paper:cat} extends past simulation setups to show the impact of an added attacker capability. Meanwhile, Paper~\ref{paper:ctor} models part of the Tor network with mathematical formulas to estimate performance overhead. All but Paper~\ref{paper:sauteed} contain real-world measurements relating to Internet infrastructure, websites, certificates, Tor, or practical deployability of our proposals. All but Paper~\ref{paper:ctor} contain research prototypes with associated evaluations, e.g., performance profiling, as well as corroborating or refuting our security definitions in experimental settings. All papers include discussions of security and privacy properties as well as their limitations and strengths in the chosen settings (where assumptions are explicit and threat models motivated). Throughout our experiments, we strived to follow best practices like documenting the used setups, making datasets and associated tooling available, reducing potential biases by performing repeated measurements from multiple different vantage points, and discussing potential biases (or lack thereof)~\cite{rfenr}. We also interacted with Tor's research safety board~\cite{trsb} to discuss the ethics and safety of our measurements in Paper~\ref{paper:tlwo}, and refrained from measuring real (i.e., non-synthetic) usage of Tor whenever possible (Papers~\ref{paper:ctor} and~\ref{paper:cat}). Finally, the uncovered bugs and vulnerabilities in Papers~\ref{paper:cat}--\ref{paper:tlwo} were responsibly disclosed to the Tor project. This included suggestions on how to move forward. \section{Contributions} \label{sec:contribs} The main contributions of this thesis are listed below. An overview of how they relate to our research questions and appended papers is shown in Figure~\ref{fig:contrib}. \begin{figure}[!t] \centering \includegraphics[width=0.83\textwidth]{src/introduction/img/contribs} \caption{% Overview of appended papers, contributions, and research questions. } \label{fig:contrib} \end{figure} \vspace{1cm} \begin{contributions} \item[Reduced trust in third-party monitoring with a signed tree head extension that shifts trust from non-cryptographic certificate notifications to a log's gossip-audit model (or if such a model does not exist yet, the logs themselves).] Paper~\ref{paper:lwm} applies existing cryptographic techniques for constructing static and lexicographically ordered Merkle trees so that certificates can be wild-card filtered on subject alternative names with (non-)membership proofs. This building block is evaluated in the context of Certificate Transparency, including a security sketch and performance benchmarks. \item[Increased probability of split-view detection by proposing gossip protocols that disseminate signed tree heads without bidirectional communication.] Paper~\ref{paper:ctga} explores aggregation of signed tree heads at line speed in programmable packet processors, facilitating consistency proof verification on the level of an entire autonomous system. Such verification can be indistinguishable from an autonomous system without any split-view detection to achieve herd immunity, i.e., protection without aggregation. Aggregation at 32 autonomous systems can protect 30-50\% of the IPv4 space. Paper~\ref{paper:ctor} explores signed tree heads in Tor's consensus. To reliably perform an undetected split-view against log clients that have Tor in their trust root, a log must collude with a majority of directory authorities. \item[Improved detectability of website hijacks targeting Tor Browser by proposing privacy-preserving and gradual roll-outs of Certificate Transparency in Tor.] Paper~\ref{paper:ctor} explores adoption of Certificate Transparency in Tor Browser with signed certificate timestamps as a starting point, then leveraging the decentralized network of relays to cross-log certificates before ultimately verifying inclusion proofs against a single view in Tor's consensus. The design is probabilistically secure with tunable parameters that result in modest overheads. Paper~\ref{paper:sauteed} shows that Certificate Transparency logging of domain names with associated onion addresses helps provide forward censorship-resistance and detection of unwanted onion associations. \item[An extension of the attacker model for website fingerprinting that provides attackers with the capability of querying a website oracle.] A website oracle reveals whether a monitored website was (not) visited by any network user during a specific time frame. Paper~\ref{paper:cat} defines and simulates website fingerprinting attacks with website oracles, showing that most false positives can be eliminated for all but the most frequently visited websites. A dozen sources of real-world website oracles follow from the protocols used during website visits. We enumerate and classify those sources based on ease of accessibility, reliability, and coverage. The overall analysis includes several Internet measurements. \item[Remotely-exploitable probing-attacks on Tor's DNS cache that instantiate a real-world website oracle without any special attacker capabilities or reach.] Paper~\ref{paper:cat} shows that timing differences in end-to-end response times can be measured to determine whether a domain name is (not) cached by a Tor relay. An estimated true positive rate of 17.3\% can be achieved while trying to minimize false positives. Paper~\ref{paper:tlwo} improves the attack by exploiting timeless timing differences that depend on concurrent processing. The improved attack has no false positives or false negatives. Our proposed bug fixes and mitigations have been merged in Tor. \item[A complete redesign of Tor's DNS cache that defends against all (timeless) timing attacks while retaining or improving performance compared~to~today.] Paper~\ref{paper:tlwo} suggests that Tor's DNS cache should only share the same preloaded domain names across different circuits to remove the remotely-probable state that reveals information about past exit traffic. A network measurement with real-world Tor relays shows which popularity lists are good approximations of Tor usage and, thus, appropriate to preload. Cache-hit ratios can be retained or improved compared to today's Tor. \end{contributions} \section{Summary of Appended Papers} \label{sec:appended} The appended papers and their contexts are summarized below. Notably, all papers are in publication-date order except that Paper~\ref{paper:cat} predates Papers~\ref{paper:ctor}--\ref{paper:sauteed}. { \hypersetup{linkcolor=black} \listofsummaries } \section{Related Work} \label{sec:related} This section positions the appended papers with regard to related work. For Certificate Transparency, this includes approaches towards signed certificate timestamp verification, gossip, and the problem of monitoring the logs. The related work with regard to Tor is focused on the practicality of website fingerprinting attacks and prior use of side-channels (such as timing attacks). \subsection{Certificate Transparency Verification} Approaches that fetch inclusion proofs have in common that they should preserve privacy by not revealing the link between users and visited websites. Eskandarian~\emph{et~al.}\ mention that Tor could be used to overcome privacy concerns; however, it comes at the cost of added infrastructure requirements~\cite{eskandarian}. Lueks and Goldberg~\cite{lueks} and Kales~\emph{et~al.}~\cite{kales} suggest that logs could provide inclusion proofs using multi-server private information retrieval. This requires a non-collusion assumption while also adding significant overhead. Laurie suggests that users can fetch inclusion proofs via DNS as their resolvers already learned the destination sites~\cite{ct-over-dns}. While surveying signed certificate timestamp auditing, Meiklejohn~\emph{et~al.}\ point out that Certificate Transparency over DNS may have privacy limitations~\cite{sok-sct-auditing}. For example, the time of domain lookups and inclusion proof queries are detached. Paper~\ref{paper:ctga} uses Laurie's approach as a premise while proposing a gossip protocol. Paper~\ref{paper:ctor} applies Certificate Transparency in a context where Tor is not additional infrastructure~(Tor~Browser). Several proposals try to avoid inclusion proof fetching altogether. Dirksen~\emph{et~al.}\ suggest that all logs could be involved in the issuance of a signed certificate timestamp~\cite{dirksen}. This makes it harder to violate maximum merge delays without detection but involves relatively large changes to log deployments. Nordberg~\emph{et~al.}\ suggest that signed certificate timestamps can be handed back to the origin web servers on subsequent revisits~\cite{nordberg}, which has the downside of assuming that machine-in-the-middle attacks eventually cease for detection. Nordberg~\emph{et~al.}~\cite{nordberg} as well as Chase and Meiklejohn~\cite{chase} suggest that some clients/users could collaborate with a trusted auditor. Stark and Thompson describe how users can opt-in to use Google as a trusted auditor~\cite{opt-in-sct-auditing}. This approach was recently replaced by opt-out auditing that cross-checks a fraction of signed certificate timestamps with Google using k-anonymity~\cite{opt-out-sct-auditing}. Henzinger~\emph{et al.} show how such k-anonymity can be replaced with a single-server private information retrieval setup that approaches the performance of prior multi-server solutions~\cite{henzinger}. None of the latter two proposals provide a solution for privately reporting that a log may have violated its maximum merge delay because the trusted auditor is assumed to know about all signed certificate timestamps. Eskandarian~\emph{et~al.}\ show how to prove that a log omitted a certificate privately~\cite{eskandarian}. However, they use an invalid assumption about today's logs being in strict timestamp order~\cite{sok-sct-auditing}. Paper~\ref{paper:ctor} suggests that Tor Browser could submit a fraction of signed certificate timestamps to randomly selected Tor relays. These relays perform further auditing on Tor Browser's behalf: much like a trusted auditor, except that no single entity is running it. Merkle trees fix log content---not promises of logging. Therefore, inclusion proof fetching by users or their trusted parties must be accompanied by consistency verification and gossip to get a complete gossip-audit model~\cite{rfc6962}. Chuat~\emph{et~al.}\ suggest that users and web servers can pool signed tree heads, gossiping about them as they interact~\cite{chuat}. Nordberg~\emph{et~al.}\ similarly suggest that users can pollinate signed tree heads as they visit different web servers~\cite{nordberg}. Hof and Carle suggest that signed tree heads could be cross-logged to make all logs intertwined~\cite{hof}. Gunn~\emph{et~al.}\ suggest multi-path fetching of signed tree heads~\cite{gunn}, which may make persistent split-views hard depending on the used multi-paths. Syta~\emph{et~al.}\ suggest that independent witnesses could cosign the logs using threshold signatures~\cite{syta}. Smaller-scale versions of witness cosigning received attention in industry~\cite{sigsum-witness,trustfabric-arxiv}, and generally in other types of transparency logs as well~\cite{parakeet}. Larger browser vendors could decide to push the same signed tree heads to their users, as proposed by Sleevi and Messeri~\cite{sth-push}. Paper~\ref{paper:ctga} uses the operators of network vantage points for aggregating and verifying signed tree heads to provide their users with gossip-as-a-service, however assuming plaintext DNS traffic and a sound signed tree head frequency as defined by Nordberg~\emph{et~al.}~\cite{nordberg}. We used the multi-path assumptions of Gunn~\emph{et~al.}\ to break out of local vantage points. In contrast, Paper~\ref{paper:ctor} ensures that the same logs are observed in the Tor network by incorporating signed tree heads into Tor's consensus (thus making directory authorities into witnesses). Li~\emph{et~al.}\ argue that it would be too costly for most domains to run a monitor~\cite{li}.\footnote{% Whether the third-party monitors in this study misbehaved or not can be questioned~\cite{ayer-on-li}. } Similar arguments have been raised before, and lead to alternative data structures that could make monitoring more efficient than today's overhead~\cite{vds,coniks,tomescu}. Paper~\ref{paper:lwm} falls into this category, as the root of an additional static lexicographically-ordered Merkle tree is added to a log's signed tree heads to encode batches of included certificates. The downside is that a non-deployed signed tree head extension is assumed~\cite{rfc9162}, as well as a tree head frequency similar to those described by Nordberg \emph{et~al.}~\cite{nordberg}~to~get~efficiency~in~practise. Paper~\ref{paper:sauteed} uses a Mozilla Firefox web extension to verify embedded signed certificate timestamps in Tor Browser. Such verification is similar to the gradual deployments of Certificate Transparency in other browsers~\cite{ct-history,does-ct-break-the-web}, and the starting point to improve upon in Papers~\ref{paper:ctga}--\ref{paper:ctor}. Moreover, the use of Certificate Transparency to associate human-meaningful domain names with non-mnemonic onion addresses (as in Paper~\ref{paper:sauteed}) is one of many proposals for alternative naming systems and onion search solutions~\cite{kadianakis,muffet-onions,nurmi,onion-location,h-e-securedrop,onio-ns}. \subsection{Website Fingerprinting and Side-Channels} Several researchers outline how past website fingerprinting attacks have been evaluated in unrealistic conditions~\cite{juarez14,perryCrit,realistic}. This includes not accounting for the size of the open-world setting, failing to keep false positive rates low enough to be useful, assuming that homepages are browsed one at a time, how to avoid dataset drift, and training classifiers on synthetic network traces. While some of these challenges were addressed~\cite{onlinewf,realistic}, the question of how to deal with false positives remains open. Papers~\ref{paper:cat}--\ref{paper:tlwo} make a significant dent in this problem by providing evidence that the website fingerprinting attacker model could be made \emph{stronger} to capture \emph{realistic real-world capabilities} that eliminate most false positives around Alexa top-10k and the long tail~of~unpopular~sites. Others have evaluated traffic analysis attacks against Tor beyond the website fingerprinting setting. On one side of the spectrum are end-to-end correlation/confirmation attacks that typically consider a global passive attacker that observes all network traffic~\cite{johnson13,nasr18,oh22,rimmer22}. Such strong attackers are not within the scope of Tor~\cite{tor}. On the other side of the spectrum are local attackers that see a small fraction of the network, typically in a position to observe a user's encrypted entry traffic (Figure~\ref{fig:wf}). Many have studied those \emph{weak attacks} in lab settings where, e.g., advances in deep learning improved the accuracy significantly~\cite{wfdef,tiktok,df}. Others have focused on improved attacks that are \emph{active} in the Tor network from their own local vantage points~\cite{chakravarty10,mittal11,murdoch05}, which is similar to the techniques in Papers~\ref{paper:cat}--\ref{paper:tlwo}. Greschbach~\emph{et~al.}\ show that an attacker who gains access to (or traffic to~\cite{siby20}) commonly used DNS resolvers like Google's \texttt{8.8.8.8} get valuable information to improve both end-to-end correlation and website fingerprinting attacks~\cite{greschbach}. Paper~\ref{paper:cat} generalizes the attacker capability they uncovered by allowing the attacker to query Tor's receiver anonymity set with a website oracle of time-frame~$t$. It is further shown that it is possible to instantiate such an abstraction in the real-world while \emph{staying within Tor's threat model}. In other words, the attacker is still local but may employ passive and active measures to narrow down the receiver anonymity set. Paper~\ref{paper:ctor} proposes Certificate Transparency verification that gives log operators website oracle access. Tor's directory authorities tune $t$. Website oracles exist because Tor is designed for anonymity---not unobservable communication~\cite{anonterm}. The instantiation of a real-world website oracle is either a direct result of observing network flows from the protocols used during website visits, or due to state of these network flows being stored and inferable. Inferring secret system state is widely studied in applied cryptography and hardware architecture~\cite{lucky13,ge18,kocher96,mart21,tsunoo03,heist}, where the goal is usually to determine a key, decrypt a ciphertext, forge a message, or similar using side-channels. A side-channel can be local or remote and ranges from analysis of power consumption to cache states and timing differences. There is a long history of remote timing attacks that are practical~\cite{bbrumley11,dbrumley03,crosby09,wang22}. A recent improvement in this area that is relevant for Tor is timeless timing attacks, which exploit concurrency and message reordering to eliminate network jitter~\cite{timeless}. Paper~\ref{paper:cat} demonstrates a remote timing attack against Tor's DNS cache that achieves up to 17.3\% true positive rates while minimizing false positives. Paper~\ref{paper:tlwo} instead uses a remote timeless timing attack with no false positives, no false negatives, and a small time-frame $t$. This approaches an ideal website oracle without special attacker capabilities or reach into third-parties. \section{Conclusions and Future Work} \label{sec:concl} Throughout the thesis, we contributed to the understanding of how trust requirements in Certificate Transparency can be reduced. Efficient and reliable monitoring of the logs is easily overlooked. If the broader ecosystem achieves monitoring through third-parties, they should be subject to the same scrutiny as logs. We proposed a solution that makes it hard for third-party monitors to provide subscribers with selective certificate notifications. We also proposed a gossip-audit model that plugs into interacting with the logs over DNS by having programmable packet processors verify that the same append-only logs are observed. Avoiding the addition of complicated verification logic into end-user software is likely a long-term win because it reduces the number of moving parts. In other words, simple gossip-audit models will be much easier to deploy in the wide variety of end-user software that embeds TLS clients. We also contributed to the understanding of how Certificate Transparency can be applied in the context of Tor Browser. Compared to a regular browser, this results in a different setting with its own challenges and opportunities. On the one hand, Tor Browser benefits from the ability to preserve privacy due to using the anonymity network Tor. On the other hand, data relating to website visits cannot be persisted to disk (such as signed certificate timestamps blocked by maximum merge delays). Our incrementally-deployable proposal keeps the logic in Tor Browser simple by offloading all Certificate Transparency verification to randomly selected Tor relays. The design is complete because mis-issued certificates can eventually reach a trusted auditor who acts on incidents. In addition to proposing Certificate Transparency in Tor Browser, we also explored how certificates with onion addresses may improve the association of domain names with onion addresses. Such certificates ensure domain owners know which onion addresses can be discovered for their sites, much like Certificate Transparency does the same thing for public TLS keys. This also adds censorship resistance to the discovery as logs are append-only. As part of exploring Certificate Transparency in Tor Browser, we further contributed to the understanding of how the protocols used during website visits affect unlinkability between Tor users and their destination websites. For example, fetching an inclusion proof from a Certificate Transparency log is one such protocol. We extended the attacker model of website fingerprinting attacks with website oracles that reveal whether any network user visited a website during a specific time frame. Our results show that website oracles eliminate most false positives for all but the most frequently visited websites. In addition to the theoretical evaluation of the extended attacker model, we could exploit (timeless) timing attacks in Tor's DNS cache to instantiate real-world website oracles without any special capabilities or reach into third-parties. This led us to contribute to the understanding of how Tor's DNS cache performs today, including a proposal for a performant alternative that preloads the same popular domains on all Tor relays to withstand all (timeless) timing attacks. As an outlook, our angle on Certificate Transparency verification has mostly been \emph{reactive} for end-users. In other words, some or all certificate verification occurs asynchronously after a website visit. An alternative to this would be upfront delivery of inclusion proofs that reconstruct tree heads which witnesses cosigned; a form of \emph{proactive} gossip as proposed by Syta~\emph{et al.}~\cite{syta}. The significant upside is that the browser's verification could become non-interactive, eliminating privacy concerns and ensuring end-users only see certificates merged into the append-only logs. Investigating what the blockers for such a proposal are in practice---today---would be valuable as log verification quickly becomes complicated with signed certificate timestamps and reactive gossip-audit models. Are these blockers significant? Are they significant over time as other \emph{eventual} changes will be needed, like post-quantum secure certificates? New transparency log applications are unlikely to need the complexity of Certificate Transparency, and should likely not copy something that was designed to fit into an existing system with a large amount of legacy (such as certificate authorities, their established processes for certificate issuance, and the many client-server implementations already deployed on the Internet). Orthogonal to the verification performed by end-users, contributing to the understanding of how domains (fail to) use Certificate Transparency for detecting mis-issued certificates is largely unexplored. For example, subscribing to email notifications of newly issued certificates becomes less useful in an era where certificates are renewed frequently and automatically. Instead, domain owners need easy-to-use solutions that raise alarms only if there is a problem. Finally, the mitigation deployed to counter our (timeless) timing attacks in Tor's DNS cache is just that: a mitigation, not a defense, that applies to modestly popular websites but not the long tail where the base rate is low. This is because the attacker's needed website oracle time frame is so large that a fuzzy time-to-live value does nothing. Practical aspects of a preloaded DNS cache need to be explored further before deployment, such as the assumption of a third-party that visits popular domains to assemble an allowlist. We may also have \emph{underestimated} the utility of the existing Umbrella list, which in and of itself does not require any new third-party. Does the use of Umbrella impact page-load latency? Latency is the most crucial parameter to keep minimized. The question is whether frequently looked-up domains are missed or not by skipping the website-visit step, as for the non-extended Alexa and Tranco lists. More broadly, the question of how to strike a balance between \emph{efficiency} and \emph{effectiveness} of website fingerprinting defenses is open. How much overhead in terms of added latency and/or bandwidth is needed? How much of that overhead is sustainable, both from a user perspective (where, e.g., latency is crucial for web browsing and other interactive activities) and a network health perspective (such as the amount of volunteered relay bandwidth that is wasted)? It is paramount to neither overestimate nor underestimate attacker capabilities, which goes back to the still-debated threat model of website fingerprinting attacks. Regardless of if Tor's DNS cache becomes preloaded or not, it will be difficult to circumvent DNS lookups from happening. Someone---be it a weak attacker like ourselves or a recursive DNS resolver at an Internet service provider---is in a position to narrow down the destination anonymity set. This is especially true when also considering other protocols that reveal information about the destination anonymity set during website visits. Accepting that sources of real-world website oracles are prevalent implies that \emph{the world can be closed}. Therefore, a closed world is more realistic than an open world. \subsection*{Acknowledgments} I received valuable feedback while writing the introductory summary from Simone Fischer-H\"{u}bner Johan Garcia, Stefan Lindskog, and Tobias Pulls. The final draft was further improved with helpful nudges from Grammarly. \bibliographystyle{plain} \bibliography{src/introduction/refs}