From 385cc92bc91e1a6c3724085c060e76bf40c13ed3 Mon Sep 17 00:00:00 2001 From: Rasmus Dahlberg Date: Tue, 15 Oct 2024 16:08:16 +0200 Subject: Import PhD thesis --- summary/src/introduction/main.tex | 826 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 826 insertions(+) create mode 100644 summary/src/introduction/main.tex (limited to 'summary/src/introduction/main.tex') diff --git a/summary/src/introduction/main.tex b/summary/src/introduction/main.tex new file mode 100644 index 0000000..da013ab --- /dev/null +++ b/summary/src/introduction/main.tex @@ -0,0 +1,826 @@ +\section{Introduction} + +The security posture of the Internet increased significantly throughout the +last decade. For example, + the cleaned-up and formally verified TLS 1.3 protocol that underpins HTTPS + has been rolled-out gradually~\cite{tls-timeline}, + the certificates that specify which public keys to use when bootstrapping a + secure connection can be obtained for free and automatically~\cite{le}, and + web browsers have shifted from positive to negative security indicators in + favor of security-by-default~\cite{browser-ui}. +The use of end-to-end encryption has further become the norm with services such +as + DNS-over-HTTPS~\cite{rfc8484}, + virtual private networks~\cite{wireguard}, + Tor~\cite{tor}, and + secure messaging~\cite{mls} +gaining traction. In other words, the era of attackers that can passively snoop +and actively tamper with unencrypted~network~traffic~is~over. + +What will remain the same is the incentive for attackers to snoop and tamper +with network traffic. Therefore, the focus is (and will likely continue to be) +on circumventing protocols that add security and privacy as they are deployed +in the real world. For example, there is a long history of certificate +mis-issuance that allows attackers to impersonate websites and thus insert +themselves as machines-in-the-middle (``MitM'') without actually breaking +TLS~\cite{sok-https,sslmate-history}. Or, in the case of encrypted channels +that are hard to intercept, instead analyzing traffic patterns to infer user +activity like which website is being +visited~\cite{cheng98,herrmann09,hintz02,liberatore06,panchenko11,sun02}. The +bad news is that attackers only need to find one vulnerability in a deployed +protocol or its software. Sometimes, such vulnerabilities can be purchased by +zero-day brokers like Zerodium~\cite{zerodium}. + +To address an attack vector, it is common to add countermeasures that frustrate +attackers and/or increase the risk involved while trying to exploit a system. +A good example is how the certificate authority ecosystem evolved. For +background, certificate authorities are trusted parties that validate domain +names before issuing certificates that list their public keys. Web browsers +are shipped with hundreds of trusted certificate authorities, which means that +the resulting TLS connections cannot be more secure than the difficulty of +hijacking the weakest-link certificate authority~\cite{sok-https}. A proposal +eventually deployed to mitigate this issue is Certificate Transparency: an +ecosystem of public append-only logs that publishes all certificates so that +any mis-issuance can be \emph{detected} by monitors~\cite {ct,rfc6962}. +These logs have a cryptographic foundation that holds them and the issuing +certificate authorities \emph{accountable}, at least in theory. In practice, +the logs are essentially trusted parties that must act honestly due to how web +browsers shape their policies to respect user +privacy~\cite{apple-log-policy,google-log-policy,sok-sct-auditing,ct-history}. + +The first objective of this thesis is to better understand the current limits +of Certificate Transparency by proposing and evaluating improvements which +\emph{reduce} the amount of trust that needs to be placed in third-party +monitors and logs. We make a dent in the problem of Certificate Transparency +verification both generally and concretely in the context of Tor Browser, which +unlike Google Chrome and Apple's Safari does not support Certificate +Transparency yet. For context, Tor Browser is a fork of +Mozilla's Firefox that (among other things) routes user traffic through the +low-latency anonymity network Tor~\cite{tor,tb}. As part of our pursuit to +improve the status quo for Certificate Transparency verification in Tor +Browser, the second objective of this thesis is to evaluate how the protocols +used during website visits affect unlinkability between senders (web browsers) +and receivers (websites). Our evaluation applies to our addition of +Certificate Transparency and other protocols already in use, e.g., + DNS, + real-time bidding~\cite{rtb}, and + certificate revocation checking~\cite{ocsp}. + +The remainder of the introductory summary is structured as follows. +Section~\ref{sec:background} introduces background that will help the reader + understand the context and preliminaries of the appended papers. +Section~\ref{sec:rqs} defines our research questions and overall objective. +Section~\ref{sec:methods} provides an overview of our research methods. +Section~\ref{sec:contribs} describes our contributions succinctly. +Section~\ref{sec:appended} summarizes the appended papers that are published in + NordSec (Paper~\ref{paper:lwm}), + SECURWARE (Paper~\ref{paper:ctga}), + PETS (Paper~\ref{paper:ctor} and~\ref{paper:cat}), + WPES (Paper~\ref{paper:sauteed}), and + USENIX Security (Paper~\ref{paper:tlwo}). +Section~\ref{sec:related} positions our contributions with regard to related +work. Section~\ref{sec:concl} concludes and briefly discusses future work. + +\section{Background} \label{sec:background} + +This section introduces background on Certificate Transparency and Tor. + +\subsection{Certificate Transparency} + +The web's public-key infrastructure depends on certificate authorities to issue +certificates that map domain names to public keys. For example, the +certificate of \texttt{www.example.com} is issued by DigiCert and lists a +2048-bit RSA key~\cite{crt:www.example.com}. The fact that DigiCert signed +this certificate means that they claim to have verified that the requesting +party is really \texttt{www.example.com}, typically by first ensuring that a +specified DNS record can be uploaded on request~\cite{ca/b}. If all +certificate authorities performed these checks correctly and the checks +themselves were fool-proof, a user's browser could be sure that any +certificate signed by a certificate authority would list a verified public key +that can be used for authentication when connecting to a website via TLS. +Unfortunately, there are hundreds of trusted certificate authorities and a long +history of issues surrounding their operations in +practice~\cite{bambo-cas,sok-https,sslmate-history}. One of the most famous +incidents took place in 2011: an attacker managed to mis-issue certificates from +DigiNotar to intercept traffic towards Google and others in +Iran~\cite{black-tulip}. The astonishing part is that this incident was first +detected \emph{seven weeks later}. + +Certificate Transparency aims to facilitate \emph{detection} of issued +certificates, thus holding certificate authorities \emph{accountable} for any +certificates that they mis-issue~\cite{ct,rfc6962}. The basic idea is shown in +Figure~\ref{fig:ct-idea}. In addition to regular validation rules, browsers +ensure certificates are included in a public append-only Certificate +Transparency \emph{log}. +This allows anyone to get a concise view of all certificates that users +may encounter, including domain owners like Google who can then see for +themselves whether any of the published certificates are mis-issued. The +parties inspecting the logs are called \emph{monitors}. Some monitors mirror +all log entries~\cite{crt.sh}, while others discard most of them in pursuit of +finding matches for pre-defined criteria like +\texttt{*.example.com}~\cite{certspotter}. Another option is subscribing to +certificate notifications from a trusted third-party~\cite{ct-monitors}. + +\begin{figure}[!t] + \centering\includegraphics[width=0.8\textwidth]{src/introduction/img/ct} + \caption{% + The idea of Certificate Transparency. Certificates encountered by users + must be included in a public log so that monitors can detect mis-issuance. + } + \label{fig:ct-idea} +\end{figure} + +What makes Certificate Transparency a significant improvement compared to the +certificate authority ecosystem is that the logs stand on a cryptographic +foundation that can be verified. A log can be viewed as an append-only +tamper-evident list of certificates. It is efficient\footnote{% + Efficient refers to space-time complexity $\mathcal{O}(\mathsf{log}(n))$, + where $n$ is the number of log entries. +} to prove cryptographically that a certificate is in the list, and that a +current version of the list is append-only with regard to a previous version +(i.e., no tampering or reordering).\footnote{% + Interested readers can refer to our Merkle tree and proof technique + introduction online~\cite{merkle-intro}. +} These properties follow from using a Merkle tree structure that supports +\emph{inclusion} and \emph{consistency} +proofs~\cite{history-trees,ct-formal,rfc6962,merkle}. +The reader +only needs to know that these proofs are used to reconstruct a log's Merkle tree +head, often referred to as a \emph{root hash}. It is a cryptographic hash +identifying a list of certificates uniquely in a tree data structure. The logs +sign root hashes with the number of entries and a timestamp to form \emph{signed +tree heads}. So, if an inconsistency is discovered, it cannot be denied. Log +operators are therefore held accountable for maintaining the append-only +property. A party that verifies the efficient transparency log proofs without +downloading all the logs is called an \emph{auditor}. + +A log that signs two inconsistent tree heads is said to perform a +\emph{split-view}. To ensure that everyone observes the same append-only logs, +all participants of the Certificate Transparency ecosystem must engage in a +\emph{gossip protocol}~\cite{chuat,nordberg}. In other words, just because +Alice observes an append-only log, it is not necessarily the \emph{same +append-only log} that Bob observes. Therefore, Alice and Bob must exchange +signed tree heads and verify consistency to assert that the log operators play +by the rules and only append certificates. Without a secure gossip protocol, +log operators would have to be trusted blindly (much like certificate +authorities before Certificate Transparency). RFC~6962 defers the specification +of gossip~\cite{rfc6962}, with little or no meaningful gossip deployed yet. + +Rolling out Certificate Transparency without breakage on the web is a +challenge~\cite{does-ct-break-the-web}. Certificates must be logged, associated +proofs delivered to end-user software, and more. One solution RFC~6962 +ultimately put forth was the introduction of \emph{signed certificate +timestamps}. A signed certificate timestamp is a log's \emph{promise} that a +certificate will be appended to the log within a \emph{maximum merge delay} +(typically 24 hours). Verifying if a log holds its promise is +usually called \emph{auditing}. +Certificate authorities can obtain signed certificate +timestamps and embed them in their final certificates by logging a +\emph{pre-certificate}. As such, there is no added latency from building the +underlying Merkle tree and no need for server software to be updated (as the +final certificate contains the information needed). The current policy for +Google Chrome and Apple's Safari is to reject certificates with fewer than two +signed certificate timestamps~\cite{apple-log-policy,google-log-policy}. How to +request an inclusion proof for a promise without leaking the user's browsing +history to the log is an open problem~\cite{sok-sct-auditing}. In other words, +asking for an inclusion proof trivially reveals the certificate of interest to +the log. + +Other than embedding signed certificate timestamps in certificates, they can be +delivered dynamically to end-users in TLS extensions and stapled certificate +status responses. For example, Cloudflare uses the TLS extension delivery +method to recover from log incidents without their customers needing to acquire +new certificates~\cite{cloudflare-scts}. Several log incidents have already +happened in the past, ranging from +split-views~\cite{trustasia-err,izenpe-err,venafi-err} to broken promises of +timely logging~\cite{wosign-err,google-err,digicert-err,starcom-err} and +potential key compromise~\cite{digicert-kc}. These are all good \emph{scares} +motivating continued completion of Certificate Transparency in practice. + +In summary, the status quo is for web browsers to require at least two signed +certificate timestamps before accepting a certificate as valid. Merkle tree +proofs are not verified. Gossip is not deployed. The lack of a reliable +\emph{gossip-audit model} means that the logs are largely trusted +parties.\footnote{% + Historical remark: the lack of verification led Google to require that all + certificates be disclosed in at least one of their logs to + validate~\cite{ct-history}. The so-called \emph{one-Google log requirement} + was recently replaced. Google Chrome instead interacts with Google's trusted + auditor. See Section~\ref{sec:related}. +} We defer discussion of related work in the area of gossip-audit models until +Section~\ref{sec:related}. + +\subsection{Tor} + +The Tor Project is a 501(c)(3) US nonprofit that advances human rights and +defends privacy online through free software and open networks~\cite{tpo}. Some +of the maintained and developed components include Tor Browser and Tor's relay +software. Thousands of volunteers operate relays as part of the Tor network, +which routes the traffic of millions of daily users with low +latency~\cite{mani}. This frustrates attackers like Internet service providers +that may try linking \emph{who is communicating with whom} from their local +(non-global) vantage points~\cite{tor}. + +Usage of Tor involves tunneling the TCP traffic of different destinations (such +as all flows associated with a website visit to \texttt{example.com}) in +fixed-size \emph{cells} on independent \emph{circuits}. A circuit is built +through a guard, a middle, and an exit relay. At each hop of the circuit, one +layer of symmetric encryption is peeled off. The used keys are ephemeral and +discarded together with all other circuit state after at most 10 minutes (the +maximum circuit lifetime). This setup allows guard relays to observe users' IP +addresses but none of the destination traffic. In contrast, exit relays can +observe destination traffic but no user IP addresses. The relays used in a +circuit are determined by Tor's end-user software. Such path selection +is randomized and bandwidth-weighted but starts with a largely static guard set +to protect users from \emph{eventually} entering the network from a relay an +attacker volunteered to run. + +Tor's \emph{consensus} lists the relays that make up the network. As the name +suggests, it is a document agreed upon by a majority of trusted \emph{directory +authorities}. Five votes are currently needed to reach a consensus. Examples +of information added to the Tor consensus include tunable network parameters and +uploaded relay descriptors with relevant metadata, e.g., public key, available +bandwidth, and exit policy. Each relay in the consensus is also assigned +different flags based on their configuration and observed performance, e.g., +\texttt{Guard}, \texttt{MiddleOnly}, \texttt{Fast}, \texttt{Stable}, and +\texttt{HSDir}. The latter means that the relay is a \emph{hidden service +directory}, which refers to being part of a distributed hash table that helps +users lookup \emph{onion service introduction points}. + +An onion service is a self-authenticated server identified by its public key. +Onion services are only reachable through the Tor network. Users that are aware +of a server's \emph{onion address} can consult the distributed hash table to +find its introduction points. To establish a connection, a user builds a +circuit to a \emph{rendezvous point}. A request is then sent to one of the +current introduction points, which informs the onion service that it may build +its own circuit to meet the user at their rendezvous point. In total, six +relays are traversed while interacting with an onion service. This setup allows +not only the sender but also the receiver to be anonymous. The receiver also +benefits from a large degree of censorship resistance as the server location may +be hidden. The main drawback of onion services is that their non-mnemonic names +are hard to discover and remember. Some sites try to overcome this by setting +their onion addresses in \emph{onion location} HTTP headers or HTML +attributes~\cite{onion-location}. + +Many users use Tor Browser to connect to the Tor network. In addition to +routing traffic as described above, Tor Browser ships with privacy-preserving +features like first-party isolation to not share any state across +different origins, settings that frustrate browser fingerprinting, and +\emph{disk-avoidance} to not store browsing-related history as well as other +identifying information to disk~\cite{tb}. Tor Browser is a fork of Mozilla's +Firefox. Unfortunately, neither Firefox nor Tor Browser supports any form of +Certificate Transparency. Conducting undetected machine-in-the-middle attacks +against Tor users is thus relatively straightforward: compromise or coerce the +weakest-link certificate authority, then volunteer to operate an exit relay and +intercept network traffic. Such interception has previously been found with +self-signed certificates~\cite{spoiled-onions}. + +While global attackers are not within Tor's threat model, it is in scope to +guard against various local attacks~\cite{tor}. For example, the intended +attacker may passively observe a small fraction of the network and actively +inject their own packets. Figure~\ref{fig:wf} shows the typical attacker +setting of \emph{website fingerprinting}, where the attacker observes a user's +entry traffic with the goal of inferring which website was visited solely based +on analyzing encrypted +traffic~\cite{cheng98,herrmann09,hintz02,liberatore06,panchenko11,sun02}. +Website fingerprinting attacks are evaluated in the \emph{open-world} or +\emph{closed-world} settings. In the closed-world setting, the attacker +monitors (not to be confused with Certificate Transparency monitoring) a fixed +list of websites. A user visits one of the monitored sites, and the attacker +needs to determine which one. The open-world setting is the same as the +closed-world setting, except that the user may also visit unmonitored sites. +The practicality of website fingerprinting attacks is up for debate, e.g., +ranging from challenges handling false positives to machine-learning dataset +drift~\cite{onlinewf,juarez14,perryCrit,realistic}. + +\begin{figure}[!t] + \centering\includegraphics[width=0.85\textwidth]{src/cat/img/setting} + \caption{ + The setting of a website fingerprinting attack. A local passive attacker + analyzes a user's encrypted network traffic as it enters the network. The + goal is to infer which website is visited. (Figure reprinted from + Paper~\ref{paper:cat}.) + } + \label{fig:wf} +\end{figure} + +In summary, Tor is a low-latency anonymity network often accessed with Tor +Browser. Among the threats that Tor aims to protect against are local attackers +that see traffic as it enters or leaves the network (but not both at the same +time all the time). A website fingerprinting attack is an example of a passive +attack that operates on entry traffic. A machine-in-the-middle attack is an +example of an active attack that typically operates on exit traffic. Discussion +of related work in the area of website fingerprinting is deferred until +Section~\ref{sec:related}. + +\section{Research Questions} \label{sec:rqs} + +The overall research objective spans two different areas: transparency logs and +low-latency anonymity networks. We aim to reduce trust assumptions in +transparency log solutions and to apply such solutions in anonymous settings for +improved security and privacy. We defined the following research questions to +make this concrete in Certificate Transparency and Tor, the two ecosystems with +the most history and dominant positions in their respective areas. + +\begin{researchquestions} + \item[Can trust requirements in Certificate Transparency be reduced in + practice?] + + Transparency logs have a cryptographic foundation that supports efficient + verification of inclusion and consistency proofs. Such proofs are useful to + reduce the amount of trust that is placed in the logs. The roll-out of + Certificate Transparency has yet to start using these proofs, and to employ a + gossip protocol that ensures the same append-only logs are observed. Part of + the challenge relates to privacy concerns as parties interact with each other, + as well as deploying gradually without breakage. + + We seek practical solutions that reduce the trust requirements currently + placed in the logs and third-party monitors while preserving user privacy. + + \item[How can authentication of websites be improved in the context of Tor?] + + Tor Browser has yet to support Certificate Transparency to facilitate + detection of hijacked websites. This includes HTTPS sites but also onion + services that may be easier to discover reliably with more transparency. + + We seek incremental uses of Certificate Transparency in Tor that preserve user + privacy while engaging in new verification protocols to reduce trust. + + \item[How do the protocols used during website visits affect + unlinkability between Tor users and their destination websites?] + + Several third-parties become aware of a user's browsing activities while a + website is visited. For example, DNS resolvers and certificate status + responders may be consulted for domain name resolution and verification of if + a certificate has been revoked. Fetching an inclusion proof from a + Certificate Transparency log would reveal the same type of information. + + We seek to explore how unlinkability between Tor users and their exit + destinations is affected by the multitude of protocols used during website + visits. The considered setting is the same as in website fingerprinting, + except that the attacker may take additional passive and active measures. + For example, the attacker may volunteer to run a Certificate Transparency log + (passive) or inject carefully-crafted packets into Tor~(active). +\end{researchquestions} + +\section{Research Methods} \label{sec:methods} + +We tackle privacy and security problems in the field of computer +science~\cite{icss,smics}. Our work is applied, following the scientific method +for security and experimental networking research. \emph{Exactly} what it means +to use the scientific method in these areas is up for debate~\cite{rfenr,sse}. +However, at a glance, it is about forming precise and consistent theories with +falsifiable predictions \emph{as in other sciences} except that the objects of +study are \emph{information systems in the real world}. + +A prerequisite to formulating precise, consistent, and falsifiable theories is +that there are few implicit assumptions. Therefore, scientific security +research should be accompanied by definitions of security goals and attacker +capabilities: what does it mean that the system is secure, and what is the +attacker (not) allowed to do while attacking it~\cite{secdefs}? Being explicit about the +overall \emph{setting} and \emph{threat model} is prevalent in formal security +work like cryptography, where an abstract (mathematical) model is used to show +that security can be \emph{proved} by reducing to a computationally hard problem +(like integer factorization) or a property of some primitive (like the collision +resistance of a hash function)~\cite{provsec}. It is nevertheless just as crucial in less +formal work that deals with security of systems in the real (natural) +world---the exclusive setting of the scientific method---which usually lends +itself towards break-and-fix cycles in light of new observations. Where to draw +the line between \emph{security work} and \emph{security research} is not +trivial. However, a few common \emph{failures} of past ``security research'' +include not bringing observations in contact with theory, not making claims and +assumptions explicit, or simply relying on unfalsifiable claims~\cite{sse}. + +While deductive approaches (like formal reduction proofs) are instrumental in +managing complexity and gaining confidence in different models, more than these +approaches are required as a model's \emph{instantiation} must also be secure~\cite{secdefs}. +It is common to complement abstract modeling with \emph{real-world measurements} +as well as \emph{systems prototyping and evaluations}~\cite{rfenr}. Real-world +measurements measure properties of deployed systems like the Internet, the web, +and the Tor network. For example, a hypothesis in a real-world measurement +could be that (non-)Tor users browse according to the same website popularity +distribution. Sometimes these measurements involve the use of research +prototypes, or the research prototypes themselves become the objects of study to +investigate properties of selected system parts (say, whether a packet processor +with new features is indistinguishable from some baseline as active network +attackers adaptively inject packets of their choosing). If it is infeasible, +expensive, or unsafe (see below) to study a real-world system, a simulation may +be studied instead. The downside of simulation is that the model used may not +be a good approximation of the natural world, similar to formal +cryptographic~modeling. + +The appended papers use all of the above approaches to make claims about +security, privacy, and performance in different systems, sometimes with regard +to an abstract model that can be used as a foundation in the natural world to +manage complexity. Paper~\ref{paper:lwm} contains a reduction proof sketch to +show reliance on standard cryptographic assumptions. Paper~\ref{paper:cat} +extends past simulation setups to show the impact of an added attacker +capability. Meanwhile, Paper~\ref{paper:ctor} models part of the Tor network +with mathematical formulas to estimate performance overhead. All but +Paper~\ref{paper:sauteed} contain real-world measurements relating to Internet +infrastructure, websites, certificates, Tor, or practical deployability of our +proposals. All but Paper~\ref{paper:ctor} contain research prototypes with +associated evaluations, e.g., performance profiling, as well as corroborating or +refuting our security definitions in experimental settings. All papers include +discussions of security and privacy properties as well as their limitations and +strengths in the chosen settings (where assumptions are explicit and threat +models motivated). + +Throughout our experiments, we strived to follow best practices like documenting +the used setups, making datasets and associated tooling available, reducing +potential biases by performing repeated measurements from multiple different +vantage points, and discussing potential biases (or lack thereof)~\cite{rfenr}. +We also interacted with Tor's research safety board~\cite{trsb} to discuss the +ethics and safety of our measurements in Paper~\ref{paper:tlwo}, and refrained +from measuring real (i.e., non-synthetic) usage of Tor whenever possible +(Papers~\ref{paper:ctor} and~\ref{paper:cat}). Finally, the uncovered bugs and +vulnerabilities in Papers~\ref{paper:cat}--\ref{paper:tlwo} were responsibly +disclosed to the Tor project. This included suggestions on how to move forward. + +\section{Contributions} \label{sec:contribs} + +The main contributions of this thesis are listed below. An overview of how they +relate to our research questions and appended papers is shown in +Figure~\ref{fig:contrib}. + +\begin{figure}[!t] + \centering + \includegraphics[width=0.83\textwidth]{src/introduction/img/contribs} + \caption{% + Overview of appended papers, contributions, and research questions. + } + \label{fig:contrib} +\end{figure} + +\vspace{1cm} +\begin{contributions} + \item[Reduced trust in third-party monitoring with a signed tree head + extension that shifts trust from non-cryptographic certificate notifications + to a log's gossip-audit model (or if such a model does not exist yet, the + logs themselves).] + + Paper~\ref{paper:lwm} applies existing cryptographic techniques for + constructing static and lexicographically ordered Merkle trees so that + certificates can be wild-card filtered on subject alternative names with + (non-)membership proofs. This building block is evaluated in the context of + Certificate Transparency, including a security sketch and performance + benchmarks. + + \item[Increased probability of split-view detection by proposing gossip + protocols that disseminate signed tree heads without bidirectional + communication.] + + Paper~\ref{paper:ctga} explores aggregation of signed tree heads at line + speed in programmable packet processors, facilitating consistency proof + verification on the level of an entire autonomous system. Such verification + can be indistinguishable from an autonomous system without any split-view + detection to achieve herd immunity, i.e., protection without aggregation. + Aggregation at 32 autonomous systems can protect 30-50\% of the IPv4 space. + Paper~\ref{paper:ctor} explores signed tree heads in Tor's consensus. To + reliably perform an undetected split-view against log clients that have Tor + in their trust root, a log must collude with a majority of directory + authorities. + + \item[Improved detectability of website hijacks targeting Tor Browser by + proposing privacy-preserving and gradual roll-outs of Certificate + Transparency in Tor.] + + Paper~\ref{paper:ctor} explores adoption of Certificate Transparency in Tor + Browser with signed certificate timestamps as a starting point, then + leveraging the decentralized network of relays to cross-log certificates + before ultimately verifying inclusion proofs against a single view in Tor's + consensus. The design is probabilistically secure with tunable parameters + that result in modest overheads. Paper~\ref{paper:sauteed} shows that + Certificate Transparency logging of domain names with associated onion + addresses helps provide forward censorship-resistance and detection of + unwanted onion associations. + + \item[An extension of the attacker model for website fingerprinting that + provides attackers with the capability of querying a website oracle.] + + A website oracle reveals whether a monitored website was (not) visited by + any network user during a specific time frame. Paper~\ref{paper:cat} + defines and simulates website fingerprinting attacks with website oracles, + showing that most false positives can be eliminated for all but the most + frequently visited websites. A dozen sources of real-world website oracles + follow from the protocols used during website visits. We enumerate and + classify those sources based on ease of accessibility, reliability, and + coverage. The overall analysis includes several Internet measurements. + + \item[Remotely-exploitable probing-attacks on Tor's DNS cache that instantiate + a real-world website oracle without any special attacker capabilities or reach.] + + Paper~\ref{paper:cat} shows that timing differences in end-to-end response + times can be measured to determine whether a domain name is (not) cached by + a Tor relay. An estimated true positive rate of 17.3\% can be achieved + while trying to minimize false positives. Paper~\ref{paper:tlwo} improves + the attack by exploiting timeless timing differences that depend on + concurrent processing. The improved attack has no false positives or false + negatives. Our proposed bug fixes and mitigations have been merged in Tor. + + \item[A complete redesign of Tor's DNS cache that defends against all (timeless) + timing attacks while retaining or improving performance compared~to~today.] + + Paper~\ref{paper:tlwo} suggests that Tor's DNS cache should only share the + same preloaded domain names across different circuits to remove the + remotely-probable state that reveals information about past exit traffic. A + network measurement with real-world Tor relays shows which popularity lists + are good approximations of Tor usage and, thus, appropriate to preload. + Cache-hit ratios can be retained or improved compared to today's Tor. +\end{contributions} + +\section{Summary of Appended Papers} \label{sec:appended} + +The appended papers and their contexts are summarized below. Notably, all +papers are in publication-date order except that Paper~\ref{paper:cat} predates +Papers~\ref{paper:ctor}--\ref{paper:sauteed}. + +{ + \hypersetup{linkcolor=black} + \listofsummaries +} + +\section{Related Work} \label{sec:related} + +This section positions the appended papers with regard to related work. For +Certificate Transparency, this includes approaches towards signed certificate +timestamp verification, gossip, and the problem of monitoring the logs. The +related work with regard to Tor is focused on the practicality of website +fingerprinting attacks and prior use of side-channels (such as timing attacks). + +\subsection{Certificate Transparency Verification} + +Approaches that fetch inclusion proofs have in common that they should preserve +privacy by not revealing the link between users and visited websites. +Eskandarian~\emph{et~al.}\ mention that Tor could be used to overcome privacy +concerns; however, it comes at the cost of added infrastructure +requirements~\cite{eskandarian}. Lueks and Goldberg~\cite{lueks} and +Kales~\emph{et~al.}~\cite{kales} suggest that logs could provide inclusion +proofs using multi-server private information retrieval. This requires a +non-collusion assumption while also adding significant overhead. +Laurie suggests that users can fetch inclusion proofs via DNS as their resolvers +already learned the destination sites~\cite{ct-over-dns}. While surveying +signed certificate timestamp auditing, +Meiklejohn~\emph{et~al.}\ point out that Certificate Transparency over DNS may +have privacy limitations~\cite{sok-sct-auditing}. For example, the time of +domain lookups and inclusion proof queries are detached. +Paper~\ref{paper:ctga} uses Laurie's approach as a premise while +proposing a gossip protocol. +Paper~\ref{paper:ctor} applies Certificate Transparency in a context +where Tor is not additional infrastructure~(Tor~Browser). + +Several proposals try to avoid inclusion proof fetching altogether. +Dirksen~\emph{et~al.}\ suggest that all logs could be involved in the issuance of +a signed certificate timestamp~\cite{dirksen}. This makes it harder to violate +maximum merge delays without detection but involves relatively large changes to +log deployments. Nordberg~\emph{et~al.}\ suggest that signed certificate +timestamps can be handed back to the origin web servers on subsequent +revisits~\cite{nordberg}, which has the downside of assuming that +machine-in-the-middle attacks eventually cease for detection. +Nordberg~\emph{et~al.}~\cite{nordberg} as well as Chase and +Meiklejohn~\cite{chase} suggest that some clients/users could collaborate with a +trusted auditor. Stark and Thompson describe how users can opt-in to use Google +as a trusted auditor~\cite{opt-in-sct-auditing}. This approach was recently +replaced by opt-out auditing that cross-checks a fraction of signed +certificate timestamps with Google using +k-anonymity~\cite{opt-out-sct-auditing}. Henzinger~\emph{et al.} show how such +k-anonymity can be replaced with a single-server private information retrieval +setup that approaches the performance of prior multi-server +solutions~\cite{henzinger}. None of the latter two proposals provide a solution +for privately reporting that a log may have violated its maximum merge delay +because the trusted auditor is assumed to know about all signed certificate +timestamps. Eskandarian~\emph{et~al.}\ show how to prove that a log omitted a +certificate privately~\cite{eskandarian}. However, they use an invalid +assumption about today's logs being in strict timestamp +order~\cite{sok-sct-auditing}. Paper~\ref{paper:ctor} suggests that Tor Browser +could submit a fraction of signed certificate timestamps to randomly selected +Tor relays. These relays perform further auditing on Tor Browser's behalf: much +like a trusted auditor, except that no single entity is running it. + +Merkle trees fix log content---not promises of logging. Therefore, +inclusion proof fetching by users or their trusted parties must be accompanied +by consistency verification and gossip to get a complete gossip-audit +model~\cite{rfc6962}. Chuat~\emph{et~al.}\ suggest that users and web servers +can pool signed tree heads, gossiping about them as they interact~\cite{chuat}. +Nordberg~\emph{et~al.}\ similarly suggest that users can pollinate signed tree +heads as they visit different web servers~\cite{nordberg}. Hof and Carle +suggest that signed tree heads could be cross-logged to make all logs +intertwined~\cite{hof}. Gunn~\emph{et~al.}\ suggest multi-path fetching of +signed tree heads~\cite{gunn}, which may make persistent split-views hard +depending on the used multi-paths. Syta~\emph{et~al.}\ suggest that independent +witnesses could cosign the logs using threshold signatures~\cite{syta}. +Smaller-scale versions of witness cosigning received attention in +industry~\cite{sigsum-witness,trustfabric-arxiv}, and generally in other types +of transparency logs as well~\cite{parakeet}. Larger browser vendors could +decide to push the same signed tree heads to their users, as proposed by Sleevi +and Messeri~\cite{sth-push}. Paper~\ref{paper:ctga} uses the operators of +network vantage points for aggregating and verifying signed tree heads to +provide their users with gossip-as-a-service, however assuming plaintext +DNS traffic and a sound signed tree head frequency as defined by +Nordberg~\emph{et~al.}~\cite{nordberg}. We used the multi-path assumptions of +Gunn~\emph{et~al.}\ to break out of local vantage points. In contrast, +Paper~\ref{paper:ctor} ensures that the same logs are observed in the Tor +network by incorporating signed tree heads into Tor's consensus (thus making +directory authorities into witnesses). + +Li~\emph{et~al.}\ argue that it would be too costly for most domains to run a +monitor~\cite{li}.\footnote{% + Whether the third-party monitors in this study misbehaved or not can be + questioned~\cite{ayer-on-li}. +} Similar arguments have been raised before, and lead to alternative data +structures that could make monitoring more efficient than today's +overhead~\cite{vds,coniks,tomescu}. Paper~\ref{paper:lwm} falls into this +category, as the root of an additional static lexicographically-ordered Merkle +tree is added to a log's signed tree heads to encode batches of included +certificates. The downside is that a non-deployed signed tree head extension is +assumed~\cite{rfc9162}, as well as a tree head frequency similar to those +described by Nordberg +\emph{et~al.}~\cite{nordberg}~to~get~efficiency~in~practise. + +Paper~\ref{paper:sauteed} uses a Mozilla Firefox web extension to verify +embedded signed certificate timestamps in Tor Browser. Such verification is +similar to the gradual deployments of Certificate Transparency in other +browsers~\cite{ct-history,does-ct-break-the-web}, and the starting point to +improve upon in Papers~\ref{paper:ctga}--\ref{paper:ctor}. Moreover, the use of +Certificate Transparency to associate human-meaningful domain names with +non-mnemonic onion addresses (as in Paper~\ref{paper:sauteed}) is one of many +proposals for alternative naming systems and onion search +solutions~\cite{kadianakis,muffet-onions,nurmi,onion-location,h-e-securedrop,onio-ns}. + +\subsection{Website Fingerprinting and Side-Channels} + +Several researchers outline how past website fingerprinting attacks have been +evaluated in unrealistic conditions~\cite{juarez14,perryCrit,realistic}. +This includes not accounting for the size of the open-world setting, failing to +keep false positive rates low enough to be useful, assuming that homepages are +browsed one at a time, how to avoid dataset drift, and training classifiers on +synthetic network traces. While some of these challenges were +addressed~\cite{onlinewf,realistic}, the question of how to deal with false +positives remains open. Papers~\ref{paper:cat}--\ref{paper:tlwo} make a +significant dent in this problem by providing evidence that the website +fingerprinting attacker model could be made \emph{stronger} to capture +\emph{realistic real-world capabilities} that eliminate most false positives +around Alexa top-10k and the long tail~of~unpopular~sites. + +Others have evaluated traffic analysis attacks against Tor beyond the website +fingerprinting setting. On one side of the spectrum are end-to-end +correlation/confirmation attacks that typically consider a global passive +attacker that observes all network +traffic~\cite{johnson13,nasr18,oh22,rimmer22}. Such strong attackers are not +within the scope of Tor~\cite{tor}. On the other side of the spectrum are local +attackers that see a small fraction of the network, typically in a position to +observe a user's encrypted entry traffic (Figure~\ref{fig:wf}). Many have +studied those \emph{weak attacks} in lab settings where, e.g., advances in deep +learning improved the accuracy significantly~\cite{wfdef,tiktok,df}. Others +have focused on improved attacks that are \emph{active} in the Tor network from +their own local vantage points~\cite{chakravarty10,mittal11,murdoch05}, which is +similar to the techniques in Papers~\ref{paper:cat}--\ref{paper:tlwo}. +Greschbach~\emph{et~al.}\ show that an attacker who gains access to (or traffic +to~\cite{siby20}) commonly used DNS resolvers like Google's \texttt{8.8.8.8} get +valuable information to improve both end-to-end correlation and website +fingerprinting attacks~\cite{greschbach}. Paper~\ref{paper:cat} generalizes the +attacker capability they uncovered by allowing the attacker to query Tor's +receiver anonymity set with a website oracle of time-frame~$t$. It is further +shown that it is possible to instantiate such an abstraction in the real-world +while \emph{staying within Tor's threat model}. In other words, the attacker is +still local but may employ passive and active measures to narrow down the +receiver anonymity set. Paper~\ref{paper:ctor} proposes Certificate +Transparency verification that gives log operators website oracle access. +Tor's directory authorities tune $t$. + +Website oracles exist because Tor is designed for anonymity---not unobservable +communication~\cite{anonterm}. The instantiation of a real-world website oracle +is either a direct result of observing network flows from the protocols +used during website visits, or due to state of these network flows being stored +and inferable. Inferring secret system state is widely studied in applied +cryptography and hardware +architecture~\cite{lucky13,ge18,kocher96,mart21,tsunoo03,heist}, where the goal +is usually to determine a key, decrypt a ciphertext, forge a message, or similar +using side-channels. A side-channel can be local or remote and ranges from +analysis of power consumption to cache states and timing differences. There is +a long history of remote timing attacks that are +practical~\cite{bbrumley11,dbrumley03,crosby09,wang22}. A recent improvement in +this area that is relevant for Tor is timeless timing attacks, which exploit +concurrency and message reordering to eliminate network jitter~\cite{timeless}. +Paper~\ref{paper:cat} demonstrates a remote timing attack against Tor's DNS +cache that achieves up to 17.3\% true positive rates while minimizing false +positives. Paper~\ref{paper:tlwo} instead uses a remote timeless timing attack +with no false positives, no false negatives, and a small time-frame $t$. This +approaches an ideal website oracle without special attacker capabilities or +reach into third-parties. + +\section{Conclusions and Future Work} \label{sec:concl} + +Throughout the thesis, we contributed to the understanding of how trust +requirements in Certificate Transparency can be reduced. Efficient and reliable +monitoring of the logs is easily overlooked. If the broader ecosystem achieves +monitoring through third-parties, they should be subject to the same scrutiny as +logs. We proposed a solution that makes it hard for third-party monitors to +provide subscribers with selective certificate notifications. We also proposed +a gossip-audit model that plugs into interacting with the logs over DNS by +having programmable packet processors verify that the same append-only logs are +observed. Avoiding the addition of complicated verification logic into end-user +software is likely a long-term win because it reduces the number of moving +parts. In other words, simple gossip-audit models will be much easier to deploy +in the wide variety of end-user software that embeds TLS clients. + +We also contributed to the understanding of how Certificate Transparency can be +applied in the context of Tor Browser. Compared to a regular browser, this +results in a different setting with its own challenges and opportunities. On +the one hand, Tor Browser benefits from the ability to preserve privacy due to +using the anonymity network Tor. On the other hand, data relating to +website visits cannot be persisted to disk (such as signed certificate +timestamps blocked by maximum merge delays). Our incrementally-deployable +proposal keeps the logic in Tor Browser simple by offloading all Certificate +Transparency verification to randomly selected Tor relays. The design is +complete because mis-issued certificates can eventually reach a trusted auditor +who acts on incidents. In addition to proposing Certificate Transparency in Tor +Browser, we also explored how certificates with onion addresses may improve the +association of domain names with onion addresses. Such certificates ensure +domain owners know which onion addresses can be discovered for their sites, much +like Certificate Transparency does the same thing for public TLS keys. This +also adds censorship resistance to the discovery as logs are append-only. + +As part of exploring Certificate Transparency in Tor Browser, we further +contributed to the understanding of how the protocols used during website visits +affect unlinkability between Tor users and their destination websites. For +example, fetching an inclusion proof from a Certificate Transparency log is one +such protocol. We extended the attacker model of website fingerprinting attacks +with website oracles that reveal whether any network user visited a website +during a specific time frame. Our results show that website oracles eliminate +most false positives for all but the most frequently visited websites. In +addition to the theoretical evaluation of the extended attacker model, we could +exploit (timeless) timing attacks in Tor's DNS cache to instantiate real-world +website oracles without any special capabilities or reach into third-parties. +This led us to contribute to the understanding of how Tor's DNS cache performs +today, including a proposal for a performant alternative that preloads the same +popular domains on all Tor relays to withstand all (timeless) timing attacks. + +As an outlook, our angle on Certificate Transparency verification has mostly +been \emph{reactive} for end-users. In other words, some or all certificate +verification occurs asynchronously after a website visit. An alternative to +this would be upfront delivery of inclusion proofs that reconstruct tree heads +which witnesses cosigned; a form of \emph{proactive} gossip as proposed by +Syta~\emph{et al.}~\cite{syta}. The significant upside is that the browser's +verification could become non-interactive, eliminating privacy concerns and +ensuring end-users only see certificates merged into the append-only logs. +Investigating what the blockers for such a proposal are in +practice---today---would be valuable as log verification quickly becomes +complicated with signed certificate timestamps and reactive gossip-audit models. +Are these blockers significant? Are they significant over time as other +\emph{eventual} changes will be needed, like post-quantum secure certificates? +New transparency log applications are unlikely to need the complexity of +Certificate Transparency, and should likely not copy something that was designed +to fit into an existing system with a large amount of legacy (such as +certificate authorities, their established processes for certificate issuance, +and the many client-server implementations already deployed on the Internet). + +Orthogonal to the verification performed by end-users, contributing to the +understanding of how domains (fail to) use Certificate Transparency for +detecting mis-issued certificates is largely unexplored. For example, +subscribing to email notifications of newly issued certificates becomes less +useful in an era where certificates are renewed frequently and automatically. +Instead, domain owners need easy-to-use solutions that raise alarms only if +there is a problem. + +Finally, the mitigation deployed to counter our (timeless) timing attacks in +Tor's DNS cache is just that: a mitigation, not a defense, that applies to +modestly popular websites but not the long tail where the base rate is low. +This is because the attacker's needed website oracle time frame is so large that +a fuzzy time-to-live value does nothing. Practical aspects of a preloaded DNS +cache need to be explored further before deployment, such as the assumption of a +third-party that visits popular domains to assemble an allowlist. We may also +have \emph{underestimated} the utility of the existing Umbrella list, which in +and of itself does not require any new third-party. Does the use of Umbrella +impact page-load latency? Latency is the most crucial parameter to keep +minimized. The question is whether frequently looked-up domains are missed or +not by skipping the website-visit step, as for the non-extended Alexa and Tranco +lists. + +More broadly, the question of how to strike a balance between \emph{efficiency} +and \emph{effectiveness} of website fingerprinting defenses is open. How much +overhead in terms of added latency and/or bandwidth is needed? How much of that +overhead is sustainable, both from a user perspective (where, e.g., latency is +crucial for web browsing and other interactive activities) and a network health +perspective (such as the amount of volunteered relay bandwidth that is wasted)? It is +paramount to neither overestimate nor underestimate attacker capabilities, which +goes back to the still-debated threat model of website fingerprinting attacks. +Regardless of if Tor's DNS cache becomes preloaded or not, it will be difficult +to circumvent DNS lookups from happening. Someone---be it a weak attacker like +ourselves or a recursive DNS resolver at an Internet service provider---is in a +position to narrow down the destination anonymity set. This is especially true +when also considering other protocols that reveal information about the +destination anonymity set during website visits. Accepting that sources of +real-world website oracles are prevalent implies that \emph{the world can be +closed}. Therefore, a closed world is more realistic than an open world. + +\subsection*{Acknowledgments} +I received valuable feedback while writing the introductory summary from + Simone Fischer-H\"{u}bner + Johan Garcia, + Stefan Lindskog, and + Tobias Pulls. +The final draft was further improved with helpful nudges from Grammarly. + +\bibliographystyle{plain} +\bibliography{src/introduction/refs} -- cgit v1.2.3