summary/src/introduction/main.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826

\section{Introduction}

The security posture of the Internet increased significantly throughout the
last decade.  For example,
  the cleaned-up and formally verified TLS 1.3 protocol that underpins HTTPS
    has been rolled-out gradually~\cite{tls-timeline},
  the certificates that specify which public keys to use when bootstrapping a
    secure connection can be obtained for free and automatically~\cite{le}, and
  web browsers have shifted from positive to negative security indicators in
    favor of security-by-default~\cite{browser-ui}.
The use of end-to-end encryption has further become the norm with services such
as
  DNS-over-HTTPS~\cite{rfc8484},
  virtual private networks~\cite{wireguard},
  Tor~\cite{tor}, and
  secure messaging~\cite{mls}
gaining traction.  In other words, the era of attackers that can passively snoop
and actively tamper with unencrypted~network~traffic~is~over.

What will remain the same is the incentive for attackers to snoop and tamper
with network traffic.  Therefore, the focus is (and will likely continue to be)
on circumventing protocols that add security and privacy as they are deployed
in the real world.  For example, there is a long history of certificate
mis-issuance that allows attackers to impersonate websites and thus insert
themselves as machines-in-the-middle (``MitM'') without actually breaking
TLS~\cite{sok-https,sslmate-history}.  Or, in the case of encrypted channels
that are hard to intercept, instead analyzing traffic patterns to infer user
activity like which website is being
visited~\cite{cheng98,herrmann09,hintz02,liberatore06,panchenko11,sun02}.  The
bad news is that attackers only need to find one vulnerability in a deployed
protocol or its software.  Sometimes, such vulnerabilities can be purchased by
zero-day brokers like Zerodium~\cite{zerodium}.

To address an attack vector, it is common to add countermeasures that frustrate
attackers and/or increase the risk involved while trying to exploit a system.
A good example is how the certificate authority ecosystem evolved.  For
background, certificate authorities are trusted parties that validate domain
names before issuing certificates that list their public keys.  Web browsers
are shipped with hundreds of trusted certificate authorities, which means that
the resulting TLS connections cannot be more secure than the difficulty of
hijacking the weakest-link certificate authority~\cite{sok-https}.  A proposal
eventually deployed to mitigate this issue is Certificate Transparency: an
ecosystem of public append-only logs that publishes all certificates so that
any mis-issuance can be \emph{detected} by monitors~\cite {ct,rfc6962}.
These logs have a cryptographic foundation that holds them and the issuing
certificate authorities \emph{accountable}, at least in theory.  In practice,
the logs are essentially trusted parties that must act honestly due to how web
browsers shape their policies to respect user
privacy~\cite{apple-log-policy,google-log-policy,sok-sct-auditing,ct-history}.

The first objective of this thesis is to better understand the current limits
of Certificate Transparency by proposing and evaluating improvements which
\emph{reduce} the amount of trust that needs to be placed in third-party
monitors and logs.  We make a dent in the problem of Certificate Transparency
verification both generally and concretely in the context of Tor Browser, which
unlike Google Chrome and Apple's Safari does not support Certificate
Transparency yet.  For context, Tor Browser is a fork of
Mozilla's Firefox that (among other things) routes user traffic through the
low-latency anonymity network Tor~\cite{tor,tb}.  As part of our pursuit to
improve the status quo for Certificate Transparency verification in Tor
Browser, the second objective of this thesis is to evaluate how the protocols
used during website visits affect unlinkability between senders (web browsers)
and receivers (websites).  Our evaluation applies to our addition of
Certificate Transparency and other protocols already in use, e.g.,
  DNS,
  real-time bidding~\cite{rtb}, and
  certificate revocation checking~\cite{ocsp}.

The remainder of the introductory summary is structured as follows.
Section~\ref{sec:background} introduces background that will help the reader
  understand the context and preliminaries of the appended papers.
Section~\ref{sec:rqs} defines our research questions and overall objective.
Section~\ref{sec:methods} provides an overview of our research methods.
Section~\ref{sec:contribs} describes our contributions succinctly.
Section~\ref{sec:appended} summarizes the appended papers that are published in
  NordSec (Paper~\ref{paper:lwm}),
  SECURWARE (Paper~\ref{paper:ctga}),
  PETS (Paper~\ref{paper:ctor} and~\ref{paper:cat}),
  WPES (Paper~\ref{paper:sauteed}), and
  USENIX Security (Paper~\ref{paper:tlwo}).
Section~\ref{sec:related} positions our contributions with regard to related
work.  Section~\ref{sec:concl} concludes and briefly discusses future work.

\section{Background} \label{sec:background}

This section introduces background on Certificate Transparency and Tor.

\subsection{Certificate Transparency}

The web's public-key infrastructure depends on certificate authorities to issue
certificates that map domain names to public keys.  For example, the
certificate of \texttt{www.example.com} is issued by DigiCert and lists a
2048-bit RSA key~\cite{crt:www.example.com}.  The fact that DigiCert signed
this certificate means that they claim to have verified that the requesting
party is really \texttt{www.example.com}, typically by first ensuring that a
specified DNS record can be uploaded on request~\cite{ca/b}.  If all
certificate authorities performed these checks correctly and the checks
themselves were fool-proof, a user's browser could be sure that any
certificate signed by a certificate authority would list a verified public key
that can be used for authentication when connecting to a website via TLS.
Unfortunately, there are hundreds of trusted certificate authorities and a long
history of issues surrounding their operations in
practice~\cite{bambo-cas,sok-https,sslmate-history}.  One of the most famous
incidents took place in 2011: an attacker managed to mis-issue certificates from
DigiNotar to intercept traffic towards Google and others in
Iran~\cite{black-tulip}.  The astonishing part is that this incident was first
detected \emph{seven weeks later}.

Certificate Transparency aims to facilitate \emph{detection} of issued
certificates, thus holding certificate authorities \emph{accountable} for any
certificates that they mis-issue~\cite{ct,rfc6962}.  The basic idea is shown in
Figure~\ref{fig:ct-idea}.  In addition to regular validation rules, browsers
ensure certificates are included in a public append-only Certificate
Transparency \emph{log}.
This allows anyone to get a concise view of all certificates that users
may encounter, including domain owners like Google who can then see for
themselves whether any of the published certificates are mis-issued.  The
parties inspecting the logs are called \emph{monitors}.  Some monitors mirror
all log entries~\cite{crt.sh}, while others discard most of them in pursuit of
finding matches for pre-defined criteria like
\texttt{*.example.com}~\cite{certspotter}.  Another option is subscribing to
certificate notifications from a trusted third-party~\cite{ct-monitors}.

\begin{figure}[!t]
  \centering\includegraphics[width=0.8\textwidth]{src/introduction/img/ct}
  \caption{%
    The idea of Certificate Transparency.  Certificates encountered by users
    must be included in a public log so that monitors can detect mis-issuance.
  }
  \label{fig:ct-idea}
\end{figure}

What makes Certificate Transparency a significant improvement compared to the
certificate authority ecosystem is that the logs stand on a cryptographic
foundation that can be verified.  A log can be viewed as an append-only
tamper-evident list of certificates.  It is efficient\footnote{%
  Efficient refers to space-time complexity $\mathcal{O}(\mathsf{log}(n))$,
  where $n$ is the number of log entries.
} to prove cryptographically that a certificate is in the list, and that a
current version of the list is append-only with regard to a previous version
(i.e., no tampering or reordering).\footnote{%
  Interested readers can refer to our Merkle tree and proof technique
  introduction online~\cite{merkle-intro}.
} These properties follow from using a Merkle tree structure that supports
\emph{inclusion} and \emph{consistency}
proofs~\cite{history-trees,ct-formal,rfc6962,merkle}.
The reader
only needs to know that these proofs are used to reconstruct a log's Merkle tree
head, often referred to as a \emph{root hash}.  It is a cryptographic hash
identifying a list of certificates uniquely in a tree data structure.  The logs
sign root hashes with the number of entries and a timestamp to form \emph{signed
tree heads}.  So, if an inconsistency is discovered, it cannot be denied.  Log
operators are therefore held accountable for maintaining the append-only
property.  A party that verifies the efficient transparency log proofs without
downloading all the logs is called an \emph{auditor}.

A log that signs two inconsistent tree heads is said to perform a
\emph{split-view}.  To ensure that everyone observes the same append-only logs,
all participants of the Certificate Transparency ecosystem must engage in a
\emph{gossip protocol}~\cite{chuat,nordberg}.  In other words, just because
Alice observes an append-only log, it is not necessarily the \emph{same
append-only log} that Bob observes.  Therefore, Alice and Bob must exchange
signed tree heads and verify consistency to assert that the log operators play
by the rules and only append certificates.  Without a secure gossip protocol,
log operators would have to be trusted blindly (much like certificate
authorities before Certificate Transparency).  RFC~6962 defers the specification
of gossip~\cite{rfc6962}, with little or no meaningful gossip deployed yet.

Rolling out Certificate Transparency without breakage on the web is a
challenge~\cite{does-ct-break-the-web}.  Certificates must be logged, associated
proofs delivered to end-user software, and more.  One solution RFC~6962
ultimately put forth was the introduction of \emph{signed certificate
timestamps}.  A signed certificate timestamp is a log's \emph{promise} that a
certificate will be appended to the log within a \emph{maximum merge delay}
(typically 24 hours).  Verifying if a log holds its promise is
usually called \emph{auditing}.
Certificate authorities can obtain signed certificate
timestamps and embed them in their final certificates by logging a
\emph{pre-certificate}.  As such, there is no added latency from building the
underlying Merkle tree and no need for server software to be updated (as the
final certificate contains the information needed).  The current policy for
Google Chrome and Apple's Safari is to reject certificates with fewer than two
signed certificate timestamps~\cite{apple-log-policy,google-log-policy}.  How to
request an inclusion proof for a promise without leaking the user's browsing
history to the log is an open problem~\cite{sok-sct-auditing}.  In other words,
asking for an inclusion proof trivially reveals the certificate of interest to
the log.

Other than embedding signed certificate timestamps in certificates, they can be
delivered dynamically to end-users in TLS extensions and stapled certificate
status responses.  For example, Cloudflare uses the TLS extension delivery
method to recover from log incidents without their customers needing to acquire
new certificates~\cite{cloudflare-scts}.  Several log incidents have already
happened in the past, ranging from
split-views~\cite{trustasia-err,izenpe-err,venafi-err} to broken promises of
timely logging~\cite{wosign-err,google-err,digicert-err,starcom-err} and
potential key compromise~\cite{digicert-kc}.  These are all good \emph{scares}
motivating continued completion of Certificate Transparency in practice.

In summary, the status quo is for web browsers to require at least two signed
certificate timestamps before accepting a certificate as valid.  Merkle tree
proofs are not verified.  Gossip is not deployed.  The lack of a reliable
\emph{gossip-audit model} means that the logs are largely trusted
parties.\footnote{%
  Historical remark: the lack of verification led Google to require that all
  certificates be disclosed in at least one of their logs to
  validate~\cite{ct-history}.  The so-called \emph{one-Google log requirement}
  was recently replaced.  Google Chrome instead interacts with Google's trusted
  auditor.  See Section~\ref{sec:related}.
} We defer discussion of related work in the area of gossip-audit models until
Section~\ref{sec:related}.

\subsection{Tor}

The Tor Project is a 501(c)(3) US nonprofit that advances human rights and
defends privacy online through free software and open networks~\cite{tpo}.  Some
of the maintained and developed components include Tor Browser and Tor's relay
software.  Thousands of volunteers operate relays as part of the Tor network,
which routes the traffic of millions of daily users with low
latency~\cite{mani}.  This frustrates attackers like Internet service providers
that may try linking \emph{who is communicating with whom} from their local
(non-global) vantage points~\cite{tor}.

Usage of Tor involves tunneling the TCP traffic of different destinations (such
as all flows associated with a website visit to \texttt{example.com}) in
fixed-size \emph{cells} on independent \emph{circuits}.  A circuit is built
through a guard, a middle, and an exit relay.  At each hop of the circuit, one
layer of symmetric encryption is peeled off.  The used keys are ephemeral and
discarded together with all other circuit state after at most 10 minutes (the
maximum circuit lifetime).  This setup allows guard relays to observe users' IP
addresses but none of the destination traffic.  In contrast, exit relays can
observe destination traffic but no user IP addresses.  The relays used in a
circuit are determined by Tor's end-user software.  Such path selection
is randomized and bandwidth-weighted but starts with a largely static guard set
to protect users from \emph{eventually} entering the network from a relay an
attacker volunteered to run.

Tor's \emph{consensus} lists the relays that make up the network.  As the name
suggests, it is a document agreed upon by a majority of trusted \emph{directory
authorities}.  Five votes are currently needed to reach a consensus.  Examples
of information added to the Tor consensus include tunable network parameters and
uploaded relay descriptors with relevant metadata, e.g., public key, available
bandwidth, and exit policy.  Each relay in the consensus is also assigned
different flags based on their configuration and observed performance, e.g.,
\texttt{Guard}, \texttt{MiddleOnly}, \texttt{Fast}, \texttt{Stable}, and
\texttt{HSDir}.  The latter means that the relay is a \emph{hidden service
directory}, which refers to being part of a distributed hash table that helps
users lookup \emph{onion service introduction points}.

An onion service is a self-authenticated server identified by its public key.
Onion services are only reachable through the Tor network.  Users that are aware
of a server's \emph{onion address} can consult the distributed hash table to
find its introduction points.  To establish a connection, a user builds a
circuit to a \emph{rendezvous point}.  A request is then sent to one of the
current introduction points, which informs the onion service that it may build
its own circuit to meet the user at their rendezvous point.  In total, six
relays are traversed while interacting with an onion service.  This setup allows
not only the sender but also the receiver to be anonymous.  The receiver also
benefits from a large degree of censorship resistance as the server location may
be hidden.  The main drawback of onion services is that their non-mnemonic names
are hard to discover and remember.  Some sites try to overcome this by setting
their onion addresses in \emph{onion location} HTTP headers or HTML
attributes~\cite{onion-location}.

Many users use Tor Browser to connect to the Tor network.  In addition to
routing traffic as described above, Tor Browser ships with privacy-preserving
features like first-party isolation to not share any state across
different origins, settings that frustrate browser fingerprinting, and
\emph{disk-avoidance} to not store browsing-related history as well as other
identifying information to disk~\cite{tb}.  Tor Browser is a fork of Mozilla's
Firefox.  Unfortunately, neither Firefox nor Tor Browser supports any form of
Certificate Transparency.  Conducting undetected machine-in-the-middle attacks
against Tor users is thus relatively straightforward: compromise or coerce the
weakest-link certificate authority, then volunteer to operate an exit relay and
intercept network traffic.  Such interception has previously been found with
self-signed certificates~\cite{spoiled-onions}.

While global attackers are not within Tor's threat model, it is in scope to
guard against various local attacks~\cite{tor}.  For example, the intended
attacker may passively observe a small fraction of the network and actively
inject their own packets.  Figure~\ref{fig:wf} shows the typical attacker
setting of \emph{website fingerprinting}, where the attacker observes a user's
entry traffic with the goal of inferring which website was visited solely based
on analyzing encrypted
traffic~\cite{cheng98,herrmann09,hintz02,liberatore06,panchenko11,sun02}.
Website fingerprinting attacks are evaluated in the \emph{open-world} or
\emph{closed-world} settings.  In the closed-world setting, the attacker
monitors (not to be confused with Certificate Transparency monitoring) a fixed
list of websites.  A user visits one of the monitored sites, and the attacker
needs to determine which one.  The open-world setting is the same as the
closed-world setting, except that the user may also visit unmonitored sites.
The practicality of website fingerprinting attacks is up for debate, e.g.,
ranging from challenges handling false positives to machine-learning dataset
drift~\cite{onlinewf,juarez14,perryCrit,realistic}.

\begin{figure}[!t]
  \centering\includegraphics[width=0.85\textwidth]{src/cat/img/setting}
  \caption{
    The setting of a website fingerprinting attack.  A local passive attacker
    analyzes a user's encrypted network traffic as it enters the network.  The
    goal is to infer which website is visited.  (Figure reprinted from
    Paper~\ref{paper:cat}.)
  }
  \label{fig:wf}
\end{figure}

In summary, Tor is a low-latency anonymity network often accessed with Tor
Browser.  Among the threats that Tor aims to protect against are local attackers
that see traffic as it enters or leaves the network (but not both at the same
time all the time).  A website fingerprinting attack is an example of a passive
attack that operates on entry traffic.  A machine-in-the-middle attack is an
example of an active attack that typically operates on exit traffic.  Discussion
of related work in the area of website fingerprinting is deferred until
Section~\ref{sec:related}.

\section{Research Questions} \label{sec:rqs}

The overall research objective spans two different areas: transparency logs and
low-latency anonymity networks.  We aim to reduce trust assumptions in
transparency log solutions and to apply such solutions in anonymous settings for
improved security and privacy.  We defined the following research questions to
make this concrete in Certificate Transparency and Tor, the two ecosystems with
the most history and dominant positions in their respective areas.

\begin{researchquestions}
  \item[Can trust requirements in Certificate Transparency be reduced in
    practice?]

  Transparency logs have a cryptographic foundation that supports efficient
  verification of inclusion and consistency proofs.  Such proofs are useful to
  reduce the amount of trust that is placed in the logs.  The roll-out of
  Certificate Transparency has yet to start using these proofs, and to employ a
  gossip protocol that ensures the same append-only logs are observed.  Part of
  the challenge relates to privacy concerns as parties interact with each other,
  as well as deploying gradually without breakage.

  We seek practical solutions that reduce the trust requirements currently
  placed in the logs and third-party monitors while preserving user privacy.

  \item[How can authentication of websites be improved in the context of Tor?]

  Tor Browser has yet to support Certificate Transparency to facilitate
  detection of hijacked websites.  This includes HTTPS sites but also onion
  services that may be easier to discover reliably with more transparency.

  We seek incremental uses of Certificate Transparency in Tor that preserve user
  privacy while engaging in new verification protocols to reduce trust.

  \item[How do the protocols used during website visits affect
    unlinkability between Tor users and their destination websites?]

  Several third-parties become aware of a user's browsing activities while a
  website is visited.  For example, DNS resolvers and certificate status
  responders may be consulted for domain name resolution and verification of if
  a certificate has been revoked.  Fetching an inclusion proof from a
  Certificate Transparency log would reveal the same type of information.

  We seek to explore how unlinkability between Tor users and their exit
  destinations is affected by the multitude of protocols used during website
  visits.  The considered setting is the same as in website fingerprinting,
  except that the attacker may take additional passive and active measures.
  For example, the attacker may volunteer to run a Certificate Transparency log
  (passive) or inject carefully-crafted packets into Tor~(active).
\end{researchquestions}

\section{Research Methods} \label{sec:methods}

We tackle privacy and security problems in the field of computer
science~\cite{icss,smics}.  Our work is applied, following the scientific method
for security and experimental networking research.  \emph{Exactly} what it means
to use the scientific method in these areas is up for debate~\cite{rfenr,sse}.
However, at a glance, it is about forming precise and consistent theories with
falsifiable predictions \emph{as in other sciences} except that the objects of
study are \emph{information systems in the real world}.

A prerequisite to formulating precise, consistent, and falsifiable theories is
that there are few implicit assumptions.  Therefore, scientific security
research should be accompanied by definitions of security goals and attacker
capabilities: what does it mean that the system is secure, and what is the
attacker (not) allowed to do while attacking it~\cite{secdefs}?  Being explicit about the
overall \emph{setting} and \emph{threat model} is prevalent in formal security
work like cryptography, where an abstract (mathematical) model is used to show
that security can be \emph{proved} by reducing to a computationally hard problem
(like integer factorization) or a property of some primitive (like the collision
resistance of a hash function)~\cite{provsec}.  It is nevertheless just as crucial in less
formal work that deals with security of systems in the real (natural)
world---the exclusive setting of the scientific method---which usually lends
itself towards break-and-fix cycles in light of new observations.  Where to draw
the line between \emph{security work} and \emph{security research} is not
trivial.  However, a few common \emph{failures} of past ``security research''
include not bringing observations in contact with theory, not making claims and
assumptions explicit, or simply relying on unfalsifiable claims~\cite{sse}.

While deductive approaches (like formal reduction proofs) are instrumental in
managing complexity and gaining confidence in different models, more than these
approaches are required as a model's \emph{instantiation} must also be secure~\cite{secdefs}.
It is common to complement abstract modeling with \emph{real-world measurements}
as well as \emph{systems prototyping and evaluations}~\cite{rfenr}.  Real-world
measurements measure properties of deployed systems like the Internet, the web,
and the Tor network.  For example, a hypothesis in a real-world measurement
could be that (non-)Tor users browse according to the same website popularity
distribution.  Sometimes these measurements involve the use of research
prototypes, or the research prototypes themselves become the objects of study to
investigate properties of selected system parts (say, whether a packet processor
with new features is indistinguishable from some baseline as active network
attackers adaptively inject packets of their choosing).  If it is infeasible,
expensive, or unsafe (see below) to study a real-world system, a simulation may
be studied instead.  The downside of simulation is that the model used may not
be a good approximation of the natural world, similar to formal
cryptographic~modeling.

The appended papers use all of the above approaches to make claims about
security, privacy, and performance in different systems, sometimes with regard
to an abstract model that can be used as a foundation in the natural world to
manage complexity.  Paper~\ref{paper:lwm} contains a reduction proof sketch to
show reliance on standard cryptographic assumptions.  Paper~\ref{paper:cat}
extends past simulation setups to show the impact of an added attacker
capability.  Meanwhile, Paper~\ref{paper:ctor} models part of the Tor network
with mathematical formulas to estimate performance overhead.  All but
Paper~\ref{paper:sauteed} contain real-world measurements relating to Internet
infrastructure, websites, certificates, Tor, or practical deployability of our
proposals.  All but Paper~\ref{paper:ctor} contain research prototypes with
associated evaluations, e.g., performance profiling, as well as corroborating or
refuting our security definitions in experimental settings.  All papers include
discussions of security and privacy properties as well as their limitations and
strengths in the chosen settings (where assumptions are explicit and threat
models motivated).

Throughout our experiments, we strived to follow best practices like documenting
the used setups, making datasets and associated tooling available, reducing
potential biases by performing repeated measurements from multiple different
vantage points, and discussing potential biases (or lack thereof)~\cite{rfenr}.
We also interacted with Tor's research safety board~\cite{trsb} to discuss the
ethics and safety of our measurements in Paper~\ref{paper:tlwo}, and refrained
from measuring real (i.e., non-synthetic) usage of Tor whenever possible
(Papers~\ref{paper:ctor} and~\ref{paper:cat}).  Finally, the uncovered bugs and
vulnerabilities in Papers~\ref{paper:cat}--\ref{paper:tlwo} were responsibly
disclosed to the Tor project.  This included suggestions on how to move forward.

\section{Contributions} \label{sec:contribs}

The main contributions of this thesis are listed below.  An overview of how they
relate to our research questions and appended papers is shown in
Figure~\ref{fig:contrib}.

\begin{figure}[!t]
  \centering
  \includegraphics[width=0.83\textwidth]{src/introduction/img/contribs}
  \caption{%
    Overview of appended papers, contributions, and research questions.
  }
  \label{fig:contrib}
\end{figure}

\vspace{1cm}
\begin{contributions}
  \item[Reduced trust in third-party monitoring with a signed tree head
    extension that shifts trust from non-cryptographic certificate notifications
    to a log's gossip-audit model (or if such a model does not exist yet, the
    logs themselves).]

    Paper~\ref{paper:lwm} applies existing cryptographic techniques for
    constructing static and lexicographically ordered Merkle trees so that
    certificates can be wild-card filtered on subject alternative names with
    (non-)membership proofs.  This building block is evaluated in the context of
    Certificate Transparency, including a security sketch and performance
    benchmarks.

  \item[Increased probability of split-view detection by proposing gossip
    protocols that disseminate signed tree heads without bidirectional
    communication.]

    Paper~\ref{paper:ctga} explores aggregation of signed tree heads at line
    speed in programmable packet processors, facilitating consistency proof
    verification on the level of an entire autonomous system.  Such verification
    can be indistinguishable from an autonomous system without any split-view
    detection to achieve herd immunity, i.e., protection without aggregation.
    Aggregation at 32 autonomous systems can protect 30-50\% of the IPv4 space.
    Paper~\ref{paper:ctor} explores signed tree heads in Tor's consensus.  To
    reliably perform an undetected split-view against log clients that have Tor
    in their trust root, a log must collude with a majority of directory
    authorities.

  \item[Improved detectability of website hijacks targeting Tor Browser by
    proposing privacy-preserving and gradual roll-outs of Certificate
    Transparency in Tor.]

    Paper~\ref{paper:ctor} explores adoption of Certificate Transparency in Tor
    Browser with signed certificate timestamps as a starting point, then
    leveraging the decentralized network of relays to cross-log certificates
    before ultimately verifying inclusion proofs against a single view in Tor's
    consensus.  The design is probabilistically secure with tunable parameters
    that result in modest overheads.  Paper~\ref{paper:sauteed} shows that
    Certificate Transparency logging of domain names with associated onion
    addresses helps provide forward censorship-resistance and detection of
    unwanted onion associations.

  \item[An extension of the attacker model for website fingerprinting that
    provides attackers with the capability of querying a website oracle.]

    A website oracle reveals whether a monitored website was (not) visited by
    any network user during a specific time frame.  Paper~\ref{paper:cat}
    defines and simulates website fingerprinting attacks with website oracles,
    showing that most false positives can be eliminated for all but the most
    frequently visited websites.  A dozen sources of real-world website oracles
    follow from the protocols used during website visits.  We enumerate and
    classify those sources based on ease of accessibility, reliability, and
    coverage.  The overall analysis includes several Internet measurements.

  \item[Remotely-exploitable probing-attacks on Tor's DNS cache that instantiate
    a real-world website oracle without any special attacker capabilities or reach.]

    Paper~\ref{paper:cat} shows that timing differences in end-to-end response
    times can be measured to determine whether a domain name is (not) cached by
    a Tor relay.  An estimated true positive rate of 17.3\% can be achieved
    while trying to minimize false positives.  Paper~\ref{paper:tlwo} improves
    the attack by exploiting timeless timing differences that depend on
    concurrent processing.  The improved attack has no false positives or false
    negatives.  Our proposed bug fixes and mitigations have been merged in Tor.

  \item[A complete redesign of Tor's DNS cache that defends against all (timeless)
    timing attacks while retaining or improving performance compared~to~today.]

    Paper~\ref{paper:tlwo} suggests that Tor's DNS cache should only share the
    same preloaded domain names across different circuits to remove the
    remotely-probable state that reveals information about past exit traffic.  A
    network measurement with real-world Tor relays shows which popularity lists
    are good approximations of Tor usage and, thus, appropriate to preload.
    Cache-hit ratios can be retained or improved compared to today's Tor.
\end{contributions}

\section{Summary of Appended Papers} \label{sec:appended}

The appended papers and their contexts are summarized below.  Notably, all
papers are in publication-date order except that Paper~\ref{paper:cat} predates
Papers~\ref{paper:ctor}--\ref{paper:sauteed}.

{
  \hypersetup{linkcolor=black}
  \listofsummaries
}

\section{Related Work} \label{sec:related}

This section positions the appended papers with regard to related work.  For
Certificate Transparency, this includes approaches towards signed certificate
timestamp verification, gossip, and the problem of monitoring the logs.  The
related work with regard to Tor is focused on the practicality of website
fingerprinting attacks and prior use of side-channels (such as timing attacks).

\subsection{Certificate Transparency Verification}

Approaches that fetch inclusion proofs have in common that they should preserve
privacy by not revealing the link between users and visited websites.
Eskandarian~\emph{et~al.}\ mention that Tor could be used to overcome privacy
concerns; however, it comes at the cost of added infrastructure
requirements~\cite{eskandarian}.  Lueks and Goldberg~\cite{lueks} and
Kales~\emph{et~al.}~\cite{kales} suggest that logs could provide inclusion
proofs using multi-server private information retrieval.  This requires a
non-collusion assumption while also adding significant overhead.
Laurie suggests that users can fetch inclusion proofs via DNS as their resolvers
already learned the destination sites~\cite{ct-over-dns}.  While surveying
signed certificate timestamp auditing,
Meiklejohn~\emph{et~al.}\ point out that Certificate Transparency over DNS may
have privacy limitations~\cite{sok-sct-auditing}.  For example, the time of
domain lookups and inclusion proof queries are detached.
Paper~\ref{paper:ctga} uses Laurie's approach as a premise while
proposing a gossip protocol.
Paper~\ref{paper:ctor} applies Certificate Transparency in a context
where Tor is not additional infrastructure~(Tor~Browser).

Several proposals try to avoid inclusion proof fetching altogether.
Dirksen~\emph{et~al.}\ suggest that all logs could be involved in the issuance of
a signed certificate timestamp~\cite{dirksen}.  This makes it harder to violate
maximum merge delays without detection but involves relatively large changes to
log deployments.  Nordberg~\emph{et~al.}\ suggest that signed certificate
timestamps can be handed back to the origin web servers on subsequent
revisits~\cite{nordberg}, which has the downside of assuming that
machine-in-the-middle attacks eventually cease for detection.
Nordberg~\emph{et~al.}~\cite{nordberg} as well as Chase and
Meiklejohn~\cite{chase} suggest that some clients/users could collaborate with a
trusted auditor.  Stark and Thompson describe how users can opt-in to use Google
as a trusted auditor~\cite{opt-in-sct-auditing}.  This approach was recently
replaced by opt-out auditing that cross-checks a fraction of signed
certificate timestamps with Google using
k-anonymity~\cite{opt-out-sct-auditing}.  Henzinger~\emph{et al.} show how such
k-anonymity can be replaced with a single-server private information retrieval
setup that approaches the performance of prior multi-server
solutions~\cite{henzinger}.  None of the latter two proposals provide a solution
for privately reporting that a log may have violated its maximum merge delay
because the trusted auditor is assumed to know about all signed certificate
timestamps.  Eskandarian~\emph{et~al.}\ show how to prove that a log omitted a
certificate privately~\cite{eskandarian}.  However, they use an invalid
assumption about today's logs being in strict timestamp
order~\cite{sok-sct-auditing}.  Paper~\ref{paper:ctor} suggests that Tor Browser
could submit a fraction of signed certificate timestamps to randomly selected
Tor relays.  These relays perform further auditing on Tor Browser's behalf: much
like a trusted auditor, except that no single entity is running it.

Merkle trees fix log content---not promises of logging.  Therefore,
inclusion proof fetching by users or their trusted parties must be accompanied
by consistency verification and gossip to get a complete gossip-audit
model~\cite{rfc6962}.  Chuat~\emph{et~al.}\ suggest that users and web servers
can pool signed tree heads, gossiping about them as they interact~\cite{chuat}.
Nordberg~\emph{et~al.}\ similarly suggest that users can pollinate signed tree
heads as they visit different web servers~\cite{nordberg}.  Hof and Carle
suggest that signed tree heads could be cross-logged to make all logs
intertwined~\cite{hof}.  Gunn~\emph{et~al.}\ suggest multi-path fetching of
signed tree heads~\cite{gunn}, which may make persistent split-views hard
depending on the used multi-paths.  Syta~\emph{et~al.}\ suggest that independent
witnesses could cosign the logs using threshold signatures~\cite{syta}.
Smaller-scale versions of witness cosigning received attention in
industry~\cite{sigsum-witness,trustfabric-arxiv}, and generally in other types
of transparency logs as well~\cite{parakeet}.  Larger browser vendors could
decide to push the same signed tree heads to their users, as proposed by Sleevi
and Messeri~\cite{sth-push}.  Paper~\ref{paper:ctga} uses the operators of
network vantage points for aggregating and verifying signed tree heads to
provide their users with gossip-as-a-service, however assuming plaintext
DNS traffic and a sound signed tree head frequency as defined by
Nordberg~\emph{et~al.}~\cite{nordberg}.  We used the multi-path assumptions of
Gunn~\emph{et~al.}\ to break out of local vantage points.  In contrast,
Paper~\ref{paper:ctor} ensures that the same logs are observed in the Tor
network by incorporating signed tree heads into Tor's consensus (thus making
directory authorities into witnesses).

Li~\emph{et~al.}\ argue that it would be too costly for most domains to run a
monitor~\cite{li}.\footnote{%
  Whether the third-party monitors in this study misbehaved or not can be
  questioned~\cite{ayer-on-li}.
} Similar arguments have been raised before, and lead to alternative data
structures that could make monitoring more efficient than today's
overhead~\cite{vds,coniks,tomescu}.  Paper~\ref{paper:lwm} falls into this
category, as the root of an additional static lexicographically-ordered Merkle
tree is added to a log's signed tree heads to encode batches of included
certificates.  The downside is that a non-deployed signed tree head extension is
assumed~\cite{rfc9162}, as well as a tree head frequency similar to those
described by Nordberg
\emph{et~al.}~\cite{nordberg}~to~get~efficiency~in~practise.

Paper~\ref{paper:sauteed} uses a Mozilla Firefox web extension to verify
embedded signed certificate timestamps in Tor Browser.  Such verification is
similar to the gradual deployments of Certificate Transparency in other
browsers~\cite{ct-history,does-ct-break-the-web}, and the starting point to
improve upon in Papers~\ref{paper:ctga}--\ref{paper:ctor}.  Moreover, the use of
Certificate Transparency to associate human-meaningful domain names with
non-mnemonic onion addresses (as in Paper~\ref{paper:sauteed}) is one of many
proposals for alternative naming systems and onion search
solutions~\cite{kadianakis,muffet-onions,nurmi,onion-location,h-e-securedrop,onio-ns}.

\subsection{Website Fingerprinting and Side-Channels}

Several researchers outline how past website fingerprinting attacks have been
evaluated in unrealistic conditions~\cite{juarez14,perryCrit,realistic}.
This includes not accounting for the size of the open-world setting, failing to
keep false positive rates low enough to be useful, assuming that homepages are
browsed one at a time, how to avoid dataset drift, and training classifiers on
synthetic network traces.  While some of these challenges were
addressed~\cite{onlinewf,realistic}, the question of how to deal with false
positives remains open.  Papers~\ref{paper:cat}--\ref{paper:tlwo} make a
significant dent in this problem by providing evidence that the website
fingerprinting attacker model could be made \emph{stronger} to capture
\emph{realistic real-world capabilities} that eliminate most false positives
around Alexa top-10k and the long tail~of~unpopular~sites.

Others have evaluated traffic analysis attacks against Tor beyond the website
fingerprinting setting.  On one side of the spectrum are end-to-end
correlation/confirmation attacks that typically consider a global passive
attacker that observes all network
traffic~\cite{johnson13,nasr18,oh22,rimmer22}.  Such strong attackers are not
within the scope of Tor~\cite{tor}.  On the other side of the spectrum are local
attackers that see a small fraction of the network, typically in a position to
observe a user's encrypted entry traffic (Figure~\ref{fig:wf}).  Many have
studied those \emph{weak attacks} in lab settings where, e.g., advances in deep
learning improved the accuracy significantly~\cite{wfdef,tiktok,df}.  Others
have focused on improved attacks that are \emph{active} in the Tor network from
their own local vantage points~\cite{chakravarty10,mittal11,murdoch05}, which is
similar to the techniques in Papers~\ref{paper:cat}--\ref{paper:tlwo}.
Greschbach~\emph{et~al.}\ show that an attacker who gains access to (or traffic
to~\cite{siby20}) commonly used DNS resolvers like Google's \texttt{8.8.8.8} get
valuable information to improve both end-to-end correlation and website
fingerprinting attacks~\cite{greschbach}.  Paper~\ref{paper:cat} generalizes the
attacker capability they uncovered by allowing the attacker to query Tor's
receiver anonymity set with a website oracle of time-frame~$t$.  It is further
shown that it is possible to instantiate such an abstraction in the real-world
while \emph{staying within Tor's threat model}.  In other words, the attacker is
still local but may employ passive and active measures to narrow down the
receiver anonymity set.  Paper~\ref{paper:ctor} proposes Certificate
Transparency verification that gives log operators website oracle access.
Tor's directory authorities tune $t$.

Website oracles exist because Tor is designed for anonymity---not unobservable
communication~\cite{anonterm}.  The instantiation of a real-world website oracle
is either a direct result of observing network flows from the protocols
used during website visits, or due to state of these network flows being stored
and inferable.  Inferring secret system state is widely studied in applied
cryptography and hardware
architecture~\cite{lucky13,ge18,kocher96,mart21,tsunoo03,heist}, where the goal
is usually to determine a key, decrypt a ciphertext, forge a message, or similar
using side-channels.  A side-channel can be local or remote and ranges from
analysis of power consumption to cache states and timing differences.  There is
a long history of remote timing attacks that are
practical~\cite{bbrumley11,dbrumley03,crosby09,wang22}.  A recent improvement in
this area that is relevant for Tor is timeless timing attacks, which exploit
concurrency and message reordering to eliminate network jitter~\cite{timeless}.
Paper~\ref{paper:cat} demonstrates a remote timing attack against Tor's DNS
cache that achieves up to 17.3\% true positive rates while minimizing false
positives.  Paper~\ref{paper:tlwo} instead uses a remote timeless timing attack
with no false positives, no false negatives, and a small time-frame $t$.  This
approaches an ideal website oracle without special attacker capabilities or
reach into third-parties.

\section{Conclusions and Future Work} \label{sec:concl}

Throughout the thesis, we contributed to the understanding of how trust
requirements in Certificate Transparency can be reduced.  Efficient and reliable
monitoring of the logs is easily overlooked.  If the broader ecosystem achieves
monitoring through third-parties, they should be subject to the same scrutiny as
logs.  We proposed a solution that makes it hard for third-party monitors to
provide subscribers with selective certificate notifications.  We also proposed
a gossip-audit model that plugs into interacting with the logs over DNS by
having programmable packet processors verify that the same append-only logs are
observed.  Avoiding the addition of complicated verification logic into end-user
software is likely a long-term win because it reduces the number of moving
parts.  In other words, simple gossip-audit models will be much easier to deploy
in the wide variety of end-user software that embeds TLS clients.

We also contributed to the understanding of how Certificate Transparency can be
applied in the context of Tor Browser.  Compared to a regular browser, this
results in a different setting with its own challenges and opportunities.  On
the one hand, Tor Browser benefits from the ability to preserve privacy due to
using the anonymity network Tor.  On the other hand, data relating to
website visits cannot be persisted to disk (such as signed certificate
timestamps blocked by maximum merge delays).  Our incrementally-deployable
proposal keeps the logic in Tor Browser simple by offloading all Certificate
Transparency verification to randomly selected Tor relays.  The design is
complete because mis-issued certificates can eventually reach a trusted auditor
who acts on incidents.  In addition to proposing Certificate Transparency in Tor
Browser, we also explored how certificates with onion addresses may improve the
association of domain names with onion addresses.  Such certificates ensure
domain owners know which onion addresses can be discovered for their sites, much
like Certificate Transparency does the same thing for public TLS keys.  This
also adds censorship resistance to the discovery as logs are append-only.

As part of exploring Certificate Transparency in Tor Browser, we further
contributed to the understanding of how the protocols used during website visits
affect unlinkability between Tor users and their destination websites.  For
example, fetching an inclusion proof from a Certificate Transparency log is one
such protocol.  We extended the attacker model of website fingerprinting attacks
with website oracles that reveal whether any network user visited a website
during a specific time frame.  Our results show that website oracles eliminate
most false positives for all but the most frequently visited websites.  In
addition to the theoretical evaluation of the extended attacker model, we could
exploit (timeless) timing attacks in Tor's DNS cache to instantiate real-world
website oracles without any special capabilities or reach into third-parties.
This led us to contribute to the understanding of how Tor's DNS cache performs
today, including a proposal for a performant alternative that preloads the same
popular domains on all Tor relays to withstand all (timeless) timing attacks.

As an outlook, our angle on Certificate Transparency verification has mostly
been \emph{reactive} for end-users.  In other words, some or all certificate
verification occurs asynchronously after a website visit.  An alternative to
this would be upfront delivery of inclusion proofs that reconstruct tree heads
which witnesses cosigned; a form of \emph{proactive} gossip as proposed by
Syta~\emph{et al.}~\cite{syta}.  The significant upside is that the browser's
verification could become non-interactive, eliminating privacy concerns and
ensuring end-users only see certificates merged into the append-only logs.
Investigating what the blockers for such a proposal are in
practice---today---would be valuable as log verification quickly becomes
complicated with signed certificate timestamps and reactive gossip-audit models.
Are these blockers significant?  Are they significant over time as other
\emph{eventual} changes will be needed, like post-quantum secure certificates?
New transparency log applications are unlikely to need the complexity of
Certificate Transparency, and should likely not copy something that was designed
to fit into an existing system with a large amount of legacy (such as
certificate authorities, their established processes for certificate issuance,
and the many client-server implementations already deployed on the Internet).

Orthogonal to the verification performed by end-users, contributing to the
understanding of how domains (fail to) use Certificate Transparency for
detecting mis-issued certificates is largely unexplored.  For example,
subscribing to email notifications of newly issued certificates becomes less
useful in an era where certificates are renewed frequently and automatically.
Instead, domain owners need easy-to-use solutions that raise alarms only if
there is a problem.

Finally, the mitigation deployed to counter our (timeless) timing attacks in
Tor's DNS cache is just that: a mitigation, not a defense, that applies to
modestly popular websites but not the long tail where the base rate is low.
This is because the attacker's needed website oracle time frame is so large that
a fuzzy time-to-live value does nothing.  Practical aspects of a preloaded DNS
cache need to be explored further before deployment, such as the assumption of a
third-party that visits popular domains to assemble an allowlist.  We may also
have \emph{underestimated} the utility of the existing Umbrella list, which in
and of itself does not require any new third-party.  Does the use of Umbrella
impact page-load latency?  Latency is the most crucial parameter to keep
minimized.  The question is whether frequently looked-up domains are missed or
not by skipping the website-visit step, as for the non-extended Alexa and Tranco
lists.

More broadly, the question of how to strike a balance between \emph{efficiency}
and \emph{effectiveness} of website fingerprinting defenses is open.  How much
overhead in terms of added latency and/or bandwidth is needed?  How much of that
overhead is sustainable, both from a user perspective (where, e.g., latency is
crucial for web browsing and other interactive activities) and a network health
perspective (such as the amount of volunteered relay bandwidth that is wasted)?  It is
paramount to neither overestimate nor underestimate attacker capabilities, which
goes back to the still-debated threat model of website fingerprinting attacks.
Regardless of if Tor's DNS cache becomes preloaded or not, it will be difficult
to circumvent DNS lookups from happening.  Someone---be it a weak attacker like
ourselves or a recursive DNS resolver at an Internet service provider---is in a
position to narrow down the destination anonymity set.  This is especially true
when also considering other protocols that reveal information about the
destination anonymity set during website visits.  Accepting that sources of
real-world website oracles are prevalent implies that \emph{the world can be
closed}.  Therefore, a closed world is more realistic than an open world.

\subsection*{Acknowledgments}
I received valuable feedback while writing the introductory summary from
  Simone Fischer-H\"{u}bner
  Johan Garcia,
  Stefan Lindskog, and
  Tobias Pulls.
The final draft was further improved with helpful nudges from Grammarly.

\bibliographystyle{plain}
\bibliography{src/introduction/refs}