1 files changed, 150 insertions, 0 deletions
diff --git a/summary/src/ctor/src/background.tex b/summary/src/ctor/src/background.tex
new file mode 100644
index 0000000..85d972f
--- /dev/null
+++ b/summary/src/ctor/src/background.tex
@@ -0,0 +1,150 @@
+\section{Background} \label{ctor:sec:background}
+The theory and current practise of CT is introduced first, then Tor
+and its privacy-preserving Tor Browser.
+
+\subsection{Certificate Transparency} \label{ctor:sec:background:ct}
+The idea to transparently log TLS certificates emerged at Google in response to
+a lack of proposals that could be deployed without drastic ecosystem changes
+and/or significant downsides~\cite{ct/a}.  By making the set of issued
+certificate chains\footnote{%
+	A domain owner's certificate is signed by an intermediate CA, whose
+	certificate is in turned signed by a root CA that acts as a trust
+	anchor~\cite{ca-ecosystem}.  Such a \emph{certificate chain} is valid if it
+	ends in a trusted anchor that is shipped in the user's system software.
+} transparent, anyone that inspect the logs can detect certificate
+mis-issuance \emph{after the fact}.  It would be somewhat circular to solve
+issues in the CA ecosystem by adding trusted CT logs.  Therefore, the
+cryptographic foundation of CT is engineered to avoid any such reliance.
+Google's \emph{gradual} CT roll-out started in 2015, and evolved from 
+downgrading user-interface indicators in Chrome to the current state of hard
+failures unless a certificate is accompanied by a signed \emph{promise} that it
+will appear in two CT logs~\cite{does-ct-break-the-web}.  Unlike Apple's
+Safari~\cite{apple-log-policy}, these two logs must additionally be operated by
+Google and not-Google to ensure independence~\cite{google-log-policy}.
+
+The lack of mainstream verification, i.e., beyond checking signatures, allows an
+attacker to side-step the current CT enforcement with minimal risk of exposure
+\emph{if the required logs are controlled by the attacker}.  
+CTor integrates into the gradual CT roll-out by starting on the
+premise of pairwise-independently trusted CT logs, which
+avoids the risk of bad user experience~\cite{does-ct-break-the-web}
+and significant system complexity.  For example, web pages are unlikely to
+break, TLS handshake latency stays about the same, and no robust management of
+suspected log misbehavior is needed.  Retaining the latter property as part of
+our incremental designs simplifies deployment.
+
+\subsubsection{Cryptographic Foundation}
+The operator of a CT log maintains a tamper-evident append-only Merkle
+tree~\cite{ct,ct/bis}.  At any time, a Signed Tree Head (STH) can be produced
+which fixes the log's structure and content.  Important attributes of an STH
+include
+	the tree head (a cryptographic hash),
+	the tree size (a number of entries), and
+	the current time.
+Given two tree sizes, a log can produce a \emph{consistency proof} that proves
+the newer tree head entails everything that the older tree head does.  As such,
+anyone can verify that the log is append-only without downloading all entries
+and recomputing the tree head.  Membership of an entry can also be proven
+by producing an \emph{inclusion proof} for an STH.  These proof techniques are
+formally verified~\cite{secure-logging-and-ct}.
+
+Upon a valid request, a log must add an entry and produce a new STH that covers
+it within a time known as the Maximum Merge Delay (MMD), e.g., 24~hours.  This
+policy aspect can be verified because in response, a Signed Certificate
+Timestamp (SCT) is returned.  An SCT is a signed promise that an entry will
+appear in the log within an MMD.  A log that violates its MMD is said to perform
+an \emph{omission attack}.  It can be detected by challenging the log to prove
+inclusion.  A log that forks, presenting one append-only version
+to some entities and another to others, is said to perform a \emph{split-view
+attack}.  Split-views can be detected by STH
+gossip~\cite{chuat,dahlberg,nordberg,syta}.
+
+\subsubsection{Standardization and Verification}
+The standardized CT protocol defines public HTTP(S) endpoints that allow anyone
+to check the log's accepted trust anchors and added certificates, as well as
+to obtain the most recent STH and to fetch proofs~\cite{ct,ct/bis}.  For
+example, the \texttt{add-chain} endpoint returns an SCT if the added certificate
+chain ends in a trust anchor returned by the \texttt{get-roots} endpoint.  We
+use \texttt{add-chain} in Section~\ref{ctor:sec:incremental}, as well as several
+other endpoints in Section~\ref{ctor:sec:base} to fetch proofs and STHs.  It might be
+helpful to know that an inclusion proof is fetched based on two parameters: a
+certificate hash and the tree size of an STH.  The former specifies the log entry
+of interest, and the latter with regards to which view inclusion should be
+proven.  The returned proof is valid if it can be used in combination with the
+certificate to reconstruct the STH's tree head.
+
+The CT landscape provides a limited value unless it is verified that the logs
+play by the rules.  What the rules are changed over time, but they are largely
+influenced by the major browser vendors that define \emph{CT policies}.  For
+example, what is required to become a recognized CT log in terms of uptime and
+trust anchors, and which criteria should pass to consider a certificate CT
+compliant~\cite{apple-log-policy,google-log-policy}.  While there are several ways that
+a log can misbehave with regards to these policy aspects, the most fundamental
+forms of cheating are omission and split-view attacks.  A party that follows-up
+on inclusion and consistency proofs is said to \emph{audit} the logs.
+
+Widespread client-side auditing is a premise for CT logs to be untrusted, but
+none of the web browsers that enforce CT engage in such activities yet.  For
+example, requesting an inclusion proof is privacy-invasive because it leaks
+browsing patterns to the logs, and reporting suspected log misbehavior comes
+with privacy~\cite{ct-with-privacy} as well as operational challenges.
+Found log incidents are mostly reported manually to the CT policy
+list~\cite{ct-policy-mailing-list}.  This is in contrast to automated
+\emph{CT monitors}, which notify domain owners
+of newly issued certificates based on what actually appeared in the public
+logs~\cite{lwm,ct-monitors}.
+
+\subsection{Tor} \label{ctor:sec:background:tor}
+
+Most of the activity of Tor's millions of daily users starts with Tor Browser
+and connects to some ordinary website via a circuit comprised of three
+randomly-selected Tor relays. In this way no identifying information from
+Internet protocols (such as IP address) are automatically provided to the
+destination, and no single entity can observe both the source and destination of
+a connection. Tor Browser is also configured and performs some filtering to resist
+browser fingerprinting, and first party isolation to resist sharing state or
+linking of identifiers across origins. More generally it avoids storing
+identifying configuration and behavioral information to disk.
+
+Tor relays in a circuit are selected at random, but not uniformly. A typical
+circuit is comprised of a \emph{guard}, a \emph{middle}, and an \emph{exit}. A
+guard is selected by a client and used for several months as the entrance to all
+Tor circuits. If the guard is not controlled by an adversary, that adversary
+will not find itself selected to be on a Tor circuit adjacent to (thus
+identifying) the client. And because some relay operators do not wish to act as
+the apparent Internet source for connections to arbitrary destinations, relay
+operators can configure the ports (if any) on which they will permit connections
+besides to other Tor relays. Finally, to facilitate load balancing, relays are
+assigned a weight based on their apparent capacity to carry traffic. In keeping
+with avoiding storing of linkable state, even circuits that share an origin will
+only permit new connections over that circuit for ten minutes. After that, if
+all connections are closed, all state associated with the circuit is cleared.
+
+Tor clients use this information when choosing relays with which to build a
+circuit. They receive the information via an hourly updated \emph{consensus}.
+The consensus assigns weights as well as flags such as \texttt{guard} or
+\texttt{exit}. It also assigns auxiliary flags such as
+\texttt{stable}, which, e.g.,
+is necessary to obtain the \texttt{guard} flag since guards must have good
+availability. Self-reported information by relays in their \emph{extra-info
+document}, such as statistics on their read and written bytes, are also part of
+the consensus and uploaded to \emph{directory authorities}. Directory
+authorities determine the consensus by voting on various components making up
+the shared view of the state of the Tor network. Making sure that all clients
+have a consistent view of the network prevents epistemic attacks wherein clients
+can be separated based on the routes that are consistent with their
+understanding~\cite{danezis:pets2008}. This is only a very rough sketch of Tor's
+design and operation.  More details can be found by following links at Tor's
+documentation site~\cite{tor-documentation}.
+
+Tor does not aim to prevent end-to-end correlation attacks. An adversary
+controlling the guard and exit, or controlling the destination and observing the
+client ISP, etc., is assumed able to confirm who is connected to whom on that
+particular circuit. The Tor threat model assumes an adversary able to control
+and/or observe a small to moderate fraction of Tor relays measured by both
+number of relays and by consensus weight, and it assumes a large
+number of Tor clients
+able to, for example, flood individual relays to detect traffic signatures of
+honest traffic on a given circuit~\cite{long-paths}. Also, the adversary can
+knock any small number of relays offline via either attacks from clients or
+direct Internet DDoS.