Port Contention Goes Portable: Port Contention Side Channels in Web Browsers

CPU Port Contention, a recently discovered micro-architectural side-channel, can be implemented in web-browsers, substantially increasing the attack surface.

Port Contention Goes Portable: Port Contention Side Channels in Web Browsers

Microarchitectural attacks have been greatly explored over the last years. They can have a great impact on security by allowing attackers to circumvent system isolation. However, to mount such an attack, the attacker must run code on the victim's machine, which is a strong condition in today's systems, where users grew distrustful of unknown code.

In this post, we show how we developed port-contention side channels by running them entirely in JavaScript, dramatically widening its threat surface. We explain how we solved the challenges brought by the browser sandbox, and show concrete examples of port contention attacks. In particular, our attack shows the fastest covert channel in the browser, thus breaking the isolation model at the core of modern browsers' security model. With this covert channel, an attacker can exchange cookies with other tabs or leak secret information.

📰
Port Contention Goes Portable: Port Contention Side Channels in Web Browsers
Thomas Rokicki, Clémentine Maurice, Marina Botvinnik, Yossi Oren
AsiaCCS 2022

Citation

@inproceedings{RokickiMBO22, author = {Thomas Rokicki and Cl{\'{e}}mentine Maurice and Marina Botvinnik and Yossi Oren}, title = {Port Contention Goes Portable: Port Contention Side Channels in Web Browsers}, booktitle = {AsiaCCS}, pages = {1182--1194}, publisher = {{ACM}}, year = {2022} }

Microarchitectural Side Channels

Modern processors are highly optimized pieces of hardware. With the evolution of computing, they have remarkably shrunk while offering exponentially greater computation power. To allow greater performances without changing the frequency of the CPU, hardware vendors have developed a wide range of specific optimizations, in order to reduce the execution time of certain operations.

However, these optimizations can leak sensitive information. As they are situated below the software security brought by the OS, these optimizations are shared between the users. A malicious user can exploit differences in computation time to retrieve secret information. These microarchitectural side channels can target many different hardware components, but most of them share two major prerequisites:

  • A high-resolution timer to detect the subtle timing differences caused by optimizations. Generally speaking, these attacks require a resolution of a few nanoseconds.
  • Running malicious code on the victim's machine, in order to use its hardware.

The most well-known example of microarchitectural side channels is based on the CPU Cache. The cache is a very small (around 64 MB), high-speed memory. It is used to dynamically store the values of often-used values, as loading data from the cache is faster than from the DRAM. The cache is shared by all processes on the machine. A malicious user can measure the loading time of certain addresses to monitor other processes: if an access is quicker than usual, it means that the address is in the cache, i.e., another process used it recently. These attacks can allow a process to monitor other processes, or retrieve cryptographic secrets. Recently, transient-execution attacks, such as Spectre, often use microarchitectural side channels to extract sensitive information.

Port Contention

Port contention is a microarchitectural side channel first described by Aldaya et al. in 2019. It exploits HyperThreading, a technology allowing, at an abstract level, to create two or more logical execution threads on a single physical core. This means that, at the OS level, each physical core offers several threads, but logical threads on the same core share all hardware components in the execution pipeline.

Figure 1: Simplification of the execution pipeline on a single physical core.

Port contention exploits the CPU ports. Figure 1 presents an abstracted view of the execution pipeline for a single physical core. Thread 1 and thread 2 are both HyperThreading logical threads. The decoder fetches the instruction to execute and decompose them in micro-operations (or uops), a smaller, more atomic form of operation. Then, the uops are sent to the execution engine to be executed. The execution engine is composed of a various number of execution units. Each execution unit is specialized for a certain type of operations, e.g., arithmetical operations or memory loads. The uops are transmitted to their associated execution units through CPU Ports. These ports act as a bottleneck, as a single uop can go through a port per cycle.

Figure 2.a: All the attacker uops are executed in a row.
Figure 2.b: The attacker's uops are delayed by the victim's computations.

By repeatedly calling and timing instructions with a specific port usage, an attacker can monitor the port usage of other processes on the same core. Figure 2 illustrates the process. When no other processes emit uop on port 1, as in Figure 2.a, all the attacker uops are executed in a row, resulting in fast execution. However, when the victim emits an uop on port 1, as in Figure 2.b, the second attacker uop is delayed in the queue, resulting in slower execution. In this simplified scenario, the attacker can know whether or not the victim used instructions with port 1 usage. Using this principle, Aldaya et al. mounted an attack against OpenSSL's TLS implementation to recover private keys. Port contention was also leveraged to mount a speculative execution attack in SMoTherSpectre.

Port Contention in Browsers

Challenges

As all microarchitectural attacks, port contention requires running code on the victim's hardware. This is a strong assumption for native attacks, as users rarely download and run malicious code voluntarily.

The main contribution of our paper is to implement port contention in a web setting. Client-side languages, such as JavaScript or Web-Assembly, are by design meant to be downloaded automatically from a web server and run on the client's hardware. This allows to fulfill one of the prerequisites for microarchitectural attacks, thus substantially increasing their threat surface. However, in the case of port contention, this also introduces new challenges:

  • [C1]: Port contention attacks, like most timing attacks, require access to high-resolution timers, which are not available by default in the browser.
  • [C2]: The high level of abstraction of the browser means that the attacker can neither know nor control on which core the code is run.
  • [C3]: Client-side languages are run inside of a secured sandbox, restricting access to native instructions and memory addresses or port usage.

High-resolution timers - C1

Schwarz et al. show how an attacker can reconstruct high-resolution timers in the JavaScript language by using SharedArrayBuffers. They are array shared between the main JavaScript thread and Web Workers, Javascript multithreading implementation. A Web Worker is created and increments a value in the array in a continuous loop. As the duration of the increment is relatively small and steady, we can use it as a time measurement. When the main thread needs a timestamp, it just has to read the shared value.

Fig 3: SharedArrayBuffer-based clock

This clock has a resolution of 10 nanoseconds, which is sufficient to mount our attack, thus resolving C1. However, it is less precise than native clocks and the continual increments reintroduce noise that can deteriorate the results.

Core management - C2

As port contention exploits shared components inside a physical core, we need both the attacker and the victim on the same core. Due to its sandbox design, JavaScript or WebAssembly do not allow the user to know or control on which core the code is running.

We can, however, exploit the OS's scheduler to try and have both parties on the same core. The scheduler dynamically dispatches software tasks between cores. The scheduler heuristics are complicated, but its goal is to balance the workload between all cores to enhance performance. By creating several WebWorkers running port contention computations, they have a high probability of being set on different cores, as they require heavy computations. This ensures that at least one of them shares a physical core with the victim.

PC Detector - C3

JavaScript and WebAssembly are high-level languages. They offer a strong abstraction from memory management, data structure, or hardware. This is highly practical on the web to build simple and portable scripts running in the browser. However, in our attack scenario, it prevents us to know what really happens at the microarchitecture level. In particular, we don't know the translation of our WebAssembly instructions in native x86 instructions, nor the micro-operations or port usage of our script.

To create and exploit contention, we must find instructions that are decomposed in uops dispatched in a specific port. To do so, we built PC-Detector, a framework automatically testing all available instructions for a system, and testing if the instruction creates contention on a specific port.

To detect contention on the web, we use known native instructions paired with unknown WebAssembly instructions. For instance, native vpermd emits one micro-operation on port 5. If we detect contention when repeatedly calling both vpermd and our WebAssembly instruction, this means our WebAssembly instruction emits at least one micro-operation on port 5.

By repeating this operation for all instructions on different ports, we were able to identify instructions creating contention on port 0, port 1, port 23, port 5, and port 6 on an Intel Skylake CPU.

Proof of concepts

We now know that port contention can be achieved in the browser.
We can even target several ports to try to leak more data.
However, this is still a basic building block, an attack vector.
To illustrate the capacity of port contention in the browser, we created two proofs of concept: a covert channel and an artificial side-channel example.

Artificial Side Channel Gadget

Programs often change behavior based on user input or environment values. For instance, vulnerable cryptographical implementations will not execute the same computations based on the secret key. This difference in behavior can be detected indirectly through leaks of information in side-channel attacks.

Fig. 4: Illustration of the workflow of the victim. If the secret is 0, it will run POPCNT in a loop, otherwise it will run VPERMD.

We built a synthetic and generic example (Fig. 4) showing how a program, which execution depends on secret information, is vulnerable to WebAssembly port contention. The victim is a native unprivileged program, running different code sections based on the bits of secret information. As port usage differs between branches, an attacker monitoring port contention can infer the secret. One branch will execute the POPCNT instruction in a loop, creating contention on port 1. The other branch executes VPERMD, creating contention on port 5. By measuring port contention on either port 1 or 5 in the browser, we can detect which branch was executed, and infer the associated secret.

The victim and the attacker are two different processes. This means that the attacker does not know on which core the victim runs, and must measure contention on all cores and analyze all the traces.

The smallest number of instructions detectable in a branch is critical, as it represents our spatial resolution. A high resolution means that more pieces of code are vulnerable, hence widening the attack scope. At best, in a single trace setting, we achieved a spatial resolution of 1024 native instructions, i.e., 3072 bytes. This number is on par with other microarchitectural side channels in the browser, such as the Prime+Probe cache attack.

Covert Channel

A covert channel exploits data leakage in the microarchitecture to create a communication channel between two parties that should not be able to communicate. In our case, a script running inside of the sandbox could communicate with a native program through port contention. This can be particularly threatful as it could be used to monitor users, steal cookies or extract sensitive data.

The physical layer of our channel, i.e., sending 0s and 1s, is fairly simple. A sender creates contention on a specific port for a specific time to send a 1 and idles for the same time to send a 0. The receiver just has to measure contention on this port to receive the flow of information.

Fig 4: Illustration of our covert channel

We've built a two-way covert channel from the browser to the native program. It handles errors and desynchronization with a request to send protocol and offers a bandwidth of 200 bits per second, which is higher than other microarchitectural (or even software!) covert channels in the browser.

Our covert channel also works when one of the two parties is situated inside a VM or a container, and we can even communicate between two different browsers or two tabs in the same browser.

Countermeasures

Countermeasures for this vulnerability could potentially be implemented at three different levels:

  • Hardware level: At its very core, port contention exploits the sharing of CPU ports between a victim and an attacker, relying heavily on HyperThreading. Some parties have chosen to disable it totally, but it comes at a high-performance cost, bringing a performance degradation of 15%. Research have proposed alternative solutions, mainly based on still implementing HyperThreading but partitioning the resource so they are not shared.

  • OS Level: At the OS/Software level, some have suggested port-independent code, i.e., that the port usage does not differ based on a secret. However, this means rewriting all pre-existing code and training all software developers. The scheduler could also be aware of SMT attacks and, for instance, allow highly sensitive operations to run on different physical cores than other computations.

  • Browser Level: After the publication of previous microarchitectural attacks in the JavaScript sandbox, several countermeasures have been implemented. A popular solution is to remove access to high-resolution timers. However, we've seen that it is no easy task as auxiliary timers can always be built. Browser vendors also tried to add more isolation, mainly at the process level, but it has little to no impact on port contention attacks. Browser vendors claim that granted the high level of abstraction of browsers, browser-level mitigations are not viable and result in too much cost for low results.

Useful Links