# True-Time Delay Cancellers for Full-Duplex 

VEJDE NILSSON \& VILGOT SNYGG
MASTER'S THESIS
DEPARTMENT OF ELECTRICAL AND INFORMATION TECHNOLOGY FACULTY OF ENGINEERING | LTH | LUND UNIVERSITY


# True-Time Delay Cancellers for Full-Duplex 

Vejde Nilsson \& Vilgot Snygg<br>Department of Electrical and Information Technology Lund University

Supervisor: Jonas Lindstrand, Rehman Akbar \& Henrik Sjöland
Examiner: Pietro Andreani
February 29, 2024

## Abstract

The concept of simultaneously receiving and transmitting at the same frequency is known as Full-Duplex (FD). Such a wireless system is a novel technique which could effectively half the required channel Bandwidth (BW) for the same data rate. In FD systems, the leakage of the Transmitter (Tx), called Self-Interference (SI) signal, leaks into the Receiver ( Rx ) which is a major hardware-related obstacle for reliable FD operation. Multiple leakage paths exists for the SI signal, such as direct path (DP) antenna-to-antenna feed through or environment-based multisurface reflection paths (RP). By taking the output of the Tx, introducing delay and amplitude compensation, then subtracting this signal from the SI signal in the Rx front end, the SI can effectively be canceled in the analog Radio Frequency (RF) domain. This thesis is investigating how such a SIC circuit, targeting cancellation in the RF analog domain, could be designed. Large focus is given to True-Time Delay (TTD) generation, which is a major part of the SIC system. The proposed RF SIC is a hybrid solution with passive pre-LNA (Low-Noise Amplifier) SIC and an active post-LNA SIC aimed at an operating frequency of 10 GHz with 100 MHz of carrier bandwidth. The passive TTD is implemented as a binary weighted delay chain with cascaded lumped LC transmission line filters for course tuning and an $L R-R L$ lattice all-pass filter with tunable coupled inductors for fine tuning. For the active TTD, a Time Interleaved (TI) N-path circuit was used. The passive TTD achieves a delay range of $62-1661 \mathrm{ps}$ with an estimated rms (Root Mean Square) cancellation of 18.64 dB over the BW. For the active TI N-path TDD, a delay range of $22-1772 \mathrm{ps}$ is showcased with an estimated rms cancellation of 19.53 dB over the BW. The TI N-path TTD can be scaled up to achieve longer delays and branched to generate multiple outputs for cancellation of multipath reflections with reduced chip-area.

## Popular Science Summary

You have two friends called Adam and Bob. Imagine Adam standing next to you and shouting to Bob across the street. Meanwhile, Bob is trying to communicate with you. It will be very difficult for you to hear what Bob is saying because Adam will be overpowering him.

This is an analogy for a radio system trying to operate in full-duplex where you represent the receiver, Adam the transmitter and Bob another radio device. In more technical terms, full-duplex is the concept of simultaneously sending and receiving signals at the same frequency. The problem with such radios is that the transmitter (Adam) overpowers the received signal from another radio device (Bob), such that the receiver (you) can not interpret the information. This signal coming from the transmitter is called self-interference (SI) and in order for fullduplex to work, the SI must be cancelled.

If we could recreate the SI signal perfectly we could cancel the SI by combining it with the inverse of the recreated signal. The transmitted signal is an electromagnetic wave which when traveling through the air will lose some of its power and bounce off objects in the environment around the radio. All of this gives the SI signal some delay and some amplitude loss before entering the receiver. Therefore, some delay also needs to be applied to the recreated SI signal.

This thesis is focused on how to generate this delay using modern semiconductor circuits, which is more difficult than you might think when considering that the delay must be adjustable down to only one picosecond. This is the time it takes for light to travel 0.3 mm !

## Preface

This master thesis is the final project of an engineering master's degree within high frequency electronics. The project has been carried out at Ericsson AB who have been providing the authors with office space, computers, knowledge and guidance on top of an inspiring and innovative environment. We would like to thank Stefan Andersson, manager for RF frontend and PA at Ericsson Research in Lund, for approving this thesis project. We want to thank Jonas Lindstrand and Rehman Akbar, the project supervisors at Ericsson, who has been very generous in their commitment. They provided both insightful feedback and learning opportunities. A final word of appreciations is also dedicated to Henrik Sjöland, the Lund University supervisor.

## Table of Contents

1 Introduction ..... 1
2 Full Duplex System Overview ..... 3
2.1 Full-duplex vs Half-duplex ..... 3
2.2 Self-Interference Paths ..... 4
2.3 Rate gain region ..... 6
2.4 Different SIC-Techniques ..... 6
2.4.1 Antenna domain ..... 7
2.4.2 Analog domain ..... 8
2.4.3 Digital domain ..... 8
2.5 System Specifications ..... 9
3 Considerations for RF SIC ..... 11
3.1 System Hardware with ideal sub-blocks ..... 11
3.2 Noise Figure degradation from noise injection and non-linear distortions ..... 13
3.3 Attenuation Balance and Injection Points ..... 14
3.4 Cancellation Required in the analog RF domain ..... 17
3.5 Cancellation vs Delay and Amplitude Error ..... 18
3.6 Proposed Hybrid Topology ..... 19
4 True Time Delay Circuits ..... 21
4.1 Passive TTD Circuits ..... 21
4.1.1 Delay generation ..... 22
4.1.2 Tuning ..... 27
4.2 Active TTD Circuits ..... 32
4.2.1 N-path Circuit ..... 32
4.2.2 Time Interleaved N -path ..... 37
4.2.3 Adding a Second Time Interleaved Stage and Branching the TI N-path ..... 40
4.2.4 $\quad g_{m}$ based delays ..... 44
5 Circuit Design ..... 47
5.1 Design of Passive TTD ..... 47
5.1.1 Binary weighted delay with lumped $L C$ transmission lines ..... 50
5.1.2 $L R-R L$ lattice fine tuning filter with coupled inductors ..... 56
5.2 Design of the N-path TTD circuit ..... 62
5.2.1 Original N-path ..... 62
5.2.2 Time-Interleaving the N-path ..... 69
5.2.3 Branched Time-Interleaved N -path ..... 74
6 Results and Discussion ..... 77
6.1 True-Time Delay Performance for Active and Passive Techniques ..... 77
6.1.1 Results of the Passive Binary Weighted TTD ..... 78
6.1.2 Results of the TI N-path TTD ..... 83
6.2 Showcasing high delay with Branched Double TI N-path ..... 87
6.3 Estimated Performance of the Combined Hybrid SIC ..... 88
7 Summary and Conclusion ..... 91
A Appendix ..... 97
A. 1 RF SIC Block Level Calculations ..... 97

## List of Figures

2.1 The difference between half-duplex and full-duplex illustrated in a time-frequency graph ..... 4
2.2 Self-interference paths from Tx to Rx for different antenna structures. ..... 5
2.3 The three domains of the transceiver where SIC can be performed [5] ..... 7
2.4 Overview of the SIC techniques in a tree diagram [5] ..... 7
3.1 Overview of the transceiver topology and the different signal levels ..... 12
3.2 Noise injection from the SIC into the Rx ..... 13
3.3 Simplified model of an SIC block ..... 14
3.4 The trade-off in attenuation distribution for the SIC. Both power levels in this figure are output referred. ..... 15
3.5 Contour line of NF degradation for pre-LNA injection. ..... 16
3.6 Contour line of NF degradation for post-LNA injection ..... 16
3.7 Contour line of IMD3 power levels normalized to the noise floor for pre-LNA injection ..... 16
3.8 Contour line of IMD3 power levels normalized to the noise floor for post-LNA injection. ..... 16
3.9 Simulated contour plot of cancellation versus amplitude error and de- lay error ..... 19
4.1 General lattice all-pass filter ..... 22
4.2 Phase response of a lattice low-in-phase filter ..... 23
4.3 Phase response of a lattice high-in-phase filter ..... 23
4.4 RC-CR lattice filter ..... 24
4.5 LR-RL lattice filter ..... 24
4.6 LC-CL lattice filter ..... 25
4.7 Differential lumped LC transmission line ..... 26
4.8 Bode diagram of a second order low-pass filter for different quality factors ..... 26
4.9 Equally sized delay blocks in a tapped topology ..... 27
4.10 Binary weighted delay chain ..... 28
4.11 The delay path and bypass path of a delay block ..... 28
4.12 Transistors stacked in series for increased linearity and isolation ..... 29
4.13 Ground connected transistors for increased isolation ..... 29
4.14 Tuning of a binary weighted delay chain ..... 29
4.15 Mutually coupled inductors with a capacitor in the secondary circuit ..... 30
4.16 Binary weighted tunable capacitor with three bits ..... 31
4.17 Switchable capacitor ..... 31
4.18 N-path sample and hold. The next stage needs to provide high output impedance. ..... 33
4.19 Simple sample and hold circuit. The input impedance of the buffer ( $Z_{\text {out }}$ ) is high to not load the sample node. ..... 33
4.20 N sample and reconstruct branches of an N-path circuit ..... 33
4.21 Visualization of the sample and hold mechanism of an $N=8 \mathrm{~N}$-path TTD. ..... 33
4.22 Output and input matching of the N-path. ..... 34
4.23 Parasitic effects of the MOSFET as a switch for N-path circuit ..... 35
4.24 Summary of the parasitic elements in the full N-path circuit. ..... 36
4.25 Contour plot of the maximum achievable delay normalized by the in- put signal time period $T_{r f}$. This is plotted against the number of phases $N$ on the vertical axis and the sample clock frequency $f_{s}$ normalized against $f_{r f}$. The red curves shows the sample rate in sample/waveform. ..... 37
4.26 Schematic of an $N$ by $M$ TI N-path. ..... 38
4.27 Clock programming for a TI N-path. In this example, the input sample and output reconstruct ( $\Phi$ and $\Phi_{d}$ respectively) is set to the same phase. $d_{c l k \Psi}$ and $d_{c l k \Psi d}$ is the delay of the interleaved clocks relative to the input sample clock ..... 39
4.28 Introducing a second TI stage into the N-path circuit. ..... 41
4.29 Clock programming of a double TI N-path circuit with the following number of stages: $N=5, M=4, K=3$. ..... 42
4.30 Visual representation of the delay tuning mechanism for a double TI N -path circuit. ..... 43
4.31 Comparison of the conventional parallel TI N-path and the proposed branched TI N-path ..... 43
4.32 The left side plot presents the phase of an ideal TTD, the magenta line, compared to a first order approximation, the blue line. To the right, the resulting time delay is plotted. ..... 45
4.33 Delay plots for different values of $\omega_{0}$ to showcase the increased delay variation over frequency ..... 45
5.1 The entire passive binary weighted TTD, with its two fundamental parts ..... 48
5.2 Delay block with $\lambda$ or longer delay time ..... 50
5.3 The impedance is inverted when switching on and off the switch after the quarter wavelength transmission line ..... 51
5.4 The $L C_{L P}$ stage with a low-pass switching mechanism ..... 51
5.5 The $L C_{c c}$ stage with $N_{c c}$ cascaded $L C$ stages ..... 52
5.6 Topology for the $\lambda / 2$ delay block ..... 53
5.7 Topology for the $\lambda / 4$ and $\lambda / 8$ delay blocks ..... 53
5.8 Layout of the inductor for the $L C$ stage with 25 ps of delay ..... 54
5.9 Layout of the inductor for the LC stage with 12.5 ps of delay ..... 54
5.10 Signal power being attenuated the most by the biggest delay block ..... 54
5.11 The tunable coupled inductor implemented in an $L R-R L$ lattice filter ..... 56
5.12 Effective inductance as a function of capacitance for different primary inductances ..... 58
5.13 Effective inductance as a function of capacitance for different sec- ondary inductances ..... 58
5.14 Layout of the primary inductor coil ..... 58
5.15 Layout of the secondary inductor coil ..... 58
5.16 Delay vs capacitance of the $L R-R L$ lattice filter using the drawn cou- pled inductor and an ideal capacitor ..... 59
5.17 Capacitance of the tunable capacitor for all the levels ..... 61
5.18 Effective inductance for all the levels ..... 61
5.19 Delay of the $L R-R L$ lattice filter for all levels at 10 GHz ..... 61
5.20 Delay of the $L R-R L$ lattice filter for all levels over frequency ..... 61
5.21 Impact of output buffer input impedance $R_{L}$ on TTD performance of the original N -path. Both design were set to generate around 100 ps of delay. For design one, this means a total of four clock pulses $(\tau=1 /(8 \cdot 5 \mathrm{GHz})=25 \mathrm{ps})$ and for the second delay a total of five clock pulses $(\tau=1 /(16 \cdot 3 \mathrm{GHz})=20.8 \mathrm{ps})$ ..... 64
5.22 Total delay for design one versus design two. ..... 65
5.23 Output magnitude of first design $\left(f_{s}=5 \mathrm{GHz}, N=8\right)$ N-path circuit with increasing output buffer input capacitance $C_{i n, A 2}$ ..... 66
5.24 LC resonator placed at the reconstruction node to resonate the para- sitic capacitance of the output buffer and switches. $C_{S}$ prevents a DC short to ground through the inductor. ..... 66
5.25 Isolated simulations of LC resonator to be implemented at the re- construction node of the $N$-path carried out for unloaded and loaded circuits with $C_{i n, A 2}=20 f, 50 f$ \& $100 f$ and $Q_{L}=20$. ..... 68
5.26 Magnitude of the output signal of the N-path after the LC Resonator from figure 5.25 is implemented. ..... 68
5.27 Phase response of the N -path with an LC resonator at the reconstruc- tion node for different values of $C_{i n, A 2}$. As the capacitance increase, the quality factor also increases and the 90 degree phase shift is more visible. ..... 68
5.28 Transient output signal of the N -path without the resonator (in ma- genta) and with the resonator (in red) ..... 69
5.29 Different clock settle times in TI part versus the fast sampling and reconstruction stages ..... 70
5.30 Buffer topologies used in the TI N-path[28]. ..... 72
5.31 Inverter loading of the output of the TI part. ..... 73
5.32 Simulated frequency content of the output signal of TI N-path with an input signal power of -20 dBm ..... 74
5.33 The principle of introducing a second TI stage, effectively splitting the already interleaved N -path once again. ..... 74
5.34 Showcase of the implemented branching structure. One RF input is sampled by the N-path sample stage, interleaved twice and branched after the second TI into two reconstructed RF outputs. ..... 75
6.1 Delay for the passive TTD circuit for all levels vs frequency ..... 78
6.2 Zoomed in view of delay vs frequency for the passive TTD circuit ..... 78
6.3 Delay GHz vs level at 10 for the passive TTD circuit ..... 79
6.4 Zoomed in view of delay vs level at 10 for the passive TTD circuit ..... 79
6.5 Delay error for the passive TTD circuit at 10 GHz vs level with the rms of all levels ..... 79
6.6 Zoomed in view of delay error vs level at 10 for the passive TTD circuit ..... 79
6.7 Loss for all the levels vs frequency for the passive TTD circuit ..... 80
6.8 Loss for the passive TTD circuit vs level with the rms of all levels ..... 80
6.9 Zoomed in view of Loss vs level for the passive TTD circuit ..... 80
6.10 Amplitude variation for the passive TTD circuit vs level with the rms of all levels ..... 81
6.11 Zoomed in view of amplitude variation vs level for the passive TTD circuit ..... 81
6.12 IP3 for the passive TTD circuit with the $8 \lambda$-delay block turned on ..... 82
6.13 IP3 for the passive TTD circuit with the $8 \lambda$-delay block turned off ..... 82
6.14 Noise figure and loss for the passive TTD circuit with the $8 \lambda$ delay block turned on and turned off ..... 82
6.15 Full delay range of the N -path using coarse tuning. The $\Psi_{d}$ clock sets coarse tuning by steps of 200ps and clock programming sets medium tuning in steps of 25ps. ..... 83
6.16 Amplitude of the output signal for the coarse tuning settings. Note the discrepancy of the lowest TI clock ( $\Psi_{d}$ ) highlighted in the figure. ..... 83
6.17 Delay at 10 GHz for the TI N-path with all the coarse tuning levels ..... 84
6.18 Delay variation over the 100 MHz band width at 10 GHz for all coarse tuning levels ..... 84
6.19 Gain for all coarse tuning levels ..... 85
6.20 Amplitude variation of the output signal for all coarse tuning levels. ..... 85
6.21 Showcase of the fine tuning mechanism in which a delay is applied to the output reconstruction clock to tune the delay achieved by the TI N-path. ..... 85
6.22 Delay at 10 GHz for each step in the fine tuning example. ..... 85
6.23 Simulated delay resolution of the TI N-path for 1 ps tuning increment. ..... 86
6.24 Simulated large signal compression curve of the output power for the TI N-path. The black marker marks the 1dB compression point at -9.07 dB when the LC is not implemented. ..... 87
6.25 Simulated noise figure for the TI N-path with and without the LC res- onator at the output. The LC tank increases the integrated NF of from 4.06 dB to 6.35 dB in the bandwidth of $9.95-10.05 \mathrm{GHz}$. ..... 87
6.26 Transient simulation of Branched TI N-path. Both outputs are shown to achieve different delays of 5600ps and 6150ps respectively. ..... 88
6.27 Clock timing used to achieve the delay presented in figure 6.26. ..... 88
6.28 Estimated cancellation across the entire 100 MHz BW for the pas- sive pre-LNA SIC, the TI N-path post-LNA SIC using 1ps and 2ps of clock resolution and the combined pre- and post-LNA SIC using 1ps of clock resolution ..... 89
A. 1 Visualisation of the looped calculation method for the system calcula- tions. ..... 98
A. 2 Flow chart displaying the method by which the performance degrada- tion parameters was calculated. ..... 99

## List of Tables

2.1 System specifications ..... 9
3.1 Rx Specifications ..... 12
4.1 The different modes of operation for the N -path circuit. ..... 34
5.1 Delay requirements of the passive delay ..... 49
5.2 The delay and number of cascaded stages for all delay block with $\geq \lambda$ of delay ..... 52
5.3 Design parameters and results of the designed inductors ..... 54
5.4 Simulation results for all delay blocks in the binary weighted delay chain ..... 55
5.5 Simulation results for the layout of the coupled inductors at 10 GHz ..... 59
5.6 Design parameters of the LSB switchable capacitor ..... 60
5.7 Simulation results of the LSB switchable capacitor ..... 60
5.8 Summary of design equations for the original N -path including con- straints from [31] ..... 63
5.9 Constraints check for the final design of original N-path ..... 64
5.10 Constraints check for the final design of TI N-path ..... 71
5.11 Sizing of A1 and A2 buffer with performance parameters. ..... 72
6.1 Summary of the delay results for the passive TTD ..... 80
6.2 Summary of the amplitude results for the passive TTD ..... 81
6.3 Summary of linearity and noise for the passive TTD ..... 83
6.4 Summary of the delay results for TI N-path. The resolution was calcu- lated from the fine tuning simulations and the variation was calculated across the entire delay range using the coarse tuning steps. ..... 86
6.5 Summary of the output amplitude of the TI N-path across the entire tuning range ..... 86
6.6 Noise figure for the injection points and the total noise figure degra- dation of the Rx chain ..... 90

## Abbreviations

| FD | Full-Duplex |
| :--- | :--- |
| Tx | Transmitter |
| SI | Self-Interference |
| Rx | Receiver |
| SIC | Self-Interference Cancellation |
| RF | Radio Frequency |
| TTD | True-Time Delay |
| BW | Bandwidth |
| LNA | Low-Noise Amplifier |
| HD | Half-Duplex |
| FDD | Frequency-Division Duplexing |
| TDD | Time-Division Duplexing |
| DP | Direct Path |
| AR | Antenna Reflection |
| RP | Reflection Path |
| EBD | Electrical Balancing-Duplexer |
| SNR | Signal-to-Noise ratio |
| SINR | Signal-to-Interference plus Noise Ratio |
| PA | Power Amplifier |
| IF | Intermediate Frequency |
| BB | Baseband frequency |
| ADC | Analog-to-Digital Converter |
| IMD3 | Third-order Intermodulation Distortion |


| NF | Noise Figure |
| :--- | :--- |
| OIP3 | Output refered Third order Intercept Point |
| FIR | Finite Impulse Response |
| LSB | Least Significant Bit |
| LO | Local Oscillator |
| NMOS | $n$-type Metal-Oxide Semiconductor |
| TI | Time Interleaved |
| DC | Direct Current |
| MOSFET | Metal-Oxide Semiconductor Field-Effect Transistor |
| rms | Root Mean Square |
| IP3 | Third order Intercept Point |

## Introduction

The monthly data traffic deployed by radio systems world-wide surpassed 140 Exabytes in Q3 of 2023 and are expected to increase by almost $30 \%$ during 2024 [1]. Meanwhile, spectrum licenses are ever increasing in value, not only for the crowded sub 6 GHz spectrum used in 5G New Radio. Exemplifying this is the recent license purchase in Sweden by operator Telia $A B$ of 120 MHz in the 3.50 to 3.62 GHz frequency range with a total price tag of SEK 760 millions [2]. Technologies which increases spectral efficiency could save the industry money and effectively help push towards higher data rates. A candidate for such technologies is full-duplex (FD) radio, which can theoretically double the data rate for a single channel by allowing the Receiver ( Rx ) and Transmitter ( Tx ) to both send and receive simultaneously at the same frequency, thus sharing time and frequency resources. One mayor hardware-related obstacle in FD communication is the strong Tx to Rx self interference (SI) signal, which consists of both high power direct path (DP) signals and lower power, multi path reflections (RP). The characteristic of the DP SI signals is short delay and transceiver specific, correlated to antenna isolation, while the RP SI signals may consists of many environment based multi-surface reflections.

This thesis investigates the implementation of a Self Interference Canceler (SIC) circuit, for the analog radio frequency ( RF ) domain cancellation at an operational frequency of 10 GHz . Such a circuit would be benchmarked by the ability to recreate the SI signals in terms of amplitude and delay, but also by the degradation in Rx performance due to noise and distortion injection. Main focus has been directed towards the true time delay (TTD) generation circuits where two techniques, categorized into passive and active, are investigated. A passive TTD circuit, intended for pre-LNA (Low-Noise Amplifier) injection, is designed containing two parts; a binary weighted delay chain using lumped $L C$ transmission lines for course tuning, and an $L R-R L$ lattice filter with tunable coupled inductors for fine tuning. Further, an active TTD is designed using a Time Interleaved (TI) N-path circuit including an innovative way of branching the N-path to reduce chip-area for multiple delay generation scenarios.

The report begins with an overview of the entire FD system in chapter 2, before moving on to a pre-study about the analog RF SIC in chapter 3. The SIC prestudy covers the specific considerations for an analog RF SIC and is concluded with the proposed hybrid SIC topology. This is then followed by the principle of operation of the designed TTD circuits in chapter 4, where passive and active TTD circuits are covered separately. In chapter 5, detailed designs of these circuits are presented. Chapter 6 covers the simulation results for each TTD circuit. The TTD results is first presented separately before estimating the cancellation that these circuits could achieve implemented in an SIC. Finally, the work is summarized and concluded in chapter 7.

In this work, neither layout nor tape-out for any circuit is considered as an end goal. The main aim is to understand the dynamics and requirements of the SIC block in an FD system and based on this, propose different TTD circuits topologies that could be considered as candidates for future FD transceivers. The authors conducted the FD overview and SIC pre-study together, but the work on the passive and active TTD was done separately. Vejde investigated a passive binary weighted TDD circuit while Vilgot focused on an active TI N-path TTD circuit. The results and discussion regarding the implementation of a hybrid SIC was made jointly.

Ericsson AB provided the necessary CAD tools; Cadence Virtuoso and ADE for circuit simulations, Layout XL suite and Momentum plugin for inductor layout and simulations. The work was carried out in 22 nm fully depleted silicon on insulator CMOS technology. Office space and computers were also provided by Ericsson AB.

## Full Duplex System Overview

FD communication could in theory double the capacity in comparison with halfduplex (HD) communication. However, the use of FD introduces the new challenge of Tx-to-Rx SI. This chapter gives an overview of the different challenges and advantages of FD along with a summary of the many different SIC techniques used to cancel the SI. Finally, a set of system specifications is set for which this thesis is aimed at.

### 2.1 Full-duplex vs Half-duplex

A fundamental challenge for telecommunication systems is to separate the Rx signal from the Tx signal. The down-link from the base station is transmitted at far higher signal power than the up-link power received from the user end. Therefore, these signals needs to be separated in order for the received up-link signal to be distinguishable for the base station and vice versa for the user end. The most common way to do this has for long been the half-duplex system. A HD system allows communication in both directions but not at the same frequencies or at the same time. HD systems differentiates the Rx signals from the Tx signals using frequency-division duplexing (FDD) or time-division duplexing (TDD). This way the Tx signals are transmitted on other frequencies or at other time slots than the $R x$ signals are received. This is an efficient way of isolating the Tx signals from the $R x$ signals in order to ensure clean bidirectional communication, by minimizing the interference and ensuring co-existence with other wireless equipment. [3]

However, a drawback of half-duplex systems is that the spectral efficiency is not as high as it could be because of the frequency- or time-division. In order to get the maximum spectral efficiency one should use the entire allocated BW for both Rx and Tx simultaneously. Therefore, full-duplex, or as it is also called inband full-duplex (IBFD), should be used in order to achieve maximum spectral efficiency. This is because FD instead transmits and receives signal on the same frequencies and at all times. The difference between HD and FD is illustrated in figure 2.1. [4]


Figure 2.1: The difference between half-duplex and full-duplex illustrated in a time-frequency graph

A FD system, however, have the inherent problem of self-interference (SI), i.e. because the strong Tx signal, the SI, will leak into the Rx path and drown out the weaker Rx signals. This results in a desensitization of the Rx, by either, drowning the Rx signal with the SI signal, compressing the Rx and/or introducing in-band distortion from the Rx non-linearities. Therefore, an FD system needs SIC in order to suppress the Tx signal to an acceptable level for the reliable signal detection in the Rx. [3]

### 2.2 Self-Interference Paths

In a FD system, the Tx signal can take different paths from the Tx side of the transceiver to the Rx side, which are usually divided into three different categories of interference paths: direct path (DP), antenna reflection (AR) and multipath reflection path (RP). The paths will be different depending on whether a single antenna is used for both Tx and Rx or the Tx and Rx have separated antennas or antenna panels, this is shown in figure 2.2.[3]


Figure 2.2: Self-interference paths from $T x$ to $R x$ for different antenna structures.

If a single antenna is used for both Tx and Rx , then an antenna interface is needed to isolate the incoming $R x$ signal from the outgoing Tx signal. For a single antenna solution, the antenna interface is usually implemented using a circulator or an electrical balancing-duplexer (EBD) and in either case there will inevitably be some leakage of the Tx signal into the Rx chain, and this is the DP interference. In the the single antenna case, the antenna will cause some small reflections of the Tx signal back into the Rx path. This is the AR interference and only exists for the single antenna case. [3]

When separate antennas are used for Tx and Rx there will be no interference paths through any antenna interface or the antenna itself. For the separated antenna case, the DP path refers the to shortest path over the air from the Tx antenna to the $R x$ antenna. In both the single and separate antenna case, there will also be multipath reflections, caused by the Tx signal reflecting off of different objects in the environment. [5]

The Tx signal will appear in the receiver with different signal levels and after different delay times depending on which path of propagation the signal has taken. The DP and AR signals will have a constant signal level and delay time since these paths only depends on the transceiver itself. The RP signals will instead have a dynamic behavior since the environment surrounding the transceiver could change. The self-interference from these different types of paths must therefore be cancelled differently by the SIC. [5]

For this report separate antennas for Tx and Rx is assumed and no further consideration for an antenna interface with the AR and DP interference is needed. A normal antenna array is assumed with $4 \times 2$ antenna panels with $2 \lambda$ of antenna element spacing and $4 \lambda \mathrm{Tx}$-to-Rx panel separation. This antenna solution has a
maximum distance from a Tx antenna element to a $R x$ antenna element of $14 \lambda$. Therefore, the maximum delay time of the DP interference will be $14 \lambda$. The RP interferences can be estimated to have up to several 100s of $\lambda \mathrm{s}$ of propagation, but with much lower signal level. To note, the signal level of this interference decrease and the delay time increase with increased distance to the object of reflection.

### 2.3 Rate gain region

The rate gain region is the region of system parameters where FD outperforms the conventional HD in terms of bit-rate. This is investigated in order to determine how much is gained from using FD instead of HD. According to the ShannonHartleys theorem, the maximum channel capacity $C$ is [4]:

$$
\begin{equation*}
C=B W \cdot \log _{2}(1+S N R) \tag{2.1}
\end{equation*}
$$

where $B W$ is the bandwidth in [Hz], SNR is the Signal-to-Noise Ratio in linear scale and channel capacity $C$ is given in bits per second [bits/s].
In FD systems, the achieved SNR can be rewritten in terms of Signal-to-Interference plus Noise Ratio (SINR). This term takes into account the self-interfere signal (SI) coming from the adjacent Tx which will leak into the Rx , assuming a single base station and a single user end. FD utilizes double the bandwidth for both $R x$ and TX compared to FDD since they share the entire BW instead of splitting it up in half. The capacity for FD is also doubled compared to TDD because both Tx and $R x$ is operated during double the amount of time when using FD instead of TDD. This means that double the amount of bits can be sent during this doubled amount of time. Therefore, the ratio $k$ of increase in maximum channel capacity for FD compared to HD is

$$
\begin{equation*}
k=\frac{C_{F D}}{C_{H D}}=\frac{2 \cdot B W \cdot \log _{2}\left(1+S I N R_{F D}\right)}{B W \cdot \log _{2}\left(1+S N R_{H D}\right)}=2 \cdot \frac{\log _{2}\left(1+S I N R_{F D}\right)}{\log _{2}\left(1+S N R_{H D}\right)} \tag{2.2}
\end{equation*}
$$

To gain the benefits of a FD system compared to a HD system, the $S I N R_{F D}$ needs to be increased. To achieve this, the interfering Tx signal needs to be cancelled. Further, the method by which the SI is mitigated should not increase the noise floor of the Rx significantly. It is also evident, that theoretical maximum rate gain in bit-rate for an FD system is 2 when $S I N R_{F D}=S N R_{H D}$, i.e. when the SI power is well below Rx thermal noise floor. [4]

### 2.4 Different SIC-Techniques

The total SIC needed for FD operation is usually more than $100 \mathrm{~dB}[6]$, [3], [7]. For achieving this, different SIC-techniques in the different domains, antenna domain, analog domain and digital domain, are all needed to be used together. The different domains of the transceiver where SIC can be used are illustrated in figure 2.3. In this chapter a brief summary of SIC in each domain is presented in order to get an overview of the entire area of the FD system.


Figure 2.3: The three domains of the transceiver where SIC can be performed [5]


Figure 2.4: Overview of the SIC techniques in a tree diagram [5]
Within each domain there are a lot of different approaches for SIC. An overview of the different techniques within each domain is shown in figure 2.4. In the following subsections, a brief summary of the SIC techniques within each domain is presented.[5]

### 2.4.1 Antenna domain

In the antenna domain, a main factor for SIC is whether the transceiver uses a shared antenna for both Tx and Rx or if they are separated. For use of a shared antenna, the only way of separating the Tx and $R x$ is by use of an antenna interface which usually is a circulator [8] or an EBD [9].

When separated antennas are used there are many ways of suppressing the Tx signal from entering the Rx antenna. The most basic method is to simply separate the antennas physically a further distance from each other [10]. Another method is to use Rx and Tx antennas operating on orthogonal polarizations since different polarizations do not couple [11]. If multiple Tx and $R x$ antennas are used the method of beamforming in the antenna domain could be used for steering the Tx and $R x$ beams away from each other [12].

### 2.4.2 Analog domain

For analog SIC, the Tx signal from the output of the power amplifier (PA) is injected into the analog domain of the Rx chain after being modified to match the SI. By recreating the SI using time-domain or frequency-domain approaches and inverting it, the recreated SI will cancel out the original SI when injected into the Rx. For frequency-domain approaches, tunable band-pass filters are used to modify the center frequency, phase response and quality factor to match the SI response [13]. For the time-domain approach, true-time delays are used to match the delay from the propagation via the interference paths [14].

The modified Tx signal can be injected at either radio frequency [15], intermediate frequency (IF) [16] or baseband frequency (BB) [5] which all have different advantages and disadvantages. For RF injection both pre-LNA [15] and post-LNA [14] injection is possible. The analog SIC techniques can also be divided into either fixed or adaptive SIC. Fixed SIC do not use any feedback and targets the DP and AR interference which do not change over time[17]. Adaptive SIC is able to change its tuning according to changes in the surrounding using a feedback control circuit and can therefore target the dynamic RP interference[15].

Analog SIC can also be digitally assisted by using digitally implemented canceller taps. This allows for significantly higher number of taps which can improve canceller flexibility. [18]

### 2.4.3 Digital domain

In the digital domain, a common SIC method is channel modelling, where the SI is reconstructed using the knowledge of the Tx signal. After correct filtering of the Tx signal it can be combined with the received signal after the analog-to-digital converter (ADC) to provide further interference suppression. There are multiple channel modeling techniques divided into linear[19] and non-linear methods[20].

Apart from digital channel modelling, one can also use beamforming in the digital domain instead of the antenna domain. By use of digital signal processing, the beams of the Tx and Rx antennas can be steered away from each other and isolate the $R x$ antennas from the Tx signal. [21]

### 2.5 System Specifications

There are many different types of radio transceivers and a FD system will have to be designed individually for different transceivers. Therefore, a set of assumptions are assumed for the radio system and for the rest of the FD system. The system specifications are showed in table 2.1 and are chosen for a general radio transceiver.

Table 2.1: System specifications

| Parameter | Value |
| :--- | :--- |
| Transmitter Power | 23 dBm |
| Carrier Frequency | 10 GHz |
| Operational Bandwidth | 100 MHz |
| Antenna Isolation | 60 dB |
| Digital Cancellation | 30 dB |
| Analog Baseband Cancellation | 15 dB |

It is not crucial for these specifications to be founded on strong scientific research since it serves as a benchmark for the specific full-duplex system designed in this report. It is however important that these values are realistically chosen in order for the system to be a viable option for real life application. Thus, these values are general for modern day mobile telecommunication systems. A main take-away is the carrier frequency at 10 GHz , which is higher than what most other full-duplex research aims at [22].

## Considerations for RF SIC

To ensure proper design of the TTD, the effect of non-idealities need to be considered when looking at the implementation of the SIC system as a whole. In this section, system considerations such as: Rx-specifications, total cancellation, delay resolution, injection points and attenuation is discussed together with the non-idealities, such as noise and distortion. Along with this, a sub-chapter covering the required cancellation and delay resolution is also included. Concluding this chapter is a section which introduces a proposed hybrid topology, in which a pre-LNA SIC is implemented with a passive TTD circuit and a post-LNA SIC is implemented with an active TTD circuit.

### 3.1 System Hardware with ideal sub-blocks

To understand and calculate performance degradation of the Rx depending on SIC parameters, a general FD transceiver block schematic was used. The transceiver utilizes two SIC paths, injecting both pre- and post-LNA. Each SIC is modeled as a delay cell in series with two variable attenuators. The attenuators are assumed to have infinite linearity and a noise figure equal to their attenuation (passive components). The Rx and Tx antennas are considered to be ideal split panel antennas and isolated by 60dB. This system is presented in figure 3.1 along with an illustration of the different signals and non-idealities.


Figure 3.1: Overview of the transceiver topology and the different signal levels

Starting at the $R x$ antenna, the received signal will include the interfering $T x$ signal and distortions from the PA, i.e. the SI - both attenuated by the antenna isolation and path losses. Before the LNA, the first SIC block will cancel some of the SI signal. However, the SIC block will also inject noise ( $N_{p r e}$ ) and nonlinearities $\left(I M D_{\text {pre }}\right)$ created within the block itself and these non-idealities will not be cancelled since they are not correlated to the Tx non-linearities nor the Tx noise received by the Rx antenna.

After the LNA, an additional SIC block injects a second cancellation signal with its non-idealities ( $N_{\text {post }}$ and $I M D_{\text {post }}$ ). Continuing on, the signal passes through the BB processing, where some additional analog cancellation takes place, by additional baseband SIC blocks (not shown in figure 3.1), and this may occur prior to the ADC, but also after the ADC in the digital domain. Finally, at the output of the $R x$, the interfering Tx signal has gone through the Rx signal chain and is suppressed through numerous cancellation stages, but at the cost of increased $R x$ noise figure which reduce the $R x$ sensitivity. Note that it is assumed in this analysis that the combiner element of the SIC does not add any losses or add any noise to the system. The Rx specifications, presented in table 3.1, where chosen to reflect an Rx operating at 10 GHz in typical cellular application.

Table 3.1: Rx Specifications

| Block | Gain [dB] | NF $[\mathrm{dB}]$ | OIP3 [dBm] |
| :--- | :--- | :--- | :--- |
| LNA | 20 | 3 | 10 |
| Mixer | -10 | 10 | 5 |
| BB Amplifier | 30 | 6 | 10 |
| ADC | 0 | 20 | 10.5 |

### 3.2 Noise Figure degradation from noise injection and non-linear distortions

If it would be possible to construct a SIC block that is able to fully cancel the SI signal, ideally this SIC should not degrade the sensitivity of the receiver in any way. However, this is not the case due to the injection of noise and nonlinear distortion, such as Third-order Intermodulation Distortion (IMD3), from the SIC into the Rx chain. Surveying the research within FD cancellation circuits shows that a sub 2 dB noise figure (NF) degradation of the $R x$ chain would be a reasonable benchmark to aim for [23][24][25][26]. Degradation in NF of the $R x$ depends on the total noise power generated within the SIC itself. This noise power, will consist of amplified or attenuated thermal noise from the input source and additional noise introduced within the SIC. The noise figure is given by [27]:

$$
\begin{equation*}
N F=S N R_{i n[\mathrm{~dB}]}-S N R_{\text {out }[\mathrm{dB}]}=\left(S_{i n[\mathrm{~dB}]}-S_{\text {out }[\mathrm{dB}]}\right)+\left(N_{\text {out }[\mathrm{dB}]}-N_{i n[\mathrm{~dB}]}\right) \tag{3.1}
\end{equation*}
$$

If the input source and and output load are considered to have the same noise temperature, then the only factor which determines the total degradation of the output noise floor is the additional noise created within the SIC. For circuits where the additional noise is zero, for example in a purely passive circuit, the total noise power at the output is limited by the thermal noise, and therefore not correlated with the NF of the circuit itself. To quantify the noise injection, the total output noise of the SIC was calculated and added to noise floor of the receiver at the injection point as shown in figure 3.2. The injected noise figure $N F_{i n j}$ gives the actual NF related to the Rx input and is used when calculating $R x$ link-budget.


Figure 3.2: Noise injection from the SIC into the Rx

The non-linear distortions are tackled by ensuring that the linearity of the SIC chain is high enough to not inject IMD3 distortion at a power level of 15 dB below the noise floor at the point of injection. It is shown in the calculations below that this margin would degrade an arbitrary noise floor level of $N$ by only 0.135 dB

$$
\begin{gathered}
N_{\text {tot }}=10 \cdot \log \left(10^{N / 10}+10^{(N-15 \mathrm{~dB}) / 10}\right)=10 \cdot \log \left(10^{N / 10}\left(1+10^{-15 \mathrm{~dB} / 10}\right)\right)= \\
=N+10 \cdot \log \left(1+10^{-15 \mathrm{~dB} / 10}\right)=N+0.135
\end{gathered}
$$

It is noted that more non-linear distortions are created within the SIC, apart from IMD3. Even order intermodulation products are also created but can be suppressed by using a differential design. Harmonic distortion from the Tx signal is not a concern for this system, since these frequencies will be far above the channel band. Depending on the implementation of the delay circuit, spurious tones may be injected to the Rx. If this is the case, these delay cells needs to be well designed since they could also degrade the sensitivity of the Rx.

### 3.3 Attenuation Balance and Injection Points

The model of the SIC block, as seen in figure 3.3, consists of a TTD between two variable attenuators. The total attenuation if the SIC block should match the combined attenuation from the fixed antenna isolation and the dynamic multipath propagation loss. The distribution of the attenuation between the input Tx side and output Rx side is very important since it affects the impact of the nonlinearities.


Figure 3.3: Simplified model of an SIC block

If all attenuation is distributed towards the Tx, the TTD sees a much smaller input signal power and the linearity requirement is eased. However, if all attenuation happens before the delay, the noise injection into the Rx is increased since the noise is not attenuated at the end of the SIC block. Note that this is not the case for delay blocks that are limited by thermal noise, i.e passive delay circuits, since these do not increase the noise above the thermal limit. The opposite is true when most of the attenuation is placed towards the $R x$ - higher linearity requirement of the SIC but better noise performance. This trade-off is shown in figure 3.4. As the attenuation is placed towards the Rx, noise power injection is low while the non-linear IMD3 products increases. Placing the attenuation towards the Tx side results in higher noise power injection but lower distortion requirements. In figure 3.4, this trade-off is showcased, and the TTD circuit was given an arbitrary
specification. The calculations used to generate figure 3.4 and all further graphs in this sub section are described in the appendix, section A.1. As mentioned before, the attenuation blocks are assumed to be ideal passive attenuators with a NF equal its attenuation. The extraction of the Tx signal and injection of the SIC output signal into the Rx is assumed to be ideal with no losses.


Figure 3.4: The trade-off in attenuation distribution for the SIC. Both power levels in this figure are output referred.

Another design aspect is the different injection points. For pre-LNA injection, adding anything to the signal path before the LNA will heavily impact the Rx NF and is required to have a low noise power level. This can be achieved with low noise circuits, or with highly linear circuits which would allow to shift most of the attenuation towards the Rx side and thus, attenuate the noise. Post-LNA injection is less sensitive to noise since the signal is already amplified by the gain of the LNA. This behavior can be seen in figure 3.5 and 3.6 below. The figures shows two contour plots of the NF degradation in the Rx, for pre- and post-LNA injection, as a function of the noise figure of the delay and attenuation distribution, at a Tx output power of 23 dBm . The thick, black lines indicates the 2 dB noise figure degradation target for both cases and it is evident that the more attenuation shifted to the input, the more stringent the NF requirements of the delay become.


Figure 3.5: Contour line of NF degradation for pre-LNA injection.


Figure 3.6: Contour line of NF degradation for post-LNA injection.

In the case of linearity, the requirements for the pre-LNA injection point is less stringent compared to that of the post-LNA injection. The reason for this is that the post-LNA SIC have less total attenuation due to the gain of the LNA. In section 3.2, a goal was introduced for linearity in which the IMD3 products were supposed to be suppressed 15 dB below the noise floor. For the delay block parameters, this can be translated into an Output refered Third Order Intercept Point (OIP3). By sweeping the OIP3 of the delay and the attenuation balance the following plots where created in figure 3.7 and 3.8. The figures shows the contour plots of the IMD3 power levels normalized to the noise floor. On the horizontal and vertical axis the OIP3 and the attenuation balance are swept respectively.


Figure 3.7: Contour line of IMD3 power levels normalized to the noise floor for pre-LNA injection.


Figure 3.8: Contour line of IMD3 power levels normalized to the noise floor for post-LNA injection.

From figure 3.5 and 3.6, a certain NF of the TTD circuit will give some upper limit for the attenuation fraction towards the Tx, based on the crossing point of the 2 dB NF degradation line. If instead the figures 3.7 and 3.8 are analysed, a lower limit for the attenuation fraction towards the Tx is given, based on the crossing
point of the -15 dBm IMD3 line.

In the case of the pre-LNA SIC, the attenuation balance ought to be shifted towards the Rx side to alleviate the required NF of the delay block. However, this shift would put greater requirements on the linearity of the delay cell. When considering the fact that the noise and linearity requirement can be interchanged like this, it is obvious that pre-LNA injection puts greater demands on the delay block compared to post-LNA. In the post-LNA SIC, the linearity completely dominates the non-idealities while the NF degradation due to noise injection is very small even for a large NF of the TTD circuit. This implies that the trade-off in attenuation balance is less severe and that most of the attenuation ought to be biased towards the Tx.

To further show the difference between pre- and post-LNA injection, consider a TTD circuit with NF equal to 20 dB . For the pre-LNA injection case this means that maximum fraction of attenuation shifted towards the Tx is $60 \%$ (figure 3.5), which in turn requires an OIP3 of at least 20 dBm (figure 3.7). Now, looking at the post-LNA case in figure 3.6. A NF of 20 dB would not result in any unacceptable NF degradation regardless of attenuation balance. In other words, all of the attenuation balance should be shifted towards the Tx which would give a required OIP3 of 10 dBm from figure 3.8. From this example it is evident that even though the linearity requirements is more stringent for the post-LNA injection, the combined effect of noise and linearity requirements for the pre-LNA injection makes it overall harder to design delay blocks for the pre-LNA SIC.

### 3.4 Cancellation Required in the analog RF domain

As discussed previously, a fundamental prerequisite of any FD system is the need to suppress the interference coming from the Tx. From section 2.5, additional cancellation coming from the digital domain, antenna domain and BB analog domain is covered. The aim for the SIC block in this work is to, together with the cancellation from other domains, suppress the interfering Tx signal 15 dB below the noise floor of the receiver. In figure 3.1, one can follow the Tx interference signal throughout the $R x$ chain to arrive at the following expression for the required cancellation:
$P_{t x}-A n t_{I S O}+G_{R x}-\left(S I C_{\text {pre }}+S I C_{\text {post }}+S I C_{B B}+S I C_{\text {Digital }}\right)<N_{\text {Floor Rx }}-15 \mathrm{~dB}$
Where $G_{R x}$ is the cascaded gain of the Rx chain, $N_{F l o o r R x}$ is the noise floor power level after the ADC, SIC pre, SIC $_{\text {post }}, S I C_{B B}, S I C_{\text {Digital }}$ is the different sources of cancellation. With the specification for the Rx chain presented in table 3.1, the cascaded gain is equal to 40 dB and the cascaded NF is 3.8 dB . The input noise floor level is -93.8 dBm for a 100 MHz bandwidth according to eq.A.1. Thus, the
total noise power at the output of the $R x$ is:

$$
\begin{gathered}
N_{\text {Floor Rx }}=-113.8[\mathrm{dBm}]+10 \log (B W[\mathrm{MHz}])+G_{R x}+N F_{R x}= \\
\quad-113.8 \mathrm{dBm}+10 \log (100 \mathrm{MHz})+40 \mathrm{~dB}+3.8 \mathrm{~dB}=-50 \mathrm{dBm}
\end{gathered}
$$

Using the digital, analog BB cancellation and antenna isolation specified in table 2.1 and the noise floor calculated above, the minimum required cancellation in the analog RF domain $\left(\right.$ SIC $_{\text {post }}+$ SIC $\left._{\text {pre }}\right)$ is:

$$
\begin{gathered}
{S I C C_{\text {post }}+S_{\text {SIC }}}=P_{T x}-A n t_{I S O}+G_{R x}-\left(S_{\text {pIC }}^{B B}+S_{\text {DIC }} C_{\text {Digital }}\right)- \\
-N_{\text {floorRx }}+15 \mathrm{~dB}=23 \mathrm{dBm}-60 \mathrm{~dB}+40 \mathrm{~dB}-(15 \mathrm{~dB}+30 \mathrm{~dB})- \\
-(-50 \mathrm{dBm})+15 \mathrm{~dB}=23 \mathrm{~dB}
\end{gathered}
$$

Note that the noise floor used in this calculation assumes that the SIC is not introduced into the system or that it is noise-less. In reality, when the SIC is added to the $R x$, the noise floor will increase which will relax the cancellation requirement but decrease the achieved SINR. Further, this is the required RF cancellation for suppressing DP SI since this interferer is only attenuated by the antenna isolation. The RP SI is however attenuated further because of multipath propagation losses.

### 3.5 Cancellation vs Delay and Amplitude Error

All self interfering signals will have some random delay. If the goal is to cancel all of these interfering signals by 23 dB , then there will be some minimum delay resolution that can achieve this. To actually cancel an interfering signal successfully, the SIC block needs to match its delay down to some small error. Otherwise, the delay mismatch will cause the cancellation to be very weak since the signals overlap very little in time. To calculate the minimum delay resolution needed for the above specified cancellation, a simple test bench was used. The channel was simulated by a transmission line, tuned to the wavelength of 10 GHz . At the output, an SIC signal was injected which could be tuned by some delay error and amplitude error. The achieved cancellation at 10 GHz was simulated for different delay and amplitude errors. The results of this is presented in figure 3.9.


Figure 3.9: Simulated contour plot of cancellation versus amplitude error and delay error

The maximum delay error that achieves 23 dB of cancellation is around 1 ps for amplitude errors up to 0.5 dB . However, the error is not directly corresponding to the resolution. The worst case scenario happens when the delay of an interfering signal happens to be exactly in the middle between two delay tuning steps. The distance between these two steps is the resolution. Thus, the minimum resolution is the maximum error times two which in this case is 2 ps .

### 3.6 Proposed Hybrid Topology

An SIC circuit can be implemented using either passive or active TTD, which have different advantages and disadvantages. It is possible for active TTD topologies to generate long delays [28], which is more difficult for passive topologies without introducing too much loss or using too much chip-area. Unlike an SIC built with active components, a purely passive SIC will not inject any additional noise, apart from the thermal noise, into the Rx chain as discussed in section 3.2. Furthermore, the linearity of an active SIC is limited by the rail-to-rail voltage swing, while the passive components are completely linear until the circuit breaks down.

For a split antenna, as considered in this report, the DP interference have high signal power and short delays of up to $15 \lambda$. Because of its high power, the DP SI will desensitize the LNA and distort the signal if not cancelled beforehand. The RP interference on the other hand, consists of multi-path reflections which have lower signal power and longer delays of up to several tens of $\lambda$. Thus, a pre-LNA SIC must cancel the short delayed DP SI, but not necessarily the longer RP SI which instead can be cancelled post-LNA.

It can thus be concluded, that the active SIC should be placed post-LNA in order to alleviate noise and linearity requirements, and target the RP SI. In order to cancel the DP SI and avoid saturation of the LNA, the passive SIC is placed pre-LNA. This is possible since the passive SIC generates no additional noise and have no linearity limitations due to rail-to-rail voltage swing. In this report, the authors propose a hybrid technique which uses both passive and active TTD elements to achieve the full RF SIC block.

## True Time Delay Circuits

This chapter is split into two parts, where passive and active TTD generation are covered separately. Active delays have low linearity and high noise figure, which can limit their usage for pre-LNA injection. However, active delays are also known for achieving longer delays in smaller silicon area compared to passive delays. The low noise and high linearity performance of passive TTD circuits are better suited for pre-LNA injection. The combination of the active and passive delays can be required in FD SIC systems. This chapter presents common topologies of analog active and passive TTD circuits along with their limitations.

### 4.1 Passive TTD Circuits

The passive TTD circuits sub-chapters are divided into two parts. The first part is regarding how to generate a delay using only passive circuit elements and this part covers known circuits that could be used in a TTD system. There are many ways of generating a delay using passive elements and they are all based on utilizing the negative phase shift from some sort of filter. A negative phase shift translates into a delay time that may be independent on the operating frequency. The two topologies for creating a negative phase shift covered in this report are the lattice filter and the lumped transmission line filter. Within the field of lattice filters, three types of filters are presented. For both topologies the TTD circuit is designed differentially for lower electromagnetic interference, ground noise and voltage handling.

The second part is about how to tune a passive TTD, and here two technologies are proposed; a binary weighted delay chain and a tunable coupled inductor. An advantage of passive TTD circuits is that the noise and non-linearity contributions are very low. However, to tune the delay time, some sort of switching mechanism is needed, which introduces more loss and non-linearities contributions to the SIC system. Therefore, an important aim for the tuning methods is to avoid the need for switches in the signal path.

### 4.1.1 Delay generation

### 4.1.1.1 Lattice All-Pass Filters

A lattice filter, also called an X-section, is a differential all-pass filter formed by crossing two parallel impedance's between the two differential paths as shown in figure 4.1. This filter is symmetric, meaning the input and output are reciprocal.[29]


Figure 4.1: General lattice all-pass filter

The impedance for the lattice filter $Z_{l a t}$ is given by eq.4.1. [29]

$$
\begin{equation*}
Z_{l a t}=\sqrt{Z \cdot Z^{\prime}} \tag{4.1}
\end{equation*}
$$

The crossed elements $Z^{\prime}$ are called the lattice elements, and they should be designed to be the dual of the series element $Z$ with respect to the characteristic impedance $Z_{0}$ according to eq.4.2. The lattice and series elements can be implemented as either resistances, capacitances or inductances.[29]

$$
\begin{equation*}
\frac{Z}{Z_{0}}=\frac{Z_{0}}{Z^{\prime}} \tag{4.2}
\end{equation*}
$$

If eq.4.2 is fulfilled, the circuit will have the same impedance for all frequencies. Since the impedance is the same for all frequencies, the filter will also have the same attenuation for all frequencies, making it an all-pass filter. The transfer function for the lattice filter is given by eq.4.3.[29]

$$
\begin{equation*}
H(j \omega)=\frac{Z_{\text {lat }}-Z}{Z_{\text {lat }}+Z} \tag{4.3}
\end{equation*}
$$

A low-in-phase lattice filter has a negative phase shift similar to a low-pass filter as shown in figure 4.2 and a high-in-phase lattice filter has a positive phase shift similar to a high-pass filter as shown in figure 4.3.[29]


Figure 4.2: Phase response of a lattice low-in-phase filter


Figure 4.3: Phase response of a lattice high-in-phase filter

The phase shift for a low-in-phase $\phi_{\text {low }}$ and a high-in-phase $\phi_{\text {high }}$ lattice filter is given in eq.4.4 and eq.4.5 respectively. The midpoint frequency $\omega_{0}$ is defined as the frequency where the absolute value of the phase shift is $90^{\circ}$.[29]

$$
\begin{align*}
& \phi_{\text {low }}=-2 \arctan \left(\frac{\omega}{\omega_{0}}\right)  \tag{4.4}\\
& \phi_{\text {high }}=2 \arctan \left(\frac{\omega_{0}}{\omega}\right) \tag{4.5}
\end{align*}
$$

For implementing lattice filters on chip, the layout would be a problem since the wiring of the lattice elements would have to be crossed. This means that extra metal layers would have to be used for the crossing which could cause extra parasitic capacitances. The design would also be difficult to make symmetrical, which causes imbalance of the design. An advantage of this circuit is the all-pass behavior, which reduces losses and thus minimizes the needed tuning span for a tunable attenuator.

For generating a delay the phase shift should be negative, and therefore this report covers the different types of low-in-phase lattice filters. There are different types of low-in-phase lattice filters which use different combinations of circuit elements and have slightly different properties. Three different types are presented here; the $R C-C R$, the $L R-R L$ and the $L C-C L$.

The $R C-C R$ lattice all-pass filter uses resistances in series and capacitances as lattice elements as shown in figure 4.4.


Figure 4.4: $R C-C R$ lattice filter

The midpoint frequency of the $R C-C R$ filter is given by

$$
\begin{equation*}
\omega_{0}=\frac{1}{R C} \tag{4.6}
\end{equation*}
$$

This means that if the filter is to be designed for a negative $90^{\circ}$ phase shift and a characteristic impedance of $Z_{0}$, the resistance $R$ and capacitance $C$ should be designed to fulfill

$$
\begin{equation*}
\frac{R}{Z_{0}}=\omega_{0} C Z_{0} \quad \& \quad R C=\frac{1}{\omega_{0}} \tag{4.7}
\end{equation*}
$$

The RC-CR filter uses resistances in the signal path which introduces more loss to the SIC. This could be a problem when long delays are to be achieved by cascading this lattice filter.

The $L R-R L$ lattice filter uses inductances in series and resistances as lattice elements as shown in figure 4.5.


Figure 4.5: $L R-R L$ lattice filter

The midpoint frequency of the $L R-R L$ filter is given by

$$
\begin{equation*}
\omega_{0}=\frac{R}{L} \tag{4.8}
\end{equation*}
$$

For the $L R-R L$ filter to be designed for a negative $90^{\circ}$ phase shift and a characteristic impedance of $Z_{0}$, the resistance $R$ and inductance $L$ should be designed to fulfill

$$
\begin{equation*}
\frac{Z_{0}}{R}=\frac{\omega_{0} L}{Z_{0}} \quad \& \quad \omega_{0}=\frac{R}{L} \tag{4.9}
\end{equation*}
$$

This also uses resistances which will introduce the same loss to the SIC as the $R C-C R$ filter.

An advantage of the $L R-R L$ filter is that it uses inductors, which can be coupled with a secondary inductor, i.e. a transformer. The effective inductance of the primary inductor can then be tuned by attaching tunable capacitors on the secondary circuit. This way the delay time of the $L R-R L$ filter can be tuned without switches in the signal path. More about the tunable coupled inductor is presented in section 4.1.2.2.

The LC-CL lattice filter uses inductances in series and capacitances as lattice elements as shown in figure 4.6.


Figure 4.6: $L C-C L$ lattice filter

The midpoint frequency of the $L C-C L$ filter is given by

$$
\begin{equation*}
\omega_{0}=\frac{1}{\sqrt{L C}} \tag{4.10}
\end{equation*}
$$

For the $L C-C L$ filter to be designed for a negative $90^{\circ}$ phase shift and a characteristic impedance of $Z_{0}$, the capacitance $C$ and inductance $L$ should be designed to fulfill

$$
\begin{equation*}
\omega_{0} C R_{0}=\frac{\omega_{0} L}{R_{0}} \quad \& \quad \omega_{0}=\frac{1}{\sqrt{L C}} \tag{4.11}
\end{equation*}
$$

The $L C-C L$ filter has lower loss than the $R C-C R$ and $L R-R L$ filters since it does not use any resistances.

### 4.1.1.2 Lumped $L C$ transmission line

A differential lumped LC transmission line consists of two series inductors and two parallel capacitances as shown in figure 4.7. This circuit is symmetric and reciprocal, meaning it has the same behavior in both directions.


Figure 4.7: Differential lumped LC transmission line

The lumped $L C$ transmission line behaves as a second order low-pass filter, which attenuates high frequencies and generates a negative phase shift as shown in the Bode diagram in figure 4.8.[30]


Figure 4.8: Bode diagram of a second order low-pass filter for different quality factors

At the angular resonance frequency the amplitude starts to decrease with 40 $\mathrm{db} / \mathrm{dec}$ and is therefore refered to as the angular cut-off frequency $\omega_{c}$ given in eq.4.12. The phase shift over frequency depends on the quality factor $Q$, but at the cut-off frequency the phase shift is always $-90^{\circ}$. [30]

$$
\begin{equation*}
\omega_{c}=\frac{1}{\sqrt{L C}} \tag{4.12}
\end{equation*}
$$

The impedance of the lumped $L C$ transmission line is given in eq.4.13. If matched to the characteristic impedance $Z_{0}$, the losses could be made very low since the lumped $L C$ transmission line uses no resistors.

$$
\begin{equation*}
Z=\sqrt{\frac{L}{C}} \tag{4.13}
\end{equation*}
$$

The loss of the lumped $L C$ transmission line will also depends on the quality factor $Q$ of the inductors which is equal to the absolute value of the reactance $X$ divided by the inherent resistance $R_{i}$ of the inductor coils, as shown in eq.4.14.

$$
\begin{equation*}
Q=\frac{|X|}{R_{i}}=\frac{\omega L}{R_{i}} \tag{4.14}
\end{equation*}
$$

This means that the loss is minimized by increasing the ratio between the reactance and the resistance of the inductors.

### 4.1.2 Tuning

For tuning the delay, two different methods are proposed: a binary weighted delay chain and a tunable coupled inductor. In this chapter the two methods are described without implementation, meaning that the methods could be used for different sorts of delay generation circuits. In chapter 5 these methods will be implemented on a design for a specific TTD.

### 4.1.2.1 Binary Weighted Delay Chain

In current SIC technology, a common approach for passive TTD is a finite impulse response (FIR) filter using a tapped topology as shown in figure 4.9. The signal is tapped off between each of the $N$ delay blocks, each delay block generating a delay of $T d$, and tuned in amplitude separately. Multiple signals with different delay time and different amplitudes are then combined and injected into the Rx.[6]


Figure 4.9: Equally sized delay blocks in a tapped topology

For this TTD topology, the ratio between maximum delay time $T d_{\text {max }}$ and tuning resolution $T d_{\text {res }}$ is

$$
\begin{equation*}
k_{d e l a y}=\frac{T d_{\max }}{T d_{r e s}}=\frac{N \cdot T d}{T d}=N \tag{4.15}
\end{equation*}
$$

This means that for creating a high maximum delay with a fine delay resolution, a lot of delay blocks will be needed. Path switching and amplitude tuning is needed for every delay block, which causes high loss and high noise injection when many delay blocks are used. Therefore, a design trade-off exists between the number of filter taps and the degradation of the Rx noise figure because more filter taps will cause more noise degradation.[6]
In order to achieve high delays with high delay resolution, without causing too much noise degradation, a new topology is proposed where binary weighted delay blocks are used as shown in figure 4.10. For this topology, the delay time of the delay blocks are binary weighted in order to increase the ratio between maximum delay time and resolution.


Figure 4.10: Binary weighted delay chain

In order to tune the delay time using this topology, a bypass path is introduced as shown in figure 4.11. This way, each delay block can be switched on or off, where the on-mode is through the delay path and the off-mode is through the bypass path. A switching mechanism is used in both the bypass path and the delay path. It would not be enough with only one switch in the bypass switch and no switch in the delay path. This is because the delay path needs to be switched off when the signal should go through the bypass switch. Otherwise, some of the signal will go through the delay path and have a delay of Td, and the rest of the signal goes through the bypass path and not have any delay. This will cause distortion in the injected SIC signal, which will degrade the cancellation.


Figure 4.11: The delay path and bypass path of a delay block

The switch for the bypass path is implemented by stacking transistors in series in order to increase linearity and isolation as shown in figure 4.12. The linearity is increased since the total voltage drop is distributed over multiple transistors, thus making the voltage drop over the individual transistors smaller and therefore minimizing the intermodulations. The isolation is also increased because the
total off-resistance is increased. By adding big resistances between gate and the enabling signal $E N$, the gate voltage will swing in relation to source voltage so that the gate-to-source voltage is maintained at the same level. Another way of improving the isolation is to add ground connected transistors as shown in figure 4.13. The ground connected transistors can be stacked in the same way as the series transistors for increased linearity and isolation.


Figure 4.12: Transistors stacked in series for increased linearity and isolation


Figure 4.13: Ground connected transistors for increased isolation

The entire delay is tuned by controlling which delay blocks should be turned on and which should be turned off as shown in figure 4.14. The input word will be a series of ones and/or zeroes, where each number determines whether a specific delay block should be turned on or off. This forms a binary number equal to the delay time at the output when multiplied by the least significant bit (LSB), in this case Td. The number of tuning levels will be equal the number of delay blocks squared $N^{2}$, as in any binary weighted system. The ratio between maximum delay time and delay resolution will then be

$$
\begin{equation*}
k_{\text {delay }}=\frac{T d_{\max }}{T d_{r e s}}=\frac{\left(N^{2}-1\right) \cdot T d}{T d}=N^{2}-1 \tag{4.16}
\end{equation*}
$$

which is much higher than the ratio for the tapped off topology used in current technology.


Figure 4.14: Tuning of a binary weighted delay chain

### 4.1.2.2 Tunable Coupled Inductor

Another way of tuning a delay circuit without switches in the signal path is to tune the inductance of an inductor. This can be done using two coupled inductors and adding a capacitor in parallel with the secondary inductor. This way the reactance of the primary circuit can be tuned by varying the capacitance on the secondary circuit. The change of reactance in the primary circuit could be used to change the delay of a delay generating circuit containing one or more inductors.

Two mutually coupled inductors are shown in figure 4.15 together with a capacitor in the secondary circuit. The mutual coupling $M$ depends on the coupling coefficient $k$ and the two inductors $L_{p}$ and $L_{s}$.

$$
\begin{equation*}
M=k \sqrt{L_{p} L_{s}} \tag{4.17}
\end{equation*}
$$

Each inductor coil has an additional voltage induced by the current in the other coil equal to

$$
\begin{equation*}
V_{i n d}=\omega M I \tag{4.18}
\end{equation*}
$$



Figure 4.15: Mutually coupled inductors with a capacitor in the secondary circuit

The voltage drop $V_{p}$ around the primary circuit is

$$
\begin{equation*}
V_{p}=I_{p} j \omega L_{p}+\omega M I_{s} \tag{4.19}
\end{equation*}
$$

and the equation for the voltage around the secondary circuit is

$$
\begin{equation*}
\omega M I_{p}=I_{s} j \omega L_{s}+\frac{I_{s}}{j \omega C} \tag{4.20}
\end{equation*}
$$

which means the current through the secondary inductor is

$$
\begin{equation*}
I_{s}=\frac{\omega M I_{p}}{j \omega L_{s}+\frac{1}{j \omega C}} \tag{4.21}
\end{equation*}
$$

The voltage $V_{p}$ can then be written as

$$
\begin{equation*}
V_{p}=I_{p} j \omega L_{p}+\frac{(\omega M)^{2} I_{p}}{j \omega L_{s}+\frac{1}{j \omega C}} \tag{4.22}
\end{equation*}
$$

This means that the total reactance of the primary circuit is

$$
\begin{equation*}
X_{p}=\operatorname{Im}\left(\frac{V_{p}}{I_{p}}\right)=\omega L_{p}+\frac{\omega^{2} L_{p} L_{s} k^{2}}{\frac{1}{\omega C}-\omega L_{s}} \tag{4.23}
\end{equation*}
$$

This corresponds to an effective inductance of

$$
\begin{equation*}
L_{e f f}=\frac{X_{p}}{\omega}=L_{p}+\frac{\omega L_{p} L_{s} k^{2}}{\frac{1}{\omega \mathrm{C}}-\omega L_{s}} \tag{4.24}
\end{equation*}
$$

This way, the tunable coupled inductor can be tuned between the maximum and minimum inductance of the tuning span, $L_{\max }$ and $L_{\text {min }}$, by tuning the capacitance between a maximum and a minimum capacitance $C_{\max }$ and $C_{\min }$. In order to find the capacitance needed for the maximum and minimum inductance, equation 4.24 is rewritten as

$$
\begin{equation*}
C=\left(\frac{\omega^{2} L_{p} L_{s} k^{2}}{L_{e f f}-L_{p}}+\omega^{2} L_{s}\right)^{-1} \tag{4.25}
\end{equation*}
$$

This however, is the capacitance needed assuming no losses in the tunable capacitor. For accurate values of the maximum and minimum capacitance, the effective inductance would have to be simulated.

A tunable capacitor is needed, which could be implemented using binary weighted switchable capacitors as shown in figure 4.16, where each switchable capacitor could be implemented as shown in figure 4.17. The capacitance is tuned by switching on and off these capacitors, which in turn will tune the effective inductance of the primary inductor. Each switchable capacitor can be implemented with two series capacitances $C_{S}$ and a transistor. On each side of the transistor a big resistance is connected to the inverse of the enabling signal $E N$. This way the isolation is increased when the capacitor is turned off, which is important in order to minimize the off-capacitance.


Figure 4.16: Binary weighted tunable capacitor with three bits


Figure 4.17: Switchable capacitor

The binary weighted tunable capacitor can be designed with different amounts of bits. The on- and off-capacitance, $C_{B, o n}$ and $C_{B, o f f}$ of the least significant bit (LSB) will depend on how many bits $N$ are used.

$$
\begin{equation*}
C_{B, o n}=\frac{C_{\max }}{N^{2}-1} \quad \& \quad C_{B, o f f}=\frac{C_{\min }}{N^{2}-1} \tag{4.26}
\end{equation*}
$$

When series connecting two equal capacitors, the total capacitance is halved. Therefore, the two series capacitances $C_{s}$ should be twice the size of the oncapacitance. The off-capacitance is adjusted to the correct value by calibrating the off-capacitance of the transistor. The switchable capacitors also needs to have a sufficiently large quality factor $Q$, which is given by

$$
\begin{equation*}
Q=\frac{X_{\mathcal{C}}}{R_{o n}} \tag{4.27}
\end{equation*}
$$

The transistor for the switchable capacitors has to be designed with a small enough off-capacitance in order to achieve the desired on/off ratio but still have a small enough on-resistance to not degrade the quality factor too much. This means a trade-off occurs for the sizing of the transistor where a too wide gate will cause too much off-capacitance, but a too small gate will cause too much on-resistance.

### 4.2 Active TTD Circuits

The active TTD circuits should be able to create long delay, at time delaying the signal by up to several 10s of $\lambda$, over large bandwidth with low variation over frequency. Further, since they target longer delays, with less power, post-LNA is the targeted injection point. Many different active circuits are commonly used, in this report N -path (also known as Switched Capacitor Circuit) and $\mathrm{g}_{\mathrm{m}} \mathrm{C}$ based TTD circuits are investigate. The N -path is a versatile circuit which has favorable characteristic for SIC TTD operations such as wide band delay with low delay variations and high tunability [31]. Section 4.2.1 investigates the theory of operation for the N-path and the limiting design trade-offs, section 4.2.2 introduces the concept of time interleaving the N-path, greatly increasing the maximum delay. Finally in section 4.2.3, a way of branching multiple time interleaved N-path circuits to reduce chip-area is presented. Although the main focus of the report is directed towards the N-path, $g_{m}$ based TTD circuits are another well-known active delay generation circuit which is shortly discussed in section 4.2.4.

### 4.2.1 N-path Circuit

### 4.2.1.1 Principles of Operation

The unit cell in the N-path circuit is a shunt capacitor in between two switches, see figure 4.19 , and in its simplest form, this mimics a sample and hold circuit, see figure 4.19. As the left-hand switch turns ON, the capacitor is charged and stores the voltage seen at the input. After some time, the right-hand side switch turns

ON and the voltage is transferred to the output. This means that the delay will be determined by the time it takes from when the input switch turns ON until the output switch turns ON.


Figure 4.18: N-path sample and hold. The next stage needs to provide high output impedance.


Figure 4.19: Simple sample and hold circuit. The input impedance of the buffer ( $Z_{\text {out }}$ ) is high to not load the sample node.

In order to reconstruct the input signal at the output, $N$ number of sample and reconstruct branches (called phases) are coupled in parallel, this is shown in figure 4.20. In the general case with a sample clock frequency $f_{s}$ and number of phases $N$, the effective sampling frequency becomes $N f_{S}$ and number of samples per waveform is determined by the input signal frequency $f_{R F}$. In figure 4.21, the input waveform, named $V_{R F}$, is sampled 8 times per waveform to highlight one example.


Figure 4.20: N sample and reconstruct branches of an N -path circuit


Figure 4.21: Visualization of the sample and hold mechanism of an $N=8$ N-path TTD.

The delay introduced in the circuit is determined by the time delay $\tau_{d}$ between $\Phi_{X}$ and $\Phi_{X d}$ clocks, where $X$ denotes the phase index. These clocks are generated by a common frequency generation circuits, however their implementation were considered to be outside the scope of this project. In simple terms, an LO (local
oscillator) frequency $f_{s}$ with some duty cycle is transformed into $2 N$ number of phase clocks with an duty cycle of $1 / N$.

The settling time window for the sample or release event can be defined as the phase clock pulse width $\tau=1 / N f_{s}$. When comparing this time constant to the $R C$ constant formed by the output or input impedance and the sampling capacitance, the following regions of operation can be noted [31]:

Table 4.1: The different modes of operation for the N -path circuit.

| Region | Input RC | Output RC |
| :---: | :---: | :---: |
| Mixing | $\mathrm{RC} \gg \tau$ | $\mathrm{RC} \gg \tau$ |
| Delay | $\mathrm{RC} \approx \tau$ | $\mathrm{RC} \approx \tau$ |
| Sampling | $\mathrm{RC} \ll \tau$ | $\mathrm{RC} \gg \tau$ |

In the mixing mode, the switches can be seen as cascaded down- and up-converting mixers which will introduce a narrow band phase shift at periodic intervals. The delay mode introduces wide band delay but has poor TTD performance due to large delay variation over frequency. Lastly the sample mode offers both wide band delay and low delay variation over frequency [31]. Thus, if implemented with TTD behavior in mind, the N-path should be designed to operate in the sampling region with the correct output and input matching. In order to provide this, two buffers are used at the input and output respectively (See figure 4.22) [31].


Figure 4.22: Output and input matching of the N-path.

Taking into account the switch on-resistance, the $R C$ constant design constraint can be formulated in equations as:

$$
\begin{equation*}
\left(R_{S}+R_{s w 1}\right) C \ll \tau, \quad\left(R_{L}+R_{s w 2}\right) C \gg \tau \tag{4.28}
\end{equation*}
$$

### 4.2.1.2 Parasitic Effects

The most straight-forward implementation of the switch is a transistor placed in the signal path with the gate as the enable input. In figure 4.23, an NMOS ( $n$-type metal-oxide semiconductor) is implemented as the switch to the left, and to the right the parasitic elements is extracted and the NMOS is replaced by the ideal switch. These parasitics included will be the main ones that limits the circuit.


Figure 4.23: Parasitic effects of the MOSFET as a switch for N -path circuit

As the switch is conducting in the on-state, there will be some switch resistance from the channel resistance of NMOS device that will affect the RC constant of the discharge and charge event (as seen from eq.4.28). For the input sample event the left-hand side transistors contributes with a total of $R_{s w 1}$, while for the output reconstruction event, the right-hand side switches will be contributing with $R_{s w 2}$. $C_{p D}$ and $C_{p S}$ are the shunt diffusion junction capacitances at the drain and source respectively. When the switch is in the off-state, a leakage capacitance $C_{o f f}$ will decrease the isolation. This capacitance originates from $C_{d s}$ of the NMOS and is a voltage dependent capacitance. It is shown in [31], that larger values of $C_{d s}$ (larger transistors widths) will result in increased delay variation over bandwidth. The complete overview is shown in figure 4.24. At the output of the input buffer $A 1$, the combined shunt capacitance of $N$ switches is seen in addition to the contribution of $C_{\text {out A1 }}$.

At the input of the N -path, the total shunt capacitance is the output capacitance of the input buffer $A 1$, plus the total capacitance contributed by $N$ switches as $N$. $C_{p D 1}$. Similarly, the output node capacitance also sees a total shunt capacitance of $N$ times the right-hand side output switch capacitance $C_{p D 2}$ in addition with the output buffers input capacitance $C_{A 2 i n}$.


Figure 4.24: Summary of the parasitic elements in the full N -path circuit.

Consider two transient sample events that follows each other in time. After the first sample, some charge is stored at the input and output nodes. Once the second sample event initiates by switching on the first switch, residual charge left by the first sample on the input will contribute to the charging of the second sample capacitor. This is known as charge sharing and is not a problem as long as the settling time of the residual part is much faster than the actual sample charging. To minimize the charge sharing at the input, the following constraint is introduced:

$$
\begin{equation*}
R_{S}\left(N C_{p D 1}+C_{\text {out A1 }}\right) \ll \tau \tag{4.29}
\end{equation*}
$$

At the output of the N -path the approach is different. As a consequence of the sampling nature of the N -path, the input impedance of the $A 2$ buffer is large and thus the settling time for the output node is slower. However, it is important that the sampling capacitor $C$ is able to discharge onto the output. It is therefore required that the capacitance at the output node is much smaller than C. Formulated in an expression as:

$$
\begin{equation*}
N C_{p D 2}+C_{i n A 2} \ll C \tag{4.30}
\end{equation*}
$$

### 4.2.1.3 Design Trade-Offs

Assuming non-overlapping clocks, the maximum usable delay of an N -path filter with $N$ paths and a sample clock at $f_{s}$ is [31]:

$$
\begin{equation*}
t_{\max }=\frac{N-2}{N f_{s}} \tag{4.31}
\end{equation*}
$$

To investigate different trade-offs with respect to the number off phases and sample clock frequency, eq.4.31 is used to produce the contour plot presented in figure 4.25. The horizontal axis shows the sample clock frequency $f_{s}$ normalized to the input frequency $f_{r f}$ and the vertical axis shows the number of phases $N$. The contours themselves shows two parameters, in black lines and coloured highlight the achieved maximum delay is shown, normalized to the time period of the input frequency $T_{r f}=1 / f_{r f}$; and in red, the relative sampling rate, given by $N f_{s} / f_{r} f$, is included as red contour plots.


Figure 4.25: Contour plot of the maximum achievable delay normalized by the input signal time period $T_{r f}$. This is plotted against the number of phases $N$ on the vertical axis and the sample clock frequency $f_{s}$ normalized against $f_{r f}$. The red curves shows the sample rate in sample/waveform.

It is evident, that to achieve large maximum delay, the sample clock frequency needs to be only a fraction of the input frequency. As an example, say that the required delay is in the order of 8 to $10 T_{r f}$. The resulting sample clock frequency would be in the range of $0.1 f_{r f}$. This implies that for the circuit to accurately reproduce the input signal, a large number of phases are needed. In the case of the 8 to $10 T_{r f}$, around 40 phases are needed to reconstruct four points of the input waveform at the output. However, if $N=40$ then the capacitive loading of the input and output nodes would introduce large amount of charge sharing (see eq.4.29 and eq.4.30) and deteriorate the TTD performance of the circuit. On the other end of the spectrum, where the sample clock is only some fractions smaller than the input signal, the system cannot achieve larger delays.

### 4.2.2 Time Interleaved N-path

While the N-path can achieve relative low delay variation over large bandwidths, high delay resolution and large tuning range, the approach is severely limited in total achievable delay by the capacitive loading of the sample and reconstruction
branches. To achieve longer delays, the sampling clock pulse needs to be wider in time to alleviate the impact of the parasitic charging and discharging events. This feature will further on be called time expansion. But since the effective sampling frequency of the N-path is directly proportional to the clock frequency, a more innovative approach is needed. If each saved sample stage could be sent to a second sample and hold, where it is held for longer, and $M$ of these second stages are coupled in parallel, the clock frequency of these stages could effectively be lower whilst the sampling frequency still would be given by the first sampling stage as $N f_{s}$. This type of circuit was proposed in [28] and is called a time interleaved (TI) N-path (or switched capacitor circuit). In figure 4.26, such a circuit is shown where the grey squares are the interleav part of the circuit. As seen, each sampling stage now also includes an output buffer that will isolate the sampled state from the output.


Figure 4.26: Schematic of an $N$ by $M$ TI N-path.

In figure 4.27, the clock programming for this type of circuit is shown with an input number of phases $N=8$ and an arbitrary number $M$ phases in the interleaved stage. As seen, the $\Psi$ clock pulse width, $\tau_{2}$, is equal to the $\Phi$ clock pulse width $\tau_{1}$ times $N$, and thus it can be seen that the TI stage has gone through time expansion. Assume that the circuit in figure 4.26 cycles through the $\Phi$ sampling stages. It starts at $\Phi_{1}$ and when it has gone through all the sample states down to $\Phi_{N}$, it needs to once again sample at $\Phi_{1}$. The TI part, $\Psi_{11}$ to $\Psi_{1 M}$, effectively stores the initial $\Phi_{1}$ state, so that when it needs to sample again, it is free to do so without loosing the information. Notice the indexing here: the TI clocks are indexed both by which N phase they interleave, and which M state they are saving within this

N phase ( $\left.\Psi_{N M}\right)$.


Figure 4.27: Clock programming for a TIN-path. In this example, the input sample and output reconstruct ( $\Phi$ and $\Phi_{d}$ respectively) is set to the same phase. $d_{c l k \Psi}$ and $d_{c l k \Psi d}$ is the delay of the interleaved clocks relative to the input sample clock.

As mentioned in section 4.2.1, the limiting factor for the maximum achievable delay for the original N-path is the parasitic loading of the branches when the number of phases increase. In this circuit, this is mitigated in the interleaved stages since the clock frequencies are much lower which gives an increased settling time, meaning that parasitic effects will have less of an impact on performance. Assume that $f_{s}$ is the sampling clock from which the N input sample clocks are generated from, the clock pulse width for the first sample stage and the interleaved stage is $\tau_{1}=1 /\left(f_{s} N\right)$ and $\tau_{2}=N \cdot 1 /\left(f_{s} N\right)=1 / f_{s}$ respectively. Then the total delay achievable in a TI N-path can be formulated as:

$$
\begin{equation*}
t_{\operatorname{maxTI}}=(M-1) \frac{1}{f_{s}} \tag{4.32}
\end{equation*}
$$

While TI techniques can increase the delay of an N-path circuit, it has been known for a long time in analog to digital converter design for achieving higher sampling rates [32]. But interleaving will lead to the creation of spurious free tones on the output due to mismatches in the different branches. These need's to be kept under control to ensure good performance of the circuit. Consider $A$ number of interleaved sample and hold stages, consisting of an input switch, sampling capacitor and output buffer. The total sample frequency achieved is $f_{A}$. In this circuit, three different sources of mismatch will give rise to spurs: DC (direct current) offset, Gain error and Time Skewing. The DC offset mismatch comes from the difference in each output buffers DC offset. This periodic mismatch is superimposed on the reconstructed output and will show up at frequencies of:

$$
\begin{equation*}
f_{D C o f f s e t}=\frac{k}{A} f_{A} \tag{4.33}
\end{equation*}
$$

Where $k$ is an integer equal to $0,1,2, \ldots$. The power of the DC offset tones is proportional to the relative DC offset mismatch between stages. The gain error mismatch
arises from the relative error in gain between stage, and the power is proportional to the magnitude of the mismatch. The periodic nature of this mismatch can be compared to applying a amplitude modulation to the reconstructed output. These spurs show up at a frequency of:

$$
\begin{equation*}
f_{\text {Gerror }}= \pm f_{\text {in }}+\frac{k}{A} f_{A} \tag{4.34}
\end{equation*}
$$

Here $k$ is equal to $1,2,3, \ldots$. The third and final source of spurs in interleaved circuits is the time skewing spurs that arises from clock mismatches. The frequency that these spurs show up is the same as the gain error spur and their power is dependent on the amount of timing error and input frequency (higher frequency would result in larger relative time error).

$$
\begin{equation*}
f_{\text {TimeSkewing }}=f_{\text {Gerror }}= \pm f_{\text {in }}+\frac{k}{A} f_{A} \tag{4.35}
\end{equation*}
$$

### 4.2.3 Adding a Second Time Interleaved Stage and Branching the TI N-path

The TI N-path is proficient in creating large delays; however, each N-path can only create one individual delay. In order to cancel SI signals from multi-reflections surfaces, several delays are required. To achieve this with a TI N-path, several separate circuits could be used in parallel, but considering that the TI N-path in its stand-alone version is hardware intensive, this setup is very inefficient. The proposal is to instead re-use earlier interleaving parts and branch out from these to create several delays which would reduce the total number of stages.

Before this branching technique is discussed, a second TI stage is implemented. The reason for this is that a second TI stage enables more opportunity in the branching and it also introduces the possibility of bypassing an interleaved stage, something which will add re-configurability to the circuit. The principle of the second interleaved stage is the same as when the first TI stages were added, each interleaved node is split into several sample and hold stages with lower frequency clocks, creating an additional time expansion. In figure 4.28, a double TI N-path is shown. It can be realized that the core essence here is the same as in the single TI circuit by comparing the middle part, the large rectangle, with the figure of the single TI circuit in figure 4.26.


Figure 4.28: Introducing a second TI stage into the N-path circuit.

A second interleaving stage does not increase the total delay achieved per number of sample nodes compared to only a single interleaved circuit, but it also does not decrease it. Instead the benefit, in terms of total delay, is the ability to increase the number of phases in the second TI time expansion region without worrying about the capacitive loading. This may be useful if the N-path circuit is implemented at higher frequencies where the time expansion, introduced by the first TI stage, is not enough to effectively mitigate the parasitic effects of the switches. However, in the case of the discussion of the branching technique, the second interleaving stage is not added to provide longer total delay.

Moving on, while the total number of stages is not increased in the double TI N -path circuit (for a given delay), there is additional complexity in the clock programming since an additional set of lower frequency clocks are added as can be seen in figure 4.29 which shows an example of the clock programming.


Figure 4.29: Clock programming of a double TI N-path circuit with the following number of stages: $N=5, M=4, K=3$.

If the clock pulse width of the first sample stage and the first TI stage is defined as it is in the case of the single TI in eq.4.32, that is $\tau_{1}=1 /\left(f_{s} N\right)$ and $\tau_{2}=$ $N \cdot 1 /\left(f_{s} N\right)=1 / f_{s}$ respectively, then the pulse width of the second TI stage can be written as $\tau_{3}=N M \cdot 1 /\left(f_{s} N\right)=M / f_{s}$ and the total achievable delay of the circuit is written as:

$$
\begin{equation*}
t_{\text {maxTIdouble }}=M(K-1) \frac{1}{f_{s}} \tag{4.36}
\end{equation*}
$$

The double TI N-path can achieve a large range of delays via programming of the clocks in the TI stages in different ways. By reprogramming the TI 2 clocks, different coarse delays, separated by $\tau_{3}$, are achieved and it is this stage that sets the maximum achievable delay. After the coarse tuning, a medium tuning is achieved using reprogramming of the TI 1 clocks with the resolution of $\tau_{2}$. The sample and reconstruction clocks achieve the finest clock reprogramming tuning step with $\tau_{1}$ whilst the last ultra-fine tuning is based on a small delay $\Delta \varphi$ introduced in the output reconstruction clock. All of the tuning mechanisms are visualised in figure 4.30.


Figure 4.30: Visual representation of the delay tuning mechanism for a double TI N-path circuit.

Now that the second TI stage is properly introduced, the discussion continues with the branching technique. As mentioned earlier, if earlier interleaved stages could be re-used, the overall number of switches could be reduced whilst still achieving the same delay. The idea is reinforced by an example, presented in figure 4.31, in which six delays are created in clusters of two using a double TI circuit. To the left in the figure, the conventional way of implementing each delay with an individual N-path tap is seen, while to the right the proposed branched structure is seen.


Figure 4.31: Comparison of the conventional parallel TI N-path and the proposed branched TI N-path.

Using the same branching as in the example, then the total number of switches in the parallel approach can be written as:

$$
\begin{equation*}
\# \text { Switches }_{\text {Parallel }}=(2 N+2 N M+2 K M N) \cdot 6 \tag{4.37}
\end{equation*}
$$

And for the branched technique, the total number of switches can be written as:

$$
\begin{equation*}
\# \text { Switches }_{\text {Branched }}=N+N M+2 \cdot 3 N M K+6 M N+N \tag{4.38}
\end{equation*}
$$

From the eq.4.37 and eq.4.38, the total number of stages in a branched structure can be compared to that of the parallel structure. If $N=8$, and if $M$ and $K$ are swept from 2 to 10, then the total number of stages can be reduced by almost $50 \%$ (49.17\% for $\mathrm{M}=\mathrm{K}=10$ ). The way the nodes branch also affects the total reduction in the number of stages. If, in figure 4.31, the second TI would branch into 2 nodes, and the de-interleaving of the first TI would branch into 3 nodes, then the total number of stages would be reduced by almost $65 \%$ ( $64.19 \%$ for $\mathrm{M}=\mathrm{K}=10$ ). The general trend here is that the later in the chain the nodes branch out, the higher the reduction in number of stages is. However, this comes with a trade-off. Since each delay now shares more of the lower frequency stages (TI 1 and TI 2), they will be less separated in the delay within the cluster and thus the clusters becomes more narrow. An extreme example of this is if all the branching happens at the reconstruction stage. Then each delay will only be separated by the fine tuning of reconstruction stage with a maximum separation of $(N-1) \tau_{1}$.

### 4.2.4 $\mathrm{gm}_{\mathrm{m}}$ based delays

This subsection will shortly mentioned one other option of active circuits which could be used for TTD generation, namely $g_{m}$ based delays. A introductory analysis of the common first-order all-pass approach shows that this circuit, on it's own, can not achieve the required delay range of the RP SI targeted in this report. Different approaches, such as linearization techniques presented in [33] or higher order $g_{m}$ all-pass filters, could increase the single stage delay at higher frequency. Combining this with tuning method such as the binary weighted delay chain presented in section 4.1.2.1, greater delay ranges could be covered.

The $g_{m} C$-cell or the $g_{m} R C$-cell is based on all-pass filters constructed of transconductances and capacitances. Active component, such as a MOSFET, is used to implement the transconductance. A common approach when designing $\mathrm{g}_{\mathrm{m}}$ based delays is to start at all-pass filter designs. The transfer function of an ideal TTD with a delay of $\tau, H(s)=e^{-s \tau}$, can be approximated by a first order all-pass filter using a bilinear technique where the expression is first re-written in exact form and then Taylor expanded in the first order [34]:

$$
\begin{equation*}
H(s)=e^{-s \tau}=\frac{e^{-s \tau / 2}}{e^{s \tau / 2}} \approx \frac{1-\frac{s}{\omega_{0}}}{1+\frac{s}{\omega_{0}}} \tag{4.39}
\end{equation*}
$$

In this equation $\omega_{0}=2 / \tau$ is the characteristic angular frequency. However since this is an approximation, the delay or phase response of the first order all-pass filter will start to deviate at higher frequencies. In figure 4.32 the phase and resulting delay of the first-order all-pass filter is compared with the ideal TTD. As the frequency increases, the phase shift of the all-pass filters starts to drop off, resulting in an non-constant delay over frequency.


Figure 4.32: The left side plot presents the phase of an ideal TTD, the magenta line, compared to a first order approximation, the blue line. To the right, the resulting time delay is plotted.

This deviation from ideal TTD behaviour is evident also in the time-domain by looking at the delay response of a first-order all-pass as presented in figure 4.33.


Figure 4.33: Delay plots for different values of $\omega_{0}$ to showcase the increased delay variation over frequency

It is evident, that the first-order approximation is inherently flawed if it is to be used to create large delay at high frequencies. Two approaches can be taken to work around this problem. Either, the circuit is designed using higher order filter approximations, or multiple lower order $\mathrm{gm}_{\mathrm{m}} \mathrm{C}$-cells are cascaded. In reality, if the targeted delay for the circuit is multiple of 10 's of $\lambda$ at 10 GHz , then a combination of these techniques ought to be used.

## Circuit Design

After presenting the theory of the active and passive TTD in chapter 4, the design and simulation of these circuits will be presented in this chapter. The design of the passive and active TTD circuits are very different and will therefore be presented separately.

### 5.1 Design of Passive TTD

In the theoretical chapter about the passive TTD (chapter 4), several circuits for generating a delay were presented, along with two methods for tuning a passive TTD. In this chapter the knowledge from the theory will be used together with the simulation software to design an entire passive TTD circuit.

The first step is to define what the aim is for the passive TTD delay. In the SIC pre-study a hybrid topology was proposed, in which the passive TTD was to be used for a pre-LNA injection SIC. This is because a passive TTD can be implemented with lower noise degradation and less intermodulations than an active TTD. However, the passive TTD can not achieve as long delays as active delays. Therefore, the passive TTD will be aimed at DP interference with delay of $15 \lambda$ or shorter. Since the DP interference has the highest signal power, this is the interference of most importance to cancel out before the LNA. If too high powered interference signal is sent into the LNA it could cause compression and cause non-linearity.

The aim of the passive TTD is to achieve long enough delays to cover all DP interference of up to approximately $15 \lambda$ of delay time, as well as achieving high enough cancellation for the DP SI so that the LNA is not compressed by it. The achieved cancellation depends on the delay error and amplitude error of the TTD according to figure 3.9. Therefore, the delay error from the passive TTD should be low enough to provide the corresponding required cancellation. Apart from making sure the LNA does not saturate, the passive pre-LNA SIC should together with the active post-LNA SIC provide the required total analog RF cancellation. The amplitude error from the passive TTD should ideally be as low as possible in order to reduce the needed tuning span for the tunable attenuator.

The lattice filters can generate a delay of $\lambda / 2$ for unlimited BW due to their allpass behavior, while the lumped LC transmission line is limited to $\lambda / 4$ due to its low-pass behavior. In order to achieve $15 \lambda$ of delay, a lot of stages would therefore need to be cascaded for both the lattice filter and the lumped LC option. Therefore, a single stage of the delay generating circuit would need to have very low loss, and the $L C-C L$ lattice filter or the lumped $L C$ transmission line is therefore viable options since they do not have any resistors.

The low-pass behavior of the lumped $L C$ transmission line could be used to implement the switching mechanism needed for a binary weighted delay chain by intentionally decreasing the cut-off frequency below the operating frequency. Also, the lumped LC transmission line does not require crossed routings like the lattice filters, which facilitates layout implementation. Therefore, the lumped LC transmission line is found to be a suitable choice for implementation of the binary weighted delay chain.

For fine tuning, the tunable coupled inductor method could be applied to a delay generating circuit containing inductors. In order to maintain a low amplitude error, the $L R-R L$ lattice filter is chosen as the delay generating circuit due to its all-pass behavior. Using both the binary weighted delay chain using cascaded lumped $L C$-transmission lines and the $L R-R L$ lattice filter with tunable coupled inductors, the entire delay span can be covered as well as achieving a high delay resolution.

Arrangement of coarse and fine tuning delays are shown in figure 5.1. The binary weighted delay chain is placed before the $L R-R L$ lattice filter because it has much higher loss and therefore reduces power handling requirement on the $L R-R L$ lattice filter.


Figure 5.1: The entire passive binary weighted TTD, with its two fundamental parts

This system is designed for operating at 10 GHz , which means that one $\lambda$ corresponds to 100 ps . The aim for the passive TTD is to achieve $15 \lambda$ of maximum delay, which at 10 GHz is 1500 ps . Since the binary weighted delay will have a total delay of about twice the delay of the Most Significant Bit (MSB), the delay of the MSB should therefore be at least 750 ps . For some design margin and eased design process the delay of the MSB is chosen to be 800 ps . The next step is to choose the delay of the Least Significant Bit (LSB) of the binary weighted delay chain. Since the delay of the MSB $T d_{M S B}$ is set at 800 ps , the LSB will depend on the number of bits $N$ according to eq.5.1.

$$
\begin{equation*}
T d_{L S B}=\frac{T d_{M S B}}{2^{N}}=\frac{800 \mathrm{ps}}{2^{N}} \tag{5.1}
\end{equation*}
$$

In order to cover the entire delay range, the tuning span of the fine tuning $L R-R L$ lattice filter needs to be as big as the delay of the LSB in the binary weighted delay. Therefore, the LSB of the binary weighted delay block should be small enough so that the fine tuning $L R-R L$ lattice filter is capable of covering a tuning span that big. The tuning span of the $L R-R L$ lattice filter can not be too large since the phase shift of lattice filters follows a non-linear arctan curve as shown in figure 4.2. The lattice filter have a maximum tuning span of 50 ps and it is the most linear around the midpoint frequency at 25 ps . A reasonable tuning span for the $L R-R L$ lattice filter in order to keep it somewhat near 25 ps that is also a possible LSB according to eq. 5.1 is 12.5 ps . This will result in 7 bits of tuning for the binary weighted delay chain.

The tuning span of the $L R-R L$ lattice filter $T d_{\text {span }}$ only has to be equal to the LSB of the binary weighted delay chain minus the delay resolution $T d_{\text {res }}$. This is because the difference between the tuning span of the $L R-R L$ lattice filter and the LSB of the binary weighted delay chain can be the same as the delay resolution and still cover the entire delay span. For $N_{L R R L}$ bits of tuning in the $L R-R L$ lattice filter, the delay resolution will be

$$
\begin{equation*}
T d_{\text {res }}=\frac{T d_{\text {span }}}{2_{L R-R L}^{N}-1}=\frac{12.5-T d_{\text {res }}}{2_{L R-R L}^{N}-1} \quad \Longrightarrow \quad T d_{\text {res }}=\frac{12.5}{2_{L R-R L}^{N}} \tag{5.2}
\end{equation*}
$$

By choosing 3 bits of tuning, the delay resolution will be about 1.56 ps which with ideal amplitude matching will result in about 20 dB of cancellation according to figure 3.9. The tuning span for the $L R-R L$ lattice filter will then be about 11 ps which should be a feasible tuning span for the $L R-R L$ lattice filter. In order to keep the tuning span centered around 25 ps , the $L R-R L$ lattice filter is set to have a tuning span between 20-31 ps.

In total the passive TTD will theoretically have more than 1600 ps of maximum delay and 1.56 ps of delay resolution using 10 bits of tuning. This is shown in table 5.1 where the delay span, resolution and number of bits is shown for the course tuning binary weighted delay chain and the fine tuning $L R-R L$ lattice filter individually aswell as in total.

Table 5.1: Delay requirements of the passive delay

|  | Binary Weighted $L C$ | $L R-R L$ lattice | Total |
| :--- | :--- | :--- | :--- |
| Delay span $[\mathbf{p s}]$ | $[1587.5-0]$ | $[31-20]$ | $[1618.5-20]$ |
| Resolution $[\mathbf{p s}]$ | 12.5 | 1.56 | 1.56 |
| Bits $[-]$ | 7 | 3 | 10 |

The next step will be to design the two parts of the passive TTD individually according to the chosen requirements, which will be conducted in the two following subsections.

### 5.1.1 Binary weighted delay with lumped $L C$ transmission lines

The binary weighted delay blocks are weighted to different delay times by cascading different amounts of lumped $L C$ transmission line stages (LC stages) for each delay block. The switching mechanism for the delay path of a certain delay block differs depending on whether the delay time of the delay block is $\geq \lambda$ or $<\lambda$. For delay blocks with a delay time $\geq \lambda$, the topology shown in figure 5.2 is used.


Figure 5.2: Delay block with $\lambda$ or longer delay time

The first and last $L C$ stage of the delay block are designed as quarter wavelength transmission lines. This means that the inductance and capacitance of the LC stage are designed for a $-90^{\circ}$ phase shift, which occurs at the angular cut-off frequency given in eq.4.12. In order to also match the impedance of the $L C$, given in eq.4.13, to a characteristic impedance of $50 \Omega$ at 10 GHz , the inductance $L$ and capacitance $C$ of the quarter wavelength $L C$ stage are calculated as.

$$
\begin{aligned}
\mathrm{Z}_{0}=50 & =\sqrt{\frac{L}{C}} \quad \& \quad \omega_{c}=10 \mathrm{GHz}=\frac{1}{\sqrt{L C}} \\
\Longrightarrow L & =796 \mathrm{pH} \quad \& \quad C=318 \mathrm{fF}
\end{aligned}
$$

By placing shunt switches after these stages, the input impedance can be switched from low to high when the shunt switches are switched on. When the shunt switches are turned off the impedance is high at that side of the quarter wavelength transmission line but inverted at the other side. As shown in figure 5.3, the input impedance of the delay block will be low when the shunt switches are turned off and high when they are turned on. As well as functioning as a switching mechanisms for the delay path, the two quarter wavelength $L C$ stages contributes with a quarter wavelength of delay each.


Figure 5.3: The impedance is inverted when switching on and off the switch after the quarter wavelength transmission line

The two middle blocks called $L C_{L P}$ are modified $L C$ stages and also functions as switching mechanisms for the delay path. The low-pass behavior of the lumped $L C$ transmission line can be used to switch on and off the circuit. By increasing the parallel capacitance, the cut-off frequency of the lumped $L C$ transmission line will decreases and create isolation for the delay path. In order to be able to switch on and off a bigger capacitance, two capacitances are used in parallel on either side of the inductors as shown in figure 5.4.


Figure 5.4: The $L C_{L P}$ stage with a low-pass switching mechanism

When the switches are off, the total capacitance on each side of the inductors should be designed such that the angular cut-off frequency is equal to the angular operating frequency. The $L C$ stage will then generate a delay of $\lambda / 4$. The capacitances $C_{l o}$ should be designed such that the total capacitance from $C_{l o}$ and the off-capacitance of the switches $C_{o f f}$ is equal to the capacitance needed for an angular cut-off frequency equal to the angular operating frequency. Using eq.4.12 the size of $C_{l o}$ can be calculated.

$$
\begin{aligned}
\omega_{c} & =\frac{1}{\sqrt{L\left(C_{l o}+C_{o f f}\right)}} \\
& \Longrightarrow C_{l o}=\frac{1}{\left(\omega_{c}\right)^{2} \cdot L}-C_{o f f}
\end{aligned}
$$

The capacitance $C_{h i}$ should be high enough so that the cut-off frequency is decreased sufficiently below the operating frequency to provide enough isolation.

For setting the delay time of the entire delay block, the block called $L C_{c c}$ is used which contains $N_{c c}$ cascaded LC stages as shown in figure 5.5. These LC stages are halved in size to minimize the delay variation over frequency. This is because the cut-off frequency will be increased, which will cause the delay to vary less over frequency since the cut-off frequency is now further away from the operating frequency. The delay of each $L C$ stage will also be halved, only producing 12.5 ps of delay, which means that double the amount of stages will have to be used to achieve the same delay as when using LC stages with 25 ps of delay. However, in total the delay variation was found to be lower for achieving the same delay when using halved $L C$ stages.


Figure 5.5: The $L C_{c c}$ stage with $N_{c c}$ cascaded $L C$ stages

The amount of cascaded $L C$ stages in each delay block depends on how long delay time the entire delay block should have in total $T d_{\text {block }}$. Together the $L C_{\lambda / 4}$ and $L C_{L P}$ blocks generate 100 ps of delay, which means that the two blocks of cascaded LC stages should each generate a delay of $T d_{c c}$ given in eq.5.3.

$$
\begin{equation*}
T d_{c c}=\frac{T d_{\text {block }}-100 \mathrm{ps}}{2} \tag{5.3}
\end{equation*}
$$

Using the LC stages with 12.5 ps of delay, the number of cascaded stages $N_{c c}$ should therefore be

$$
\begin{equation*}
N_{c c}=\frac{T d_{c c}}{12.5 \mathrm{ps}} \tag{5.4}
\end{equation*}
$$

Using eq.5.3 and eq.5.4, the delay $T d_{c c}$ and number of cascaded stages $N_{c c}$ are calculated for all delay blocks with $\geq \lambda$ of delay and shown in table 5.2.

Table 5.2: The delay and number of cascaded stages for all delay block with $\geq \lambda$ of delay

|  | $8 \lambda$ | $4 \lambda$ | $2 \lambda$ | $\lambda$ |
| :--- | :--- | :--- | :--- | :--- |
| $T d_{c c}[\mathrm{ps}]$ | 350 | 150 | 50 | 0 |
| $N_{c c}[-]$ | 28 | 12 | 4 | 0 |

For smaller delay blocks with below 100 ps of delay time, the topology will be different since the delay time will be too long if both quarter wavelength stages with shunt switches and low-pass stages are used. Therefore, only quarter wavelength stages are used for the $\lambda / 2$ delay block as shown in figure 5.6. In addition, series switches are used for the smaller delay blocks in order to achieve high enough isolation through the delay path. Shunt switches are placed both between the quarter wavelength stages and after the series switches in order to achieve high enough isolation.


Figure 5.6: Topology for the $\lambda / 2$ delay block

The two smallest delay blocks used are the $\lambda / 4$ and $\lambda / 8$ delay blocks which both use the same topology but with differently sized LC stage as shown in figure 5.7.


Figure 5.7: Topology for the $\lambda / 4$ and $\lambda / 8$ delay blocks

Two inductors were designed to optimize the quality factor and minimize the loss in the delay line. One bigger inductor for the 25 ps $L C$ stage as shown in figure 5.8 and a smaller one for the $12.5 \mathrm{ps} L C$ stage shown in figure 5.9.


Figure 5.8: Layout of the inductor for the $L C$ stage with 25 ps of delay


Figure 5.9: Layout of the inductor for the $L C$ stage with 12.5 ps of delay

The inductors were drawn on layout and simulated using an electromagnetic simulation tool. The design parameters for the inductor coils, including inner diameter, separation and width, and the results for the inductance and the quality factor are disclosed in table 5.3.

Table 5.3: Design parameters and results of the designed inductors

|  | In. Dia. [um] | Sep. [um] | Width [um] | L [pH] | Q |
| :--- | :--- | :--- | :--- | :--- | :--- |
| Bigger Ind. | 65 | 5 | 9 | 395.7 | 24.92 |
| Smaller Ind. | 32 | 4 | 6 | 215.0 | 19.66 |

The binary weighted delay chain should have the delay blocks in ascending order. This way the signal power will be reduced for the delay blocks later in the chain since the biggest delay blocks have the most attenuation as shown in figure 5.10.


Figure 5.10: Signal power being attenuated the most by the biggest delay block

The biggest delay block in the front of the chain will need to handle the highest signal power. Therefore, the bypass switches for the biggest delay blocks needs to be designed for high linearity. For all of the bypass switches the isolation also needs to be high while maintaining a low-loss. The isolation and linearity is increased by stacking transistors in series and to use ground connected transistors as explained in theory for the binary weighted delay chain in section 4.1.2.1.

The bypass switches are designed to attenuate the signal equally much as the delay path by adjusting the gate width of the transistors. This way the needed tuning range for the variable attenuator in the SIC path is reduced. Since the bypass switches inevitably introduces some parasitic delay, the capacitances in the $L C$ stages is increased somewhat in order to make sure the difference in delay time between the delay path and the bypass path is what the delay blocks are designed for. In practice there will always be some process variations which will affect the delay time, which is solved by fine calibration of the capacitances.

After calibrating the capacitances and the bypass switches for all delay blocks, the delay blocks were simulated with the results in table 5.4. It is seen that the parasitic delay and loss scales with the size of the delay blocks for the bigger delay blocks. However, for the smaller delay blocks, the loss and parasitic delay was not possible to be reduced further. This is because the below $\lambda$ delay blocks use series switches in the delay path.

The isolation should ideally be over 60 dB for all delay blocks in order to hinder the signal from passing through the wrong path. If not enough isolation is achieved, some of the signal will have another delay time apart from the desired one, which will cause insertion of uncorrelated SI signal into the Rx chain. For two of the delay blocks the isolation is found to be slightly below 60 dB . This is not detrimental to performance, but could be is a possible future improvement. In general this design have solved the major design criteria for the binary weighted delay chain and is now ready for use.

Table 5.4: Simulation results for all delay blocks in the binary weighted delay chain

| Delay block | $8 \lambda$ | $4 \lambda$ | $2 \lambda$ | $\lambda$ | $\lambda / 2$ | $\lambda / 4$ | $\lambda$ |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| Td ON [ps] | 813.7 | 406.1 | 203.4 | 102.8 | 54.7 | 29.7 | 16.8 |
| Td OFF [ps] | 13.8 | 6.1 | 3.3 | 2.6 | 4.7 | 4.7 | 4.4 |
| Td Diff [ps] | 800.0 | 400.0 | 200.1 | 100.2 | 50.0 | 25.0 | 12.4 |
| Loss ON [dB] | 15.23 | 7.60 | 3.90 | 1.80 | 1.45 | 1.46 | 1.42 |
| Loss OFF [dB] | 15.21 | 7.22 | 3.60 | 1.86 | 1.43 | 1.47 | 1.41 |
| Iso. Delay [dB] | 102.4 | 68.4 | 61.9 | 61.4 | 97.8 | 102.7 | 54.9 |
| Iso. Bypass [dB] | 185.6 | 147.1 | 114.8 | 77.0 | 55.5 | 61.1 | 60.05 |

### 5.1.2 LR-RL lattice fine tuning filter with coupled inductors

The $L R-R L$ lattice filter incorporates the tunable coupled inductors as shown in figure 5.11. The effective inductance of the primary inductor will increase when the tunable capacitor is increased. This will cause the midpoint frequency to decrease, which will increase the phase shift at the operating frequency and therefore also increase the delay.


Figure 5.11: The tunable coupled inductor implemented in an $L R$ $R L$ lattice filter

The first step in the design process is to determine the resistances $R$ in the lattice filter. The lattice filter must fulfill eq. 4.2 in order to be an all-pass filter and have the same input resistance for all frequencies. This can not be fulfilled for all tuning levels, because the impedance of the lattice filter must be altered for tuning the delay time. Therefore, the best option is to design the lattice filter as an all-pass filter at the midpoint frequency. At the midpoint frequency

$$
\frac{R}{L}=2 \pi f_{0} \quad \Longrightarrow R=2 \pi f_{0} L
$$

which means the all-pass criterion is

$$
\frac{Z}{Z_{0}}=\frac{2 \pi f_{0} L}{Z_{0}}=\frac{R}{Z_{0}}=\frac{Z_{0}}{R}
$$

For a characteristic impedance of $50 \Omega$ the lattice resistance $R$ should preferably also be $50 \Omega$. The next step is to find the inductance span that corresponds to the desired delay span of [20-31] ps. Since the phase shift is equal to

$$
\phi=-2 \arctan \left(\frac{\omega_{0}}{\omega_{m}}\right)=-2 \arctan \left(\frac{2 \pi f_{0} L}{R}\right)
$$

and the delay time is equal to

$$
\begin{equation*}
T d=-\frac{\phi}{360 f_{o}}=\frac{\arctan \left(\frac{2 \pi f_{0} L}{R}\right)}{180 f_{o}} \tag{5.5}
\end{equation*}
$$

the inductance can be written as

$$
\begin{equation*}
L=\tan \left(\pi f_{o} T d\right) \cdot \frac{R}{2 \pi f_{o}} \tag{5.6}
\end{equation*}
$$

The maximum and minimum inductance corresponding to the desired tuning will therefore be

$$
\begin{aligned}
& L_{\max }=\tan \left(\pi f_{0} \cdot 31 \mathrm{ps}\right) \cdot \frac{R}{2 \pi f_{0}} \approx 1171 \mathrm{pH} \\
& L_{\min }=\tan \left(\pi f_{0} \cdot 20 \mathrm{ps}\right) \cdot \frac{R}{2 \pi f_{0}} \approx 578 \mathrm{pH}
\end{aligned}
$$

The effective inductance can be plotted using eq. 4.24 over capacitance for different primary and secondary inductances as shown in figure 5.12 and 5.13. A coupling factor of 0.8 is assumed for these plots and the maximum and minimum effective inductances, $L_{\max }$ and $L_{\min }$ are also shown in the plots as straight lines. The primary inductance determines the lowest effective inductance, that is when no capacitance is added. Since the binary weighted tunable capacitor will inevitably have some parasitic capacitance when the switches are turned off, the primary inductance should be set slightly below the minimum effective inductance. This way the parasitic capacitance can be accounted for and the correct delay span can be achieved.

It is seen that a pole occurs at a capacitance of approximately 500 fF when a secondary inductance of 500 pH is used. The capacitance at which the pole occurs is higher when lower secondary inductances are used. The secondary inductance therefore determines the rate at which the effective inductance increases in relation to the capacitance. If a large secondary inductance is used, the tuning span of the tunable capacitor will therefore be very small. The tuning span for the tunable capacitor can not be too small because that will require very small capacitors, which can be difficult to achieve. Therefore, the secondary inductor should be smaller than the primary inductor to increase the tuning span of the capacitor. However, if a too small secondary inductor is used the required capacitance span will be very big, which would require a high on/off ratio for the switchable capacitors, which could also be difficult to achieve.


Figure 5.12: Effective inductance as a function of capacitance for different primary inductances


Figure 5.13: Effective inductance as a function of capacitance for different secondary inductances

The coupled inductors are drawn on layout and simulated using an electromagnetic simulation tool in order to more realistically test the coupling effect. This is because the Process Design Kit (PDK) does not include any suitable transformer. The primary, figure 5.14, and secondary inductor, figure 5.15, are drawn on neighbouring metal layers with similar diameter and wire width to increase the coupling coefficient. Since the secondary inductance is to be made smaller than the primary for allowing a wider capacitance span, the size difference can be achieved by using a turn ratio of $2: 1$. The down conversion of the voltage due to the $2: 1$ turn ratio also improves linearity since less voltage swing is exerted in the switches in the tunable capacitor. The size of the inductance for the two inductors are chosen to fulfil the requirements mentioned above, i.e. a primary inductance of slightly below $L_{\max }$ for off-capacitance margin and a secondary inductance of about 200 pH for a wide capacitance span.


Figure 5.14: Layout of the primary inductor coil


Figure 5.15: Layout of the secondary inductor coil

The inductances, quality factors and coupling factor for the final design of the coupled inductor are shown in table 5.5 at 10 GHz . The final choice of a primary inductance at 566 pH was chosen in order to leave a little margin to the theoretical maximum inductance for the inevitable off-capacitance while still having a high enough inductance to achieve the required inductance tuning span. The choice of secondary inductance was more or less a result of using a 2:1 turn ratio, which caused the inductance of 174 pH , which turns out to be a suitable tuning span for the tunable capacitor.

Table 5.5: Simulation results for the layout of the coupled inductors at 10 GHz

| parameter | value | unit |
| :--- | :--- | :---: |
| $L_{p}$ | 566 | pH |
| $L_{s}$ | 174 | pH |
| $k$ | 0.739 | - |
| $Q_{p}$ | 25.7 | - |
| $Q_{s}$ | 19.1 | - |

Using the the simulated results of the coupled inductor, the approximate maximum and minimum capacitance for the tunable capacitor can be calculated using eq.4.25.

$$
\begin{equation*}
C=\left(\frac{\omega^{2} L_{p} L_{s} k^{2}}{L_{e f f}-L_{p}}+\omega^{2} L_{s}\right)^{-1} \tag{5.7}
\end{equation*}
$$

The maximum and minimum capacitance, $C_{\max }$ and $C_{\min }$, should based on eq.5.7 be approximately 963 fF and 55 fF . This however, is only an approximation of the capacitance span based on the assumption that there are no losses in the inductors. The delay time of the entire $L R-R L$ lattice filter is simulated using the coupled inductors drawn in layout and sweeping the capacitance of an ideal capacitor as shown in figure 5.16 together with straight lines for the required maximum and minimum delay.


Figure 5.16: Delay vs capacitance of the $L R-R L$ lattice filter using the drawn coupled inductor and an ideal capacitor

A more accurate estimation of the maximum and minimum capacitance is then found by extracting the capacitances at which the simulated delay is equal to the required maximum and minimum delay. The maximum capacitance of the tunable capacitor is found to be $C_{\max }=1125.2 \mathrm{fF}$ and the minimum capacitance is found to be $C_{\text {min }}=255.8 \mathrm{fF}$. Since three bits of tuning is to be used, the on- and off-capacitance of the LSB for the tunable capacitor should approximately be

$$
\begin{align*}
& C_{B, o n}=\frac{C_{\max }}{N^{2}-1}=\frac{1125.2 f F}{2^{3}-1} \approx 160.7 f F  \tag{5.8}\\
& C_{B, o f f}=\frac{C_{\min }}{N^{2}-1}=\frac{255.8 f F}{2^{3}-1} \approx 36.5 f F \tag{5.9}
\end{align*}
$$

which means the series capacitance $C_{S}$ should approximately be

$$
\begin{equation*}
C_{s}=C_{B, o n} \cdot 2=321.5 f F \tag{5.10}
\end{equation*}
$$

For the switchable capacitors, the off-capacitance is calibrated by adjusting the transistor width. The transistor can not have too wide gate width because that will result in too much off-capacitance. But it can also not be too thin because that will increase the on-resistance, which degrades the Q-factor. After optimizing the series capacitance and the transistor parameters to fulfill the requirements, the final design in table 5.6 with the results in table 5.7 was reached, $m_{c}$ is the multiplicity factor of the series capacitances, $W$ is the total gate width, $n f$ is the number of gate fingers, $L$ is the gate length and $m_{t}$ is the multiplicity factor of the transistor. The number of fingers is set to an odd number in order to have equal number of drain and source connections, which is beneficial for a transistor working as a switch. For the two and four bit switchable capacitors, the multiplicity factors are doubled and quadrupled respectively.

Table 5.6: Design parameters of the LSB switchable capacitor

| parameter | value | unit |
| :--- | :--- | :--- |
| $C_{s}$ | 325 | fF |
| $m_{\mathcal{c}}$ | 1 | - |
| $W$ | 38 | $\mu \mathrm{~m}$ |
| $n f$ | 9 | - |
| $L$ | 20 | nm |
| $m_{t}$ | 2 | - |

Table 5.7: Simulation results of the LSB switchable capacitor

| parameter | value | unit |
| :--- | :--- | :--- |
| Con | 189.7 | fF |
| Coff | 51.0 | fF |
| $Q$ | 16.7 | - |

The finished tunable capacitor produces the capacitances shown in figure 5.17 which corresponds to the effective inductances in figure 5.18. It is seen that the capacitance is very linear across the levels while the effective inductance is not as linear. This is because the effective inductance is non-linearly dependent on the capacitance. To improve the linearity of the effective inductance, the tunable capacitor would have to be made non-linear, which would not be possible with a binary weighted tuning scheme.


Figure 5.17: Capacitance of the tunable capacitor for all the levels


Figure 5.18: Effective inductance for all the levels

The delay time for the $L R-R L$ lattice filter is shown across the levels in figure 5.19. In figure 5.20 the delay for all levels are shown over frequency. It is seen that the tuning of the delay is not completely linear because of the non-linear inductance tuning. The delay time also varies over frequency which limits the bandwidth for the SIC. Further improvement of the delay tuning could be achieved by weighting the capacitance tuning such that the delay is tuned linearly, but this will not be covered in this report.


Figure 5.19: Delay of the $L R-R L$ lattice filter for all levels at 10 GHz


Figure 5.20: Delay of the $L R-R L$ lattice filter for all levels over frequency

### 5.2 Design of the N-path TTD circuit

In section 4.2.1.3, the design trade-offs for large total delay versus clock frequency and number of stages are presented. Initially, the number of stages $N$ and the clock frequency $f_{s}$ is chosen based on the effective sampling rate $N f$ and the total delay achieved $(N-2) / N f_{s}$. If a sampling frequency of $4 f_{R F}=40 \mathrm{GHz}$ and a total delay of $1500 \mathrm{ps}(15 \lambda$ at 10 GHz$)$ is required, then the total number of stages can be calculated to be $N=62$ with a clock frequency of $f_{s}=645 \mathrm{MHz}$. To branch out into 62 parallel nodes at 10 GHz does pose its own challenges, add to that the parasitic effect of the 62 switches described in eq.4.29 and eq.4.30 and shown in figure 4.23, and this design can be ruled out as unfeasible. In more general terms, this example shows that the original N-path can never achieve the plus $15 \lambda$ delays that is targeted. However, by utilizing TI in the N-path, the effects of this problem is effectively reduced by the time expansion (or frequency reduction) in the interleaved stage. The following design section will cover, in direct order, the design of two original N-path filter with $f_{s}=5 \mathrm{GHz}$, and $f_{s}=3 \mathrm{GHz}$ respectively, the principle of interleaving the N -path to increase the achieved delay and finally the new innovative way of branching TI N-paths to achieve multiple delays with a reduced number of stages.

### 5.2.1 Original N -path

Two different N -path circuits was designed to showcase different combinations of $N$ and $f_{s}$. The first one utilizes a 5 GHz clock, which is half of $f_{R F}$, placing moderate restrictions on the parasitic loading $(N=8)$ while keeping the clock frequency relatively low. The second design point has lower sampling clock frequency (3 $\mathrm{GHz})$ and thus more stages $(N=16)$, but will achieve larger delay. Carrying on after this section, only the first design will be considered when TI and branching is introduced.

### 5.2.1.1 Sizing of the switches and sampling capacitors

Designing the N-path to function as a TTD circuit, the input RC constant should be much smaller than the clock pulse width $\tau$ and the output RC constant should be much larger. The expression for these RC constants are found in eq.4.28. A second constraint is the charge sharing between the different phase nodes, which are formulated in eq.4.29 and eq.4.30. In table 5.8, these equations are summarized and combined with the constraints described in [31].

Table 5.8: Summary of design equations for the original N-path including constraints from [31].

| Design aspect | Expression | Constraint |
| :--- | :---: | :---: |
| Low input RC constant | $\left(R_{S}+R_{s w 1}\right) C$ | $<\tau / 10$ |
| High output RC constant | $\left(R_{L}+R_{s w 2}\right) C$ | $>10 \tau$ |
| Charge sharing at input | $R_{S}\left(N C_{p D 1}+C_{\text {out } A 1}\right)$ | $<\tau / 20$ |
| Charge sharing at output | $N C_{p D 2}+C_{\text {inA2 }}$ | $<C / 5$ |

Designing the N-path circuit starts from the input and output buffer. The main parameters are the output resistance of the input buffer $R_{S}$, the input resistance of the output buffer $R_{L}$ and their respective parasitic capacitances $C_{o u t, A 1}$ and $C_{i n, A 2}$. For this project, these buffers were not implemented with real circuits but simulated as ideal blocks. The following parameters where chosen: $R_{S}=20 \Omega$, $R_{L}=1 \mathrm{k} \Omega, C_{\text {out }, A 1}=13 \mathrm{fF}$. The output buffer capacitance $C_{i n, A 2}$ was set to be zero for the initial design, more on this later. From constraint number three in table 5.8, the switch resistance of the left hand switch can be found (switch number one). Introducing the process related parameter $\Gamma=R_{s w} C_{p a r}$, which is assumed to be equal for different transistor sizes within the same technology, the following expression to find the switch resistance is derived:

$$
\begin{equation*}
R_{S}\left(N C_{p D 1}+C_{o u t A 1}\right)=R_{S}\left(N \frac{\Gamma}{R_{s w 1}}+C_{o u t A 1}\right)=\frac{\tau}{20} \Leftrightarrow R_{s w 1}=\frac{\Gamma N}{\frac{\tau}{20 R_{s}}-C_{\text {out } A 1}} \tag{5.11}
\end{equation*}
$$

Once the left hand switch resistance is found, the switch itself is sized. The next steps uses constraint number one from table 5.8 , where the sampling capacitance size is given by:

$$
\begin{equation*}
\left(R_{S}+R_{s w 1}\right) C=\frac{\tau}{10} \Leftrightarrow C=\frac{\tau}{10\left(R_{S}+R_{s w 1}\right)} \tag{5.12}
\end{equation*}
$$

Finally the right hand switch can be sized by utilizing the second constraint:

$$
\begin{equation*}
\left(R_{L}+R_{s w 22}\right) C=10 \tau \Leftrightarrow R_{s w 2}=\frac{10 \tau}{C}-R_{L} \tag{5.13}
\end{equation*}
$$

If the resulting $R_{s w 2}$ is greater than the maximum achievable switch resistance, i.e. minimum device width, of the used technology, there is two options. The value of the sampling capacitance could be increased, and the left hand switch be resized correspondingly, allowing for smaller $R_{s w 2}$. Note however, that this would deteriorate the design constraint set in table 5.8 and lead to worsened delay performance. The second approach is to introduce a second transistor in series in the switch, effectively doubling the on-resistance, but also increasing the parasitic capacitance. The final values for the design equations and their respective constraints are presented in table 5.9 for both designs. In design one, with
lower $N$, the on-resistance of the input and output switch was calculated to be $R_{s w 1}=36.8 \Omega$ and $R_{s w 2}=2.9 \mathrm{k} \Omega$, while the sample capacitance was designed to be $C=44 \mathrm{fF}$. For the second design with a higher number of phases, the double transistor approach was used to better meet the constraints and the following values where calculated and used: $R_{s w 1}=94.2 \Omega, R_{s w 2}=2.3 \mathrm{k} \Omega$ and $C=20 \mathrm{fF}$.

Table 5.9: Constraints check for the final design of original N -path.

| Expression | Design 1 (5 GHz,N=8) |  | Design 2 (3 GHz,N=16) |  |
| :--- | :--- | :--- | :--- | :--- |
|  | Value | Constraint | Value | Constraint |
| $\left(R_{S}+R_{s w 1}\right) \mathrm{C}$ | 2500 fs | $<2500 \mathrm{fs}$ | 2083 fs | $<2083 \mathrm{fs}$ |
| $\left(R_{L}+R_{s w 2}\right) \mathrm{C}$ | 176 ps | $>250 \mathrm{ps}$ | 142 ps | $>208 \mathrm{ps}$ |
| $R_{S}\left(N C_{p D 1}+C_{\text {out A1 }}\right)$ | 1154 fs | $<1250$ | 1135 fs | $<1041 \mathrm{fs}$ |
| $N C_{p D 2}+C_{\text {inA2 }}$ | 1.4 fF | $<8.8 \mathrm{fF}$ | 2.7 fF | $<4 \mathrm{f}$ |

Neither of the circuits meet the output RC constraint (marked in red). However, the values are still in the same order of magnitude, and thus the impact on performance is not detrimental. In fact, as presented in figure 5.21, the only difference seems to be the in the achieved delay and not in TTD performance, at least between 9 to 11 GHz . The designs were set to generate 100 ps and 104 ps respectively in the figure. By increasing $R_{L}$ by a factor of five, the only difference was that for both design the delay increased by some relatively small amount (1.2 ps and 2.2 ps respectively at 10 GHz ).


Figure 5.21: Impact of output buffer input impedance $R_{L}$ on TTD performance of the original N -path. Both design were set to generate around 100 ps of delay. For design one, this means a total of four clock pulses $(\tau=1 /(8 \cdot 5 \mathrm{GHz})=25 \mathrm{ps})$ and for the second delay a total of five clock pulses ( $\tau=1 /(16$. $3 \mathrm{GHz})=20.8 \mathrm{ps}$ )

Finally, figure 5.22 showcases the difference in total delay achieved for the different designs. As seen, design two achieves higher delay due to the fact that it utilizes more phases $N$ and slower clocks, but it is still limited to around 300 ps and increasing the number of phases $N$ will cause the TTD behavior to be less idea.


Figure 5.22: Total delay for design one versus design two.

### 5.2.1.2 Resonating LC tank at reconstruction node

The N-path design started with the specifications of the input and output buffers. In these, the parasitic input capacitance $C_{i n, A 2}$ of the output buffer was set to zero. The reason for this is that the parasitic capacitance $N C_{p D 2}$, originating from the $N$ output switches, forms a low-pass filter with the switch resistance $R_{s w 2}$ that, even without $C_{i n, A 2}$ included, has a pole frequency close to 10 GHz . This lowpass characteristic is highlighted in figure 5.23, were the magnitude response of design one is simulated for different values of $C_{i n, A 2}$. It can be seen, that already at $C_{i n, A 2}=20 \mathrm{fF}$, the signal strength at 10 GHz is well below the 3 dB cut-off frequency. Although the input node shares the same capacitive loading effect, which is proportional to the number of phases $N$, the low output impedance of the input buffer $R_{S}$ ensures that the 3 dB frequency of this low-pass filter falls well above 10 GHz.


Figure 5.23: Output magnitude of first design ( $f_{s}=5 \mathrm{GHz}, N=8$ ) N-path circuit with increasing output buffer input capacitance $C_{i n, A 2}$

To alleviate the design requirements of the output buffer, and to filter the signal, an inductor in series with a capacitor is placed in parallel with $C_{A 2_{i} n}$ to form an LC resonator. The series capacitor $C_{S}$ blocks the DC path to ground and ensures that the bias at the output is not lost. When the DC blocking capacitor is added, it forms a series resonance circuit with the inductance $L_{S}$ as drawn to the left in figure 5.24.


Figure 5.24: LC resonator placed at the reconstruction node to resonate the parasitic capacitance of the output buffer and switches. $C_{S}$ prevents a DC short to ground through the inductor.

Assuming that the quality factor of the series resonance circuit is dominated by the inductor, the resistive part can be written as $R_{S}=\omega_{0} L_{S} / Q_{L}$, where $Q_{L}$ is the quality factor of the inductor at $\omega_{0}$. Combining the reactance of $L_{S}$ and $C_{S}$ into $X_{S}=\omega L_{S}-1 / \omega C_{S}$ allows for a series to parallel conversion of the series resonator, as pictured to the right in figure 5.24 where $Q=X_{S} / R_{S}$. The newly formed reactive element consists of a series $L C$ network, with a resonance frequency of $f_{X r}=1 / 2 \pi \sqrt{L_{X} C_{X}}$, above which the circuit can be considered to be
inductive and thus forms a parallel resonance circuit with $C_{\text {out }, A 2}+N C_{p D 2}$. It is however important that the resonance of the reactive LC element is below 10 GHz , since the closer $f_{X r}$ is to 10 GHz , the less inductive the circuit is and the worse the matching to $C_{o u t, A 2}+N C_{p D 2}$ becomes. When sizing the resonator, the starting point is the combination of the parasitic capacitance and the inductive part of the reactive element. These should resonate at a frequency $f_{r}=10 \mathrm{GHz}$, which, in combination with the following relation $L_{X}=L_{S}\left(Q^{2}+1\right) / Q^{2}$ gives the inductor in the series resonance circuit a value of:

$$
\begin{equation*}
L_{S}=\frac{Q^{2}}{\left(Q^{2}+1\right)\left(2 \pi f_{r}\right)^{2}\left(C_{\text {out }, A 2}+N C_{p D 2}\right)} \tag{5.14}
\end{equation*}
$$

By looking at the expression for the reactive element $X_{S}=\omega L_{S}-1 / \omega C_{S}$, and assuming that reactive part of the inductor dominates at 10 GHz , it is realized that $Q \approx \omega L_{S} / R_{S}=Q_{L}$. Once the size of $L_{S}$ is determined, the DC blocking capacitance $C_{S}$ can be sized by placing the resonance frequency of the reactive element well below 10 GHz . Through simulations, it was suggested that an order of magnitude lower, 1 GHZ , was sufficient. The size of $C_{S}$ can thus be formulated in an equation as:

$$
\begin{equation*}
C_{S}=\frac{1}{\left(2 \pi f_{r} / 10\right)^{2} L_{S}} \tag{5.15}
\end{equation*}
$$

First, the resonance network is set-up in an isolated environment with only $C_{\text {out,A2 }}+$ $N C_{p D 2}$ and the series resonator without any loading via $R_{L}$. As presented in figure 5.25 for $C_{A 2, i n}=20 \mathrm{fF}, 50 \mathrm{fF}$ and 100 fF , the resonance falls close to 10 GHz and, when the value of the inductor increases for lower values of $C_{i n, A 2}$, so does the impedance at 10 GHz . This is due to the fact that in the resistive part, $R_{S}\left(1+Q^{2}\right)$, the quality factor is constant $\left(Q \approx Q_{L}\right)$ but the resistance $R_{S}$ is proportional to $L_{S}$. As a consequence of this, the loaded quality factor of the final parallel resonance circuit will increase with increasing parasitic capacitance, as seen by the dotted lines in figure 5.25 and also in the case when the resonance circuit is implemented in the real N -path circuit as presented in figure 5.26.


Figure 5.25: Isolated simulations of LC resonator to be implemented at the reconstruction node of the N -path carried out for unloaded and loaded circuits with $C_{i n, A 2}=20 f$, $50 f$ \& $100 f$ and $Q_{L}=20$.

The increased quality factor of the resonance circuit is not necessarily a wanted result since a larger quality factor means that the 90 degree phase response of the resonance circuit is sharper - which deteriorates the TTD behaviour of the circuit as shown in figure 5.27.


Figure 5.27: Phase response of the N -path with an LC resonator at the reconstruction node for different values of $C_{i n, A 2}$. As the capacitance increase, the quality factor also increases and the 90 degree phase shift is more visible.

Finally, the filtering of the signal should be mentioned as a side-effect of the resonator. At the reconstruction node, the sampled input signal is reconstructed as a sampled version of the input which now also contains some higher frequency components. The resonator will filter both the higher frequency components and also the lower frequency components, given its band-pass behaviour. In figure 5.28 , the output waveform is shown for the N -path circuit with a sinusoidal input at 10 GHZ . When the resonator is not present and the signal is not filtered, the charge and discharge nature of the sample and hold circuit is quite evident. However, once the resonator is implemented, the signal is filtered and the output waveform becomes cleaner. Note that although this filtering is a side effect of the resonance circuit, more filtering could be implemented later on in the circuit as well.


Figure 5.28: Transient output signal of the N-path without the resonator (in magenta) and with the resonator (in red).

### 5.2.2 Time-Interleaving the N-path

In the previous design section, it is shown that for any reasonable number of phases $N$ in the N-path, the maximum achieved delay is an order of magnitude smaller than the required $15 \lambda$ ( 1.5 ns for 10 GHz ). To solve this issue, the TI stage is introduced. Although two version of the original N-path was designed, only the TI of the $N=8$ design is carried out. The main underlying reason for this is that the TI increases circuit complexity and simulation time significantly. Since the number of nodes in the TI N-path is directly proportional to the number of interleaved stages $M$, the aforementioned increase in simulation time will limit the size of the circuit implemented; and thus also limit the total achieved delay. However, this design of the TI N-path will still be able to showcase the strength of the interleaving principle and give firm indications that this may be a way forward when creating multiples of $10 \lambda$ in delay. Starting from design one of the original N-path, with $f_{s}=5 \mathrm{GHz}$ and $N=8$, a TI N-path is designed using $M=10$ interleaving stages. The total achieved delay should then, according to eq.4.32, be $t_{M=10}=\frac{10-1}{5 \mathrm{GHz}}=1800 \mathrm{ps}$.

### 5.2.2.1 Buffers and sizing of switches and sampling capacitors

At the in- and output, the TI N-path is similar to the original N -path with $N=$ 8 switches coupled in parallel, the difference being that the N-path is now split (see figure 5.29 ) by the TI stage. In the TI circuit, there are also two different regions with different clock pulse widths as a result of the time expansion. The sample and reconstruction switches operate at $\tau_{1}=25 \mathrm{ps}$, while the inner TI stages operate at $\tau_{2}=200 \mathrm{ps}$.


Figure 5.29: Different clock settle times in TI part versus the fast sampling and reconstruction stages.

The constraints from table 5.8 [31] for the original N-path can be reused for the input and output switches of the TI N-path. However, since the switches are now split and isolated from each other, the approach is to calculate the width of the switches separately. The input parameters are carried over from the previous section as $R_{\text {out }, A 0}=R_{S}=20 \Omega, C_{\text {out }, A 0}=13 \mathrm{fF}$ and $R_{\text {in }, A 3}=1 \mathrm{k} \Omega$. Calculating the left hand side switch, the procedure is exactly the same as for the original N path - starting from the cross charging constraint at input (eq.4.29), the following expression gives the switch resistance $R_{s w 1}$ :

$$
\begin{equation*}
R_{s w 1}=\frac{\Gamma N}{20 R_{o u t, A 0}}-C_{o u t A 0} \tag{5.16}
\end{equation*}
$$

Once $R_{s w 1}$ is found, and the switch is sized accordingly, the capacitance at the input sample is sized by using the low input $R C$ constant constraint formulated in eq.4.28 and specified in table 5.8:

$$
\begin{equation*}
C_{1}=\frac{\tau_{1}}{10\left(R_{S}+R_{s w 1}\right)} \tag{5.17}
\end{equation*}
$$

Sizing the right side will be done without any regard of the values calculated for the left side, because of the splitting of the N-path. The two governing constraints are the high RC output constant from eq.4.28 and the charge sharing at the output as formulated in eq.4.30. Both of these constraints can be rewritten in the case of the TI N-path as:

$$
\begin{equation*}
N \frac{\Gamma}{R_{s w 2}}+C_{i n, A 3}<\frac{C_{3}}{5}, \quad\left(R_{i n, A 3}+R_{s w 2}\right) C_{3}>10 \tau_{1} \tag{5.18}
\end{equation*}
$$

In this case, since $C_{3}$ is not affecting the input, the sizing of the output switch is not limited by the sample capacitor. To minimize chip-area, the output switch was set to minimum width, so that $R_{s w 2}$ was maximized and $C_{3}$ could be scaled down in value and physical size while still achieving the high output $R C$ constraint. The calculated values for the input side switch and capacitor are $R_{s w 1}=$ $32 \Omega$ and $C_{1}=22 \mathrm{fF}$ respectively, while the output side switch and capacitor was sized by $R_{s w 2}=2.9 \mathrm{k} \Omega$ and $C_{3}=57 \mathrm{fF}$. For these settings, the constraints are given in table 5.10.

Table 5.10: Constraints check for the final design of TI N-path.

| Design Aspect | Expression | TI Design |  |
| :--- | :--- | :--- | :--- |
|  |  | Value | Constraint |
| Low RC input | $\left(R_{\text {out }, A 0}+R_{\text {swi }}\right) C_{1}$ | 2456 fs | $<2500 \mathrm{fs}$ |
| High RC output | $\left(R_{\text {in, } A 3}+R_{\text {sw2 }}\right) C_{3}$ | 250 ps | $>250 \mathrm{ps}$ |
| Charge sharing Input | $R_{\text {out }, A 0}\left(C_{\text {out } t, A 0}+N C_{p D 1}\right)$ | 1116 fs | $<1250 \mathrm{fs}$ |
| Charge sharing Output | $C_{i n, A 3}+N C_{p D 2}$ | 1.4 fF | $<11.4 \mathrm{fF}$ |

After the sample stage of $C_{1}$, the TI stage is isolated from the input via the buffer $A 1$. The output impedance $R_{\text {out }, A 1}$ and the switch resistance $R_{s w 2}$ forms an $R C$ constant with the capacitance $C_{2}$ of the inner TI stage which should be compared to the time extended TI clock pulse width of $\tau_{2}=8 \tau_{1}=200 \mathrm{ps}$ in a similar manner to what was done for the input $R C$ constant of the original N-path. If it is assumed that the $A 1$ buffer has the same output impedance as $A 0$, i.e. $R_{\text {out, } A 1} \approx$ $R_{o u t, A 0}$, then the resistance of the switch in the TI part can be sized as $R_{s w 2}=$ $N R_{s w 1}$ since it scales with the settling time of the clock. The final switch value can be adjusted if the output impedance of $A 1$ and $A 0$ does not match. Once $R_{s w 2}$ is found, it will in turn give the value of the capacitance from the following expression:

$$
\begin{equation*}
C_{2}=\frac{\tau_{2}}{10\left(R_{s w 2}+R_{\text {out }, A 1}\right)} \tag{5.19}
\end{equation*}
$$

The buffer topologies of $A 1$ and $A 2$ for the TI N-path implemented in this report are found in [28] and are shown in figure 5.30. Both buffers implement common source inputs, which will ensure that the sample capacitor sees a high impedance during the release event and is effectively isolated from the next stage.


Figure 5.30: Buffer topologies used in the TI N-path[28].

For the buffer $A 1$, seen in figure 5.30.a, the common source input transistor has a gain of $-g_{m 1} R_{L}$ and if loaded by a diode-connected stage with the input admittance of $G_{L}=g_{m 2}$, the structure can be designed to achieve unity gain if $-g_{m 1} / g_{m 2} \approx 1$. To mitigate channel length modulations effects, both M1 and M2 were implemented with non-minimum lengths. Apart from the unity gain criteria, the width of M1 and M2 were also balanced so that the output DC bias was placed in vicinity of the mid supply level of 450 mV . The final transistor sizes and performance parameters are presented in table 5.11.

Table 5.11: Sizing of A1 and A2 buffer with performance parameters.

| Parameters | Buffer A1 | Buffer A2 |
| :---: | :---: | :---: |
|  | Values |  |
| Widths | $\mathrm{w}_{\mathrm{M} 1} / \mathrm{w}_{\mathrm{M} 2}=20 / 16 \mathrm{um}$ | $\mathrm{W}_{\mathrm{Mn}} / \mathrm{w}_{\mathrm{Mp}}=2.4 / 2.4 \mathrm{um}$ |
|  |  | $\mathrm{w}_{\mathrm{M}^{\prime} \mathrm{n}} / \mathrm{w}_{\mathrm{M}^{\prime} \mathrm{p}}=12 / 12 \mathrm{um}$ |
| Lengths | $\mathrm{l}_{\mathrm{M} 1} / \mathrm{l}_{\mathrm{M} 2}=70 / 70 \mathrm{~nm}$ | $1_{\mathrm{Mn}} / \mathrm{l}_{\mathrm{Mp}}=20 / 20 \mathrm{~nm}$ |
|  |  | $\mathrm{l}_{\mathrm{M}^{\prime} \mathrm{n}} / \mathrm{l}_{\mathrm{M}^{\prime} \mathrm{p}}=20 / 20 \mathrm{~nm}$ |
| Gain | 0dB | 5 dB |
| $\mathrm{R}_{\text {out }}$ | $41 \Omega$ | $603 \Omega$ |
| $\mathrm{C}_{\text {in }} / \mathrm{C}_{\text {out }}$ | 18.8/12.0 fF | 7.4/5.7 fF |

The second buffer is implemented using a push-and-pull pair in the form of $M_{n}$ and $\mathrm{M}_{\mathrm{p}}$ in figure 5.30.b. To add switch functionality, $\mathrm{M}_{\mathrm{n}}^{\prime}$ and $\mathrm{M}_{\mathrm{p}}^{\prime}$ switches ON and OFF the supply with the enable pin EN. When the buffer is turned ON, the on-resistance of $M_{n}^{\prime}$ and $M_{p}^{\prime}$ degenerates the source terminal of the $M_{n}$ and $M_{p}$ respectively. Table 5.11 also includes the sizing and performance parameters for the A2 buffer as well. Finally, a self-biased inverter load was placed after the A2 buffer as shown in figure 5.31 to ensure a DC bias around mid supply of 450 m at the output.


Figure 5.31: Inverter loading of the output of the TI part.

### 5.2.2.2 Occurrence of Time Interleaved Spurs

When TI is introduced, the creation of spurious tones from the interleaved stages needs to be taken into account. In this report, the single TI N-path circuit was designed with $N=8$ input and reconstruction stages with an effective sample frequency of $N f_{s}=8 \cdot 5 \mathrm{GHz}=40 \mathrm{GHz}$, each interleaved by $M=10$ stages which in turn has an effective sample frequency of $f_{s}=5 \mathrm{GHz}$. These settings, according to eq.4.33, will generate DC offset related spurs at:

$$
\begin{aligned}
f_{D C o f f s e t, N c l k} & =\frac{k}{8} \cdot 40 \mathrm{GHz}=k \cdot 5 \mathrm{GHz} \\
f_{D C o f f s e t, M c l k} & =\frac{k}{10} \cdot 5 \mathrm{GHz}=k \cdot 500 \mathrm{MHz} \\
k & =0,1,2,3, \ldots
\end{aligned}
$$

Further, another set of spurs related to gain error and time skewing are also created at frequencies described in eq.4.34 and eq. 4.35 respectively. The worst case scenario is if all of the spurs overlap, which happens when the input frequency $f_{R F}$ is equal to 10 GHz , then the resulting spurs should end up at:

$$
\begin{gathered}
f_{\text {TimeSkewing,Nclk }}=f_{\text {Gerror,Nclk }}= \pm 10 \mathrm{GHz}+\frac{k}{8} \cdot 40 \mathrm{GHz}= \pm 10+5 \mathrm{GHz} \\
f_{\text {TimeSkewing,Mlck }}=f_{\text {Gerror,Mclk }}= \pm 10 \mathrm{GHz}+\frac{k}{10} \cdot 5 \mathrm{GHz}= \pm 10+k \cdot 500 \mathrm{MHz}
\end{gathered}
$$

The output frequency spectra is simulated for the worst case scenario described in the equation above and shown in figure 5.32. The spurs at 5 GHz and 15 GHz are the largest spurs apart from DC and the input signal at 10 GHz . There is also spurs separated by 500 MHz mostly visible at lower frequencies which can be attributed to the interleaved clock.


Figure 5.32: Simulated frequency content of the output signal of TI N -path with an input signal power of -20 dBm

### 5.2.3 Branched Time-Interleaved N-path

The branched TI N-path implements a second TI stage, splitting up the first interleaved stage into two parts in the same way as the single TI stage split up the original N-path (see figure 5.33). Since section 5.2.1 covers the design of the high frequency sample and reconstruction part, and section 5.2 . 2 shows how an additional interleaving step was added, this section will be kept shorter.


Figure 5.33: The principle of introducing a second TI stage, effectively splitting the already interleaved N -path once again.

In this work, the double TI is implemented using the following number of stages: $N=8, M=5$ and $K=8$. Sizes of switch and sample capacitor in the second TI stage are kept the same size as the previous TI stage (from section 5.2.2). The buffer $A 3$ which isolates the sampled state at $C_{3}$ from the output and also acts as the release switch for the second TI stage, is copied from the topology used for A2 (figure 5.30.b). The splitting is introduced at the output node of the second TI stage so that two outputs are formed, creating two delays. These are separated
by the tuning of the first TI stage and the output reconstruction stage as shown in figure 5.34


Figure 5.34: Showcase of the implemented branching structure. One RF input is sampled by the N -path sample stage, interleaved twice and branched after the second TI into two reconstructed RF outputs.

This setup should achieve a maximum delay as described in eq.4.36:

$$
t_{\text {maxTIdouble }}=M(K-1) \frac{1}{f_{s}}=5 \cdot(8-1) \frac{1}{5 \mathrm{GHz}}=7 \mathrm{~ns}
$$

Further, the percentage of switch stages reduced due to the branching technique compared to if the same circuit was implemented in a parallel fashion can be calculated as (using $N=8, M=5$ and $K=8$ ):

$$
\% \text { Reduced }=\frac{\text { Switches }_{\text {Branched }}}{\text { Switches }_{\text {Parallel }}}=\frac{N+N M+2 N M K+2 M N+2 N}{(2 N+2 N M+2 K M N) \cdot 2}=46.7 \%
$$

## Results and Discussion

This chapter starts with the results of the TTD circuits themselves, where the passive and active TTD are covered separately. The TTD circuits were not implemented into an entire SIC system to simulate the cancellation. However, the theoretical cancellation that a SIC system could achieve using the investigated TTD circuits can be estimated using the simulation results from the TTD circuits. In the later part of this chapter, the cancellation from the entire hybrid SIC is estimated together with discussion about the entire SIC-system.

### 6.1 True-Time Delay Performance for Active and Passive Techniques

The results from the passive binary weighted TTD and the active N-path TTD are presented separately, but with focus on the same parameters. The results of for these parameters are simulated and presented for both TTD circuits and will be used to estimate the cancellation of the entire hybrid SIC in the next section.

The delay variation $T d_{v a r}$ is defined as the difference in delay between the center frequency at 10 GHz and the edges of the bandwidth at 9.95 GHz and 10.05 GHz . The delay resolution $T d_{\text {res }}$ is defined as the difference in delay between each level at 10 GHz , i.e. the in-band carrier width. When tuning the TTD circuit to a certain delay time, the biggest difference between the targeted delay and the delay generated by the TTD that can occur, happens if the targeted delay is in the middle of two delay levels. The difference between the targeted delay and the delay generated by the TTD is also increased at the edges of the bandwidth by the delay variation $T d_{\text {var }}$. Therefore, the delay error $T d_{\text {err }}$ is defined as

$$
\begin{equation*}
T d_{e r r}=\frac{T d_{r e s}}{2}+T d_{v a r} \tag{6.1}
\end{equation*}
$$

These parameters will be different for all the levels and the root mean square (rms) of all values can be calculated to show the mean value. It can also be of interest to know what the maximum delay error of all the levels is since this will show the worst performance of the TTD.

For the amplitude, there will also be some variation over the bandwidth $A_{\text {var }}$, which is defined in the same manner as for the delay. The TTD will have different losses for all its different levels. However, amplitude error $A_{\text {err }}$ does not depend on the amplitude deviation between these levels since a tunable attenuator is used to correct for this. The amplitude error is instead dependent on the resolution of the tunable attenuator $A_{\text {res }}$. But since a tunable attenuator is not designed in this report, a reasonable amplitude resolution of a tunable attenuator is assumed in order to estimate the cancellation. The maximum amplitude error found over the BW also depends on the amplitude variation, and the total amplitude error is therefore found as

$$
\begin{equation*}
A_{\text {err }}=A_{\text {res }}+A_{\text {var }} \tag{6.2}
\end{equation*}
$$

### 6.1.1 Results of the Passive Binary Weighted TTD

The entire passive TTD, consisting of the binary weighted delay chain and the $L R-$ $R L$ fine tuning filter, was simulated for all tuning levels. This TTD circuit is tuned using in total 10 bits, which corresponds to 1024 levels. All of these levels where simulated over frequency as shown in figure 6.1. The TTD covers a range of delays between 1661-62 ps, which is a span of nearly $16 \lambda$. A sample range between $850-800 \mathrm{ps}$ is showed in figure 6.2 in order to showcase the difference between each level. It is seen that the separation between the levels, the delay resolution $T d_{r e s}$, differs over the delay span. The delay also differs over frequency with increasing delay for higher frequency. This is because the phase shift of the lumped $L C$ transmission lines and the $L R-R L$ lattice filter increases with frequency faster than the period time decreases with frequency.


Figure 6.1: Delay for the passive TTD circuit for all levels vs frequency


Figure 6.2: Zoomed in view of delay vs frequency for the passive TTD circuit

The delay as a function of level shows that the delay tuning is very linear when showed for all levels in figure 6.3. However, at a closer in figure 6.4 look it is seen that the delay does not change entirely linear.


Figure 6.3: Delay GHz vs level at 10 for the passive TTD circuit


Figure 6.4: Zoomed in view of delay vs level at 10 for the passive TTD circuit

The delay error is calculated and plotted over all frequencies along with the root mean square (rms) in figure 6.5. The maximum delay error is 3.20 ps and the rms is 1.77 ps . A closer look is showed in figure 6.6 where it is seen that the delay error fluctuates in a repeating pattern. The further spaced peaks are spaced with 32 levels apart and are therefore likely caused by the fifth smallest delay block $\left(2^{5}=32\right)$, which is the $\lambda / 4$-delay block. Repeatedly, it seems like the delay is gradually increasing for about 8 levels before being quickly decreased the next level. This indicates that it is caused by the $L R-R L$ lattice fine tuning, since it has 3 bits of tuning corresponding to 8 levels. The cause for this is likely the non-linear behavior of the fine tuning as noted in the design.


Figure 6.5: Delay error for the passive TTD circuit at 10 GHz vs level with the rms of all levels


Figure 6.6: Zoomed in view of delay error vs level at 10 for the passive TTD circuit

The main results concerning the delay of the passive TTD circuit are summarized in table 6.1.

Table 6.1: Summary of the delay results for the passive TTD

|  | $T_{d, r e s}$ | $T_{d, \text { var }}$ | $T_{d, \text { err }}$ | Delay Range [Min/Max] |
| :--- | :--- | :--- | :--- | :---: |
|  | 1.76 | 0.95 | 1.77 | $62.3 \mathrm{ps} / 1661.3 \mathrm{ps}$ |
| Worst [ps] | 4.13 | 1.38 | 3.20 |  |

The signal should ideally have a constant amplitude after passing through the delay circuit, i.e. not have any amplitude error. For the passive TTD, the loss is different for different levels and also varies over frequency as shown in figure 6.7.


Figure 6.7: Loss for all the levels vs frequency for the passive TTD circuit

The loss at 10 GHz as a function of level is shown in figure 6.8, where it is seen that the loss is fluctuating around the rms of 39.17 dB . In figure 6.9 the loss is zoomed in on level 100-200. The loss is peaking for some levels of above 43 dB , with the maximum loss being 43.23 dB .


Figure 6.8: Loss for the passive TTD circuit vs level with the rms of all levels


Figure 6.9: Zoomed in view of Loss vs level for the passive TTD circuit

The amplitude variation is plotted for all levels along with the rms in figure 6.10 and zoomed in at level 100-200 in figure 6.11. The amplitude variation is peaking for some levels at around 0.4 dB , and with a maximum amplitude variation of 0.46 dB . The rms is however only half of that at 0.23 dB .


Figure 6.10: Amplitude variation for the passive TTD circuit vs level with the rms of all levels


Figure 6.11: Zoomed in view of amplitude variation vs level for the passive TTD circuit

The main results concerning the amplitude of the passive TTD circuit are summarized in table 6.2. If an antenna isolation of about 60 dB is assumed, the loss from the passive TTD circuit is still on average about 20 dB below that, which means that about 20 dB of attenuation is to be achieved by the tunable attenuator in the SIC path. The span between the maximum and minimum loss is $43.23-36.77$ $\mathrm{dB}=6.46 \mathrm{~dB}$, which means that this is the needed tuning range for the tunable attenuator.

Table 6.2: Summary of the amplitude results for the passive TTD

|  | $A_{\text {var }}$ | Loss [Min/Max] | Loss [Rms] |
| :--- | :---: | :---: | :---: |
| Rms [dB] | 0.23 | $36.77 / 43.23$ | 39.17 |
| Worst [dB] | 0.46 |  |  |
|  |  |  |  |

The linearity of the passive TTD mostly depends on the first biggest binary weighted delay block since here the signal power is the highest. In figure 6.12 and 6.13 the third order intercept point (IP3) is showed for a level with the $8 \lambda$ delay block turned on and turned off. The OIP3 is significantly lower for the level with the $8 \lambda$ delay block turned off. This is caused by the bypass switch in the $8 \lambda$ delay block which has to handle a much higher signal power than the rest of the delay chain. Since the bypass path was designed to have as much attenuation as the delay path, the transistors were made quite small in order to cause this amount of loss. Therefore, transistors causes a lot of non-linearity. When the $8 \lambda$ delay block is instead turned on, the linearity is much better because no series switches are used in the delay path. In future work, the linearity could be further improved by creating the attenuation of the bypass path some other way, and avoid the non-linearity from the transistors.


Figure 6.12: IP3 for the passive TTD circuit with the $8 \lambda$-delay block turned on


Figure 6.13: IP3 for the passive TTD circuit with the $8 \lambda$-delay block turned off

The simulated noise figure and loss of the passive true-time delay is shown in figure 6.14 for the same two levels as for the linearity with the $8 \lambda$ delay block turned on and turned off. The noise is slightly higher for the OFF-state, which is explained by the series switch in the bypass path. The noise is seemingly very high for both cases with a noise figure of almost 40 dB . This is due to the loss of the TTD circuit which in theory will always be smaller or equal to the noise figure according to eq.3.1. However, the loss is higher than the noise figure for both levels, which does not conform with theory. This is explained by the mismatches in the TTD causing return loss which makes it possible for the loss to be slightly larger than the noise figure. Since the noise figure is about the same as the loss, the noise figure for the passive TTD is only due to resistive losses, and the only noise injected to the $R x$ is the thermal noise.


Figure 6.14: Noise figure and loss for the passive TTD circuit with the $8 \lambda$ delay block turned on and turned off

A summary of the linearity and noise performance of the passive TTD is shown in table 6.3.

Table 6.3: Summary of linearity and noise for the passive TTD

|  | $8 \lambda$ ON | $8 \lambda$ OFF |
| :--- | :--- | :--- |
| IIP3 [dBm] | 45.5 | 26.1 |
| OIP3 [dBm] | 4.7 | -14.4 |
| NF [dB] | 39.4 | 39.9 |

### 6.1.2 Results of the TI N-path TTD

To reduce the simulation time, the result for the TI N-path is structured into two parts. First, the coarse tuning mechanism is stepped through to capture the delay and amplitude variation across the full delay range. Secondly, a fine tuning part with fixed coarse tuning settings is carried out to calculate delay resolution. To further reduce the simulation time, the LC tank and the output buffer input capacitance was removed during the delay simulation which reduced the settling time of the initial Periodic-Steady-State (PSS) transient significantly.

Starting with the coarse delay tuning across the full delay range which is presented in figure 6.15. In this part the stepping is based on the reprogramming of the TI and the output reconstruction clock, which results in a final coarse tuning step of $\tau_{1}=25 \mathrm{ps}$. The clock programming starts with the TI clock $\Psi_{d}$ which determines the coarsest setting in steps of 200 ps. Then, the output reconstruction clock $\Phi_{d}$ sets the next coarse setting in steps of 25 ps . Along with the delay, the simulated gain across the full delay range is presented in figure 6.16. There is an increase in amplitude when the lowest TI clocks are chosen, which will lead to overall increased range of loss.


Figure 6.15: Full delay range of the N-path using coarse tuning. The $\Psi_{d}$ clock sets coarse tuning by steps of 200ps and clock programming sets medium tuning in steps of 25ps.


Figure 6.16: Amplitude of the output signal for the coarse tuning settings. Note the discrepancy of the lowest TI clock $\left(\Psi_{d}\right)$ highlighted in the figure.

In figure 6.17 the delay at 10 GHz is plotted against the all the setting levels. It is seen that the full range, from 22 to 1772 ps is covered by steps of 25 ps using the coarse tuning. The 3 ps discrepancy in the minimum and maximum delay is attributed to small clock offsets which is introduced to ensure no clock overlapping. The delay variation for each of the coarse tuning levels is presented in figure 6.18 and it shows that rms variation is 0.98 ps with a maximum value of 2 ps.


Figure 6.17: Delay at 10 GHz for the TI N-path with all the coarse tuning levels.


Figure 6.18: Delay variation over the 100 MHz band width at 10 GHz for all coarse tuning levels

The gain across all coarse tuning stages are summarized and presented in figure 6.19 along with the amplitude variation over the 100 MHz bandwidth in figure 6.20. Since the maximum gain is -0.33 dB and the minimum gain is -4.86 dB , the tunable attenuator should be have a tuning span of 4.53 dB , in order to level out the amplitude deviation. The impact of the amplitude increase for the lowest $\Psi_{d}$ clock setting is noticeable in these figures and it is the main contribute to the large gain span.


Figure 6.19: Gain for all coarse tuning levels


Figure 6.20: Amplitude variation of the output signal for all coarse tuning levels.

By introducing a small delay to the output reconstruction clock $\Phi_{d}$ of the TI Npath the fine tuning of the delay can be achieved. This is shown in figure 6.21 in which the clock is stepped through a 10 ps span, centered at the coarse setting of 900 ps (the fourth TI setting and the fourth reconstruction clock setting) using 1 ps increments. In figure 6.22 , the delay 10 GHz is plotted for each level.


Figure 6.21: Showcase of the fine tuning mechanism in which a delay is applied to the output reconstruction clock to tune the delay achieved by the TI N -path.


Figure 6.22: Delay at 10 GHz for each step in the fine tuning example.

In figure 6.23 it seen that the 1 ps increment introduced in the output clock is directly related to the resolution of the whole TI N-path. In other words, the tunability is determined by the clock generation of the output and input N-path sample and reconstruction clocks.


Figure 6.23: Simulated delay resolution of the TI N -path for 1 ps tuning increment.

The presented delay and loss characteristic for the TI N-path is summarized and presented in table 6.4 and 6.5.

Table 6.4: Summary of the delay results for TI N-path. The resolution was calculated from the fine tuning simulations and the variation was calculated across the entire delay range using the coarse tuning steps.

|  | $T_{d, \text { res }}$ | $T_{d, v a r}$ | $T_{d, e r r}$ | Delay Range [Min/Max] |
| :---: | :--- | :--- | :--- | :---: |
| Rms [ps] | 1.00 | 0.98 | 1.49 | $22.4 \mathrm{ps} / 1772.0 \mathrm{ps}$ |
| Max [ps] | 1.00 | 2.01 | 2.01 |  |

Table 6.5: Summary of the output amplitude of the TI N-path across the entire tuning range.

|  | $A_{\text {var }}$ | Gain [Min/Max] | Gain [Rms] |
| :---: | :---: | :---: | :---: |
| Rms [dB] | 0.05 | $-0.33 \mathrm{~dB} /-4.86 \mathrm{~dB}$ | -4.00 dB |
| Max [dB] | 0.16 |  |  |

The 1dB compression point the TI N-path was simulated using PSS and is presented in figure 6.24 for both with the LC resonator and without it. The output and input buffer OIP3 was set to 99 dBm . It is seen that when the resonator is not used, the compression curve is shifted up and the 1 dB compression point is increased from -13.62 dBm to -9.07 dBm due to the added passive gain of the resonator. Further, the noise figure of the circuit is presented in figure 6.25 which shows a repetitive dependency on frequency. For this simulation, the input buffer and output buffer noise figures where both set to 3 dB . Around 10 GHz a minima
is seen, and the integrated noise figure over the 100 MHZ band with is calculated to be 4.06 dB without the LC resonator. When the resonator is added, the noise figure is degraded to 6.35 dB . Note that both the compression and the noise figure was simulated for one delay setting in the middle of the 1800 ps delays range.


Figure 6.24: Simulated large signal compression curve of the output power for the TI N path. The black marker marks the 1dB compression point at -9.07 dB when the LC is not implemented.


Figure 6.25: Simulated noise figure for the TI N-path with and without the LC resonator at the output. The LC tank increases the integrated NF of from 4.06 dB to 6.35 dB in the bandwidth of 9.9510.05 GHz .

### 6.2 Showcasing high delay with Branched Double TI N-path

The simulation results for the branched double TI N-path will be presented in a different manner compared to the results of the single TI N-path as a result of complications with the PSS and PAC simulations. Apart from being time consuming, these periodic state simulations are notoriously difficult to run with low frequency clocks operating orders of magnitude lower than the input RF frequency due to excessive number of harmonics needed. Instead, the simulations in this part is focused on showcasing the clock programming and the total delay achieved.

In figure 6.26 a transient simulation is shown where the input is fed with a 10 GHz single tone input with a delay of 25 ns . The measured delay at the output of branch one (in red) and branch two (in blue) is 5600 ps and 6150 ps respectively. Note the that the amplitude rise time of the first waveform periods is due to the charging of the LC resonator.


Figure 6.26: Transient simulation of Branched TI N-path. Both outputs are shown to achieve different delays of 5600ps and 6150ps respectively.

Continuing on with the clock programming. The TI2 release clock was programmed to cycle 6 times ( $5 \tau_{3}$ to $6 \tau_{3}$, and +400 ps in clock offset), which gives a span of 5.4 ns to 6.4 ns for the programmed delay. Within this, the tuning of the separate TI1 clocks and reconstruction stage determines the exact delay for each branch as shown in figure 6.27.


Figure 6.27: Clock timing used to achieve the delay presented in figure 6.26.

### 6.3 Estimated Performance of the Combined Hybrid SIC

In this section the entire hybrid SIC, which consists of both the passive binary weighted pre-LNA SIC and the active TI N-path post-LNA SIC, is benchmarked. The results for the TTD circuits used for these SIC:s was presented in the previous section. However, in a real implementation of an SIC a tunable attenuator is required for matching the amplitude level of the SIC signal to the amplitude level of the SI. Each injection path provides a cancellation depending on both delay error and amplitude error as shown in figure 3.9. The amplitude error depends on the resolution of the tunable attenuator, thus, in this chapter an amplitude resolution of 0.1 dB is assumed to estimate the cancellation.

To calculate the theoretical cancellation, the delay error and amplitude variation across the bandwidth is simulated for one delay level. Levels with values close to the rms are chosen in order to be representative for the entire delay range. The total amplitude error is given by adding the amplitude variation to the amplitude resolution of the variable attenuator. Once amplitude and delay error is summarized, the cancellation is simulated for 11 frequencies across the bandwidth. These results are shown in figure 6.28 for both the passive binary weighted preLNA SIC and the TI N-path post-LNA SIC.


Figure 6.28: Estimated cancellation across the entire 100 MHz BW for the passive pre-LNA SIC, the TI N-path post-LNA SIC using 1 ps and 2 ps of clock resolution and the combined pre- and post-LNA SIC using 1 ps of clock resolution

The rms cancellation over the 100 MHz BW is estimated as 18.64 dB for the passive pre-LNA SIC and 19.53 dB for the active post-LNA SIC. If these two SICs where to target the same DP SI, with delay up to 1662 ps , the total rms cancellation would theoretically be 38.17 dB . This is more than the requirement in section 3.5. However, in the simulations, the clocks of the TI N-path are implemented as ideal pulsed voltage sources which has perfect timing and zero time tolerances. For real applications, where the frequency generation circuits are implemented, clock mismatches will degrade the delay error which in turn decreases SI suppression. For example, if a resolution of 2 ps are assumed for the clocks, then the total cancellation would be reduced by 4.1 dB as shown in figure 6.28.

The maximum delay resolution of the passive TTD was found to peak at 3.20 ps , which seemed to occur for a few specific delay levels. This indicates that the maximum delay resolution could be further decreased by fine tuning some of the delay blocks which in turn would increase the minimum cancellation. This way the cancellation would be more consistent over the entire delay span such that no bad delay levels would exist.

By adding multiple parallel TTD circuits in the passive pre-LNA SIC, centered at different frequencies over the BW, the rms cancellation over the BW could be further increased.
The simulated NF for each TTD was used to calculate the noise power injection for both pre- and post-LNA. This was used together with the Rx specifications to calculate the total NF degradation for the entire Rx chain according to appendix A.1. The additional NF of the injection points and the total NF degradation is shown in table 6.6.

Table 6.6: Noise figure for the injection points and the total noise figure degradation of the Rx chain

| NF of pre-LNA inj. | NF of post-LNA inj. | Total NF deg. |
| :--- | :--- | :--- |
| 3.01 dB | 0.02 dB | 1.52 dB |

Injection of non-linear distortions from the SIC where targeted to fall 15 dB below the noise floor of the injection point to minimize the increase of the noise floor. The OIP3 simulated for each TTD circuit is used to calculate the IMD3 at the injection points using eq.A. 3 with the specifications for the Tx power of 23 dBm and antenna isolation of 60 dBm . It is found that the IMD3 levels at the injection points are roughly -30 dBm and -67 dBm for the pre- and post-SIC respectively. These two power levels are significantly higher than 15 dB below noise floor.

The linearity of the passive binary weighted TTD could be improved by designing the bypass switches for higher Tx power. For this design the bypass switches were designed to match the signal loss of the delay paths by decreasing the gate width for the transistors. By attenuating the signal through the bypass path in some other way so that the signal loss over the transistors is not that high, nonlinear distortions could be avoided.

One of the targets for the hybrid SIC is achieving the high delay range that is required to suppress RP SI (10s of $\lambda$ ). Due to time limitations, The single TI Npath was only designed up to 1800 ps in maximum delay range. It is expected that a scaled up version with higher delays, targeting RP SI, would achieve similar cancellation over the bandwidth. However, this is not verified by simulations. Furthermore, to achieve wideband cancellation of the RP SI, several delays would be required due to clusters of multi-surface reflections. This, in addition to high delay range, can be achieved using Branched Double TI N-path. This was showcased in section 6.2, where two branches where created achieving delays of 5600 ps and 6150 ps respectively. The full design and simulations of a double TI Npath utilizing several branches would require further work. In such a project, one design aspect is the complex clock generation structure and the resulting spurs from the interleaving stages. First and foremost, a better way of determining and analysing the spurious frequencies of a double TI circuit should be created. Further, careful choosing of the number of stages could mitigate, or at least reduce, the creation of high power in-band spurs by a careful clock frequency plan.

## Summary and Conclusion

In this thesis the authors investigate TTD generation for RF SIC to be used within FD systems operating at 10 GHz . Analysis of the interference paths, noise injection and linearity requirements motivates the proposed hybrid solution which contains a passive pre-LNA SIC and an active post-LNA SIC. For use at pre-LNA injection, a passive binary weighted topology was designed using lumped LC transmission line filters and an $L R-R L$ lattice filter with tunable inductors to target the DP SI with a delay of up to $1661 \mathrm{ps}(>15 \lambda)$. The passive binary weighted TTD pre-LNA SIC achieved a rms cancellation of 18.64 dB and the active TI Npath TTD post-LNA SIC achieved a rms cancellation of 19.53 dB . To target the longer delayed RP SI signals, the N-path circuit was investigated for post-LNA injection. It was found that introducing TI techniques vastly increased the delay capabilities of the circuit and a TI N-path achieving a delay range of 22 ps to 1772 ps was showcased. Further, a branching structure of the TI N-path was used to enable the generation of multiple long delays at a reduced chip-area. Such a circuit could be used to cancel complex multi-surface reflections. Although time restrictions limited the size of the designed TI N-path, the straight forward scalability of the circuit makes it a good candidate for creation of longer delays.

Together, the entire proposed hybrid SIC was estimated to provide a cancellation of 38.17 dB for the DP SI. An overall noise figure degradation of 1.52 dB was estimated, which is reasonable compared to other state-of-the-art RF SIC:s. However, neither of the circuits met the stringent linearity required for a Tx power of 23 dBm . Thus, it is recommended for future work to focus on increasing the linearity for both circuits to enable higher Tx output power.

## Bibliography

1. Ericsson Mobility Report tech. rep. (Ericsson AB, November 2023).
2. Auktioner, 3.5 GHz -bandet tech. rep. (Post- och Telestyrelsen, 2022).
3. Sabharwal, A. et al. In-Band Full-Duplex Wireless: Challenges and Opportunities. IEEE Journal on Selected Areas in Communications 32, 1637-1652 (2014).
4. Ahmed, E., Eltawil, A. M. \& Sabharwal, A. Rate Gain Region and Design Tradeoffs for Full-Duplex Wireless Communications. IEEE Transactions on Wireless Communications 12, 3556-3565 (2013).
5. Kolodziej, K. E., Perry, B. T. \& Herd, J. S. In-Band Full-Duplex Technology: Techniques and Systems Survey. IEEE Transactions on Microwave Theory and Techniques 67, 3025-3041 (2019).
6. Zhang, T., Su, C., Najafi, A. \& Rudell, J. C. Wideband Dual-Injection Path Self-Interference Cancellation Architecture for Full-Duplex Transceivers. IEEE Journal of Solid-State Circuits 53, 1563-1576 (2018).
7. Zhang, Z., Long, K., Vasilakos, A. V. \& Hanzo, L. Full-Duplex Wireless Communications: Challenges, Solutions, and Future Research Directions. Proceedings of the IEEE 104, 1369-1409 (2016).
8. Khaledian, S., Farzami, F., Smida, B. \& Erricolo, D. Inherent self-interference cancellation at 900 MHz for in-band full-duplex applications, 1-4 (2018).
9. Laughlin, L., Zhang, C., Beach, M. A., Morris, K. A. \& Haine, J. L. Passive and Active Electrical Balance Duplexers. IEEE Transactions on Circuits and Systems II: Express Briefs 63, 94-98 (2016).
10. Everett, E., Sahai, A. \& Sabharwal, A. Passive Self-Interference Suppression for Full-Duplex Infrastructure Nodes. IEEE Transactions on Wireless Communications 13, 680-694 (2014).
11. Nawaz, H. \& Tekin, I. Three dual polarized 2.4 GHz microstrip patch antennas for active antenna and in-band full duplex applications, 1-4 (2016).
12. Snow, T., Fulton, C. \& Chappell, W. J. Multi-antenna near field cancellation duplexing for concurrent transmit and receive, 1-4 (2011).
13. Krishnaswamy, H. et al. Full-duplex in a hand-held device - From fundamental physics to complex integrated circuits, systems and networks: An overview of the Columbia FlexICoN project, 1563-1567 (2016).
14. Yang, X. \& Babakhani, A. A single-chip in-band full-duplex low-IF transceiver with self-interference cancellation, 1-4 (2016).
15. Radunovic, B. et al. Rethinking Indoor Wireless Mesh Design: Low Power, Low Frequency, Full-Duplex, 1-6 (2010).
16. Van den Broek, D.-J., Klumperink, E. A. M. \& Nauta, B. 19.2 A self-interference-cancelling receiver for in-band full-duplex wireless with low distortion under cancellation of strong TX leakage, 1-3 (2015).
17. Bojja-Venkatakrishnan, S., Alwan, E. A. \& Volakis, J. L. Wideband RF and analog self-interference cancellation filter for simultaneous transmit and receive system, 933-934 (2017).
18. Kiayani, A., Anttila, L. \& Valkama, M. Active RF cancellation of nonlinear TX leakage in FDD transceivers, 689-693 (2016).
19. Adams, M. \& Bhargava, V. K. Use of the Recursive Least Squares Filter for Self Interference Channel Estimation in 2016 IEEE 84th Vehicular Technology Conference (VTC-Fall) (2016), 1-4.
20. Anttila, L., Korpi, D., Syrjälä, V. \& Valkama, M. Cancellation of power amplifier induced nonlinear self-interference in full-duplex transceivers, 1193-1198 (2013).
21. Krishnaswamy, H. \& Zhang, L. Analog and RF Interference Mitigation for Integrated MIMO Receiver Arrays. Proceedings of the IEEE 104, 561-575 (2016).
22. Katanbaf, M., Chu, K.-D., Zhang, T., Su, C. \& Rudell, J. C. Two-Way Traffic Ahead: RFAnalog Self-Interference Cancellation Techniques and the Challenges for Future Integrated Full-Duplex Transceivers. IEEE Microwave Magazine 20, 22-35 (2019).
23. Zhang, T., Najafi, A., Su, C. \& Rudell, J. C. 18.1 A 1.7-to-2.2GHz fullduplex transceiver system with $>50 \mathrm{~dB}$ self-interference cancellation over 42MHz bandwidth, 314-315 (2017).
24. Dastjerdi, M. B., Jain, S., Reiskarimian, N., Natarajan, A. \& Krishnaswamy, H. 28.6 Full-Duplex $2 \times 2$ MIMO Circulator-Receiver with High TX Power Handling Exploiting MIMO RF and Shared-Delay Baseband Self-Interference Cancellation, 448-450 (2019).
25. Nagulu, A. et al. 6.6 Full-Duplex Receiver with Wideband Multi-Domain FIR Cancellation Based on Stacked-Capacitor, N-Path Switched-Capacitor Delay Lines Achieving $>54 \mathrm{~dB}$ SIC Across 80 MHz BW and $>15 \mathrm{dBm}$ TX Power-Handling. 64, 100-102 (2021).
26. Nagulu, A. et al. A Full-Duplex Receiver With True-Time-Delay Cancelers Based on Switched-Capacitor-Networks Operating Beyond the Delay-Bandwidth Limit. IEEE Journal of Solid-State Circuits 56, 13981411 (2021).
27. Friis, H. Noise Figures of Radio Receivers. Proceedings of the IRE 32, 419-422 (1944).
28. Forbes, T. et al. A $0.2-2 \mathrm{GHz}$ Time-Interleaved Multistage SwitchedCapacitor Delay Element Achieving 2.55-448.6 ns Programmable Delay Range and $330 \mathrm{~ns} / \mathrm{mm} 2$ Area Efficiency. IEEE Journal of Solid-State Circuits 58, 2349-2359 (2023).
29. Zobel, O. US patent 1792 523. https: / /patents.google.com/ patent/US1792523A/en(1931).
30. Webb, K. Second-Order Filters, ENGR 202 - Electrical Fundamentals II tech. rep. (Oregon State University).
31. Zolkov, E. \& Cohen, E. A $0.2-3-G H z$ N-Path True Time Delay Circuit Achieving $<1 \%$ Delay Variation Over Frequency. IEEE Transactions on Microwave Theory and Techniques 70, 3224-3233 (2022).
32. Manganaro, G. \& Robertson, D. Interleaving ADCs: Unraveling the Mysteries. Analog Dialogue 49 (2015).
33. Garakoui, S. K., Klumperink, E. A. M., Nauta, B. \& van Vliet, F. E. Compact Cascadable g m -C All-Pass True Time Delay Cell With Reduced Delay Variation Over Frequency. IEEE Journal of Solid-State Circuits 50, 693-703 (2015).
34. Christensen, R. 039: Allpass Systems: Phase and Time Delay tech. rep. (Acculution, 2017). https://www. acculution. com/single-post/039-allpass-phase-and-time-delay.
35. Ghione, G. \& Pirola, M. Microwave Electronics (Cambridge University Press, 2017).
36. Karki, J. Calculating noise figure and third-order intercept in ADCs in (2005). https://api. semanticscholar.org/CorpusID: 15277998.
37. Razavi, B. RF Microelectronics (2nd Edition) (Prentice Hall Communications Engineering and Emerging Technologies Series) 2nd. ISBN: 0137134738 (Prentice Hall Press, USA, 2011).

## Appendix

## A. 1 RF SIC Block Level Calculations

In this subsection, the method by which the $R x$ performance was calculated is explained. Starting with the noise floor equation. There will be some thermal noise distributed over the bandwidth of the channel that will give some noise floor power level at the Rx input. This is given in $\mathrm{dBm} / \mathrm{MHz}$ as [35]:

$$
\begin{equation*}
N_{t h}=10 \log \left(\frac{k T \cdot 1 \mathrm{MHz}}{1 \mathrm{~mW}}\right)+10 \log (B W[\mathrm{MHz}])=-113.8 \mathrm{dBm}+B W[\mathrm{dBm}] \tag{A.1}
\end{equation*}
$$

Where $k$ is the Boltzmann constant in $\mathrm{m}^{2} \mathrm{~kg} \mathrm{~s}^{-2} \mathrm{~K}^{-1}, T$ is the temperature (assumed to be 300 K ) and $B W$ the channel bandwidth given in MHz

In terms of non-linear components, this analysis focuses on the third-order intermodulation product $I M D_{3}$. Assume a stage with a third-order output referred intercept point $O I P 3_{\text {stage }}$ that sees two tones at the input, each with an power equal to $P_{I N} / 2$. The inter modulations $I M D_{3}$ created from this stage can be related in the following expression [36]:

$$
\begin{equation*}
O I P 3_{\text {stage }}[\mathrm{dBm}]=\left(P_{I N}[\mathrm{dBm}]-3 \mathrm{~dB}\right)-\frac{I M D_{3}[\mathrm{dBc}]}{2} \tag{A.2}
\end{equation*}
$$

In the equation above, the $I M D_{3}$ product is given in dBc , which is related to the carrier $\left(P_{I N}\right)$. The expression can be reformulated to give the $I M D_{3}$ levels in dBm instead. The equation is then rearranged into the following expression:

$$
\begin{equation*}
I M D_{3}[\mathrm{dBm}]=P_{I N}[\mathrm{dBm}]-2\left(O I P 3_{\text {stage }}[\mathrm{dBm}]-\left(P_{I N}[\mathrm{dBm}]-3 \mathrm{~dB}\right)\right) \tag{A.3}
\end{equation*}
$$

Note that this does not take into account previously generated intermodulations created from earlier stages. In the final calculations, the term $G_{s t a g e}+I M D_{3, I N}$ is added to correct this.

Moving on, to track the noise floor in the system, it is required to know how much each stage raises the noise floor. If the input noise power is called $N_{I N}$ and the stage has some gain, the input noise power will simply be increase by the gain of
the stage at the output. The second part is the noise introduced on top of this by the stage itself. Fortunately, the definition of the NF of the stage is just this (given that the NF is refereed to the same level as the input noise). Summarizing all of this in an expression:

$$
\begin{equation*}
N_{\text {OUT }}=N_{I N}+G_{\text {stage }}+N F_{\text {stage }} \tag{A.4}
\end{equation*}
$$

Finally, Frii's formula for cascaded stages was used to calculate the cascaded noise figure NF. For a two block chain, the formula can be written as[37]:

$$
\begin{equation*}
N F=N F_{1}+\frac{N F_{2}-1}{G_{1}} \tag{A.5}
\end{equation*}
$$

Where $G_{1}$ and $N F_{1}$ is the gain and noise figure of the first stage respectively and $N F_{2}$ is the noise figure of the second stage. All terms are in linear scale.

With equations A. 3 and A. 4 above, the noise power level and $I M D_{3}$ power level is tracked throughout the system by stepping through each block as pictured below in figure A. 1


Figure A.1: Visualisation of the looped calculation method for the system calculations.

In the figure, some noise and $I M D_{3}$ power is assumed at the input, as well as some input power. Equation A. 3 takes the input power and $I M D_{3}$ input power and calculates the output power level of the $I M D_{3}$ components. Note that the amplified input power $\left(I M D_{3}+G_{1}\right)$ is also added to the output. The output noise power in equation A. 4 will take the input noise power and add the gain and the NF of the stage to increase the noise floor. Cascaded noise figure is also calculated using equation A. 5 during the looping. In figure A. 2 below the process of calculating the non-idealities is shown.


Figure A.2: Flow chart displaying the method by which the performance degradation parameters was calculated.

