

## LUND UNIVERSITY

#### A low complexity threshold detector making MLSD decisions in a multiuser environment

Nilsson, Rickard; Sjöberg, Frank; Edfors, Ove; Ödling, Per; Eriksson, Håkan; Wilson, Sarah Kate; Börjesson, Per Ola

Published in: Proc. IEEE Vehicular Technology Conference

DOI: 10.1109/VETEC.1998.686590

1998

Link to publication

Citation for published version (APA): Nilsson, R., Sjöberg, F., Edfors, O., Ödling, P., Eriksson, H., Wilson, S. K., & Börjesson, P. O. (1998). A low complexity threshold detector making MLSD decisions in a multiuser environment. In *Proc. IEEE Vehicular Technology Conference* (Vol. 1, pp. 333-337). IEEE - Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/VETEC.1998.686590

Total number of authors:

#### General rights

Unless other specific re-use rights are stated the following general rights apply:

- Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the
- legal requirements associated with these rights

· Users may download and print one copy of any publication from the public portal for the purpose of private study or research.

- You may not further distribute the material or use it for any profit-making activity or commercial gain
   You may freely distribute the URL identifying the publication in the public portal

Read more about Creative commons licenses: https://creativecommons.org/licenses/

#### Take down policy

If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim.

LUND UNIVERSITY

**PO Box 117** 221 00 Lund +46 46-222 00 00

# A Low Complexity Threshold Detector Making MLSD Decisions in a Multiuser Environment

Rickard Nilsson\*, Frank Sjöberg\*, Ove Edfors<sup>†</sup>, Per Ödling\*,

Håkan Eriksson<sup>‡</sup>, Sarah Kate Wilson<sup>\*</sup>, Per Ola Börjesson<sup>\*</sup>

\*Luleå University of Technology, Div. of Signal Processing, SE\_971 87 Luleå, Sweden <sup>†</sup>Lund University, Dept. of Applied Electronics, SE-221 00 Lund, Sweden <sup>‡</sup>Ericsson Mobile Communications AB, Nya vattentornet, SE-221 83 Lund, Sweden

Abstract— In this paper we present a pipelined implementation of an iterative threshold detector, which makes the same decisions as a maximum likelihood sequence detector (MLSD) on some, but not necessarily all, bits. We apply this as a first stage detector in a direct sequence code division multiple access (DS-CDMA) interferencelimited multiuser system. In combination with the singleuser matched filter (MF) detector we obtain improved performance and lower complexity than with the decorrelating receiver for a limited number of simultaneous users, e.g. up to 25 users with a spreading factor of 127.

#### I. INTRODUCTION

The direct sequence code division multiple access (DS-CDMA) multiuser maximum-likelihood sequence detector (MLSD) [1] is known to be very complex. Therefore, suboptimal detectors exhibiting lower complexity are often proposed. Examples of these are the single-user matched filter We applied the pipelined Delta detector in combination with different second stage detectors in a DS-CDMA uplink environment using pseudo-noise (PN)-sequences with 127 chips per bit over Rayleigh fading channels. The singleuser MF receiver [2] and the successive non-decision directed interference canceller (NDDIC) [4] were both applied as the second receiver stage.

#### II. THE DELTA DETECTOR

In this section we briefly describe the detection algorithm presented in [7] and [8, 9]. Consider a binary modulated data sequence  $\mathbf{b} \in \{-1, +1\}^P$  transmitted through a known linear channel with additive Gaussian noise. In matrix notation we can represent this as

$$\mathbf{y} = \mathbf{H}\mathbf{b} + \mathbf{n},\tag{1}$$

where y is a vector containing the received sequence, H is

(MF)-[2], decorrelating-[2], minimum mean square error-[3], and interference cancellation-[4, 5, 6] detectors.

In this paper we derive a detector with both much lower complexity and improved bit error rate (BER) performance compared to the decorrelating detector.

A low-complexity iterative threshold receiver, which we refer to as the Delta detector, was derived in [7, 8, 9]. This Delta detector makes the same decisions as an MLSD on some, but not necessarily all, bits. We apply this detector to DS-CDMA multiuser detection and develop a new pipelined implementation, exhibiting even lower complexity.

The proposed implementation consists of a fixed number of equal, and independent, processing units. These processing units are coupled in series to form the first stage of the DS-CDMA multiuser detector. Since the Delta detector stage leaves some bits undetermined, a second detector stage is required. This second stage detects the remaining bits, typically using one of the above-mentioned suboptimal detectors. As with many other multiuser detectors, a drawback with the Delta detector is that it requires the knowledge of the users' spreading sequences and channels. This important issue, however, is beyond the scope of this paper. a matrix representing a channel and **n** is a white Gaussian noise vector with zero mean and covariance matrix  $\sigma^2 \mathbf{I}$ .

Let z denote the MF output, *i.e.* 

$$\mathbf{z} \triangleq \mathbf{H}^H \mathbf{y} = [z_1, \dots, z_P]^T \tag{2}$$

and define the correlation matrix

$$\mathbf{M} \triangleq \mathbf{H}^{H} \mathbf{H}.$$
 (3)

Now, MLSD decisions may be found by simply comparing the MF output  $z_k$  with two thresholds,  $\Delta_k^+$  and  $\Delta_k^-$ . The following decision rules ensure MLSD decisions:

$$\operatorname{Re}\left\{z_k\right\} > \Delta_k^+ \quad \Rightarrow \quad \widehat{b}_k = +1 \tag{4}$$

 $\operatorname{and}$ 

$$\operatorname{Re}\left\{z_k\right\} < \Delta_k^- \quad \Rightarrow \quad \widehat{b}_k = -1, \tag{5}$$

where the above thresholds depend on already detected bits and can be determined as

$$\Delta_k^+ \triangleq \sum_{\substack{i \mid i \neq k, \ \hat{b}_i \ \text{det.}}} m_{i,k} \widehat{b}_i + \sum_{\substack{i \mid i \neq k, \ \hat{b}_i \ \text{not det.}}} |m_{i,k}|, \qquad (6)$$

 $\operatorname{and}$ 

$$\Delta_k^- \triangleq \sum_{\{i \mid i \neq k, \ \hat{b}_i \ \det.\}} m_{i,k} \widehat{b}_i - \sum_{\{i \mid i \neq k, \ \hat{b}_i \ \text{not det.}\}} |m_{i,k}|, \qquad (7)$$

where  $m_{i,k}$  are the real parts of the elements in **M**.

The detection algorithm, as described in [7], is iterative and works on blocks of data of length P. If neither (4) nor (5) is satisfied, the detector fails to detect  $b_k$ . However, as any neighboring bits are detected, the thresholds tighten and new bits may be detected in a succeeding iteration. The algorithm continues to iterate until no further bits can be detected, which is the case when all bits are detected or no  $z_k$ 's corresponding to undetected bits exceed (6) or (7). The total number of detected bits and the required number of iterations depend on the correlation matrix  $\mathbf{M}$ , the transmitted data  $\mathbf{b}$ , and the received noise  $\mathbf{n}$ . However, for most cases we can use a fixed number of iterations and still detect almost all detectable bits.

#### III. PIPELINED DELTA DETECTOR

As mentioned in the introduction, the structure of the Delta detector lends itself to a pipelined implementation. In this section we pursue this idea and describe a pipelined version of the detector. The pipeline consists of N processing units, each unit performing one iteration of the Delta algorithm.

#### A. The basic processing unit

Assume that the Delta detector detects bit  $b_k$  according to (4) or (5). The thresholds should then be updated according to (6) and (7). This can be done recursively, given the previous values on the thresholds, as

$$\Delta^+ \cdot - \Delta^+ - |m_{\ell}| + \widehat{h} \cdot m_{\ell} \cdot (8)$$



Figure 1 – The basic processing unit as used in the pipelined detector.

The block performing threshold updates in Figure 1 consists of two shift registers, like the ones used for z and  $\hat{b}$ . However, the shift registers for  $\Delta^+$  and  $\Delta^-$  also require logic for performing the update. This logic is relatively simple for two reasons: Firstly, evaluation of (8) and (9) show that only one of the thresholds needs to be updated (see Table 1). Therefore, the threshold update requires only one addition per delay in the shift registers. Secondly, the detected bits' thresholds need not be updated, since they will not be used anymore. This implies a total of 2D additions

$$\Delta_l := \Delta_l - [m_{k,l}] + o_k m_{k,l} \tag{6}$$

 $\operatorname{and}$ 

$$\Delta_l^- := \Delta_l^- + |m_{k,l}| + b_k m_{k,l} \tag{9}$$

for all l. If the dependence between a detected bit and the thresholds is constrained to a certain neighborhood, *i.e.* 

$$m_{k,l} = 0$$
, when  $|k - l| > D$ , (10)

then a processing unit performing the threshold updates can have a finite constraint length, extending D bits in opposite directions. Such a processing unit is displayed in Figure 1, where the memory length is (2D + 1) bits. The processor uses ternary values  $\{-1, 0, 1\}$  for  $\hat{b}_k$ , where  $\hat{b}_k = 0$  denotes that bit k has not yet been detected.

The operation of the processing unit is as follows: If the current bit,  $\hat{b}_k$ , is already detected, *i.e.*  $\hat{b}_k \in \{+1, -1\}$ , no processing is performed. However, if the current bit is undetected, *i.e.*  $\hat{b}_k = 0$ , the MF value,  $z_k$ , is compared to the thresholds  $\Delta_k^+$  and  $\Delta_k^-$ . If detection is possible, as described by (4) and (5), then the detected value is stored in  $\hat{b}_k$ , and the thresholds are updated. If the bit cannot be detected, no further operations are performed. Finally, all sequences are shifted one position to the right and the next bit is processed.

per detected bit.

|   | Table 1. Threshold updates |           |                             |                            |
|---|----------------------------|-----------|-----------------------------|----------------------------|
|   | $\widehat{b}_{k}$          | $m_{k,l}$ | $\Delta_l^+$ update         | $\Delta_l^-$ update        |
|   | -1                         | > 0       | $-2 \left  m_{k,l} \right $ | 0                          |
| 7 | -1                         | < 0       | 0                           | $2 \left  m_{k,l} \right $ |
|   | 1                          | > 0       | 0                           | $2  m_{k,l} $              |
|   | 1                          | < 0       | $-2\left m_{k,l} ight $     | 0                          |

**m**1

### B. Pipelined detector

A pipelined detector consisting of N processing units is displayed in Figure 2. This detector takes the MF outputs,  $z_k$ , as its inputs and outputs  $\hat{b}_{k-D_{tot}}$  containing the MLSD decisions, interspersed with not detected bits indicated by zeros. The outputs from one processing unit are inputs to the next. The total memory length in the pipeline detector is N times the memory length in one processing unit, *i.e.* 

$$D_{tot} = N(2D+1).$$
 (11)

When initiating the pipeline processing, settings are required on the inputs  $\Delta_{in}^+$ ,  $\Delta_{in}^-$ , and  $\hat{b}_{in}$  to the first processing unit. Initially, no bits are detected and all  $\hat{b}_k = 0$ . This also



Figure 2 – Pipelined detector structure.

implies that the initial setting on the thresholds (4) and (5)become  $\Delta_k^+ = -\Delta_k^- = \sum_{i \neq k} |m_{i,k}|$ . After the first processing unit, one iteration of the Delta detector is complete. A bit not detected in one processing unit may be detected in the next because of tighter thresholds. After the whole pipeline, N iterations of the Delta detector are completed.

As mentioned in Section II, and in [7], we do not know beforehand the number of iterations required to exhaust the detection capacity of the Delta detector. The number of required iterations depends on the application at hand and the received signal. But for the pipelined detector the number of processing units (and hence iterations) must be fixed. However, experiments for the DS-CDMA case indicate that only five to six processing units suffice (see Section V).

#### **DS-CDMA** MULTIUSER DETECTION IV.

We have applied the pipelined Delta detector to a DS-CDMA system and evaluated it with two different second stage detectors: the single-user MF-receiver and the NDDIC receiver.

The transmitted signal from user m is



Figure 3 – Multiuser detector for DS-CDMA, using the pipelined detector.

the (channel and spreading-code) matched filtering in expression (2) can be implemented as a bank of RAKE receivers [10], one for each user. The input to the pipelined detector,  $z_k$ , is a multiplex of the soft RAKE outputs, as shown in Figure 3. The multiplexing is done sequentially from RAKE 1 to K in a repetitive manner, which determines the ordering of bits in **b** (see Appendix A).

If all channel responses are shorter than one bit interval, then the correlation

 $m_{k,l} = 0$  when |k-l| > 2K-1,

where K is the number of users. The memory length in the pipeline detector will then be  $D_{tot} = N(4K-1)$ , see Section III. The resulting system delay introduced by the detector, measured in bit intervals, will be  $D_{tot}/K$ .

Bits not detected in this first receiver stage are detected in a second stage. In this paper we apply both the single-user MF receiver and the NDDIC as the second receiver stage. Before applying the second receiver stage, we subtract the interference from the already detected bits as

$$\widetilde{\mathbf{z}} = \mathbf{z} - \mathbf{M} \widehat{\mathbf{b}}$$

where M is the correlation matrix (3) and  $\widehat{\mathbf{b}}$  is the output from the pipelined detector. For this purpose bits not detected are represented by zeros in  $\hat{\mathbf{b}}$ .

$$s_m[l] = \sum_{i=-\infty}^{\infty} c_{m,i} \left[l - iL\right] b_{m,i},$$

where L is the number of chips per bit,  $b_{m,i}$  is the transmitted bit at signalling interval i and  $c_{m,i}[l] \in \{-1, +1\}$ is the spreading sequence used with bit i. Assuming that the channel is constant during one bit interval, the received composite discrete-time signal from K simultaneous asynchronous users becomes

$$y[l] = \sum_{i=-\infty}^{\infty} \sum_{m=1}^{K} b_{m,i} \left( c_{m,i} \left[ l - iL \right] * h_{m,i} \left[ l - d_{m,i} \right] \right) + n[l],$$
(12)

where  $h_{m,i}[l]$  is the *m*th user's channel response at signalling interval  $i, d_{i,m} \in \{0, 1, \dots, L-1\}$  is the asynchronous delay, n[l] is additive white Gaussian noise, and \* denotes convolution with respect to l. The interference consists of both inter-symbol interference and multiple-access interference.

The pipelined detector can be used as a DS-CDMA multiuser detector. This can be realized by rephrasing (12) in terms of the matrix expression (1). Then the vector **b** contains the transmitted bits from all users. For DS-CDMA,

The single-user MF receiver, on the one hand, detects the remaining bits by comparing the corresponding elements in  $\tilde{\mathbf{z}}$  with a zero threshold, *i.e.* 

$$\widehat{b}_{k,MF} = \operatorname{sign}\left(\widetilde{z}_k\right), \text{ for all } \widehat{b}_k = 0.$$

The NDDIC receiver, on the other hand, detects the remaining bits by successive non decision-directed cancellation of the remaining interference in  $\tilde{z}$ , producing an output  $\hat{b}_{k,NDDIC}$  for the bits not previously detected.

Combining the pipelined detector with a single-user MF receiver gives a complete detector with very low total complexity. The multistage NDDIC detector is much less complex than the decorrelator receiver, but it is more complex than the pipelined Delta receiver. This is because the Delta algorithm requires only a certain number of additions whereas the NDDIC has to perform about the same number of multiplications. Therefore the combination of the pipelined Delta and the NDDIC gives higher computational complexity than the Delta/single-user MF combination but still lower than the NDDIC used alone.

## V. SIMULATIONS

In this section we present simulation results for the pipelined Delta receiver used in the uplink of a DS-CDMA system. We assume a Rayleigh fading environment, where log-normal fading is eliminated by power control. Using length L = 127spreading codes, chosen as pseudo-random sequences, and a chip rate of 5 MHz, we generate the transmitted signals according to (12). All transmitters are asynchronous and their respective delays have uniform distribution over the length of one bit duration. We use independent "ETSI vehicular A" channel models [11] to simulate each user's channel. In addition, we have assumed perfect channel knowledge at the receiver.

Figure 4 shows the probability of detection versus the number of users for an increasing number of processing units in the pipelined Delta detector. After only a few processing units, *i.e.* 5 to 6, most of the detectable bits are detected by the pipelined detector. Hence, by fixing the number of processing units to 5 or 6 we obtain almost the same decision rate as the basic Delta algorithm in Section II. For up to 25 users the decision rate is higher than 99% if 5 or more pipeline units are used.

Figure 5 shows the uncoded BER for different receiver combinations with the pipelined detector using 6 processing units. The pipelined Delta detector with a single-user MF as the second stage receiver gives bit error rates that are somewhat lower than the decorrelator receiver used alone, for up to 25 users. When using the multistage NDDIC detector presented in [4] (implemented with five stages) as second stage detector we obtain better performance than for all the other compared detectors and for any number of users. In our simulations the multistage NDDIC detector used alone, gave BER performance slightly better<sup>1</sup> than the decorrelator receiver and the two curves are inseparable in Figures 5,6, and 7. Figure 6 and Figure 7 show uncoded BER versus SNR for 25 and 30 users respectively. In these graphs we find that for 25 users the pipelined detector in combination with the single-user MF is slightly better than the decorrelator receiver for all SNRs. However, for 30 users the performance is worse than for the decorrelator receiver, but still better than for the single-user MF used alone. This depends on low decision rate in the Delta detector when 30 or more users are active.



Figure 4 – Decision rate for increasing number of processing units at 12 dB SNR, and 127 chips per bit.

however, the pipelined detector needs knowledge of all users' spreading sequences and channels.

By using the pipelined threshold detector and a singleuser MF receiver as second stage detector we obtain, at the same time, lower BER and lower computational complexity than both the decorrelating receiver and the successive nondecision directed interference canceller (NDDIC) [4] for up to 25 users. For any number of users, the combination of the pipelined detector with the NDDIC as the second stage receiver results in both lower complexity and lower BER than the decorrelator detector.

## VI. CONCLUSIONS

We have developed a pipelined implementation of a simple threshold detector which makes MLSD decisions on a random number of bits.

We applied this detector in a DS-CDMA environment and found that its decision rate is high for up to a limited number of active users. Like many other multi-user detectors,

<sup>&</sup>lt;sup>1</sup>In [4] it is pointed out that as the number of stages in the NDDIC increases the BER performance converges to that of the decorrelating detector. However for some unknown number of stages the performance is even better.



Figure 5 – Uncoded BER for different receiver combinations with the pipelined Delta detector. 6 processing units, 12 dB SNR, and 127 chips per bit.



Figure 6 – Uncoded BER versus SNR for 25 users and 127 chips per bit. The pipelined Delta detector has 6 processing units.

#### A MATRIX REPRESENTATION

The multiplexing described in Section IV determines the ordering of elements in **b** and **H**. We assume that the channel is constant during one bit interval. In matrix notation we denote this as in (1) for I bits per user with

$$\mathbf{b} = (b_{1,1}, \cdots, b_{K,1}, b_{1,2}, \cdots , b_{K,2}, \cdots, b_{K,I})^T$$

The elements  $hc_{m,i}(j)$ ,  $j \in \{0, 1, \dots, LI + T - 1\}$ , in the column vector  $\mathbf{hc}_{m,i}$  are calculated as

$$hc_{m,i}(j) = (c_{m,i}[l-iL] * h_{m,i}[l-d_{m,i}])|_{l=j}$$
,

where \* denotes convolution with respect to l, and T is the length of the longest channel impulse response,  $h_{m,i}$ .

## References

- S. Verdú, "Minimum probability of error for asynchronous Gaussian multiple-access channels," *IEEE Trans. Inform. Theory*, vol. 32(1), pp. 85–96, Jan. 1986.
- [2] R. Lupas and S. Verdú, "Linear multiuser detectors for synchronous code-division multiple-access channels," *IEEE Trans. Inform. Theory*, vol. 35, pp. 123–136, Jan. 1989.
- [3] U. Madhow and M. L. Honig, "MMSE interference supression for direct-sequence spread-spectrum CDMA," *IEEE Trans. Commun.*, vol. 42, pp. 3178–3188, Dec. 1994.
- [4] A.-L. Johansson, "Interference cancellation for DS/CDMA systems in flat fading channels", Licentiate thesis, Chalmers University of Technology, 1996.
- [5] P. Patel and J. Holtzman, "Analysis of a simple successive interference cancellation scheme in a DS/CDMA," *IEEE J. Select. Areas Commun.*, vol. 12, pp. 796–807, June 1994.
- [6] M. Varanasi and B. Aazhang, "Multistage detection in asynchronous code-division multiple access communications," *IEEE Trans. Commun.*, vol. 38, pp. 509–519, Apr. 1990.

and

$$\mathbf{H} = (\mathbf{h}\mathbf{c}_{1,1}, \cdots, \mathbf{h}\mathbf{c}_{K,1}, \mathbf{h}\mathbf{c}_{1,2}, \cdots, \mathbf{h}\mathbf{c}_{K,2}, \cdots, \mathbf{h}\mathbf{c}_{K,I}).$$



Figure 7 – Uncoded BER versus SNR for 30 users and 127 chips per bit. The pipelined Delta detector has 6 processing units.

- [7] P. Ödling, H. B. Eriksson, P. O. Börjesson, "Making MLSD-decisions on some individual bits in a sequence by thresholding the matched filter output", ISIT'97, p. 75, 1997.
- [8] H. B. Eriksson, Digital Block Transmission and Timeof-Flight Estimation, - Receiver Design and Performance Analysis. Doctoral thesis, Luleå University of Technology, May 1995.
- [9] P. Ödling, Design and Analysis of Digital Receivers. Doctoral thesis, Luleå University of Technology, May 1995
- [10] J. Proakis, Digital communications. Prentice-Hall, 3rd ed., 1995.
- [11] ETSI/SMG, Overall requirements on the radio interface(s) of the UMTS ETR/SMG 21.01, v.3.0.0, ETSI, Valbonne, France, 1997.