# ALTERNATIVES FOR ON-BOARD DIGITAL MULTICARRIER DEMODULATION

ENRICO DEL RE AND ROMANO FANTACCI

Dipartimento di Ingegneria Elettronica, Università di Firenze, Via S. Marta, 3 50139, Firenze, Italy

#### **SUMMARY**

In this paper alternatives for digital multicarrier demodulators (MCD) suitable for advanced digital satellite communications systems are presented. The MCD permits the direct on-board interfacing of FDMA and TDM communication links by digital signal processing techniques. Two main functions are implemented by a MCD: demultiplexing (DEMUX) and demodulation (DEMOD). We focus here only on a digital implementation of the MCD, looking at its advantages, flexibility, better performance and VLSI integrability.

The DEMUX may be implemented in a number of ways: the analytic signal method, fast Fourier transform with polyphase network technique, or multistage methods. For all the implementation methods considered it is shown that a certain degree of integration of DEMUX and DEMOD functions is possible. To this end, in the proposed MCD schemes the receiver pulse-shaping filter has been integrated in the DEMUX structure, reducing the overall implementation complexity. It is shown that, for the per-channel structure based on the analytic signal method, a highly modular and flexible

implementation can also be achieved.

Coherent demodulation is used to reduce the signal-to-noise ratio required to achieve a specified bit error rate. The coherent demodulation is carried out by using the maximum likelihood (ML) estimation method. Two different approaches to receiver synchronization have been studied. For the first, the carrier phase and symbol timing estimates are independently derived by suitable techniques. The second approach makes use of the maximum a posteriori probability method to estimate both the carrier phase and symbol timing of the received signal. In particular, for this technique it is shown that, by a suitable choice of the architecture of the digital coherent receiver, the ML demodulator can be integrated in the joint carrier and clock recovery circuit, with no increase in the overall system complexity.

The digital architecture of the proposed MCD can be adapted to different digital modulation techniques. However, here we consider only the application for QPSK signals, as this modulation

scheme is the most promising for digital satellite communications.

A theoretical analysis and computer simulation have been used to evaluate the performance degradation of the proposed MCD, including finite-arithmetic implementation effects.

KEY WORDS Satellite communications On-board processing Multicarrier demodulators

### 1. INTRODUCTION

The evolution of satellite communications is pointing to a new generation of satellites—'user oriented' satellites-which will operate with earth-stations much smaller and less complex than at present. Digital communication systems will play a key role in the development and establishment of the new, value-added services of the planned advanced communication networks. Digital transmission will be used in those areas that traditionally have been, and still are, the domain of analogue transmission, such as radio-relay links and satellite communications. Efficient and cost-effective solutions will require new approaches and implementations for both the transmission process (e.g. new coding/ modulation schemes) and the communication system architecture. For example, in the past satellites have operated using analogue modulation of the carrier, and accessing the satellite was accomplished by using frequency-division multiple access (FDMA).

The satellite simply translated the carrier frequency and retransmitted the signal in a wide beam covering a large geographical area. The new systems employ time division multiple access (TDMA), new efficient modulation techniques, multiple beam antennas and on-board processing for higher system efficiency. On-board signal processing offers advantages to satellite communications. A typical and interesting feature is the separation of the up-links and downlinks, thus allowing their separate and independent optimization. Regenerative satellites allow different modulation and multiple-access schemes to be used in the up-links and down-links; for example, uplink random access and down-link TDMA techniques can be envisaged.1 Recently a mixed frequency and time division multiple access to the satellite has been proposed, for example for mobile or fixed communications services. In such cases the use of up-link FDMA techniques (with their low-cost earthstations) and down-link TDMA techniques (that can fully exploit the satellite transponder output power without intermodulation distortion) is an attractive solution. The feasibility of this approach, however, depends on the availability of an efficient means of translating between the two multiple access formats on board the satellite. The complexity of the onboard system (including the VLSI design) and power consumption are, of course, of primary concern. The on-board processing system receives an input FDMA signal and supplies an output to interface the TDM link; therefore it must be able to separate each radio channel, perform demodulation and correctly switch to the appropriate down-link channel. It is appropriate to call the on-board processing system performing the first two operations the 'multicarrier demodulator' (MCD). The MCD has two main functions: demultiplexing (DEMUX) and demodulation (DEMOD).

This paper is concerned only with a digital implementation of the MCD because it offers several advantages such as flexibility, VLSI integrability and better efficiency. The operation of the DEMUX is to separate the individual input FDMA channels and to supply each of them to a demodulator input for the appropriate down-conversion to baseband. Therefore in principle its operation corresponds to a bank of bandpass filters followed by a downconverter. By digital means the down-conversion can be appropriately implemented by a decimation operation. Moreover, the direct implementation of a bank of digital filters is not the most convenient solution. This paper deals with efficient approaches to the digital implementation of the DEMUX, namely the block, the per-channel method and the multistage technique.<sup>2–8</sup>

Coherent demodulation is usually employed in satellite communications in order to achieve the required bit error rate with an acceptable signalto-noise ratio. The performance of a coherent demodulator depends rather critically on the design of the synchronization circuit used to estimate the received carrier phase and bit synchronization reference from the received signal. Carrier recovery can be achieved in different ways: the Mth power method, the Costas loop and decision-directed feedback circuit. With M=2, the Mth power method is known as a squaring loop.9 Clock recovery is usually achieved by performing a non-linear operation on the received signal, because the signal does not contain discrete spectral lines at the clock frequency. 9 Clock recovery can be done after carrier recovery or at the same time. In the latter case the recovery circuits operate on the demodulated (not necessarily detected) baseband waveform, whereas in the former case circuits operate directly on the modulated carrier signal.

The complexity of the clock recovery circuit could be drastically reduced if a kind of network clock synchronization be implemented. The carrier recovery circuit could also be simplified, using differential detection schemes where applicable. In this paper we consider the most general structure, where both carrier and clock recovery circuits are required for each demultiplexed channel. The digital architecture of the studied receiver can be adapted to different digital modulation techniques; nevertheless, we focus here only on the application for a QPSK signal, as this modulation scheme is the most promising in digital satellite communications.

To achieve receiver synchronization there are two different approaches. For the first method a carrier recovery circuit which makes use of the nonlinear estimation technique proposed by Viterbi and Viterbi<sup>10</sup> is considered. This approach has been selected because it gives a good estimate, is less sensitive to a finite arithmetic implementation, and requires a short, well-defined acquisition time. Moreover, it can be used with continuous as well as burst mode carriers, and a certain degree of integration of the DEMUX and DEMOD functions is possible. In this case the clock recovery circuit is assumed to be implemented according to the clock estimation method proposed by Gardner. 11 This clock recovery approach has been selected because its estimation operations are independent of carrier phase, and some degree of integration of the demultiplexing and clock recovery functions is possible. A maximum a posteriori probability (MAP) method is used in the second approach to estimate together the parameters that require synchronization. In particular it will be shown that by a suitable choice of the architecture of the digital coherent receiver, the ML demodulator can easily be integrated with the joint carrier and clock recovery circuit with no increase in the overall system complexity.12

The multicarrier demodulator described in this paper represents a complete solution for a processing system interfacing FDMA and TDM links. In particular its design allows for implementation by means of custom VLSI digital circuits.

### 2. DEMULTIPLEXER

A recent approach to the FDMA–TDM translation problem is the use of a demultiplexer, a piece of equipment that provides the required interface without recovering the analogue baseband signals.<sup>3–8</sup> This solution is made possible by the techniques of digital signal processing (DSP).

In the literature three main approaches to this problem have been presented: the per-channel approach, the block approach and the tree approach. The block technique requires a block processor of the fast Fourier transform (FFT) type; such a processor is not employed in the other methods. At present the per-channel and tree approaches appear to be easily implemented by custom VLSI technology. All these demultiplexer implementation techniques are considered here. In particular we

focus on the analytic signal (AS) method for the per-channel approach, the FFT with polyphase network for the block technique, and the multistage method for the tree approach.

### 2.1 The analytical signal approach

The AS approach<sup>5</sup> is a per-channel method that avoids any digital product modulator and any block processor. It has the specific advantage that it enables the filter specifications to be relaxed, thus achieving a lower implementation complexity than other per-channel approaches. Furthermore, it leads directly to a high modular structure which is directly matched to the per-channel implementation of the demodulators. Therefore, as will be shown, integration of the DEMUX and DEMOD functions is possible. Another advantage of the AS approach is its high flexibility: unlike the other methods, if a specific applications would benefit from an unequal channel bandwidth, the AS structure could vary on demand the bandwidth assigned to each channel simply by switching to a suitable new set of DEMUX parameters. The principle of operation of the AS method is illustrated in Reference 5, and is only briefly described here. The structure of the DEMUX in the AS method is shown in Figure 1. The FDMA input signal, after appropriate analogue downconversion of the received signal to a low frequency range, is sampled according to the sampling theorem at the high-rate frequency  $f_u = 1/T_u$  and processed in order to obtain N<sub>c</sub> TDM digital signals, each sampled at the low-rate frequency  $f_d = 1/T_d$ ,  $N_c$  being the number of multiplexed channels. In Figure 1  $H_i$  $(fT_{\rm u})$  and  $H'_{i}(fT_{\rm u})$  represent the conjugate symmetric and antisymmetric parts, respectively, of the highrate complex bandpass filter  $\mathbf{H}_i$  ( $fT_{\rm u}$ ), which can be regarded as a frequency-translated version of a lowpass prototype  $\mathbf{H}(fT_{\rm u})$  such that<sup>5</sup>

$$\mathbf{H}_{i}(fT_{\mathrm{u}}) = H_{i}(fT_{\mathrm{u}}) + \mathrm{j}H'_{i}(fT_{\mathrm{u}})$$
  
= 
$$\mathbf{H}[2\pi(f-iW-\frac{1}{2}W)T_{\mathrm{u}}]$$
 (1)

where W is the channel spacing. In Figure 1  $G_i(fT_d)$  and  $G'_i(fT_d)$  represent the conjugate symmetric and antisymmetric parts, respectively, of the complex low-rate filter  $G_i(fT_d)$  which can be defined as<sup>5</sup>

$$\mathbf{G}_{i}(fT_{d}) = G_{i}(fT_{d}) + jG'_{i}(fT_{d}) = \mathbf{G}\{[f - \frac{1}{2}(-1)^{i}W]T_{d}\}$$
(2)



Figure 1. Demultiplexing in the analytic signal approach: block diagram

Thus, each filter  $G_i(fT_d)$  is related, according to equation (2), to a low-pass prototype. It can be noted from equation (2) that the number of different filters  $G_i(fT_d)$  is actually two: one for the odd channels and one for the even channels. Taking into account equations (1) and (2), we have in the frequency domain<sup>5</sup>

$$X_i(fT_d) = S[(fT_d + \frac{1}{2}i)/N_c]$$
 (3)

for the implementation structure shown in Figure 1. It must be noted that a decimation by a factor equal to the number  $N_c$  of multiplexed channels must be used. As shown in Figure 1, the terms  $S(fT_u)$  and  $X_i(fT_d)$  represent the spectrum of the input signal and the spectrum of the *i*th output of the DEMUX, respectively.

The pulse-shaping filter which is generally used in order to reduce the effects of noise at the receiver and to avoid the intersymbol interface (ISI) at the instant of detection can be implemented by cascading the two digital filters  $\mathbf{H}_i(fT_u)$  and  $\mathbf{G}_i(fT_d)$ . The high-rate filter  $\mathbf{H}_i(fT_u)$  is essentially a bandpass filter, so the desired pulse-shaping function can be implemented by the low-rate filter  $\mathbf{G}_i(fT_d)$ . For example, it will be shown later that a 40 per cent cosine roll-off factor pulse-shaping filter<sup>9</sup> shared equally between the transmitter and receiver can easily be integrated in the DEMUX, lowering the overall implementation complexity.

From the implementation structure of the AS method (Figure 1) it can be seen that only real quantities need be processed. The frequency DEMUX performed by the AS method is shown in Figure 2. It must be pointed out that the AS method is outlined here under the assumption of ideal filtering masks, but in real applications there are nonzero transition bands for the filters  $G_i(fT_d)$  and transition bands wider than the channel spacing W for the filters  $H_i(fT_u)$ . This opportunity gives rise to more relaxed filter specifications and thus reduces the overall system complexity (i.e. the total number of multiplications). The number of multiplications required per input channel per second is given by

$$M_{\rm AS} = (L_{\rm G} + \frac{1}{2}L_{\rm H}) 2W$$
 (4a)

where  $L_{\rm G}$  and  $L_{\rm H}$  are the number of coefficients of the low-rate filters  ${\bf G}_i$  and of the high-rate filters  ${\bf H}_i$ , respectively, and can be estimated as functions of the channel spacing W, the number of channels  $N_{\rm c}$  and the filtering bandwidth B as<sup>13</sup>

$$M_{\rm AS} = K W^2 \frac{W(N_{\rm c} + 4) - 2B(N_{\rm c} + 2)}{(W - B)(W - 2B)}$$
 (4b)

where K is given by

$$K = -\frac{2}{3}\log\left[5\delta_1\delta_2\right] \tag{5}$$

The parameters  $\delta_1$  and  $\delta_2$  denote the acceptable inband and out-of-band ripples, respectively, derived according to given system specifications; for example



Figure 2. Frequency demultiplexing in the analytic signal method: (a) FDM input signal, (b,c) frequency response of the high-rate channel filter, (d,e) spectra of the filtered FDM signal, (f,g) spectra of the complex signal obtained by decimation over  $N_c$ , (h,i) frequency response of the low-pass prototype, (k,l) spectra of the complex demultiplexed signal, (m,n) recovered baseband spectra

a filter design procedure is reported in Reference 13. From equation (4), for specified values of B and  $N_{\rm c}$  an optimum value of the channel spacing  $W_0$  can be found which gives the lowest  $M_{\rm AS}$ . However, taking into account that for the subsequent DEMOD operation an integer number of samples per symbol is convenient, a suboptimum value of W closest to  $W_0$  is generally used. To this end, a suitable choice of the DEMUX output sampling frequency 2W turned out to be 3 samples/symbol, i.e. W=3R/4 with R the transmission rate.

# 2.2 The fast Fourier transform/polyphase network approach

This method permits the implementation of DEMUX using a polyphase network and a fast Fourier transform (FFT) processor.<sup>6</sup>

The demultiplexer, whose block diagram is shown in Figure 3, is implemented by cascading a set of digital filters and a FFT processor. The set of  $N_c$  filters is obtained by shifting a basic low-pass complex filter function along the frequency axis. The frequency response of this low-pass prototype is shown in Figure 4(d). The transfer function H(z) of this filter establishes a relation between the z-transforms of the filter input and output sequences, which are assumed to have the same sampling rate  $f_u$ .

By assuming a FIR filter with  $kN_c$  coefficients, we can write



Figure 3. Demultiplexing implemented by a digital polyphase network and a FFT processor

$$H(z) = \sum_{i} a_{i} z^{-i} = \sum_{n} z^{-n} H_{n}(z^{N}_{c})$$
 (6)

with

$$H_n(z^{N_c}) = \sum_{m} a_{mN+n} z^{-mN_c}$$
 (7)

The filter H(z) can be implemented by a network with  $N_c$  paths, as shown in Figure 3; this is called a polyphase network because each path has a frequency response which approximates that of a pure phase shift.<sup>3,6</sup> The phase shifts are constant in frequency and are integer multiples of  $2\pi/N_c$ .

A change in sampling frequency by a factor of  $N_{\rm c}$  can be introduced, thus allowing the circuit in the different paths of the network to operate at the low frequency  $f_{\rm d}$ .

The set of filters used to implement DEMUX is formed by  $N_c$  filters which cover the band from 0 to  $f_u/2$ ; the frequency response of these channel filters is sketched in Figure 4. It is clear that this response can be obtained by shifting the basic complex low-pass filter function shown in Figure 4(d) along the frequency axis by an integer multiple of  $f_d/2$ .



Figure 4. Frequency demultiplexing performed by a polyphase network with a FFT processor: (a) input FDM signals, (b,c) frequency response of the channel filters, (d) basic filter response

From H(z), the basic filter z-transfer function, a translation in frequency by  $(mf_{\rm u}/2N_{\rm c})$ , with m an integer, appears as a change in the variable from z to  $z{\rm exp}[{\rm j}2\pi m/N_{\rm c}]$ . Thus the filter with index m has a transfer function  $B_m(z)$  given by

$$B_m(z) = H(z \exp \left[j2\pi m/N_c\right]) \tag{8}$$

By applying the decomposition of H(z) previously introduced this becomes

$$B_m(z) = \sum_n z^{-n} \exp[-j2\pi mn/N_c] H_n(z^{N_c})$$
 (9)

By allowing for the fact that the functions  $H_n(z^{N_c})$  are the same for all the filters  $B_m(z)$ ,  $m=0,1,\ldots,N_c-1$ , a factorization can be introduced which results in the matrix equation

$$\begin{bmatrix} B_{0}(z) \\ B_{1}(z) \\ \vdots \\ B_{N_{c}-1}(z) \end{bmatrix} = \begin{bmatrix} 1 & 1 & \cdots & 1 \\ 1 & V & \cdots & V^{N-1} \\ \vdots & \vdots & & \vdots \\ 1 & V^{N_{c}-1} & \cdots & V^{(N_{c}-1)(N_{c}-1)} \end{bmatrix}$$

$$\times \begin{bmatrix} H_{0}(z^{N_{c}}) \\ z^{-1}H_{1}(z^{N_{c}}) \\ \vdots \\ z^{-(N_{c}-1)}H_{N_{c}-1}(z^{N_{c}}) \end{bmatrix}$$

$$(10)$$

where  $V = \exp(-j2\pi/N_c)$ .

The square matrix is a DFT matrix. Thus the set of filters is realized by forming a cascade of the polyphase network and a discrete Fourier transform processor.

The overall number of multiplications required per second per channel,  $M_{\rm FFT}$ , when the DEMUX is implemented according to the fast Fourier transform with polyphase network approach is now outlined. The polyphase network consists of  $N_{\rm c}$  digital filters, generated from a basic low-pass prototype. The number of coefficients of this filter is given by

$$L_{\rm FT} = \frac{2}{3} \log \left[ 1/(10\delta_1 \delta_2) \right] 2N_c W/(W - 2B)$$
 (11)

where W is the channel spacing, B the (one-sided) filtering bandwidth,  $N_c$  the number of input channels,  $\delta_1$  the acceptable in-band filtering ripple and  $\delta_2$  the required out-of-band filtering ripple.

Each filter of the polyphase network operates at the rate  $f_d$ , and has a number of coefficients given by:

$$L_{\rm f} = L_{\rm FT}/N_{\rm c} \tag{12}$$

The FFT computer, connected in cascade with the polyphase network, operates at the rate  $f_{\rm d}$  and requires  $M_{\rm FT}$  real multiplications per second, where  $M_{\rm FT}$  is given by

$$M_{\rm FT} = 8WN_{\rm c}\log_2 N_{\rm c} \tag{13}$$

Hence the total number of real multiplications required per second per channel can be expressed as

$$M_{\rm FFT} = [L_{\rm f} + 4\log_2 N_{\rm c}] 2W$$
 (14)

From equations (11), (13) and (14) can be determined the optimum channel spacing  $W_0$  which minimizes the required number of multiplications  $M_{\rm FFT}$ , from which a near-optimum value for W can be chosen, again leading to W=3R/4.

## 2.3 The multistage approach

The multistage DEMUX technique is appropriate when the number of channels to be divided is a power of two.<sup>8</sup> The signal is split in two bands by half-band filters, and decimated by two. Both filter outputs are again split by two filters and decimated, leading to a division into four bands. After L stages of filtering and decimating,  $2^L$  channels are obtained.

The filters are assumed to be complex in order to allow a very large transition bandwidth, which leads to a small number of taps for each filter. Hence a reduced amount of processing, even if complex, is required taking also into account that about half coefficients in the half-band filter with odd number of coefficients are of zero value. The spectrum of the signal is represented in Figure 5, while a block diagram is shown in Figure 6. It is important to notice that the structure (quite similar to a binary tree) is very modular because the filters are replicated at each stage. The structure also guarantees a certain degree of flexibility because it is possible to obtain, from the intermediate stages, channels with different bandwidths.

The modularity brings some advantages in terms of redundancy in the sense that the same filter could be used in every stage, provided the processing rate is suitable. It should be observed that, after the last stage of the tree structure, a pulse-shaping filter (not shown in Figure 6) is necessary to limit the bandwidth of each DEMUX channel. The overall implementation complexity in terms of multiplications required per channel per second,  $M_{\rm MS}$ , is given by

$$M_{\rm MS} = \left[ \left( \frac{N_{\rm F} + 1}{2} + 1 \right) \left( \log_2 N_{\rm c} - \frac{1}{2} \right) + N_{\rm G} \right] 2W \tag{15}$$

where  $N_{\rm F}$  and  $N_{\rm G}$  denote the number of coefficients of the complex half-band filters and of the last filter of the tree (including the required pulse-shaping function), respectively. As for the other methods, the number of coefficients  $N_{\rm F}$  and  $N_{\rm G}$  can be estimated from the overall acceptable in-band and out-of-band ripples, the filter bandwidth and the channel spacing.



Figure 5. Multistage spectra and filters



Figure 6. Multistage implementation: block diagram

#### 3. DIGITAL DEMODULATOR

In this section are considered two possible alternatives for the digital implementation of a DEMOD suitable for QPSK signals. In particular, two different carrier and clock recovery procedures for obtaining coherent demodulation of the demultiplexed QPSK signals are described. The first makes use of a non-linear estimation method for the carrier phase and a timing error detector to achieve receiver synchronization; the second uses a MAP criterion for a joint estimation of the parameters necessary for synchronization.

# 3.1 Non-linear estimation of QPSK-modulated carrier phase

Figure 7 illustrates the general structure of the phase estimator considered here. Let the estimation period be  $T_{\rm E}$ , and let it encompass  $N_{\rm E}$  m-ary symbols (each T long), where  $T_{\rm E}=N_{\rm E}T$ . Suppose we wish to estimate the phase at the mid-point of the estimation interval, and we let  $N_{\rm E}=2N+1$ , where N is the number of signal intervals before and after the interval where the phase is to be estimated. <sup>10</sup>

In this context, and in the presence of additive white Gaussian noise (AWGN) and zero frequency uncertainty, Figure 7 with the dashed box eliminated

(so that  $x'_n = x_n, y'_n = y_n$ ) represents the optimal (maximum-likelihood) estimator for m = 1, which corresponds to an unmodulated carrier.

Obviously, if the carrier is phase-modulated to one of m discrete phases (m=4 for QPSK signals), the above linear estimator is useless since for each successive symbol the phase takes a different value. Suppose, however, that within the dashed box we insert the two-dimensional (complex) non-linear function

$$x'_n + y'_n = F(p_n) \exp(jm\Phi)$$
 (16)

where  $p_n = \sqrt{(x_n^2 + y_n^2)}$  and  $\Phi_n = \tan^{-1}(y_n/x_n)$ . That is, for each symbol we perform a rectangular-to-polar transformation, multiply phase  $\Phi_n$  by m, perform an arbitrary non-linear transformation on  $p_n$  and finally perform a polar-to-rectangular transformation on the result. We avoid describing the non-linearity in this manner in Figure 7, because in a practical implementation it becomes a read-only memory, transforming a quantized two-dimensional vector into another such vector.

Multiplying the phase by m, along with the final operation of dividing the  $\tan^{-1}$  function by m, gives rise to an m-fold ambiguity in the phase estimates. A practical m-PSK modulation system adjusts for this by coding the data transitions rather than the data themselves (this is called differential encoding)



Figure 7. General structure of the carrier phase non-linear estimation system

and performing the function of differential decoding at the receiver.

This requires the transmission of an additional initial (unmodulated) phase symbol per burst, and results in an approximate doubling of the bit error rate since each symbol demodulation error gives rise to two transition errors upon differential decoding. For the m-PSK modulated waveform, let  $\hat{\theta}_n(m)$  denote the estimator for the nth symbol phase of a symmetric estimation interval  $(-N \le n \le N)$ . The variance of the estimator  $\text{var}[\hat{\theta}_n(m)]$  is derived in closed form in Reference 10, where it is shown that it is independent of n. The analysis reported in Reference 10 has been carried out under the assumption that the time uncertainty is a negligibly small fraction of the symbol time T; thus a highly accurate and controlled timing is guaranteed.

The acquisition time for the non-linear estimation of QPSK-modulated carrier phase is  $N_{\rm E}$  symbols. The influence of a finite-arithmetic implementation on the carrier's estimated value can be derived only by simulations. In Figure 8 the r.m.s. error in the carrier phase is shown as a function of  $E/N_0$  for a floating-point implementation (curve a) and a finite-arithmetic implementation with  $b_{\rm f}=6$  bits (curve b). It is seen that a carrier phase r.m.s. error of less than 5° can be achieved for  $E/N_0>5$  dB with  $b_{\rm f}=6$  bits. The implementation complexity of the proposed carrier phase estimator method can be derived as:

$$M_{\rm v} = 3R$$
 multiplications/second A<sub>v</sub> = 2(N+1)R additions/second (17)

The non-linear estimation method for a QPSK-modulated carrier phase gives an accurate estimation, is less sensitive to a finite-arithmetic implementation and requires a definite (and short) acquisition time equal to  $N_{\rm E}=(2N+1)$  symbols.



Figure 8. Carrier phase error as function of  $E/N_0$ : (a) floating-point implementation, (b) finite-precision implementation at 6 bits

### 3.2. A QPSK timing-error detector

In this section an algorithm is presented for timing error detection in a digital receiver. It will be shown that the algorithm operations are simple and that only two samples of the signal are required for each data symbol. Moreover, one of the two samples also serves for the symbol strobe (i.e. the sample on which the symbol decision is made). This method has been proposed in Reference 11. The algorithm is intended for synchronous, binary baseband signals with 40-100 per cent excess bandwidth. The method presented in this section is similar to the approach proposed in Reference 14. Only two samples per symbol are used to achieve timing error detection. one sample being coincident with the instant of decision. The samples are derived by a suitable interpolation/decimation operation.

Figure 9 shows a block diagram of a typical I–Q receiving modem. A passband signal is demodulated to baseband in a pair of quadrature-driven mixers. The phase of the local carrier must be adjusted to agree with that of the signal. (The necessary carrier-recovery branch is omitted from the diagram, and is irrelevant to the clock algorithm and discussion.) Data filters are required; they perform receiver filtering to shape signal pulses, minimize noise and suppress unwanted mixer products.

Our interest is in sampled receivers. We do not specify the sampling point, other than to stipulate that the filter outputs be available only in sampled form as the pair of real sequences  $\{y_{\rm I}(\ )\}$  and  $\{y_{\rm Q}(\ )\}$ . Timing information must be retrieved from these sequences. Symbols are transmitted synchronously, separated by the time interval T. Each sequence will have two samples per symbol interval, and the samples will be time-coincident between the sequences. One sample occurs at the data strobe time, and the other mid-way between data strobe times.

The index r is used to designate the symbol number. It is convenient to denote the strobe values of the rth symbol as  $y_{\rm I}(r)$  and  $y_{\rm Q}(r)$ . As a formalism, we denote the values of the pair of samples lying mid-way between the (r-1)th and the rth strobe as  $y_{\rm I}(r-\frac{1}{2})$  and  $y_{\rm Q}(r-\frac{1}{2})$ . A timing error detector operates upon samples and generates one error sample  $u_{\rm t}(r)$  for each symbol. The error sequence is smoothed



Figure 9. QPSK timing error detector circuit