by a loop filter and then used to adjust a timing error corrector. The detector algorithm is

$$u_{t}(r) = y_{I}(r - \frac{1}{2}) \left[ y_{I}(r) - y_{I}(r - 1) \right] + y_{Q}(r - \frac{1}{2}) \left[ y_{Q}(r) - y_{Q}(r - 1) \right]$$
(18)

This algorithm is suitable for both tracking and acquisition modes of operation. It is proved in Reference 11 that  $u_t(r)$  is independent of carrier phase, so that timing lock can be achieved without depending upon prior carrier phase acquisition.

The physical interpretation of equation (18) is as follows. The detector samples the data stream midway between strobe locations in both the I and Q channels. If there is a transition between symbols, the average mid-way value should be zero, in the absence of timing error. A timing error gives a nonzero sample whose magnitude depends upon the amount of error, but the slope information necessary for the correction (lead or lag) is missing. To choose between the different possibilities, the algorithm examines the two strobe values on either side of the mid-way sample. If there is no transition the strobe values are the same and their difference is zero, so the mid-way sample is rejected. (No timing information is available in the absence of a transition.) If a transition is present, the strobe values will be different; the difference between them will provide slope information. The product of the slope information and the mid-way sample provides timing error information. It may be worth while to use the signs of the strobe values instead of the actual values; this eliminates the effects of noise. If all data filtering has been performed before the strobe point, then the sign of the strobe value is the optimum hard decision on the symbol and the algorithm effectively becomes decision-directed. This expedient is known to improve tracking capability. However, acquisition performance may suffer in a decision-directed operation.

The use of the strobe signs in place of the actual values eliminates the need for multiplications in the algorithm—an attractive feature for digital processors. The decision-directed version of the algorithm will be recognized as very similar to the digital transition tracking loop used by Lindsey and Simon. The implementation complexity for the timing error detector, including the interpolator filter used in Reference 16, can be expressed as

$$M_c = 16R$$
 multiplications/second (19)  
 $A_c = 16R$  additions/second

where there have been assumed to be two samples per symbol.

The performance of the described clock recovery method, including the effects of a finite-arithmetic implementation, has been examined by computer simulation. In Figure 10 the symbol error probability obtained by considering the timing offset error due to the clock recovery circuit and the channel noise



Figure 10. Symbol error probability versus  $E/N_0$  when the QPSK timing error detector is used: (a) floating-point implementation, (b) finite-precision implementation at 8 bits

are shown as functions of  $E/N_0$ . A finite-arithmetic implementation of the clock recovery circuit at 8 bits has been considered. It can be noted from this figure that a degradation of less than 0.1 dB is introduced in this case.

## 3.3. Map joint carrier phase and symbol timing estimation

The receiver synchronization approach described in this section makes use of the maximum a posteriori probability (MAP) criterion to obtain a joint estimate of the parameters necessary for synchronization. The coherent demodulation of a received signal usually requires correlator circuits or matched filters and, in some applications, a trellis decoder.9 It is shown here that, by a suitable choice of the architecture of the digital receiver, the coherent demodulator can easily be integrated in the joint carrier and clock recovery circuit, with no increase in the overall system complexity. The digital architecture of the receiver can be adapted to different digital modulation techniques, so we focus here on the application for QPSK signals, as this modulation scheme is the most promising for advanced satellite digital communication systems. The QPSK signal can be written as

$$s(t) = \sqrt{(2E/T_b)} \sum_{i} (d_{\rm I}(i)p(t-iT_b+\epsilon)\cos(2\pi f_c t + \theta) + d_{\rm Q}(i)p(t-iT_b+\epsilon)\sin(2\pi f_c t + \theta)$$
(20)

where  $d_{\rm I}(t)$  and  $d_{\rm O}(t)$  are the data sequences for the in-phase and quadrature channels, respectively. They are related to the input bit data sequence and are equal to  $\pm 1$ , with the same probability. The function p(x) is a rectangular function defined as

$$p(x) = \begin{cases} 1, & 0 \le x \le 2T_{b} \\ 0, & \text{otherwise} \end{cases}$$
 (21)

The term  $1/T_b=2/T=R$  represents the bit data rate,  $f_c$  is the carrier frequency and E is the energy per

bit. The carrier phase  $\theta$  is a random variable uniformly distributed on the interval  $[-\pi,\pi]$ , and  $\epsilon$  is the symbol synchronization offset, assumed to be independent of  $\theta$  and uniformly distributed on the interval  $[-T_b,T_b]$ . Now let us assume that the received signal r(t) consists of s(t) plus bandlimited Gaussian noise n(t). The noise spectrum is considered to be flat and equal to  $N_0$  (W/Hz) in the signal bandwidth  $f_c-B$  to  $f_c+B$ , where we assume  $B < f_c$ .

The signal r(t), supplied at the input of the digital receiver, is sampled, according to the sampling theorem, at the rate  $1/t_s = f_s$ . The system which jointly estimates the carrier phase  $\theta$  and clock offset  $\epsilon$  processes a block of samples r of predetermined length. A MAP method<sup>17</sup> is used in order to give a simultaneous estimate of  $\theta$  and  $\epsilon$ . The MAP estimation procedure for  $\theta$  and  $\epsilon$  consists in choosing  $\hat{\theta}$  and  $\hat{\epsilon}$  such that the a posteriori probability density of  $\theta$  and  $\epsilon$ , given the vector  $\mathbf{r}$ , i.e.  $f(\theta, \epsilon | \mathbf{r})$  is maximized.

We now define the function f(x) to be the probability density function (p.d.f.) of the random variable x. Since the joint p.d.f.  $f(\theta, \epsilon)$  is uniform and the received signal p.d.f.  $f(\mathbf{r})$  is independent of  $\theta$  and  $\epsilon$ , from the Bayes theorem the MAP estimation procedure becomes: choose  $\hat{\theta} = \theta$  and  $\hat{\epsilon} = \epsilon$  such that  $f(\mathbf{r}|\theta, \epsilon)$  is maximum. In order to evaluate the density  $f(\mathbf{r}|\theta, \epsilon)$ , we write  $\mathbf{r}$  as the sum of a signal vector and a noise vector:

$$\mathbf{r} = \mathbf{s} + \mathbf{n} \tag{22}$$

The components of the vectors  $\mathbf{s}$  and  $\mathbf{n}$  are just samples of the time waveforms s(t) and n(t) in accordance with the sampling theorem. The components of the vector  $\mathbf{n}$  are statistically independent, zero-mean, Gaussian random variables with variances of  $N_0B$ .

Thus the density  $f(\mathbf{r}|\theta, \epsilon)$  can be written as<sup>18</sup>

$$f(\mathbf{r}|\theta, \epsilon) = C_0(\mathbf{r}) \prod_{i=0}^{L} \cosh\left\{\frac{2t_s}{N_0} \sqrt{\left(\frac{2E}{T_b}\right)} \sum_{k=1}^{2M} r(t) \right\}$$

$$p(t-iT_b+\epsilon)\cos(2\pi f_c t + \theta) \Big|_{t=iT_b+kt_s}$$

$$\times \prod_{i=0}^{L} \cosh\left\{\frac{2t_s}{N_0} \sqrt{\left(\frac{2E}{T_b}\right)} \sum_{k=1}^{2M} r(t) \right\}$$

$$P(t-iT_b+\epsilon)\sin(2\pi f_c t + \theta) \Big|_{t=iT_b+kt_s}$$

where  $C_0(r)$  is a term which does not depend on  $\theta$  and  $\epsilon$ . 2M is the number of samples per symbol time and L+1 is the number of symbol intervals required to estimate  $\theta$  and  $\epsilon$ . A necessary, but not sufficient, condition that must be satisfied by  $\hat{\theta}$  and  $\hat{\epsilon}$  in order for them to be MAP estimates of  $\theta$  and  $\epsilon$  is given in Reference 18 with reference to an analogue implementation and a MSK modulation scheme; the corresponding digital version is given in Reference 12. It is easy to show that the necessary

conditions for a digital version of the MAP carrier and bit timing estimation for a QPSK modulation scheme then become

$$\frac{\partial \ln f(\mathbf{r}/\theta, \hat{\boldsymbol{\epsilon}})}{\partial \theta} \bigg|_{\hat{\boldsymbol{\epsilon}} = \hat{\boldsymbol{\epsilon}}} = \left[ -\sum_{i=0}^{L} \tanh \left\{ \frac{2t_{s}}{N_{0}} \sqrt{\left( \frac{2E}{T_{b}} \right)} \right] \times \sum_{k=1}^{2M} r(t) p(t - iT_{b} + \hat{\boldsymbol{\epsilon}}) \cos(2\pi f_{c} t + \hat{\boldsymbol{\theta}}) \bigg|_{t = iT_{b} + kt_{s} - \hat{\boldsymbol{\epsilon}}} \right] \times \sum_{k=1}^{2M} r(t) p(t - iT_{b} + \hat{\boldsymbol{\epsilon}}) \sin(2\pi f_{c} t + \hat{\boldsymbol{\theta}}) t_{s} \bigg|_{t = iT_{b} + kt_{s} - \hat{\boldsymbol{\epsilon}}} \right] + \left[ \sum_{i=0}^{L} \tanh \left\{ \frac{2t_{s}}{N_{0}} \sqrt{\left( \frac{2E}{T_{b}} \right)} \right\} \times \sum_{k=1}^{2M} r(t) p(t - iT_{b} + \hat{\boldsymbol{\epsilon}}) \sin(2\pi f_{c} t + \hat{\boldsymbol{\theta}}) \bigg|_{t = iT_{b} + kt_{s} - \hat{\boldsymbol{\epsilon}}} \right] \times \sum_{k=1}^{2M} r(t) p(t - iT_{b} + \hat{\boldsymbol{\epsilon}}) \cos(2\pi f_{c} t + \hat{\boldsymbol{\theta}}) t_{s} \bigg|_{t = iT_{b} + kt_{s} - \hat{\boldsymbol{\epsilon}}} = 0 \tag{24a}$$

$$\frac{\partial \ln f(\mathbf{r}/\theta, \epsilon)}{\partial \epsilon} \Big|_{\substack{\theta = \hat{\theta} \\ \epsilon = \epsilon}} = \left[ -\sum_{i=0}^{L} \tanh \left\{ \frac{2t_{s}}{N_{0}} \sqrt{\left(\frac{2E}{T_{b}}\right)} \right. \\
\times \sum_{k=1}^{2M} r(t) p(t - iT_{b} + \hat{\epsilon}) \cos(2\pi f_{c}t + \hat{\theta}) \Big|_{t=iT_{b} + kt_{s} - \hat{\epsilon}} \right\} \\
\times \left\{ r(t) \cos(2\pi f_{c}t + \hat{\theta}) \Big|_{t=iT_{b} - \hat{\epsilon}} \\
- r(t) \cos(2\pi f_{c}t + \hat{\theta}) \Big|_{t=(i+1)T_{b} - \hat{\epsilon}} \right\} \\
+ \left[ \sum_{i=0}^{L} \tanh \left\{ \frac{2t_{s}}{N_{0}} \sqrt{\left(\frac{2E}{T_{b}}\right)} \right. \\
\times \sum_{k=1}^{2M} r(t) p(t - iT_{b} + \hat{\epsilon}) \sin(2\pi f_{c}t + \hat{\theta}) \Big|_{t=iT_{b}\hat{\epsilon}} \right\} \\
\times \left\{ r(t) \sin(2\pi f_{c}t + \hat{\theta}) \Big|_{t=iT_{c} - \hat{\epsilon}} \\
- r(t) \sin(2\pi f_{c}t + \hat{\theta}) \Big|_{t=(i+1)T_{b} - \hat{\epsilon}} \right\} \right] = 0 \quad (24b)$$

These expressions can be considered as error signals; thus the MAP estimate of  $\theta$  at the instant (k+1)T, i.e.  $\hat{\theta}_{k+1}$ , and the MAP estimate of  $\epsilon$  at the instant (k+1)T, i.e.  $\hat{\epsilon}_{k+1}$ , can be evaluated according to

$$\begin{split} \hat{\theta}_{k+1} &= \hat{\theta}_k + K_{\theta} \frac{\partial f(\mathbf{r}/\theta, \boldsymbol{\epsilon})}{\partial \theta} \begin{vmatrix} \theta & \hat{\theta}_k \\ \boldsymbol{\epsilon} & \hat{\boldsymbol{\epsilon}}_k \end{vmatrix} \\ \hat{\boldsymbol{\epsilon}}_{k+1} &= \hat{\boldsymbol{\epsilon}}_k + K_{\boldsymbol{\epsilon}} \frac{\partial f(\mathbf{r}/\theta, \boldsymbol{\epsilon})}{\partial \boldsymbol{\epsilon}} \begin{vmatrix} \theta & \hat{\theta}_k \\ \theta & \hat{\boldsymbol{\epsilon}}_k \end{vmatrix} \end{split}$$

where  $K_{\theta}$  is the carrier loop gain and  $K_{\epsilon}$  is the clock loop gain. These parameters must be determined so as to guarantee both system stability and good



Figure 11. Digital receiver with integrated MAP synchronization and ML demodulation for QPSK signals: block diagram

tracking performance. The configuration of the digital system which jointly estimates  $\theta$  and  $\varepsilon$  suggested by equations (24) and (25) is depicted in Figure 11. The carrier and clock recovery circuit previously described can be 180° out of phase; in order to avoid a high bit error rate, a suitable differential encoding should be introduced in the modulation scheme.

The demodulator for QPSK signals can be integrated in the joint carrier and clock recovery circuit with no increase in overall system complexity. A coherent demodulator for QPSK signals evaluates the correlations of r(t) with the two quadrature carriers  $\cos(2\pi f_c t)$  and  $\sin(2\pi f_c t)$  over a symbol interval. In the digital receiver architecture shown in Figure 11, these two products can be derived at the points B and B'; then, the same hardware is able to perform joint carrier and clock recovery and coherent demodulation with a substantial reduction in the receiver implementation complexity.

The total number of multiplications and additions for the coherent demodulation of QPSK signals by the digital receiver with integrated MAP synchronization and ML DEMOD is

$$M_D = 2(L+1)(2M+3)R$$
 multiplications/second  
 $S_D = [4(L+1)(M+1)+2]R$  additions/second

An important result is that the performance of the joint carrier and clock recovery circuit with integrated coherent demodulation depends only on the erroneous estimate of carrier phase. In Figure 12 the r.m.s. carrier phase error is shown versus



Figure 12. Carrier phase error as function of  $E/N_0$ 

the parameter  $E/N_0$ . From this figure a good phase estimate can be obtained, in particular for  $E/N_0 > 6$  dB.

The performance degradation due to a finiteprecision implementation of the joint carrier and clock recovery circuit must also be taken into account. The finite-precision design of the MAP carrier and recovery circuit with integrated ML DEMOD can be carried out by following the procedure reported in Reference 13. A finite-arithmetic word-length  $b_{\rm e}$  equal to 8 bits can be used to implement the joint carrier and clock recovery circuit and the integrated ML demodulator in order to introduce an overall degradation on a specified bit error rate (i.e.  $10^{-9}$ ) less than or equal to 0.05 dB; in practice a degradation equal to 0.029 dB is introduced.

### 4. SYSTEM DESIGN AND PERFORMANCE

In this section, the design of a MCD is presented for different DEMUX and DEMOD implementation techniques. As the number of channels processed by the DEMUX  $(N_c)$  influences the input sampling frequency, the processing rate and the complexity of the first stage of the MCD are also affected. In particular, a feasible constraint is to require that the input A/D converter sampling frequency (clock) should be close to its maximum possible value. Starting from these considerations, as the design goal we have selected  $N_c$ =8 and  $N_c$ =10 at R=2048 kb/s. A further design is presented for the analytic signal approach in which it is possible to vary the transmission rate R in a suitable range. Therefore, the number of channels  $N_c$  should be such that the variation in transmission rate has the least possible impact on the overall MCD structure. In particular, a feasible constraint is to require the input A/D converter sampling frequency to be held constant at its maximum possible value. Starting from these considerations, as a design goal we have selected  $N_c = 3i$  at R = 4396/i kb/s, with i = 4.8, 16.32.

For the analytic signal approach the high-rate and low-rate low-pass prototypes have been designed as a FIR linear phase filter by using the equi-ripple method. <sup>19</sup> The low-rate low-pass prototype has been designed to include the required pulse-shaping function with a 40 per cent roll-off factor equally shared between the transmitter and the receiver. The implementation complexity in terms of multiplications per second per channel is given in Table I for various values of R and  $N_c$ .

Table I. Implementation complexity for the analytic signal method: R, transmission rate;  $L_{\rm H}$ , high-rate, low-pass filter coefficient number;  $L_{\rm G}$ , low-rate, low-pass filter coefficient number;  $M_{\rm AS}$ , overall number of multiplications per channel per second

| $N_{\rm c}$ | R<br>(kb/s) | $L_{H}$ | $L_{\mathrm{G}}$ | $M_{AS} \times R$ |
|-------------|-------------|---------|------------------|-------------------|
| 8           | 2048        | 35      | 27               | 66.75             |
| 10          | 2048        | 44      | 27               | 73.5              |
| 12          | 1092        | 38      | 27               | 69                |
| 24          | 546         | 79      | 27               | 99.75             |
| 48          | 273         | 167     | 27               | 165.75            |
| 96          | 137         | 351     | 27               | 303.75            |

The implementation of a digital signal processing system requires a finite arithmetic. Although it is possible to conceive and actually implement floatingpoint arithmetic for digital signal processing systems, fixed-point arithmetic implementation will remain the more convenient solution for the near term. The filtering specifications and the DEMUX finiteprecision design have therefore been chosen such that at each demultiplexer output there is a suitable degradation with respect to the input signal-to-noise ratio SNR<sub>i</sub>. The finite-arithmetic word-lengths are given in Table II. In Figures 13 and 14 the degradations for the output signal-to-noise ratio introduced by the digital implementation of the DEMUX are shown as a function of  $E/N_0$ . There is good agreement between the results derived by theoretical analysis and those obtained in computer simulations.

The design of the DEMUX in the polyphase approach is now presented. As for the multistage approach only  $N_{\rm c}$  values that are an integer power of 2 must be used. The low-pass digital filter prototype has been designed as a FIR linear phase filter. The required pulse-shaping function is included in this filter in order to reduce the overall implementation complexity. In Table III the overall DEMUX complexity in terms of multiplication per second per channel  $(M_{\rm FFT})$  is given for various

Table II. Finite-arithmetic DEMUX design for the analytic signal method:  $b_{\rm q}$ , input signal word-length;  $b_{\rm c}$ , filter coefficient word-length;  $b_{\rm m}$ , filter arithmetic word-length;  $b_{\rm a}$ , filter output word-length

| all in basels |          | Input signal quantiz- | Filters $H(fT_{\rm u})$ |             |             | Filters $G(fT_d)$ |             |         |
|---------------|----------|-----------------------|-------------------------|-------------|-------------|-------------------|-------------|---------|
| $N_{\rm c}$   | R (kb/s) | $b_{\rm q}$           | $b_{\rm c}$             | $b_{\rm m}$ | $b_{\rm a}$ | $b_{\rm c}$       | $b_{\rm m}$ | $b_{q}$ |
| 8             | 2048     | 8                     | 12                      | 11          | 8           | 11                | 11          | 8       |
| 10            | 2048     | 8                     | 12                      | 11          | 8           | 11                | 11          | 8       |
| 12            | 1092     | 8                     | 10                      | 14          | 8           | 8                 | 14          | 8       |
| 24            | 546      | 8                     | 10                      | 14          | 8           | 9                 | 14          | 8       |
| 48            | 273      | 8                     | 10                      | 14          | 8           | 11                | 14          | 8       |
| 96            | 137      | 8                     | 10                      | 14          | 8           | 11                | 14          | 8       |



Figure 13.Performance degradation due to a finite-precision implementation of DEMUX ( $N_c$ =8, R=2048 kb/s)



Figure 14. Performance degradation due to a finite-precision implementation of DEMUX ( $N_c$ =10, R=2048 kb/s)

Table III. Implementation complexity for the fast Fourier transform with polyphase network approach: R, transmission rate;  $L_{\rm FT}$ , low-pass prototype coefficients including pulse shaping;  $M_{\rm FFT}$ , overall number of multiplications per channel per second

| $N_{ m c}$ | r<br>(kb/s) | $L_{ m FT}$ | $M_{\mathrm{FFT}} \times R$ |
|------------|-------------|-------------|-----------------------------|
| 8          | 2048        | 224         | 60                          |
| 32         | 137         | 896         | 72                          |

values of  $N_{\rm c}$  and R. The finite-precision design follows the scheme described in detail in Reference 13. It can be shown that in the complexity evaluation of this DEMUX approach the control circuitry for the FFT processor must also be taken into account. From experience, we estimate the amount of FFT processor control circuitry to be about 1·5–2 times the overall number of multiplications per second. This added control complexity has not been included in Table III. In Table IV the finite-arithmetic wordlengths derived to introduce a degradation in the output signal-to-noise ratio of less than 0·05 dB at  $E/N_0=8$  dB are reported for  $N_{\rm c}=8$  and R=2048 kb/s.

Table V gives the overall implementation complexity with the multistage demultiplexer approach for  $N_{\rm c}{=}8$  and  $R{=}2048$  kb/s. In this case the required pulse-shaping function is performed in the DEMUX. The finite-precision design is such as to make

Table IV. Finite-arithmetic demultiplexer design (FFT with polyphase network,  $N_{\rm c}{=}8$ ,  $R{=}2048$  kb/s):  $b_{\rm q}$ , input signal word-length;  $b_{\rm c}$ , filter coefficient word-length;  $b_{\rm m}$ , filter arithmetic word-length;  $b_{\rm a}$ , filter output word-length;  $b_{\rm Fc}$ , FFT coefficients word-length;  $b_{\rm Fm}$ , FFT arithmetic word-length;  $b_{\rm Fa}$ , FFT output word-length

| Input signal   | polyphase filters |             |             | FFT processor |             |                   |
|----------------|-------------------|-------------|-------------|---------------|-------------|-------------------|
| quantization b | $b_{\rm c}$       | $b_{\rm m}$ | $b_{\rm a}$ | $b_{ m Fc}$   | $b_{ m Fm}$ | $b_{\mathrm{Fa}}$ |
| 8              | 12                | 13          | 10          | 11            | 13          | 8                 |

Table V. Implementation complexity for the multistage approach

| $N_{ m c}$ | R (kb/s) | $M_{\mathrm{MS}} \times R$ |
|------------|----------|----------------------------|
| 8          | 2048     | 74.25                      |

losses acceptable. The relevant finite-precision word-lengths are given in Table VI. In Table VII the corresponding actual degradations are given for different values of  $E/N_0$ .

Table VI. Finite-arithmetic demultiplexer design (multistage approach):  $b_{\rm q}$ , input signal word-length;  $b_{\rm c}$ , filter coefficient word-length;  $b_{\rm m}$ , filter arithmetic word-length;  $b_{\rm a}$ , filter output word-length

|             |          | Input signal quantiz- | Half-band filters |             | Pulse-shaping filters |             |             |             |
|-------------|----------|-----------------------|-------------------|-------------|-----------------------|-------------|-------------|-------------|
| $N_{\rm c}$ | R (kb/s) | $b_{\rm q}$           | $b_{\rm c}$       | $b_{\rm m}$ | $b_{\rm a}$           | $b_{\rm c}$ | $b_{\rm m}$ | $b_{\rm q}$ |
| 8           | 2048     | 8                     | 12                | 14          | 8                     | 12          | 14          | 8           |

Table VII. DEMUX loss for  $N_c = 8$  and R = 2048 kb/s (multistage approach)

| $E/N_0$ (dB)           | DEMIX loss (dB) |  |
|------------------------|-----------------|--|
| 8 see la terrangeligen | 0.05            |  |
| 10                     | 0.09            |  |
| 12                     | 0.16            |  |
| 14                     | 0.22            |  |

The design of the digital DEMOD is now discussed. The resulting implementation complexity is independent of the number of channels processed by a single MCD; from equations (17), (19) and (26) it is equal to 19R multiplications per channel per second with  $N_{\rm E}$ =33 for the first approach, and 36R multiplications per channel per second for the joint carrier and clock recovery method with L=2 and M=3. The degradation of the demodulator (DEMOD loss) is caused by the phase jitter introduced by the carrier phase estimate and by a symbol timing offset introduced by the symbol timing estimate.

In order to evaluate the DEMOD loss, the phase jitter and symbol timing offset have been assumed to be independent noise contributions. The loss (in dB) due to phase jitter can be shown to be<sup>20</sup>

$$4.34 \frac{1+2\Gamma}{\beta} \left( 1 + \frac{1+2\Gamma}{2\beta} \right) \tag{27}$$

where  $\Gamma$  is equal to  $E/N_0$ , and  $\beta$  (for moderate to high signal-to-noise ratios) can be assumed to be equal to  $1/\sigma_{\theta}^2$ , with  $\sigma_{\theta}$  the r.m.s. value of the phase error. Simulations have shown that, for the nonlinear estimation method, in the steady state  $\sigma_{\theta} = 5^{\circ}$ . The degradation is shown in Figure 15 as a function of  $E/N_0$  with a finite-precision implementation at 6 bits. It can be noted that an acceptable loss



Figure 15. Degradation due to the carrier recovery circuit (finiteprecision implementation at 6 bits)

(<0.25 dB) can be achieved. The degradation due to the symbol timing offset has been derived by computer simulations. The overall DEMOD losses, also including the effects of a finite-precision implementation, for the two different approaches are 0.33 and 0.03 dB at  $E/N_0 = 8$  dB.

From the previous results the analytic signal approach seems to be the most convenient for implementing a digital DEMUX. This choice has been made on the basis of the following criteria: implementation complexity; possible integration of same demultiplexer and demodulator functions; low sensitivity to a finite-precision implementation; and low implementation degradations.

The analytic signal method has a further advantage over the other approaches: its suitability for the baseline and enhanced modes of operation, i.e. variable bit rate or number of channels processed by a single MCD.

For the DEMOD implementation the most suitable approach seems to be to use the non-linear carrier phase estimate method with a timing error detector. A lower implementation complexity is

achieved, together with good tracking performance and lower sensitivity to a finite-precision implementation. Further, it is possible to integrate some DEMOD function in the low-rate stage of the DEMUX, as for example the separation of the inphase and quadrature components for the received QPSK signal. An implementation block diagram of the proposed MCD system is shown in Figure 16. The overall MCD losses are given in Table VIII for different values of  $E/N_0$ ,  $N_c=8$  and R=2048 kb/s.

The MCD structure shown in Figure 16 represents an attractive solution for processing systems which interface FDMA and TDM links directly. It achieves a good performance with an acceptable on-board system implementation complexity and allows for a high degree of system flexibility.

# 5. CONCLUDING REMARKS ON MCD DESIGN

The design of a completely digital multicarrier-demodulator (MCD) for use of different data rates has been presented.

The value of the channel spacing W that minimizes the overall implementation complexity and guarantees an integer number of samples per symbol has been determined: this value also achieves good spectrum utilization. The number of carriers to be processed by a single MCD influences the sampling frequency and the complexity of the MCD. This

Table VIII. Overall MCD loss for  $N_c = 8$  and R = 2048 kb/s

| $E/N_0$ (dB) | DEMUX loss<br>(dB) | DEMOD loss<br>(dB) | Overall loss<br>(dB) |
|--------------|--------------------|--------------------|----------------------|
| 8            | 0.004              | 0.330              | 0.334                |
| 10           | 0.007              | 0.305              | 0.322                |
| 12           | 0.010              | 0.280              | 0.290                |
| 14           | 0.016              | 0.230              | 0.246                |



Figure 16. Implementation structure of the MCD

number has been chosen with state-of-art technology in mind.

The digital demultiplexer (DEMUX) is a signal processor for translating signals from the FDMA to the TDM format. Three different DEMUX designs have been considered: the analytic signal method, the polyphase network technique and the multistage approach.

The analytic signal method leads to a per channel and high-modular structure having a high degree of flexibility: unlike the other methods, the bandwidth assigned to each channel may be varied as required simply by switching to a suitable new set of filter coefficients and decimation factor. This approach is suitable when it is desired to vary the transmission bit rate and channel bandwidth.

The digital filters required for the DEMUX implementation are linear phase FIR filters; this choice introduces no group delay distortion.

Furthermore, integration of the demultiplexer and demodulator functions is conceivable. In particular, the separation of the in-phase and quadrature components of the received QPSK signal as well as the required pulse-shaping function can be performed by the low-data-rate stage of the demultiplexer.

The digital coherent demodulator has three main parts: the carrier recovery circuit, the clock recovery circuit and the data decision circuit. Two different implementations of the carrier recovery circuit have been considered: a MAP joint carrier and clock estimation method, and a non-linear carrier phase estimation technique together with a symbol timing error detector. The non-linear estimation method has been found to be accurate and less sensitive to a finite-arithmetic implementation, and to require a short, definite acquisition time. This carrier phase estimation approach can be applied to continuousas well as burst-mode carriers.

The clock recovery is by the QPSK direct timingerror detector described in section 3.1. This method has a low implementation complexity and exploits the integration of the pulse-shaping function and separation of the in-phase and quadrature components of the received QPSK signal. Another advantage of this approach is that it works whatever the carrier phase offset, and so the acquisition time is not influenced by the carrier phase error. In particular, it requires only a short acquisition time and so can be applied to continuous and burst-mode

The digital MCD system represents an appropriate solution for advanced digital communication systems directly interfacing FDMA and TDM links on board the satellite. In particular, it is suitable for implementation by custom or semicustom VLSI digital circuits.

#### ACKNOWLEDGEMENT

This work has been partly supported by the European Space Agency under the ESTEC Contract 6096/84/GM(SC).

### REFERENCES

- 1. G. Benelli, E. Del Re, R. Fantacci and F. Mandelli, 'Performance of uplink random-access and downlink TDMA techniques for packet satellite networks', Proc. IEEE, 72, 1583-1593 (1984).
- 2. R. E. Crochiere and L. R. Rabiner, Multirate Digital Signal Processing', Prentice-Hall, Englewood Cliffs, NJ, 1983.
- 3. M. Bellanger, Digital Processing of Signals: Theory and
- Practice, Wiley, Chichester, 1984..
  4. E. Del Re, F. Ronconi, P. Salvi and P. Semenzato, 'Comparison of Non-FFT Methods of TDM-FDM Transmultiplexing', Alta Freq. 51, 9-21 (1982).
- 5. E. Del Re and P. L. Emiliani, 'An analytic signal approach for transmultiplexers: Theory and design', IEEE Trans. Commun., COM-30, 1623-1628 (1982)
- 6. M. G. Bellanger and J. L. Dauguet, 'TDM-FDM transmultiplexer: Digital polyphase and FFT', IEEE Trans. Commun., COM-22, 1199-1205 (1974).
- 7. E. S. Yam and M. D. Redman, 'Development of a 60 channel FDM-TDM transmultiplexer', COMSAT Tech. Rev. 13, 1-55 (1983).
- 8. H. Soheruermann and H. Gockler, 'A comprehensive survey of digital transmultiplexing methods', Proc. IEEE, 69, 1419-1450 (1981).
- 9. V. K. Bhargava, D. Haccaun, R. Matjas and P. P. Nuspl, Digital Communication by Satellite, Wiley, New York, 1984. 10. A. J. Viterbi and A. M. Viterbi, 'Non-linear estimator of
- PSK modulated carrier phase with application to burst digital transmission', IEEE Trans. Inf. Theory, IT-29, 543-551
- 11. F. M. Gardner, 'A BPSK/QPSK timing-error detector for sampled receivers', IEEE Trans. Commun., COM-34, 423-429 (1986).
- E. Del Re and R. Fantacci, 'A digital communication receiver with integrated MAP synchronization and ML demodulation', *Electron. Lett.* **21**, 862–864 (1985). 13. E. Del Re, R. Fantacci, *et al.*, 'Multicarrier demodulator
- design', Estec Contract 6096/84/GM(SC), Final Report, 1986.
- 14. O. Agazzi, C. P. J. Tzeng, D. G. Messerschmitt and D. A. Hodges, 'Timing recovery in digital subscriber loops', IEEE Trans. Commun., COM-33, 558-569 (1985).
- 15. W. C. Lindsey and M. K. Simon, Telecommunication Systems Engineering, Prentice-Hall, Englewood Cliffs, NJ, 1973
- 16. F. M. Gardner, 'On-board processing for mobile-satellite communications', Estec Contract 5889/84/NL/GM, Final Report, 1985
- 17. H. L. Van Trees, Detection, Estimation and Modulation Theory, New York, Wiley, 1968.
- 18. R. W. Booth, 'An illustration of the MAP estimation method for deriving closed loop phase tracking topologies: The MSK signal structure', IEEE Trans. Commun., COM-28, 1137-1142 (1980).
- 19. ASSP Digital Signal Processing Committee, Programs for Digital Signal Processing, IEEE Press, New York, 1979.
- 20. R. Matyas, 'Effect of noisy phase references on coherent detection of FFSK signals', IEEE Trans. Commun., COM-26, 807-815 (1978).