An MPSoC-Based QAM Modulation Architecture with Run-Time Load-Balancing
© Christos Ttofis et al. 2011
Received: 28 July 2010
Accepted: 15 January 2011
Published: 23 January 2011
QAM is a widely used multilevel modulation technique, with a variety of applications in data radio communication systems. Most existing implementations of QAM-based systems use high levels of modulation in order to meet the high data rate constraints of emerging applications. This work presents the architecture of a highly parallel QAM modulator, using MPSoC-based design flow and design methodology, which offers multirate modulation. The proposed MPSoC architecture is modular and provides dynamic reconfiguration of the QAM utilizing on-chip interconnection networks, offering high data rates (more than 1 Gbps), even at low modulation levels (16-QAM). Furthermore, the proposed QAM implementation integrates a hardware-based resource allocation algorithm that can provide better throughput and fault tolerance, depending on the on-chip interconnection network congestion and run-time faults. Preliminary results from this work have been published in the Proceedings of the 18th IEEE/IFIP International Conference on VLSI and System-on-Chip (VLSI-SoC 2010). The current version of the work includes a detailed description of the proposed system architecture, extends the results significantly using more test cases, and investigates the impact of various design parameters. Furthermore, this work investigates the use of the hardware resource allocation algorithm as a graceful degradation mechanism, providing simulation results about the performance of the QAM in the presence of faulty components.
Quadrature Amplitude Modulation (QAM) is a popular modulation scheme, widely used in various communication protocols such as Wi-Fi and Digital Video Broadcasting (DVB) . The architecture of a digital QAM modulator/demodulator is typically constrained by several, often conflicting, requirements. Such requirements may include demanding throughput, high immunity to noise, flexibility for various communication standards, and low on-chip power. The majority of existing QAM implementations follow a sequential implementation approach and rely on high modulation levels in order to meet the emerging high data rate constraints [1–5]. These techniques, however, are vulnerable to noise at a given transmission power, which reduces the reliable communication distance . The problem is addressed by increasing the number of modulators in a system, through emerging Software-Defined Radio (SDR) systems, which are mapped on MPSoCs in an effort to boost parallelism [6, 7]. These works, however, treat the QAM modulator as an individual system task, whereas it is a task that can further be optimized and designed with further parallelism in order to achieve high data rates, even at low modulation levels.
Designing the QAM modulator in a parallel manner can be beneficial in many ways. Firstly, the resulting parallel streams (modulated) can be combined at the output, resulting in a system whose majority of logic runs at lower clock frequencies, while allowing for high throughput even at low modulation levels. This is particularly important as lower modulation levels are less susceptible to multipath distortion, provide power-efficiency and achieve low bit error rate (BER) [1, 8]. Furthermore, a parallel modulation architecture can benefit multiple-input multiple-output (MIMO) communication systems, where information is sent and received over two or more antennas often shared among many users [9, 10]. Using multiple antennas at both transmitter and receiver offers significant capacity enhancement on many modern applications, including IEEE 802.11n, 3GPP LTE, and mobile WiMAX systems, providing increased throughput at the same channel bandwidth and transmit power [9, 10]. In order to achieve the benefit of MIMO systems, appropriate design aspects on the modulation and demodulation architectures have to be taken into consideration. It is obvious that transmitter architectures with multiple output ports, and the more complicated receiver architectures with multiple input ports, are mainly required. However, the demodulation architecture is beyond the scope of this work and is part of future work.
This work presents an MPSoC implementation of the QAM modulator that can provide a modular and reconfigurable architecture to facilitate integration of the different processing units involved in QAM modulation. The work attempts to investigate how the performance of a sequential QAM modulator can be improved, by exploiting parallelism in two forms: first by developing a simple, pipelined version of the conventional QAM modulator, and second, by using design methodologies employed in present-day MPSoCs in order to map multiple QAM modulators on an underlying MPSoC interconnected via packet-based network-on-chip (NoC). Furthermore, this work presents a hardware-based resource allocation algorithm, enabling the system to further gain performance through dynamic load balancing. The resource allocation algorithm can also act as a graceful degradation mechanism, limiting the influence of run-time faults on the average system throughput. Additionally, the proposed MPSoC-based system can adopt variable data rates and protocols simultaneously, taking advantage of resource sharing mechanisms. The proposed system architecture was simulated using a high-level simulator and implemented/evaluated on an FPGA platform. Moreover, although this work currently targets QAM-based modulation scenarios, the methodology and reconfiguration mechanisms can target QAM-based demodulation scenarios as well. However, the design and implementation of an MPSoC-based demodulator was left as future work.
While an MPSoC implementation of the QAM modulator is beneficial in terms of throughput, there are overheads associated with the on-chip network. As such, the MPSoC-based modulator was compared to a straightforward implementation featuring multiple QAM modulators, in an effort to identify the conditions that favor the MPSoC implementation. Comparison was carried out under variable incoming rates, system configurations and fault conditions, and simulation results showed on average double throughput rates during normal operation and ~25% less throughput degradation at the presence of faulty components, at the cost of approximately 35% more area, obtained from an FPGA implementation and synthesis results. The hardware overheads, which stem from the NoC and the resource allocation algorithm, are well within the typical values for NoC-based systems [11, 12] and are adequately balanced by the high throughput rates obtained.
The rest of this paper is organized as follows. Section 2 briefly presents conventional QAM modulation and discusses previous related work. Section 3 presents the proposed QAM modulator system and the hardware-based allocation algorithm. Section 4 provides experimental results in terms of throughput and hardware requirements, and Section 5 concludes the paper.
2. Background-Related Work
2.1. QAM Modulator Background
The phase accumulator addresses the sine/cosine LUTs, which convert phase information into values of the sine/cosine wave (amplitude information). The outputs of the sine and cosine LUTs are then multiplied by the words I and Q, which are both filtered by FIR filters before being multiplied to the NCO outputs. Typically, Raised Cosine (RC) or Root-Raised Cosine (RRC) filters are used. Filtering is necessary to counter many problems such as the Inter Symbol Interference (ISI) , or to pulse shape the rectangular I, Q pulses to sinc pulses, which occupy a lower channel bandwidth .
2.2. Related Work
Most of the existing hardware implementations involving QAM modulation/demodulation follow a sequential approach and simply consider the QAM as an individual module. There has been limited design exploration, and most works allow limited reconfiguration, offering inadequate data rates when using low modulation levels [2–5]. The latter has been addressed through emerging SDR implementations mapped on MPSoCs, that also treat the QAM modulation as an individual system task, integrated as part of the system, rather than focusing on optimizing the performance of the modulator [6, 7]. Works in [2, 3] use a specific modulation type; they can, however, be extended to use higher modulation levels in order to increase the resulting data rate. Higher modulation levels, though, involve more divisions of both amplitude and phase and can potentially introduce decoding errors at the receiver, as the symbols are very close together (for a given transmission power level) and one level of amplitude may be confused (due to the effect of noise) with a higher level, thus, distorting the received signal . In order to avoid this, it is necessary to allow for wide margins, and this can be done by increasing the available amplitude range through power amplification of the RF signal at the transmitter (to effectively spread the symbols out more); otherwise, data bits may be decoded incorrectly at the receiver, resulting in increased bit error rate (BER) [1, 8]. However, increasing the amplitude range will operate the RF amplifiers well within their nonlinear (compression) region causing distortion. Alternative QAM implementations try to avoid the use of multipliers and sine/cosine memories, by using the CORDIC algorithm [4, 5], however, still follow a sequential approach.
Software-based solutions lie in designing SDR systems mapped on general purpose processors and/or digital signal processors (DSPs), and the QAM modulator is usually considered as a system task, to be scheduled on an available processing unit. Works in [6, 7] utilize the MPSoC design methodology to implement SDR systems, treating the modulator as an individual system task. Results in  show that the problem with this approach is that several competing tasks running in parallel with QAM may hurt the performance of the modulation, making this approach inadequate for demanding wireless communications in terms of throughput and energy efficiency. Another particular issue, raised in , is the efficiency of the allocation algorithm. The allocation algorithm is implemented on a processor, which makes allocation slow. Moreover, the policies used to allocate tasks (random allocation and distance-based allocation) to processors may lead to on-chip contention and unbalanced loads at each processor, since the utilization of each processor is not taken into account. In , a hardware unit called CoreManager for run-time scheduling of tasks is used, which aims in speeding up the allocation algorithm. The conclusions stemming from  motivate the use of exporting more tasks such as reconfiguration and resource allocation in hardware rather than using software running on dedicated CPUs, in an effort to reduce power consumption and improve the flexibility of the system.
This work presents a reconfigurable QAM modulator using MPSoC design methodologies and an on-chip network, with an integrated hardware resource allocation mechanism for dynamic reconfiguration. The allocation algorithm takes into consideration not only the distance between partitioned blocks (hop count) but also the utilization of each block, in attempt to make the proposed MPSoC-based QAM modulator able to achieve robust performance under different incoming rates of data streams and different modulation levels. Moreover, the allocation algorithm inherently acts as a graceful degradation mechanism, limiting the influence of run-time faults on the average system throughput.
3. Proposed System Architecture
3.1. Pipelined QAM Modulator
In the proposed QAM modulation system, the LUTs have a constant number of 1024 entries. The value of M can vary during operation, as shown in Figure 2. The maximum number of pipeline stages is determined by the overall hardware budget. In this work, we used 16 pipeline stages, hence the value of M can be greater than or equal to 64.
3.2. MPSoC-Based QAM Modulator
Next, we used MPSoC design methodologies to map the QAM modulator onto an MPSoC architecture, which uses an on-chip, packet-based NoC. This allows a modular, "plug-and-play" approach that permits the integration of heterogeneous processing elements, in an attempt to create a reconfigurable QAM modulator. By partitioning the QAM modulator into different stand-alone tasks mapped on Processing Elements (PEs), we construct a set of stand-alone basic components necessary for QAM modulation. This set includes a Stream-IN PE, a Symbol Mapper PE, an FIR PE, and a QAM PE. Multiple instances of these components can then be used to build a variety of highly parallel and flexible QAM modulation architectures.
The Stream-IN PEs receive input data from the I/O ports and dispatch data to the Symbol Mapper PEs. The NIs of the Stream-IN PEs assemble input data streams in packets, which contain also the modulation level n and the phase increment M, given as input parameters. By utilizing multiple Stream-IN PEs, the proposed architecture allows multiple transmitters to send data at different data rates and carrier frequencies. The packets are then sent to one of the possible Symbol Mapper PEs, to be split into symbols of I and Q words. The Symbol Mapper PEs are designed to support 16, 64, 256, 1024, and 4096 modulation levels. I and Q words are then created and packetized in the Symbol Mapper NIs and transmitted to the corresponding FIR PEs, where they are pulse shaped. The proposed work implements different forms of FIR filters such as transpose filters, polyphase filters and filters with oversampling. The filtered data is next sent to QAM PEs (pipelined versions). The modulated data from each QAM PE are finally sent to a D/A converter, before driving an RF antenna.
The proposed modulator can be used in multiple input and multiple output (MIMO) communication systems, where the receiver needs to rearrange the data in the correct order. Such a scenario involves multiple RF antennas at the output (used in various broadcasting schemes [9, 10]) and multiple RF antennas at the input (receiver). The scope of MIMO systems and data rearrangement is beyond this paper however; we refer interested readers to [9, 10]. Alternatively, the resulting parallel streams can be combined at the output resulting in a system whose majority of logic runs at lower clock frequencies, while achieving high throughput.
Under uniform input streams (i.e., all inputs receive the same data rate), each source PE has a predetermined destination PE with which it communicates, and the system functions as multiple pipelined QAM modulators. In the probable case, however, that the incoming data stream rate at one (or possibly more) input port is much higher than the incoming data stream rate of the other input ports, the MPSoC-based modulator allows inherent NoC techniques such as resource allocation stemming from the use of the on-chip network, to divert data streams to less active PEs, and improve the overall throughput of the system. A source PE can select its possible destination PEs from a set of alternative, but identical in operation, PEs in the system, rather than always communicating with its predetermined destination PE. This is facilitated by integrating a dynamic allocation algorithm inside the NIs of each PE called Network Interface Resource Allocation (NIRA), a contribution of this paper. The NIRA algorithm chooses the next destination PE and is described in the following subsection.
3.3. NIRA Resource Allocation Algorithm
The resource allocation algorithm proposed in this work relies on a market-based control technique . This technique proposes the interaction of local agents, which we call NIRA (Network Interface Resource Allocation) agents, through which a coherent global behavior is achieved . A simple trading mechanism is used between those local agents, in order to meet the required global objectives. In our case, the local agents are autonomous identical hardware distributed across the NIs of the PEs. The hardware agents exchange minimal data between NIs, to dynamically adjust the dataflow between PEs, in an effort to achieve better overall performance through load balancing.
This global, dynamic, and physically distributed resource allocation algorithm ensures low per-hop latency under no-loaded network conditions and manageable growth in latency under loaded network conditions. The agent hardware monitors the PE load conditions and network hop count between PEs, and uses these as parameters based on which the algorithm dynamically finds a route between each possible pair of communicating nodes. The algorithm can be applied in other MPSoC-based architectures with inherent redundancy due to presence of several identical components in an MPSoC.
The proposed NIRA hardware agents have identical structure and functionality and are distributed among the various PEs, since they are part of every NI as shown in Figure 4. NIRA is instantiated with a list of the addresses of its possible source PEs and stores the list in its Send Unit Register File (SURF). It also stores the hop count distances between its host PE and each of its possible source PEs (i.e., PEs that send QAM data to that particular PE). Since the mapping of PEs and their addresses is known at design time, SURF can be loaded at design time for all the NIRA instances.
While NIRA is dynamically executed at run-time, it is still important to initially map the processing elements of the QAM system on the MPSoC, in such a way that satisfies the expected operation of the QAM. This can be done by mapping algorithms, such as the ones proposed in [20, 21]. After the initial placement of PEs into the network, the decision about the destination PE for a source PE is made by the NIRA algorithm. NIRA is particularly useful in cases of network congestion that is mainly caused by two factors: the incoming rate of data at Stream-IN PEs and the level of modulation at Symbol Mapper PEs.
We next provide an example that illustrates the efficiency of NIRA under a congestion scenario, which is created when using different modulation levels at Symbol Mapper PEs. Consider the architecture shown in Figure 3 and assume that the Symbol Mapper PE at location (1,1) uses a modulation level of 16, while the remaining Symbol Mapper PEs use a modulation level of 256. When the incoming rate of data at Stream-IN PEs is constant (assume 32 bits/cycle), congestion can be created at the link between router (0,1) and router (1,1). This is because the Symbol Mapper PE at (1,1) splits each 32-bit input into more symbols (8 symbols for 16-QAM compared to 4 symbols for 256-QAM). In this case, the incoming rate of streams at Stream-IN PE (0,1) could be lowered to match the rate at which the data is processed by the Symbol Mapper PE (1,1) in order not to lose data. However, our solution to this problem is not to lower the incoming rate, but to divert data from Stream-IN PE (0,1) to the less active Symbol Mapper PEs (1,0), (1,2), or (1,3). This is possible through the integration of the NIRA allocation algorithm inside the NIs of the PEs. When the NI of the Stream-IN PE (0,1) receives the load condition of all possible destination PEs (Symbol Mapper PEs), NIRA algorithm is run to decide the next destination Symbol Mapper PE. The algorithm takes into consideration the received load conditions as well as the hop count distances between Stream-IN PE (0,1) and the Symbol Mapper PEs and solves (6) and (7) to select the next destination PE. In this example, since the rate of Stream-IN PEs (0,0), (0,2), and (0,3) is equal, the utilization of Symbol Mapper PEs (1,0), (1,2), and (1,3) will almost be equal, and therefore, the next Symbol Mapper PE for the Stream-IN PE (0,1) will be selected according to the hop count distance. Symbol Mapper PEs (1,0) and (1,2) are more likely to be selected since they are closer to the Stream-IN PE (0,1).
4. Experimental Results
4.1. Experimental Platform and Methodology
The performance of the proposed QAM communication system was evaluated using an in-house, cycle-accurate, on-chip network and MPSoC simulator [22, 23]. The simulator was configured to meet the targeted QAM modulation architecture and the behavior of each QAM component. The NIRA agents were also integrated. The individual components of the proposed system, as well as the conventional and pipelined QAM modulators, were implemented on a Xilinx Virtex-5 LX110T FPGA in order to derive comparative area results.
MPSoC-based system configuration.
MPSoC and NoC parameters
No. of VCs
No. of Stream-IN PEs
No. of S. Mapper PEs
Link data width
No. of FIR PEs
No. of flits per packet
No. of QAM PEs
We evaluated the targeted QAM architectures using different incoming rates of data streams at Stream-IN PEs, in order to compare the architectures in terms of performance (throughput). For each different data stream, we also explored the impact of NIRA parameters L and K on the overall system performance, by varying their values (given that ) and determining the values that yielded the best performance. The exploration of and parameters was carried out using floating point values during simulation but was rounded to the nearest power of 2 for hardware mapping purposes.
Lastly, we studied the impact of NIRA as a graceful degradation mechanism, by randomly creating fault conditions inside the QAM, where a number of PEs experience failures. Again, we compared the MPSoC-based architecture (with NIRA) to its equivalent system that integrates multiple pipelined QAM instances. We measured the average throughput of both architectures and observed their behavior under different fault conditions and fault injection rates.
4.2. Performance Results
Conventional versus pipelined QAM modulator.
Description of example cases used for simulation.
Deterministic cases: constant interarrival times
Stream-IN PE 0
Stream-IN PE 1
Stream-IN PE 2
Stream-IN PE 3
Random cases: mean values of stream interarrival times
Stream-IN PE 0
Stream-IN PE 1
Stream-IN PE 2
Stream-IN PE 3
The above analysis shows that the MPSoC-based ( ) system outperforms its equivalent system that integrates four instances of the pipelined QAM modulator. In particular, as the number of data streams increases and the number of available QAM components increases, the MPSoC-based architecture will be able to handle the increased data rate requirements and various input data rates, taking full advantage of the load-balancing capabilities of the NIRA algorithm. These capabilities are explained in the next section.
4.3. NIRA Parameters Exploration
Simulation results for the deterministic cases (Case D.1 to Case D.4) indicate that the parameters that returned the maximum throughput are the combinations (0.6–0.4) or (0.4–0.6), shown in Figure 8(a). Since those cases are relatively symmetric (in terms of the data rates per Stream-IN PE), the anticipated impact of both parameters is relatively equal in this case. If we only take the free slots parameter, , into account, the performance degrades, whereas when we only take the hop count parameter, , into account, the data rate is adequate only in Case D.1, since this case involves uniform data rate at all inputs. It is important to note, however, that the above observations reflect only on the example cases; for the random cases (Figure 8(b)), simulation results showed that the optimal NIRA parameters are not always the combinations (0.6–0.4) or (0.4–0.6), suggesting that for other data rates, possibly targeting a specific application, new simulations will be necessary to determine the optimal values of and .
Correspondingly, NIRA parameters need to be explored when using different network sizes as well. As network size increases, potential destination PEs can be in a long distance from their source PEs, which adds significant communication delays. In such cases, it may be better to wait in a blocking state until some slots of the destination PEs' queue become available, rather than sending data to an alternative PE that is far away; the delay penalty due to network-associated delays (i.e., router, crossbar, buffering), involved in sending the packet to the alternative PE, may be more than the delay penalty due to waiting in the source PE until the original destination PE becomes eligible to accept new data. It is therefore more reasonable to give more emphasis on NIRA's parameter, in order to reduce the communication delays and achieve the maximum possible throughput.
4.4. NIRA as a Graceful Performance Degradation Mechanism
Besides its advantage in dynamically balancing the load in the presence of loaded network conditions, NIRA can also be beneficial in the presence of faulty PEs, acting as a graceful degradation mechanism. To investigate this, we used a simulation-based fault injection methodology, assuming that faults occur according to a random distribution. Without loss of generality, we assumed that faults affect a whole PE only, and the remaining system, including the interconnection network and the NIs, is fault free.
To illustrate the impact of NIRA as a graceful degradation mechanism, we first compared the MPSoC-based architecture (with NIRA) to the architecture with 4 pipelined QAM instances. Performance simulations were first done for the system configuration listed in Table 1, with up to 4 out of the 16 PEs being subject to faults. The type and ID number of the faulty PEs were selected randomly based on uniform distribution, while the time of occurrence of a failure was assumed to be a random variable with the corresponding distribution being exponential with mean . The stream arrivals at Stream-IN PEs were Poison processes with equal rates (Case R.5) in order to study only the influence of NIRA as a graceful degradation mechanism and not as a mechanism for dynamic load balancing. We run the simulator for 106 clock cycles and compared the throughput of the MPSoC-based system with NIRA against the system with the 4 pipelined QAM instances, under the same number and type of faults.
Figure 11(b) illustrates how both systems behave in the presence of the same component failures (the 4 faults injected during simulation), by showing the throughput between successive fault occurrences. Obviously, the two systems experience different behavior as they follow different forms of failure models. The multi-pipelined system follows the single-failure model , where the lack of reconfiguration causes an entire QAM instance to fail in the presence of one individual component failure inside the QAM instance. The proposed system, on the other hand, takes advantage of the NoC architecture and follows the compound-failure model , where all components (PEs) from the same set of PEs must fail in order for the entire system to fail. As can be seen from Figure 11(b), the system without NIRA presents higher degradation rates, since each component failure causes an entire QAM instance to stop working and decreases the throughput significantly.
It must be noted that when a new fault occurs in a component which is part of an already failed QAM instance in the 4 pipelined QAM instances, the throughput is not decreased as the instance is already off-line. One example of such scenario is shown in Figure 11(b) when the fourth fault is injected, as it happened to affect a PE of an already failed QAM instance. In the MPSoC-based system, each fault does cause a throughput drop; however, this drop is minimal, as the NIRA algorithm acts as graceful degradation mechanism, forwarding the traffic destined to the faulty components to less utilized and active PEs of the same type. As a result NIRA exhibits better performance degradation.
Graceful degradation happens also in extreme scenarios; as such, we simulated 8 QAM modulators partitioned into an NoC (8 PEs per type), using higher fault injection rates (14 out of the 32 PEs fail). We followed the same comparison methodology, comparing that system against a system consisting of 8 pipelined QAM instances, in order to investigate how the two systems behave in such extremes. We evaluated two different deterministic (in terms of fault location) cases labeled Case 1 and Case 2 of fault injection schemes, each of which aims in creating different failure conditions in the systems. Case 1 was constructed in such a way as to show the best case scenario of the MPSoC-based system; this is the case where at least one PE out of the four different types of PEs that make up a QAM modulator (or equivalently, one component inside each QAM instance) fails. This case implies that when a new fault occurs, an entire QAM instance in the multi-pipelined system will be marked as faulty. Case 2, on the other hand, constitutes the worst case scenario for the MPSoC-based system, where failures occur mostly on PEs of the same type. An example scenario is given, assuming that all except one FIR PE fail. This creates a bottleneck for the MPSoC system, as all data generated by the Symbol Mapper PEs must be forwarded towards the working FIR PE, creating conditions equivalent to those in a single pipelined QAM modulator instance.
Conclusively, the results stemming from the above simulations confirm the applicability and efficiency of NIRA as a graceful degradation mechanism, even for large network sizes and different failure conditions. The proposed system can tolerate more faults compared to the multiple-pipelined one, mainly due to its ability to dynamically reconfigure itself in the presence of faulty components, limiting the influence of PE failures on the average system throughput.
4.5. Synthesis Results
Slice LUTs 69120
Slice Reg. 69120
DSP48E out of 64
DSP48E ratio (%)
0 DSP48E out of 64
NI w/NIRA agent
NI w/o NIRA agent
NoC 4 × 4
0 DSP48E out of 64
16 DSP48E out of 64
FIR 16 taps
0 DSP48E out of 64
Symbol Mapper PE
0 DSP48E out of 64
FIR PE − transpose
16 DSP48E out of 64
4 × 4 MPSoC-based QAM Modulator
MPSoC-based system w/NIRA
5. Conclusion and Future Work
This paper presented a parallel MPSoC-based reconfigurable QAM modulation system, developed using MPSoC design methodologies. The proposed architecture provides high data rates even at lower modulation levels and can therefore provide higher noise immunity. The MPSoC-based system also achieves higher data rates compared to its equivalent system with multiple pipelines, mainly due to resource sharing and reconfiguration. The MPSoC system features a hardware-based resource allocation algorithm (NIRA), for dynamic load balancing, which makes the system able to detect emerging network congestion cases and adjust system operation. This is especially useful in cases where the QAM components will function as part of a larger, complete SoC-based radio communication system, running several radio applications in parallel, where the network will facilitate an array of application traffic. Moreover, NIRA algorithm can offer graceful performance degradation as well, due to its ability to inherently monitor the operational status of the system's components and adjust the behavior of the system accordingly. Such behavior is usually implemented at the system level, while the NIRA agents allow this to be integrated in the hardware itself.
Future work includes integration of Fast Fourier Transform (FFT) and Forward Error Correction (FEC) PEs as well, in order to make the system applicable to a variety of other radio standards. Moreover, we are exploring algorithm-specific optimization techniques for area and power reductions, at both the network on-chip level as well as the PEs. Additionally, we plan to apply MPSoC-based design flow and design methodologies to develop a parallel QAM demodulator that will also integrate the NIRA allocation algorithm.
- Webb WT, Hanzo L: Modern Quadrature Amplitude Modulation: Principles and Applications for Fixed and Wireless Channels. Wiley-IEEE Press, New York, NY, USA; 1994.Google Scholar
- Koukourlis CS: Hardware implementation of a differential QAM modem. IEEE Transactions on Broadcasting 1997,43(3):281-287. 10.1109/11.632929View ArticleGoogle Scholar
- Tariq MF, Nix A, Love D: Efficient implementation of pilot-aided 32 QAM for fixed wireless and mobile ISDN applications. Proceedings of the Vehicular Technology Conference (VTC '00), May 2000, Tokyo, Japan 1: 680-684.Google Scholar
- Vankka J, Kosunen M, Hubach J, Halonen K: A CORDIC-based multicarrier QAM modulator. Proceedings of the IEEE Global Telecommunications Conference (GLOBECOM '99), December 1999, Rio de Janeireo, Brazil 1: 173-177.Google Scholar
- Banerjee A, Dhar AS: Novel architecture for QAM modulator-demodulator and its generalization to multicarrier modulation. Microprocessors and Microsystems 2005,29(7):351-357. 10.1016/j.micpro.2005.02.001View ArticleGoogle Scholar
- Schelle G, Fifield J, Grunwald D: A software defined radio application utilizing modern FPGAs and NoC interconnects. Proceedings of the International Conference on Field Programmable Logic and Applications (FPL '07), August 2007, Amsterdam, The Netherlands 177-182.Google Scholar
- Limberg T, et al.: A heterogeneous MPSoC with hardware supported dynamic task scheduling for software defined radio. Proceedings of the Design Automation Conference (DAC '09), July 2009, San Francisco, Calif, USAGoogle Scholar
- HEWLETT® PACKARD : Digital Modulation in Communications Systems—An Introduction, Application Note 1298. 1997, http://www.hpmemory.org/an/pdf/an_1298.pdf
- Biglieri E, Calderbank R, Constantinides A, Goldsmith A, Paulraj A: MIMO Wireless Communications. Cambridge University Press, New York, NY, USA; 2007.View ArticleGoogle Scholar
- Catreux S, Erceg V, Gesbert D, Heath RW: Adaptive modulation and MIMO coding for broadband wireless data networks. IEEE Communications Magazine 2002,40(6):108-115. 10.1109/MCOM.2002.1007416View ArticleGoogle Scholar
- Ogras UY, Marculescu R, Lee HG, Choudhary P, Marculescu D, Kaufman M, Nelson P: Challenges and promising results in NoC prototyping using FPGAs. IEEE Micro 2007,27(5):86-95.View ArticleGoogle Scholar
- Vangal S, Howard J, Ruhl G, Dighe S, Wilson H, Tschanz J, Finan D, Iyer P, Singh A, Jacob T, Jain S, Venkataraman S, Hoskote Y, Borkar N: An 80-Tile 1.28TFLOPS network-on-chip in 65nm CMOS. In Proceedings of the 54th IEEE International Solid-State Circuits Conference (ISSCC '07), February 2007. IEEE CS Press; 98-100.Google Scholar
- Simon H: Communications Systems. 3rd edition. John Wiley & Sons, Toronto, Canada; 1994.Google Scholar
- Lathi BP: Modern Digital and Analog Communication Systems. 3rd edition. Oxford University Press, New York, NY, USA; 1998.Google Scholar
- Goldberg BG: Digital Techniques in Frequency Synthesis. McGraw-Hill, New York, NY, USA; 1996.Google Scholar
- Meyer-Baese U: Digital Signal Processing with Field Programmable Gate Arrays. 2nd edition. Springer, New York, NY, USA; 2004.MATHView ArticleGoogle Scholar
- Shannon CE: Communication in the presence of noise. Proceedings of the IEEE 1998,86(2):447-457. 10.1109/JPROC.1998.659497View ArticleGoogle Scholar
- Clearwater SH: Market-Based Control: A Paradigm for Distributed Resource Allocation. World Scientific Publishing, River Edge, NJ, USA; 1996.View ArticleGoogle Scholar
- Chavez A, Moukas A, Maes P: Challenger: a multi-agent system for distributed resource allocation. Proceedings of the 1st International Conference on Autonomous Agents, February 1997 323-331.View ArticleGoogle Scholar
- Murali S, De Micheli G: Bandwidth-constrained mapping of cores onto NoC architectures. Proceedings of the Design, Automation and Test in Europe (DATE '04), February 2004 2: 896-901.Google Scholar
- Tornero R, Orduna JM, Palesi M, Duato J: A communication-aware task mapping technique for NoCs. Proceedings of the 2nd Workshop on Interconnection Network Architectures: On-Chip, Multi-Chip, January, 2008, Goteborg, SwedenGoogle Scholar
- Ttofis C, Theocharides T: A C++ simulator for evaluting NoC communication backbones. Proceedings of the 3rd Greek National Student Conference of Electrical and Computer Engineering, April 2009, Thessaloniki, Greece 54.Google Scholar
- Ttofis C, Kyrkou C, Theocharides T, Michael MK: FPGA-based NoC-driven sequence of lab assignments for manycore systems. Proceedings of the IEEE International Conference on Microelectronic Systems Education (MSE '09), July 2009 5-8.Google Scholar
- Ross S: Introduction to Probability Models. Academic Press, New York, NY, USA; 2003.MATHGoogle Scholar
- Pham H (Ed): Springer Handbook of Engineering Statistics. Springer; 2006.MATHGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.