USING VHDL ONLY*I DO NOT HAVE THE SOFTWARE OR AN ACCESS CODE*Please submit a single PDF containing the following:1. Your code of the following:• controller.vhd • datapath.vhd• prefix_tb1.vhd •...


USING VHDL ONLY

*I DO NOT HAVE THE SOFTWARE OR AN ACCESS CODE*


Please submit a single PDF containing the following:

1. Your code of the following:

• controller.vhd • datapath.vhd

• prefix_tb1.vhd • prefix_tb2.vhd • prefix_tb3.vhd

2. The screenshots listed below. For the waveform screenshots, please clearly show a, b, m, reset, clk, ready, LDX, SDX, load, shift[1:0], state, and R1 - R4. Some of these will need to be added to the waveform from the controller / datapath modules.

• The output waveform from prefix_tb1.vhd from 0ns to 100ns.

• The output waveform from prefix_tb2.vhd from 0ns to 130ns

• The output waveform from prefix_tb3.vhd from 240ns to 320ns

• The TCL consoles from the above 3 test benches. Make sure that your report messages and clock cycle numbers are clearly shown.

7




University of Connecticut ECE 3401 / CSE 3302: Digital Systems Design Spring 2023 Programming Assignment 3: Hardware Implementation of Prefix Sum Due April 7, 2023 (Friday) @ 11:59 PM on HuskyCT 1 Introduction The prefix sum is a useful primitive with many applications in engineering and computer science. It is computed very simply: Given an input sequence a = {a1, a2, ..., an}, the prefix sum of a is the sequence b = {b1, b2, ..., bn} where bi = { ai i = 1 ai + bi−1 1 < i="" ≤="" n="" this="" makes="" the="" prefix="" sum="" a="" sort="" of="" "running="" total"="" algorithm;="" as="" we="" scan="" a,="" we="" accumulate="" its="" sum="" in="" b.="" in="" this="" assignment,="" you="" will="" design="" hardware="" that="" computes="" the="" prefix="" sum="" of="" an="" input="" sequence.="" there="" are="" three="" main="" parts="" in="" this="" design:="" 1.="" the="" moore="" state="" machine="" design,="" which="" is="" responsible="" for="" control="" signals.="" 2.="" the="" synchronous="" data="" path="" design,="" which="" is="" responsible="" for="" computing="" parts="" of="" the="" output="" sequence.="" 3.="" the="" top="" level="" design,="" which="" incorporates="" the="" design="" and="" serves="" as="" i/o="" to="" the="" test="" benches.="" the="" state="" machine="" module="" is="" contained="" in="" controller.vhd,="" the="" datapath="" is="" contained="" in="" datapath.vhd,="" and="" the="" top="" level="" module="" is="" pa3.vhd.="" additionally,="" array.vhd="" contains="" the="" definition="" for="" the="" array_32="" type,="" which="" is="" used="" as="" the="" input/output="" of="" pa3.vhd="" (you="" should="" not="" modify="" this="" file).="" there="" are="" three="" test="" benches:="" prefix_tb1.vhd,="" prefix_tb2.vhd,="" and="" prefix_tb3.vhd.="" more="" details="" will="" be="" provided="" later="" on="" each="" module.="" goals="" of="" this="" assignment="" are="" to="" implement="" the="" design="" using="" vhdl:="" •="" implement="" the="" datapath="" design.="" •="" implement="" a="" moore="" state="" machine="" to="" provide="" control="" to="" the="" datapath.="" •="" evaluate="" design="" correctness="" and="" performance="" using="" test="" benches.="" 2="" background="" calculating="" the="" prefix="" sum="" sequentially="" is="" a="" trivial="" task.="" let="" us="" assume="" that="" a="" and="" b="" are="" implemented="" as="" n="" length="" arrays="" for="" our="" algorithm.="" we="" could="" use="" the="" following="" algorithm="" to="" compute="" the="" prefix="" sum="" sequentially:="" algorithm="" 1="" sequential="" prefix="" sum="" procedure="" prefixseq(a[1="" :="" n],="" n)="" b[1]←="" a[1]="" for="" i="2" to="" n="" do="" b[i]←="" b[i−="" 1]="" +="" a[i]="" end="" for="" return="" b[1="" :="" n]="" end="" procedure="" 1="" since="" we="" are="" iterating="" through="" the="" sequence="" once,="" this="" algorithm="" is="" o(n)="" in="" terms="" of="" the="" sequence="" length="" n.="" however,="" assuming="" that="" addition="" and="" position="" shifting="" can="" be="" done="" in="" parallel,="" we="" can="" increase="" the="" runtime="" efficiency="" (we="" assume="" that="" n="" is="" a="" power="" of="" 2="" for="" simplicity):="" algorithm="" 2="" parallel="" prefix="" sum="" procedure="" prefixparallel(a[1="" :="" n],="" n)="" b[1="" :="" n]←="" a[1="" :="" n]="" for="" i="0" to="" log="" n−="" 1="" do="" b[1="" :="" n]←="" b[1="" :="" n]="" +="" rshift(b[1="" :="" n],="" 2i,="" n)="" end="" for="" return="" b[1="" :="" n]="" end="" procedure="" procedure="" rshift(x[1="" :="" n],="" k,="" n)="" y[1="" :="" k]←="" 0="" for="" i="k" +="" 1="" to="" n="" do="" y[i]←="" x[i−="" k]="" end="" for="" return="" y[1="" :="" n]="" end="" procedure="" as="" you="" can="" see,="" this="" parallel="" algorithm="" is="" only="" o(log="" n)="" time,="" but="" requires="" n="" adders,="" shifters,="" and="" registers="" rather="" than="" 1,="" as="" well="" as="" hardware="" that="" can="" "shift="" over"="" register="" values="" in="" constant="" time.="" this="" is="" clearly="" difficult="" to="" scale="" as="" n="" increases.="" it="" may="" appear="" that="" we="" must="" choose="" between="" either="" hardware="" area="" and="" speed="" of="" computation,="" but="" in="" our="" design,="" we="" will="" strike="" a="" balance="" between="" the="" two:="" we="" will="" create="" a="" sequential="" implementation="" that="" uses="" elements="" of="" the="" parallel="" adder="" to="" give="" a="" speedup="" over="" the="" classic="" sequential="" implementation.="" if="" we="" have="" access="" to="" n="" adder/shifter/register="" elements,="" and="" the="" total="" array="" size="" is="" now="" m,="" we="" can="" compute="" the="" output="" with="" a="" total="" of="" mn="" ·="" (log="" n+="" 1)="" operations="" (accounting="" for="" loading="" input="" values="" into="" registers).="" 3="" design="" overview="" 3.1="" algorithm="" and="" examples="" in="" our="" design,="" you="" will="" implement="" the="" partially-parallel="" algorithm="" to="" compute="" the="" prefix="" sum.="" you="" will="" use="" a="" total="" of="" n="" adder/shifter/register="" elements="" in="" the="" datapath.="" the="" pseudocode="" for="" the="" design="" is="" given="" in="" algorithm="" 3="" (you="" may="" assume="" that="" the="" input="" array="" size="" m="" is="" a="" multiple="" of="" n):="" algorithm="" 3="" partially-parallel="" prefix="" sum="" 1:="" procedure="" partialprefix(a[1="" :="" m],="" n,="" m)="" 2:="" r[1="" :="" n]←="" zeros="" .="" initialize="" an="" empty="" accumulator="" array="" length="" n="" 3:="" for="" i="1" to="" i="m−" n+="" 1="" in="" increments="" of="" n="" do="" 4:="" r[1="" :="" n]←="" a[i="" :="" i+="" n−="" 1]="" +="" [r[n],="" 0,="" .="" .="" .="" ,="" 0]="" .="" parallel="" load="" 5:="" r[1="" :="" n]←="" parallelprefix(r[1="" :="" n],="" n)="" .="" compute="" n="" elements="" in="" parallel="" in="" log="" n="" cycles="" 6:="" if="" i=""> 1 then 7: b[i− n : i− 1]← r[1 : n] . Parallel store of previous iteration output 8: end if 9: end for 10: b[m− n+ 1 : m]← r[1 : n] . Parallel store of final iteration 11: return b[1 : m] 12: end procedure Note that in this implementation, you will be always using 4 parallel adders/shifters/registers (i.e. n = 4), 2 and the input size will always be a multiple of 4. Note that when m > n, lines 4 and 7 are performed in parallel after the first iteration. This enables the design to perform loads and stores to the input and output arrays in parallel. For a better understanding of the design, two examples are shown below. 3.1.1 Example 1: m = 4 Let’s say our input array is a = [1, 2, 3, 4]. We can use the above algorithm to compute the output b. Note that n = 4, m = 4, so the FOR loop will only execute once. • Cycle 1: Parallel load r[1 : 4]← a[1 : 4] + [r[4], 0, 0, 0] = [1, 2, 3, 4] + [0, 0, 0, 0] = [1, 2, 3, 4] • Cycle 2: Shift and add 1 r[1 : 4]← r[1 : 4] + [0, r[1], r[2], r[3]] = [1, 2, 3, 4] + [0, 1, 2, 3] = [1, 3, 5, 7] • Cycle 3: Shift and add 2 r[1 : 4]← [r[1 : 4] + [0, 0, r[1], r[2]] = [1, 3, 5, 7] + [0, 0, 1, 3] = [1, 3, 6, 10] • Cycle 3+: Parallel Store and Termination b[1 : 4]← r[1 : 4] = [1, 3, 6, 10] After the third cycle, b will contain the correct result of b = [1, 3, 6, 10] 3.1.2 Example 2: m = 8 Now lets increase our input size and say a = [1, 2, 3, 4, 5, 6, 7, 8]. Note that n = 4, m = 8, so now the FOR loop will execute twice! • Cycle 1: Parallel load r[1 : 4]← a[1 : 4] + [r[4], 0, 0, 0] = [1, 2, 3, 4] + [0, 0, 0, 0] = [1, 2, 3, 4] • Cycle 2: Shift and add 1 r[1 : 4]← r[1 : 4] + [0, r[1], r[2], r[3]] = [1, 2, 3, 4] + [0, 1, 2, 3] = [1, 3, 5, 7] • Cycle 3: Shift and add 2 r[1 : 4]← [r[1 : 4] + [0, 0, r[1], r[2]] = [1, 3, 5, 7] + [0, 0, 1, 3] = [1, 3, 6, 10] • Cycle 4: Parallel load (and previous iteration store) *r[1 : 4]← a[5 : 8] + [r[4], 0, 0, 0] = [5, 6, 7, 8] + [10, 0, 0, 0] = [15, 6, 7, 8] *b[1 : 4]← r[1 : 4] = [1, 3, 6, 10] *Occurs in parallel • Cycle 5: Shift and add 1 r[1 : 4]← r[1 : 4] + [0, r[1], r[2], r[3]] = [15, 6, 7, 8] + [0, 15, 6, 7] = [15, 21, 13, 15] • Cycle 6: Shift and add 2 r[1 : 4]← r[1 : 4] + [0, 0, r[1], r[2]] = [15, 21, 13, 15] + [0, 0, 15, 21] = [15, 21, 28, 36] • Cycle 6+: Termination b[5 : 8]← r[1 : 4] = [15, 21, 28, 36] After the sixth cycle, b will contain the correct result of b = [1, 3, 6, 10, 15, 21, 28, 36] 3.2 Structure of PA Modules The overall design is split into three components i) The PA3 top module, ii) The state machine controller module, and iii) The datapath module. The overall figure of the PA structure with each module’s ports is shown below: 3 controller.vhd a() m clk reset b() ready pa3.vhd clk reset m datapath.vhd load shift[1:0] a() LDX SDX Test Benches ready_flag R4 D Q clk reset ready clk reset b() Figure 3.1: Overall structure of the assignment In the top module pa3.vhd, the design performs port maps and the ready signal logic. In controller.vhd, a state machine keeps track of the algorithm phase and provides the corresponding control signals (load, shift[1:0], and ready_flag), starting load (LDX) index for input array, and starting store index (SDX) for the output array. In datapath.vhd, the inputs a, clk, and reset are ported from the top module, while the inputs load, shift, LDX, and SDX are generated by the controller. The output b is ported back through the top module to the test benches. 3.3 PA3 Top Module The input array a is initialized in the testbench (using array.vhd). Additionally, the m integer is initialized in the testbench with size of the input array. For example, for an array of four unsigned integers, m=4. You may assume that the size of input arrays are always a multiple of 4. The clk and reset signals are also from the testbench. The output array b is initialized in the testbench, and updated in the datapath synchronously. It contains the prefix sum values as they are computed by the design. The signal ready is written to the testbench to indicate when the computation is completed. It takes the ready_flag from the controller and delays its by a cycle to indicate the testbench that all updates to the output array are complete. The port maps for the controller and datapath are provided, as is the ready signal synchronization process. Therefore, the pa3.vhd module should not be modified. 4 3.4 Datapath In datapath.vhd, the logic for producing the output must be implemented correctly given the control signals load and shift[1:0] produced by the state machine. The structure of the datapath is given below: a() clk reset R2 D Q+ Input Select Logic Input Select Logic load shift[1:0] clk reset R1 D Q+ Input Select Logic Input Select Logic clk reset R4 D Q+ Input Select Logic Input Select Logic clk reset R3 D Q+ Input Select Logic Input Select Logic clk reset R1 R2 R3 R4 b() LDX SDX Figure 3.2: Inner operation of the datapath Four elements at a time are loaded into the datapath from a, with the first being indexed with LDX. As such, a(LDX), a(LDX+1), a(LDX+2), and a(LDX+3) are used to perform a parallel load of the appropriate array indices for parallel prefix sum computations. Additionally, the registers R1, R2, R3, and R4, and the output array b are updated synchronously. The outputs are indexed into the 4 elements of b given by b(SDX), b(SDX+1), b(SDX+2), and b(SDX+3). Within each "Input Select Logic" block, you must use load and shift as MUXing logic to select the correct input into each adder. There are 9 possibilities for each input: i) The indexed array values a(LDX) - a(LDX+3) ii) The four internal register values R1 - R4 or iii) 0 (zero). The load and shift signal values will correspond to a certain operation that needs to be performed (i.e. load from input and accumulate, shift registers by 1 and add, shift registers by 2 and add, or hold register values) based on the state of the controller module (discussed in the next section). The addition block outputs should be combinational with your selected inputs, and the addition block should just be implemented using the "+" operator. When the reset signal is asserted, asynchronously the outputs R1 - R4 are cleared to zero. Otherwise on the rising edge of the clock, the outputs of the adder modules update the registers R1 - R4. Also on the rising edge, the output entries b(SDX) - b(SDX+3) are also updated with current values of registers R1 - R4. 5 3.5 State Machine In the state machine module controller.vhd, the control signals are produced for pa3.vhd and datap- ath.vhd. You are responsible for designing the state machine that produces the datapath control signals with correct transition logic. Guidelines for the state machine are given below: • The state machine module should have one state for each phase (i.e. "Start / First Load", "Load","Shift by 1 and Add", "Shift by 2 and Add", and "Done". You can follow the examples given in section 3. The state definitions along with the "current state" signal will be internal to the controller.vhd module.
Apr 07, 2023
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here