Altera opt 2 #2602

AngelaGonzalezMarino · 2024-11-13T15:37:40Z

The second optimization for Altera FPGA is to move the BHT to LUTRAM. Same as before, the reason why the optimization previously done for Xilinx is not working, is that in that case asynchronous RAM primitives are used, and Altera does not support asynchronous RAM. Therefore, this optimization consists in using synchronous RAM for the BHT.

The main changes to the existing code are:

New RAM module to infer synchronous RAM in altera with 2 independent read ports and one write port (SyncThreePortRam.sv)
Changes in the frontend.sv file: modify input to vpc_i port of BHT, by advancing the address to read, in order to compensate for the delay of synchronous RAM.
Changes in the bht.sv file: This case is more complex because of the logic operations that need to be performed inside the BHT. First, the pc pointed by bht_update_i is read from the memory, modified according to the saturation counter and valid bit, and finally written again in the memory. The prediction output is given based on the vpc_i. With asynchronous memory, the new data written via update_i is available one clock cycle after writing it. So, if vpc_i tries to read the address that was previously written by update_i, everything is fine. However, in the case of synchronous memory there are three clock cycles of latency (one for reading the pc content (read port 1), another one for writing it, and another one for reading in the other port (read port 0)). For this reason, there is the need to adapt the design to these new latency constraints:
- First, there is the need for a delay on the address write of the synchronous RAM, to wait for the previous pc read and store the right modified data.
- Once this is solved, similarly to the FIFO case, there is the need for an auxiliary buffer that will store the data written in the FIFO, allowing to have it available 2 clock cycles after the update_i was valid. This is because after having the correct data, the RAM takes 2 clock cycles until data can be available in the output (one clock cycle for writing and one for reading).
- Finally, there is a multiplexer in the output that permits to deliver the correct prediction providing the data from the update logic (1 cycle of delay), the auxiliary register (2 cycles of delay), or the RAM (3 or more cycles of delay), depending on the delay since the update_i was valid (i.e. written to the memory).

core/frontend/bht.sv

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

github-actions · 2024-11-13T16:29:52Z

❌ failed run, report available here.

github-actions · 2024-11-13T16:36:13Z

❌ failed run, report available here.

github-actions · 2024-11-15T15:02:30Z

❌ failed run, report available here.

AngelaGonzalezMarino added 3 commits November 13, 2024 16:33

bht in synchronous ram

2b25bab

3 port synchronous ram

003960a

verible

f2cc340

AngelaGonzalezMarino requested review from JeanRochCoulon and zarubaf as code owners November 13, 2024 15:37

github-actions bot reviewed Nov 13, 2024

View reviewed changes

core/frontend/bht.sv Outdated Show resolved Hide resolved

Update core/frontend/bht.sv

aafac73

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Merge branch 'openhwgroup:master' into altera_opt_2

9c039ef

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Altera opt 2 #2602

Altera opt 2 #2602

AngelaGonzalezMarino commented Nov 13, 2024 •

edited

Loading

github-actions bot commented Nov 13, 2024

github-actions bot commented Nov 13, 2024

github-actions bot commented Nov 15, 2024

Altera opt 2 #2602

Are you sure you want to change the base?

Altera opt 2 #2602

Conversation

AngelaGonzalezMarino commented Nov 13, 2024 • edited Loading

github-actions bot commented Nov 13, 2024

github-actions bot commented Nov 13, 2024

github-actions bot commented Nov 15, 2024

AngelaGonzalezMarino commented Nov 13, 2024 •

edited

Loading