Badwolf wrote: ↑Mon May 09, 2022 12:43 pm
Long one coming up:
Long reply here as well
(Re: about the different DTACK behavior when accessing the PSG?)
do you know the details of the differnces? Would it affect anything other than sound and serial? Ta.)
GLUE delays DTACK a couple of cycles when accessing the PSG. This delay affects both assertion and de assertion of DTACK. That means that GLUE keeps asserting DTACK when the next bus cycle is already started. This is a not a problem for a CPU running at normal speed. By the time the CPU is checking DTACK on the next cycle, GLUE deasserted (the old) DTACK already. But if the CPU is fast enough, then it might happen that it will end the next bus cycle before the target device was ready. As Exxos said, it affected his older accelator at 64 MHz. Not sure if this is your problem at "just" 16MHz.
It doesn't affect the access to the PSG chip itself. It affects the next bus cycle. This was fixed in the STE combo chip and DTACK is deasserted immediately together with AS.
So should I take it as a rule of thumb that input clocks are better going through GCKs than outputs?
It depends on the specific device, or at least on the specific family. For this CPLD family, the XC9500XL, yes, global clocks are only available at pin inputs.
In short, flexibility. In DFB1, for example, the RAMCLK slows down with the CPU clock to ensure synchronisaion during switching (it's a synchronous [STERM-based] RAM cycle there).
I understand the idea, but IMHO, here you are paying a cost too high. This delays RAMCLK for almost a full 66 MHz cycle. The worst part of this is that you have no control of this delay. It might be one cycle or it might be half a cycle. It is very difficult to meet timing with such uncertainty,
The ideal solution for this is to use a device with its own PLL (like a MAX-10). You can then do almost what you want with the clock and still keep the edges aligned. I do realize this might be too much for such a low cost project.
Combinational I/O output is slow. Sometimes you can't help it but I think you can avoid combinational I/O here.
This sounds great, but I confess to not knowing exactly how to do what you mean. Is the latter just a case of declaring
output reg RAS, for example?
No. Registering in this context means that the signal should be the direct output of a flip flop, with no combinational logic after. Registered signals are faster, have less skew, and do not glitch (the latter is not very important in this case, but it is very important in other cases).
The multiplexed output is to avoid extra logic within the state machine (which was my limiting factor). What would you suggest instead? Simply setting the new 'RAS' register in an always block with the same multiplexer, or something else?
I doubt the size of the state machine is your limiting factor. Combine both synchronous blocks with something like this:
Code: Select all
if (READY == 1'b1) begin
...
CMD <= CMD_PRECHARGE;
...
else begin // when READY == 1'b0
...
CMD <= CMD_NOP;
...
assign RAS = CMD[2];
// Same for all other SDRAM signals
...
Any particular reason you are using two different clocks at the ram controller?
Mostly to need fewer bits in the counter which is only used to initialise the RAM in the SETUP phase ...
This is not really a very good idea. You are transferring data from two unrelated clock domains without any synchronization. You can't freely mix two clocks like that. It is perfectly possible that you would read the wrong value at the target.
Code: Select all
always @( posedge CLK )
state <= nextstate;
always @( negedge CLK ) begin
...
case(state)
nextstate <= STATE_REFRESH_NOP1;
This is bad. You are writing on one edge of the clock and reading on the other edge. You are effectively reducing the cycle time to the half. In other words, these signals would need to be as fast as if the clock would have double the frequency (133 MHz). And unsurpisingly, with only 7.5ns from edge to edge, you don't meet timing here.
Furthermore, you don't need this at all. Eliminate
nextstate altogether, and just write directly to the "
state" signal. Or if you prefer, set
nextstate separately in a
combinational process (this is just a matter of style).
My flipflops aren't clocked to CLKOSC but to the derived output lines which are routed to GCKs.
It doesn't work like that. Your flip-flops do are clocked by CLKOSC. You can't use an actual output as a clock, or as any kind of input for that matter. The actual clock is whatever is driving that output. The fact that you connected that output to a GCK pin is not relevant.
That's another benefit of performing a timing analysis. The results would hint you about such issues. In this case you can see how much faster are the signals clocked by CLK8 (the only one that you are actually using a global clock). It is also very useful to use the Technology Schematics Viewer (at the tools menu), to see exactly what the compiler synthesized from your code.
I do treat them as asynchronous, there's nothing synchronous in the bus cycle logic at all, only within the SDRAM module are things synchronous (and that's to RAMCLK).
If you consider them async, then you should better synchronize "
altram_access_int". It is not a good idea to feed an unsynchronized signal to a machine state. You should also check for any potential hazards.
Btw, are you aware that AS is kept asserted all the time on a RMW bus cycle (when using TAS)? Did you check this doesn't break your logic?
I understand that the SDRAM controls are all synchronous (the clue's in the name!) but, for example, if my state machine is occasionally missing a step because there's one too many multiplexes within one block for the wires to be stable in time for the next clock, I'm afraid I simply don't know how or what I should be examining to establish that. So I iterate & test.
In first place it is not only the SDRAM that is synchronous. Your design is also synchronous. Even when you are using some async logic, it still has many flip flops. If you violate timing specs for any flip flop, the results are unpredictable. This is as true for the internal flip flops as for the ones on the SDRAM chip.
In second place, I think you are too concerned with the number of terms. What is known as the combinational path. This is a typical problem on other devices like on a FPGA. But the architecture of this CPLD is very different. As I said on my previous message, the number of terms is usually not as significant as other internal delays. Certainly, using a non global clock and using both edges of the same clock, are much more significant.
So what you do to detect at all if your state machine meets timing? Well, as I said since the beginning, you constrain the design and perform a timing analysis. The timing analyzer would tell you. Constraining the external interface is not easy (and with such a delay on the RAMCLK output it probably won't be reliable anyway). But the machine state itself (the internal timing) is, from this point of view, rather simple.