DFB1r4 design discussion thread

General discussions or ideas about hardware.
MegaSTEarian
Posts: 80
Joined: Wed Aug 19, 2020 12:56 pm

Re: DFB1r4 design discussion thread

Post by MegaSTEarian »

Looks very good :bravo:

(The next milestone will be when you hardware guru guys achieve to use both the internal and the new CPU for a multi-CPU Falcon :) )
User avatar
Badwolf
Posts: 2228
Joined: Tue Nov 19, 2019 12:09 pm

Re: DFB1r4 design discussion thread

Post by Badwolf »

exxos wrote: Sun Dec 05, 2021 10:46 pm Pretty darn good ! Bit slow on RAM access though ?
Yeah, two reasons for that. I gated RAM access to CPUCLK, and I can't remember why. If it was when I was trying to get the DSP to work, it's probably redundant now. Secondly I removed my speed switching hold off* code when debugging the FPU. Now that I've got my independent DSACKx lines, I may not need that either.

So could be one easy optimisation coming up!

BW

* What's my speed switching hold off?

Well, switching from high speed to low speed clock takes up to one full slow cycle to complete, thus incurring an average half-slow-cycle delay on every access to the motherboard. I could perhaps come up with a better clock switching algorithm, but this one seems infeasibly reliable.

However this does cause a ~10% RAM access reduction on average (it's mitigated by the faster processing outside of this).

I postulated that access to the motherboard is fairly often done in longword chunks (not least cache filling during reads), which are two, tightly-packed, bus accesses. If I switch up to full speed immediately on the end of the first cycle, I'll incur a substantial delay in switching down for the second word.

So I normally hold off switching from slow back to fast until XDTACK has been deasserted for two consecutive (slow) cycles. This gives a decent average performance boost to motherboard accesses, but complicates access to AltRAM.

I was concerned it may also affect FPU accesses, so I disabled it.
DFB1 Open source 50MHz 030 and TT-RAM accelerator for the Falcon
DSTB1 Open source 16Mhz 68k and AltRAM accelerator for the ST
Smalliermouse ST-optimised USB mouse adapter based on SmallyMouse2
FrontBench The Frontier: Elite 2 intro as a benchmark
User avatar
Badwolf
Posts: 2228
Joined: Tue Nov 19, 2019 12:09 pm

Re: DFB1r4 design discussion thread

Post by Badwolf »

JezC wrote: Mon Dec 06, 2021 8:24 am Any more plans for future developments beyond a spin of the PCB?
There's loads of work that could be done with the firmware. Plenty of optimisations (I've aimed for reliable over fast where there's been a choice) and extra features could be added.

One example I've just mentioned above -- the mobo access speed is below optimal -- but there's also one wasted cycle in the AltRAM access that a better verilog developer could prune away. The CPLD also has an 8 bit data bus attached that could be used for software-driven options or a speed counter, etc.

But none of these is in the base base spec, so I'm hoping to get a stable next spin and open source the lot of it.

With luck, we'll see forks with all sorts of fancy firmware features.

BW
DFB1 Open source 50MHz 030 and TT-RAM accelerator for the Falcon
DSTB1 Open source 16Mhz 68k and AltRAM accelerator for the ST
Smalliermouse ST-optimised USB mouse adapter based on SmallyMouse2
FrontBench The Frontier: Elite 2 intro as a benchmark
User avatar
exxos
Site Admin
Site Admin
Posts: 23488
Joined: Wed Aug 16, 2017 11:19 pm
Location: UK
Contact:

Re: DFB1r4 design discussion thread

Post by exxos »

Badwolf wrote: Mon Dec 06, 2021 2:35 pm Well, switching from high speed to low speed clock takes up to one full slow cycle to complete, thus incurring an average half-slow-cycle delay on every access to the motherboard. I could perhaps come up with a better clock switching algorithm, but this one seems infeasibly reliable.
ah yeah. Switching from two unrelated clocks can take a couple of cycles complete glitch free :( I tried doing glitch free switching on some of my early boosters, but by the time you had switched, the bus cycle had completed anyway. Some dedicated switching chips even took 3 slow clocks to complete, and actually kept the clock low the entire time. So I gave up with that approach. Switching between 2 related clocks like my current V2.X series booster does can pretty much be done instantly. Bizarrely some of the Falcon bus speeders used a related clock switching, even though they were running from unrelated clocks.How they ever got it working is beyond me :roll:

Then when I went to the SEC booster series, I just ran the CPU at full speed all the time. Given the same clock speeds as my V2.X series boosters, running the CPU constantly fast will still increase overall speed which I was a little bit surprised about.

I think it was the PAK boards which also had something like a 90% or 95% ST-RAM speed access. I think it was only the terrible fire series of boards is actually a proper RAM speed access. But I've not looked into this for a long time so memory is a bit fuzzy :lol: :roll:
https://www.exxosforum.co.uk/atari/ All my hardware guides - mods - games - STOS
https://www.exxosforum.co.uk/atari/store2/ - All my hardware mods for sale - Please help support by making a purchase.
viewtopic.php?f=17&t=1585 Have you done the Mandatory Fixes ?
Just because a lot of people agree on something, doesn't make it a fact. ~exxos ~
People should find solutions to problems, not find problems with solutions.
User avatar
Badwolf
Posts: 2228
Joined: Tue Nov 19, 2019 12:09 pm

Re: DFB1r4 design discussion thread

Post by Badwolf »

exxos wrote: Mon Dec 06, 2021 2:45 pm
Badwolf wrote: Mon Dec 06, 2021 2:35 pm but this one seems infeasibly reliable.
Bizarrely some of the Falcon bus speeders used a related clock switching, even though they were running from unrelated clocks.How they ever got it working is beyond me :roll:
:lol: The reason I said my logic works infeasibly well is that it's technically a related clock switching method, but providing I keep the slow on one side and the fast on the other it all comes out in the wash with the proviso you occasionally get an 8MHz (down to 4 in theory during boot -- absolute bare minimum the 030 can handle!) cycle.
Then when I went to the SEC booster series, I just ran the CPU at full speed all the time. Given the same clock speeds as my V2.X series boosters, running the CPU constantly fast will still increase overall speed which I was a little bit surprised about.
I did try this extensively. I could get an *almost* reliable system but certain things didn't play nicely. Bus Arb became very hard. Palette switching in games seemed off. It was, on average, faster though.

This technique appeals to what little OCD I ever exhibit. It's the way it *should* work and it irks me it doesn't work as well as the bad-bad-naughty way. BUT: reliability first, so I'm OK with my method.
I think it was the PAK boards which also had something like a 90% or 95% ST-RAM speed access. I think it was only the terrible fire series of boards is actually a proper RAM speed access. But I've not looked into this for a long time so memory is a bit fuzzy :lol: :roll:
My suspicion is the 90-95% figures are a little bit of an artefact of the way the tests are done, but I might get the stopwatch out on my Frontier Benchmark (my approximation of a general workload) and try a head to head comparison.

BW
DFB1 Open source 50MHz 030 and TT-RAM accelerator for the Falcon
DSTB1 Open source 16Mhz 68k and AltRAM accelerator for the ST
Smalliermouse ST-optimised USB mouse adapter based on SmallyMouse2
FrontBench The Frontier: Elite 2 intro as a benchmark
User avatar
Badwolf
Posts: 2228
Joined: Tue Nov 19, 2019 12:09 pm

Re: DFB1r4 design discussion thread

Post by Badwolf »

OK, so because @exxos keeps needling me about syncing motherboard accesses to clock edges (because he's a big meanie and doesn't like 90% RAM access speeds), this week I've mostly been looking to find a way to keep it all stable whilst letting XAS fall the moment the CPU's ready.

The current status that looks pretty promising:-
  • only perform the DSP-enabling UDS/LDS hold-on when actually accessing the DSP;
  • don't allow the clock to return to full speed until two full (slow) neg edges post a motherboard access.
I was hoping to avoid this second one, but the effect doesn't seem great and it greatly improves AltRAM reliability (I can't quite see why yet, but the great SDRAM optimisation has not yet happened and may never if it just bloody works).

Here are the current figures:

Firstly, with regular TOS404 in motherboard ROM:-

IMG_4922.jpeg
IMG_4922.jpeg (197.88 KiB) Viewed 2196 times


Secondly with regular TOS404 in flash ROM:-

IMG_4921.jpeg
IMG_4921.jpeg (215.48 KiB) Viewed 2196 times


And lastly with MAPROM employed to map TOS404 into AltRAM:-

IMG_4923.jpeg
IMG_4923.jpeg (204.69 KiB) Viewed 2196 times

Tests performed and passed so far include:
  • GB6 full run
  • BadMood first level playthough
  • MiNT + networking + ssh + lines.app + Doom in truecolour mode.
This combination is normally sufficient to ferret out DSP and AltRAM issues. I do need to let Quake run for a bit to look for FPU issues too.



For reference, here's where I was on Monday:-


Image


(Ignore the headline figure -- the FPU has been reduced to 25MHz to let me fully close the case [the 40MHz oscillator is a bodge and is a bit too tall], so that figure's dragging the headline down, but it's intentional -- look at the individual measurements du choix)

BW
DFB1 Open source 50MHz 030 and TT-RAM accelerator for the Falcon
DSTB1 Open source 16Mhz 68k and AltRAM accelerator for the ST
Smalliermouse ST-optimised USB mouse adapter based on SmallyMouse2
FrontBench The Frontier: Elite 2 intro as a benchmark
User avatar
exxos
Site Admin
Site Admin
Posts: 23488
Joined: Wed Aug 16, 2017 11:19 pm
Location: UK
Contact:

Re: DFB1r4 design discussion thread

Post by exxos »

Badwolf wrote: Fri Dec 10, 2021 1:56 pm OK, so because @exxos keeps needling me about syncing motherboard accesses to clock edges (because he's a big meanie and doesn't like 90% RAM access speeds), this week I've mostly been looking to find a way to keep it all stable whilst letting XAS fall the moment the CPU's ready.
:lolbig: :bravo:

Yeah now you're too fast :lol: But that is pretty much normal because of the caches anyway. If you wanted to make 100% sure you're on track. you would have to benchmark a stock system with the instruction and data caches turned off, "save defaults" then run the tests with your booster turned on (caches still disabled) then you should in theory see closer to 100% RAM speed. Probably not worth the hassle anyway.

Great work nonetheless! :2k2:
https://www.exxosforum.co.uk/atari/ All my hardware guides - mods - games - STOS
https://www.exxosforum.co.uk/atari/store2/ - All my hardware mods for sale - Please help support by making a purchase.
viewtopic.php?f=17&t=1585 Have you done the Mandatory Fixes ?
Just because a lot of people agree on something, doesn't make it a fact. ~exxos ~
People should find solutions to problems, not find problems with solutions.
Steve
Posts: 2570
Joined: Fri Sep 15, 2017 11:49 am

Re: DFB1r4 design discussion thread

Post by Steve »

@Bestwolf
User avatar
Badwolf
Posts: 2228
Joined: Tue Nov 19, 2019 12:09 pm

Re: DFB1r4 design discussion thread

Post by Badwolf »

Cheers, guys.

So I had quake (fpu variant) running under MiNT with a background telnetd connection on the go monitoring the process. It ran for an hour at 50/25 (CPU/FPU MHz) before freezing.

I postulated this was cooking off as I don't have a decent heatsink on my 030 yet. So I reduced the clock to 40/20 and tried again.

This ran happily for two hours before quake itself crashed (telnetd was happily running underneath so I could still use the machine, albeit the screen had frozen). This suggests the first problem was very likely thermal.

So I'm going to call that good enough for this phase of optimisation. No-one's really going to try to run Quake on an 030 (about 2 seconds per frame, if you're interested!) whilst also running background tasks and thermal management is left as an exercise for the reader.

I'm going to move on to building in the bodges to date to a rev5 board, consider building up a second one of these with bodges in from the start to prove it's not a one off and maybe (maybe!) look at squeezing another cycle of the SDRAM if I'm waiting for boards.

I've pretty much got what I wanted to build now.

There are some odd questions that I'd like to investigate in case they show up issues with my board. One is why is EmuTOS not stable under MiNT when AltRAM is enabled, but TOS is? I postulated HD driver differences, so I tried to hack EmuTOS to bootstrap HDDriver like TOS after some tips from Christian but didn't get very far. I have another idea or two there, though.

Anyway. Very pleased. :)

BW
DFB1 Open source 50MHz 030 and TT-RAM accelerator for the Falcon
DSTB1 Open source 16Mhz 68k and AltRAM accelerator for the ST
Smalliermouse ST-optimised USB mouse adapter based on SmallyMouse2
FrontBench The Frontier: Elite 2 intro as a benchmark
Post Reply

Return to “HARDWARE DISCUSSIONS”