Show HN: A 16-bit Forth machine written in VHDL

jevinskie · on Aug 4, 2014

Have you heard of the two process design method? [0]

It makes your port maps super simple (and can change them by editing just one file) and each entity consists of just two processes: one to update the entity's state on clock and the other to generate the combinatorial logic using procedural, not dataflow, programming. When you use procedural, you can step through it with a debugger just like with SW! Makes debugging much easier. Also, using records for the port maps makes the ModelSim waveform viewer much easier to use.

You can see an example of it in a MIPS subset I wrote. [1] Any file with a _r.vhd or _p.vhd suffix is written in the two process style. I'd suggest looking at the ALU and I-cache, they are probably the simplest and cleanest entities.

I like your use of numeric_std. =) I'm surprised by how few times I've seen it used.

[0]: http://www.gaisler.com/doc/vhdl2proc.pdf

[1]:https://github.com/jevinskie/mips--/tree/master/project4/sou...

inforichland · on Aug 4, 2014

Yes, I have seen it, and I have used it a few times. For some types of modules, I really do like how it separates out the state from the logic itself; as a person who started writing firmware first, and then started logic design, the two-process method can make it feel less like software sometimes.

Everything I've read these days says to use numeric_std (I've read the reasons why, and it makes sense), so I just never even bothered using the std_logic_*signed libraries. I'll check out your MIPS project; I'm intending my next project to be a fully pipelined RISC machine :-)

jaekwon · on Aug 4, 2014

If this were to become a more serious (but still open/free) project, what is the projected speed, in your opinion, that can be achieved by a hobbyist with an FPGA board or similarly available technology, say by the year 2020?

I ask because this seems like one avenue to create a full stack open source/hardware machine whose security can be vetted by the community. I wonder if by 2020 we might be running your core on handsets.

On a related note, I wish I could run a USB stick that runs a ROM-BIOS burned into an FPGA stick. Is that possible today?

_wiv7 · on Aug 4, 2014

Is there any reason to trust closed-source FPGA synthesis and place and route tools any more than an Intel CPU?

jaekwon · on Aug 4, 2014

I hear of advances in program obfuscation. They're working on improving performance. Whether they are secure or not, I do not know.

But if it is possible to obfuscate a program, I think it should be possible to create something on an FPGA that is secure and private.

weland · on Aug 4, 2014

Sadly, there's more in an FPGA than "just" a dumb lookup table for implementing logic functions. There's not much more reason to trust an FPGA than an Intel CPU.

drdaeman · on Aug 4, 2014

Sounds interesting. I thought most of FPGA is logic blocks, then there's some MCU that would handle loading the design, maybe some ADC and/or DACs, and then there isn't much more than this. The MCU, theoretically, could try to analyze your design and modify it, but that would require either a targeted adversary that knows the design or a good amount of computational power and fancy algorithms to analyze what's going on.

Maybe you have any links with a good further reading on that topic?

retroencabulato · on Aug 4, 2014

The point is there may be an additional, secret, logic block which allows malicious access to the flops.

drdaeman · on Aug 4, 2014

There could be, but mustn't it be aware of schematics FPGA is soldered into? A secret block to just manipulate flops isn't enough - it must be able to be controlled by someone.

Well, in theory it must not, because it could detect "oh, this looks exactly like one of popular Ethernet cores, so I'll bug onto those pins and have networking", but this seems like a hard task. Or, well, it could be that every pin is hooked and a secret block awaits a specifically crafted code (somehow like port knocking), but I'm not sure this is a feasible approach.

inforichland · on Aug 4, 2014

Honestly, I'm still on the new-ish end of FPGA design (a year or so), so I'm probably not qualified to answer the first one, but I'd have to guess probably not much faster? A fully pipelined CPU could theoretically achieve the maximum clock rate that a given FPGA family could support (i.e. not more than 1 level of logic b/w flops). Actually, yes you could write your own BIOS in some HDL; there are several FPGA dev board that are "USB sticks." See [1] and [2].

[1] http://www.latticesemi.com/en/Products/DevelopmentBoardsAndK... [2] http://www.altera.com/b/nios-bemicro-evaluation-kit.html

fernly · on Aug 4, 2014

So what are some of the possible things one could do with this? What remains to make it the basis of a interactive FORTH system that could be hooked to a serial terminal and used, e.g. in a classroom? (Would that be a useful thing to do? Or would it be better as an embedded node?)

inforichland · on Aug 4, 2014

I guess the unstated goal of this was to create a small, simple and fast soft-core processor that could be used to augment FPGA designs (<15% of a small devices resources @ 100MHz).

Honestly, I wrote this just to scratch an itch. I've always loved the elegance of Forth, and having stumbled upon (http://users.ece.cmu.edu/~koopman/stack_computers/index.html) that, I decided to create my own, with the goal of having single-cycle execution of all non-control-flow operations. Having said that, one could easily take it, and probably fairly quickly implement the Forth interpreter and use it in a classroom if they desired. I guess the explicit goal was an embedded node, but it's fairly flexible.

krivx · on Aug 4, 2014

How does the performance compare to a Forth implementation on an mcu? There are cheap fast ARM Cortex parts available, is the "100 MHz 16-bit dual stack processor" part directly comparable?

inforichland · on Aug 4, 2014

Most likely it would be faster, as the Forth primitive words are implemented in hardware. The single-cycle words are: +, -, 2*, 2/, dup, not, @, !, >r, drop, r>, rot, -rot, swap, nip, tuck, over, =. Un/conditional jumps and calling a word take 2 cycles. Naturally, though, this only makes sense if you already have an FPGA in your design.

VLM · on Aug 4, 2014

No its not directly comparable at all.

The softcore is pretty small, takes up roughly a tenth of a small cheap (by fpga standards) dev board. There are bigger, and smaller, soft cores.

To compare, you could be running benchmarks on about ten simultaneous systems vs one FPGA if you insist on "one chip" vs "one chip" comparisons so obviously the fpga is ten times faster than it appears. The advantage a FPGA provides is really smart custom peripherals. So if for whatever reason you need to do lots of floating point divides in your app, or perhaps in your benchmark, you stick 100 hardware FP dividers on the chip and suddenly your division benchmark absolutely smokes the ARM which I believe has only one hardware FP division unit (or was it two?)

supahfly_remix · on Aug 4, 2014

alu.vhd, lines 40 - 42

      -- arithmetic
       alu_results.add_result <= std_logic_vector( signed( nos ) + signed( tos ) );
       alu_results.sub_result <= std_logic_vector( signed( nos ) - signed( tos ) );

This logic can be done with one adder instead of two. In two's complement, invert the 2nd operand and assert the carry in.