Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Old C code – how to upgrade it?
103 points by mikewarot on April 8, 2022 | hide | past | favorite | 137 comments
I've got this code I'm interested in [1], and asked about 2 years ago, actually. My efforts at patching it didn't go well, in retrospect. [2]

It's a Forth variant that does typing, and lots of things that normal forth can't do... likely as powerful as Lisp, I think. It's become an obsession to get this thing running.

I do a wget, unzip, untar, ./configure, make, su make install and it works perfectly... under Debian 3

Anything newer, even with -std=c99, and it dies a horrible death.

Taking the executable from Debian 3 to Debian 11 (32 bit) and it works perfectly.

How do I, a non-C programmer, who is willing to dump tons of time into it, migrate this code into the year 2022?

Here's an example of something that seems opaque to me, from src/words.c:2012

     *(--(char*)hash_ptr.parm.v.p) = fpop(sst);
I have a hint as to what's going on here, but when you throw * in there twice, I think you can see how it's confusing.

1 - http://stoical.sourceforge.net/download.php

2 - https://github.com/mikewarot/stoical



I've been coding some 10-15% of my two-decade coding career in C (and C++), and have been mentoring people over video conferencing and a shared server accessed via VNC. Usually I do that for pay, but I'm interested in programming language implementations (I'm doing some of that myself)[1], and I've occasionally been working on implementing a Forth-like language myself[2], and so I'm willing to spend some time with you (say, 10 sessions, ~2 hours each) for free. Let's see what we can achieve, and I'll tell you at any point what the best way forward will be. I don't really share the opinion with some others who suggest to re-implement it in another language--for that you really have to understand the original code base first, and if you do, then it shouldn't be hard to fix the issues in C, either. I personally will probably suggest to leave it in C, it will most likely be less work, and it will have lighter dependencies. You can find my contact info from my profile.

PS. some of the first things to do will be to:

- if there's a way to get hold of previous versions, make a fresh Git history that includes them, so that the changes can be correlated with release notes

- use the sanitizers offered by today's compilers, or valgrind if having to compile it with old compilers.

- read the code about how it encodes pointers, and adapt that to 64 bit if needed

- maybe instrument the garbage collector if issues show up there, maybe move to allocating memory via malloc so that the memory sanitizer will catch violations

- write documentation about the workings that we study (that helps getting a clear mind and staying focused, and may help people in the future)

[1] https://github.com/pflanze/lilyvm [2] https://github.com/pflanze/copycat


Maybe also:

- Write lots of tests with high test coverage before migration. Then you can rely on them on a continuous basis to make sure any updates do not break the existing functionality


This sounds like it could be a fun exercise all by itself (I'm a test engineer, and I enjoy testing :D )


Yes, extending the test suite may definitely be worthwhile. Also, using AFL+ to collect non-crashing test cases can make this quick and somewhat exhaustive (while admittedly not giving as easy hints about what might be broken than manually created tests would).

Just in case you're interested in joining the sessions that I initiated, see https://news.ycombinator.com/item?id=30973570


This is the only way to ensure sanity is maintained through this arduous process.


I would pay just to watch over your shoulder while you did this.

It would be hard for me to not ask questions, though. :)


I'm not adverse to doing a livestream on YouTube, or a Skype call, etc.


OK, I'm fine with opening it up, too, and we should probably so so, that may give more input. There have been lots of fixes posted already, and I haven't checked whether that solves everything (this would be a bit counter to helping in a natural way), so I don't know whether there's actually anything left to do. For either case, I've put up a page which will contain updates on the events here[1].

Public live streaming is new to me and so far I've evaded any public records of my video and voice, maybe that's pointless since this will become increasingly impossible into the future, but I'm undecided and so far am planning for conference calls that, while open for anyone to join (see [1]), will not make recordings public. I might be swayed. I'm also still not omniscient in the C world (the most glaring is probably that I've never used the GNU autotools!), OTOH I'm yearning for somewhat higher levels of abstraction so my C coding is often untypical, so YMMV if you're after learning standard ways of working in C.

[1] https://github.com/pflanze/stoical-mentoring


My concern with the current setup is that I am in GMT+10 timezone. This means that some 'livestream' events are not able to be viewed due to work or life commitsments. I will keep an eye on this topic though.


Fair point. I might make an edited version of a recording available, but can't make promises at this time.


I would definitely watch a livestream on YouTube



I think lots of people will be interested, keep us posted if the live streaming really happens!


See https://news.ycombinator.com/item?id=30973570 -- just join the Jitsi session, if that's OK with you (say if not, I'd be wondering why).


Livestream preferred for me, twitch/yt. no skype here.



Thanks for the offer, I'll get in touch.

Copycat looks interesting. I can see how Stoical and Copycat are mirrors of each other.

As for LilyVM, trying to cram anything into a 6502 is an interesting exercise.


> cram anything into a 6502

I noticed some people working on an LLVM backend for the 6502; I may revisit the C-64 target once that matures (it will generate better code than cc65, and I'm now relying on modern C features that cc65 doesn't support). I currently concentrate on getting LilyVM to do anything useful for me on modern systems. (I've got some more work that I haven't pushed yet; but I mainly need to finally start working on the compiler to bytecode, which is a larger can of worms than the VM. I might first re-implement Copycat on it.)


I'll be up to help. I've been programming in C for 30 years now (and still do). One of my "when I'm bored" projects is updating an early 90s C code base [1]. And I do have an interest in Forth, so this sounds like an interesting project. My email is on my profile.

[1] Viola, one of the first graphical web browsers that assumes sizeof(int) == sizeof(long) == sizeof(every pointer).


I'd be very happy for you to join some of the sessions that I initiated (if they will take place): https://news.ycombinator.com/item?id=30973570


Taking a quick look at the source it doesn't look too bad all things considered. First off, assume that there's going to be some sort of 32bit->64bit issues (likely casting pointers into 32bit words). So, start off on a 32bit setup, get things to cleanly build on -std=gnu99 (no need to cause yourself extra pain if it's optional). Then tidy things up with warnings (-Wall -Wextra on clang&gcc) such that the compiler can help you spot any existing bugs. Next up is the transition to 64bit. You're likely going to have to spot any pointer manipulation and where possible change 'int', 'long', etc into types which specify their length to make reading the code easier (subjective, but that's my opinion) e.g. uint32_t, uint64_t, etc. Then you ought to be pretty close to having all system tests pass on 32bit and the 64bit port.

I'd be worried if the code was old enough you were seeing K&R C notation or if it was a huge codebase, but it doesn't look like either case has occurred.

Of course if you're very much not a C programmer, learn some C. It's a comparatively small language overall (IMO) even if it's a low level one. Newer versions of the K&R book should get you up to speed pretty quick.


Here's a set of C compiler flags that I find especially useful to prevent bugs from occurring:

-Werror=implicit-function-declaration -Werror=implicit-fallthrough=3 -Werror=maybe-uninitialized -Werror=missing-field-initializers -Werror=incompatible-pointer-types -Werror=int-conversion -Werror=redundant-decls -Werror=parentheses -Wformat-nonliteral -Wformat-security -Wformat -Winit-self -Wmaybe-uninitialized -Wold-style-definition -Wredundant-decls -Wstrict-prototypes


on small projects, with clang, I use "-Weverything -Werror" and then I start fixing the issues one by one. I also either disable anything I'm willing to live with using -Wno<warning> or I use '#pragma clang diagnostic ignored "-W<warning>"' if it's only relevant for specific portions of the code.


I'll be trying that magic incantation shortly.

[Edit] I tried it, and it didn't make anything worse.

What I was hoping to find (and spend a few days looking) was for some set of flags I could give to a modern copy of gcc to hold it's nose and compile this code as-is, as a starting point.

Git under Debian 3 seems to be a no-go. I'm tempted to just have two virtual machines that are never on at the same time mount a separate virtual disk that holds the Stoical source code and git repository.

Make changes / test in Debian 3... when happy shut it down, fire up Debian 11 and do a commit and push to github. Shut it down, fire up Debian 3, repeat.


I'm not sure how much this helps, but better fire up your Debian 3 inside a virtual machine. Then you can copy files with scp and commit them to git or whatever other version control on your main host.

And those flags should be used first on the Debian 3 machine, at least those which are supported there.

In my experience, jumping straight to the new version and making it "hold its nose" only ever works if you already know what you're doing.


With vagrant, the guest's /vagrant directory is whatever host directory contains the Vagrantfile. Using that seems even easier than scp. You can also set up some other shared folder if you're less lazy than me.


Note that this may detect a lot of problems that are not necessarily fatal. It's just that enabling those warnings and sticking to it prevents errors from appearing once you earnestly start working with the code - like you will need to in order to modernize it.


If you use Docker/vagrant you should be able to do your editing/version control on the host machine and handle only compilation on the VM.


What problems are you running into getting git on debian 3?


I tried installing git... it turns out that in Debian 3 git is the Gnu Interactive Tool, not a source code management program ;-)

So I downloaded the latest version of git source, using wget from

https://mirrors.edge.kernel.org/pub/software/scm/git/git-2.9...

Then I found I needed zlib, from

https://www.zlib.net/zlib-1.2.12.tar.xz

And I also needed Tcl/Tk

https://prdownloads.sourceforge.net/tcl/tcl8.6.12-src.tar.gz

And I also needed autoconf

http://ftp.gnu.org/gnu/autoconf/autoconf-2.71.tar.gz

Autoconf needed a newer version of m4

http://ftp.gnu.org/gnu/m4/m4-1.4.19.tar.gz

m4 couldn't install... I'm not sure why... so I gave up at that point. 8(


Yes! Was going to suggest much the same: crank up the warnings and treat them as errors!


I'll second the suggestion to attack one problem at a time. A very long time ago I was on a team that built a desktop application in Qt + Visual C++ for Windows (98/ME/2K/XP), and we wanted to port it to Mac OS X (10.3) on PowerPC.

First, we addressed the compiler issue, getting it to build with GCC instead of VC++ on Windows. This was the most time-consuming step.

Second, we built on Linux x86 with GCC, because we were quite familiar with it, and this enabled..

Third, we built on Linux PowerPC with gcc, which was the little -> big endian step.

Finally, we got it running on OS X.

For your issue in particular, I'd try to stick with 32-bit x86 over x86-64 to start, and inch my way through newer Debian and gcc versions. You could install gcc-4.2 as late as Debian 5, I think.


The code is doing a lot of preprocessor macro magic, though, which rather obscures some of what is going on.


Macro-expanding the files via the -E option (and then passing the result through clang-format) can help in these cases.


Maybe the compiler is telling you all the time what is wrong, see this warning:

    kernel.h:181:19: warning: operation on ‘sst’ may be undefined [-Wsequence-point]
      181 | #define fpop(s) ((--s)->v.f)                                
          |                  ~^~~~                                                                                                            
    words.c:2185:27: note: in expansion of macro ‘fpop’
     2185 |         if ( fpop(sst) <= fpop(sst) )                                                                                                                                                                                                                                       
          |                           ^~~~                   
For this piece of code:

    /**(binary) ge
     * "1 1 ge"
     * TRUE if TOS-1 is greater than or equal to TOS.
     */
    begin(ge)
     if ( fpop(sst) <= fpop(sst) )
      fpush(sst,TRUE);
     else
      fpush(sst,FALSE);
    end()
That code does assume that the first fpop pops the top of the stack (TOS), and the second one pops the element below that (TOS-1), so "0 1 ge" should return false. But current gcc versions evaluate the calls in the opposite order, so it will return true: https://godbolt.org/z/v4TdvE83b

Also, you win your nerd sniping badge.


Basically, "dump tons of time" will probably involve some aspect of becoming a C programmer.

The program has some kind of portability bug or bugs that make it sensitive to the compiler. That the old executable works probably means that it's not a library or environment issue, though that can't be ruled out 100% because an executable from Debian 3 may be using compatibility functions in glibc that a Debian 11 executable wouldn't use.

A C programmer would just debug this on that platform where the problem reproduces.

There are tools which can help pinpoint the origins of behavior that leads to a crash.

- compiling the code with the "undefined behavior sanitizer a.k.a. ubsan" using -fsanitize=undefined.

- compiling the code also with the "address sanitizer (asan)" using -fsanitize=address.

- using valgrind on it

Of course you have the debugger gdb, and the printf function.

Interestingly, the project isn't using optimization (-O0 is specified in the CFLAGS of the root Makefile) so the failure is likely not coming from that class of issue whereby clever optimization defeats the programmer's incorrectly expressed intent.


P.S. turn on -O2 optimization; it could flush out some more warnings. For instance, I think, unused local variables aren't diagnosed without optimization. That could hide a bug (because the intent was to use the variable, but it was misspelled, landing on a different variable). Basically, you don't get all the diagnostics you could be getting at -O0 (zero) optimization.


+1 for all of this!

Other things that may help find issues:

* the -pedantic -Wall -Werror flags

* building with -ggdb3, run the program via gdb. With any luck it will lead you to a stack trace


> Basically, "dump tons of time" will probably involve some aspect of becoming a C programmer.

Yep, I figured as much. I want this thing to exist, and be multi-platform, and supported... so I'm willing to do that.


> It's a Forth variant that does typing, and lots of things that normal forth can't do

congratulations ! like a pop song you cannot get out your head, you have found and bonded with something rare, special and worthy.. but like an old, old automobile from your Uncle's backyard, it might take endless efforts for modest (or worse) rewards

> make install and it works perfectly... under Debian 3

you have your first target.. make a VirtualBox VM with Debian 3, install your target software, and guard it well !! backups, small environment improvements.. connection to the Internet even?! small steps are called for..

> Anything newer, even with -std=c99, and it dies a horrible death

standard C99 is not a terrible thing.. the libraries that are linked are the big weakness there, but despite lots of complaining from modern coders, the libraries actually do work for you know, your own use.. thats what you are building, something for one person to use, you!

> I, a non-C programmer, who is willing to dump tons of time into it, migrate this code into the year 2022

you already know the answer to this.. there is no way to port C code to standard C99 without knowing C. So, you have to cooperate with others somehow. This post is not a bad start, but if you are looking for volunteers, there has to be benefit for them. A random other person who also finds this obsessively compelling.. probably not a realistic outcome, even with the massive audience at YNews. There are legions of middle-aged guys who know C at the C99 level, but for whatever reason are not professionally programming right now. This might be a local search, in "meat space" or, alternatively worded invites, in other forums or mailing lists. Good luck!


has push song been tried to get the pop song out of head?


Thanks so much for all the help.

Backstory - About a week ago, I got the original compiled, learning about autoconf and make a bit along the way. I turned on ALL of the debugging options to try to figure it out.

Stoical is a variant of Forth from BEFORE Fig-Forth, so I didn't even realize it was working at first. It uses = to print, not . which took a while to figure out. Once the penny dropped, and I had it working, I was able to turn debugging back off and find that it works pretty much out of the box. I replicated it on a new Debian 3 VM and it does indeed do so.

Unfortunately, GIT doesn't come with Debian 3, and my attempts to compile it ran into far too many prerequisites, including zlib, Tcl/TK, autoconf (which requires m4, but a newer m4), eventually I quit that approach.

How it works:

There's a small set of built-in words, which it then uses to bootstrap an interpreter (14 words long) which then loads in the rest of the language. All of this works right out of the box in Debian 3. The executable compiled under Debian 11 starts to run, but can't bootstrap itself. Interestingly, the image compiled under Debian 3 does work in it's place.


I have bad news for you: If the project works perfectly on an older C compiler, but crashes hard on a more recent one, then that means the code has undefined behaviour.

Probably it's doing some type-punning - using casts or unions to access the same memory as multiple different types. This is a pretty normal thing to in an interpreter, sometimes you will bend the rules for performance reasons, and get away with it, too. As least you used to get away with it in the old days. But as modern compilers get better at exploiting the rules of undefined behaviour to improve performance in correct programs, some of the things you would get away with before no longer work.

Undefined behaviour can be superficial and easy to fix, but I'm afraid that in this type of project it can be deeply rooted in the data structures.

And that is what makes me a bit pessimistic on your behalf, esp. as you say you're not a C programmer. Because you need a solid understanding of the C memory model and undefined behaviour rules to deal with something like this. Although, after taking a quick look at the source, I have to say that it does look well-structured and pretty thoroughly commented, so maybe there is a chance.

What I would look for is unions and void pointers, where the program writes using one type and reads using a different type. That ".v.p" thing looks like a good candidate for where to look: Is a "value" union field always read from the same field that was last written? Is the value.p void* always cast to the correct type that was assigned to it?


Some of this you may be able to hack around via disabling compiler optimizations. For example, -fno-strict-aliasing will disable optimizations that rely on assuming objects of different types will never alias (an assumption that gets violated by type punning).

For that matter, if you're not starting on -O0, you probably should. Don't enable optimizations until it's working correctly.

Of course, if the issue is that the sizes of types changed (e.g. 32 to 64 bit), no amount of flags will fix that automatically for you. Though you could still install a 32 bit compiler on a modern distro.


use gcc -fdump-translation-unit switch compile old code, compile new code.

compare the differenct between the old gcc AST against the new gcc AST (or differences between the AST's generated when different compile switches are used)

Doesn't mean there aren't assembly language differences, but will narrow things down.


I've got a similar issue: I still have my own source code from a cool little DOS game from 1991 or so (running at 50 or 60 Hz, don't remember which: but in any case it was hard stuff for the times). It's a mix of C and some assembly but... I don't have the tooling anymore and, heck, have no memory whatsoever as to which compiler I was using back then. I still have some .exe and they run fine under DOSBox (for example).

I don't know if my best bet is to start from another project that works/compiles/run and then port my old code to that or if I should first try to compile it again.

I was "wise" enough to backup all my code and assets (gfx, music, sound fx) and executables, but foolish enough to not backup the tooling. I never thought that 30 years+ later I'd like to compile it again.


Well, DOS is still a thing. Was it Turbo C? I know DOS and Assembly.

If you have the .exe, we can figure out which compiler was used.


Oh I didn't think of analyzing the .exe to determine which compiler was used, silly me.

I'll drop you an email (but I'm in the middle of moving to another country so things are a bit hectic atm).


I don’t blame you for having a hard time, as that snippet you’re showing is not legal C (you can’t assign to a cast). --(char*)x is not legal C. Old compilers were more permissive though. unfortunately, you are going to have to both become a proficient C programmer and then figure out what the intent of that code was and rewrite it to legal constructs.


What do you mean? A cast to a char pointer is absolutely valid C. For most intents and purposes, a pointer "type" in C is merely the step size, and only void pointers cannot be decremented.

The line of code is basically two lines:

  (char*)hash_ptr.parm.v.p = (char*)hash_ptr.parm.v.p - 1; // move the pointer back by one byte
  *((char*)hash_ptr.parm.v.p) = fpop(sst); //At the memory location, write the output of fpop(sst)


It’s not the cast to the char pointer that’s a problem, it’s decrementing the casted pointer that is the problem. (char*)hash_ptr.parm.v.p is not an lvalue so you cannot assign to it.

    (char*)hash_ptr.parm.v.p = (char*)hash_ptr.parm.v.p - 1;
is not legal C, neither is

   --(char*)hash_ptr.parm.v.p


It's probably a use of this: https://gcc.gnu.org/onlinedocs/gcc-3.2.3/gcc/Lvalues.html - not clear that recent gcc versions support it. Frankly, this one line of code suggests to me that anybody looking to get started on their C career with this program is going to have a harder time than necessary ;) - though, who knows.

(If I were doing this, I'd probably change the p field to a char *. That's probably what it wants to be anyway. Then see what comes of that.)


No, that extension has been deprecated in 3.4 and removed in 4.0.


The OP is running this on Debian Woody, which used GCC 2.95.


The OP is wondering why this does not work with a current gcc.


char --> signed 8 bits, no less/no more than 8 bits used.

(*char) --> compiler will detect how to align char address for given hardware.

aka, 32bit machine,there are 4 places that a byte can evenly fit in 32 bit address/pointer.

May not be a gcc issue.

Directly manipulating a memory address (aka the numeric location where value is stored) usually prohibited by OS (or not allowed by hardware used) for security reasons.

May also be reason why works under Debian <X> and not Debian <Y>


It is a gcc issue, in that this code is written using a non-standard gcc-specific dialect of C.

In standard C, -- operates on what in C terms is known as a modifiable lvalue, a value that may be used on the left hand side of an assignment expression (https://en.cppreference.com/w/c/language/value_category). And a cast returns a value that is specifically not an lvalue of any kind (https://en.cppreference.com/w/c/language/cast), meaning a cast expression is not a valid operand for the -- operator.


The 'fun' part about minimal addressable memory sizes (assuming hardward supports) is this :

if addresses are 64bits, and only 48 bits are used for addressing, there are 16 bits available for other use!


A cast does not yield an lvalue in C (in C++ it can, if it is to a reference type). See:

http://port70.net/~nsz/c/c89/c89-draft.html#3.3.4

Footnote 36.


So, after all I've got it to compile and more or less work with GCC 9.4.0. It uses some GCC-isms (goto the address of a label), so it is probably not very portable:

https://github.com/fhars/stoical


Wow... words fail me... thank you SO much for this.


I had actually got this to build and run. These are the changes I made to get it to compile under gcc 4.6.3 with -std=gnu89:

  --- stoical-0.1.8-old/src/debug.c 2002-05-05 06:41:19.000000000 -0400
  +++ stoical-0.1.8-new/src/debug.c 2020-10-01 14:32:34.000000000 -0400
  @@ -139,7 +139,8 @@
     else if ( strcmp( (*p)->name, "l()" ) == 0 )
     {
      p++;
  -   st_pcell( ((cell*)p)++ );
  +   st_pcell( ((cell*)p) );
  +   p = (cell*)p + 1;
      p--;
     }
     else if ( strcmp( (*p)->name, "r()" ) == 0 ||
  diff -ru stoical-0.1.8-old/src/help.h stoical-0.1.8-new/src/help.h
  --- stoical-0.1.8-old/src/help.h 2002-04-25 16:06:20.000000000 -0400
  +++ stoical-0.1.8-new/src/help.h 2020-10-01 09:26:45.000000000 -0400
  @@ -1,18 +1,18 @@
  -const char help[] = "S t o i c a l  %s 
  -Copyright (c) 2002 Jonathan Moore Liles.
  -Released under the GNU General Public License (see COPYING)
  -
  -usage:
  -stoical [ option ] ... [ file ] [ arg ] ... 
  -
  -The available options include:
  -
  --h   - Print this help and exit.
  --v   - Print version information and exit.
  --e code   - Exectue the instructions 'code'.
  --l path   - Change STOICAL root library path.
  --f file   - Run 'file' (useful when file begins with a '-').
  -
  -";
  +const char help[] = "S t o i c a l  %s "
  +"Copyright (c) 2002 Jonathan Moore Liles."
  +"Released under the GNU General Public License (see COPYING)"
  +""
  +"usage:"
  +"stoical [ option ] ... [ file ] [ arg ] ... "
  +""
  +"The available options include:"
  +""
  +"-h   - Print this help and exit."
  +"-v   - Print version information and exit."
  +"-e code   - Exectue the instructions 'code'."
  +"-l path   - Change STOICAL root library path."
  +"-f file   - Run 'file' (useful when file begins with a '-')."
  +""
  +"";
  diff -ru stoical-0.1.8-old/src/words.c stoical-0.1.8-new/src/words.c
  --- stoical-0.1.8-old/src/words.c 2002-05-07 06:54:02.000000000 -0400
  +++ stoical-0.1.8-new/src/words.c 2020-10-01 14:33:49.000000000 -0400
  @@ -2009,8 +2009,8 @@
   begin(hash_put)
    /* Place character represented by the value at TOS into
     * the buffer. Decrement the pointer, and increment the count. */
  - *(--(char*)hash_ptr.parm.v.p) = fpop(sst);
  - 
  + *((char*)hash_ptr.parm.v.p) = fpop(sst);
  + hash_ptr.parm.v.p = (char*)hash_ptr.parm.v.p - 1;
    hash_cnt.parm.v.f++;
   end()
   begin(hash_a)
  @@ -2789,7 +2789,8 @@
    ip++;
    a = (cell*)ip;
    
  - ((cell*)ip)++;
  + ip = (cell*)ip + 1;
    ip--;
   
    push(sst,*a);


> ...

     /* Place character represented by the value at TOS into
     * the buffer. Decrement the pointer, and increment the count. */
  - *(--(char*)hash_ptr.parm.v.p) = fpop(sst);
  - 
  + *((char*)hash_ptr.parm.v.p) = fpop(sst);
  + hash_ptr.parm.v.p = (char*)hash_ptr.parm.v.p - 1;

Are you sure that making assignment before the predecrement is how this fragment should be functioning?


The two lines should be swapped.


Did the examples work for you? I got the threads example to run on an M1 building with clang, but the dir example fails.


I didn't test the examples. I sort of stopped once I got `1 2 +` to work.


I think the third chunk should have its two new lines swapped.


There's noting exceptional about the example really; the inner * is a cast to string (or char pointer), and the outer is a de-reference to set the value at the specified address (which for unknown reasons is 1 byte before hash_ptr.parm.v.p) to whatever fpop(sst) returns.

Writing interpreters in C is a solid path to really learning the language from my experience, but jumping head first into a code base like that without knowing C is a recipe for frustration and suffering.

There are a million more or less advanced, custom Forths out there; here are some of my own implemented in various languages:

https://github.com/codr7/ampl

https://github.com/codr7/snabel

https://github.com/codr7/alang

https://github.com/codr7/gfoo

https://github.com/codr7/forthy2


There are likely problems with the code even under Debian3 too if it won't build on more modern toolchains.

First is make some simple test case scripts that work on Debian 3, so you can cheaply get a feel for if it still works once it's building again. Run these after evety build.

You might want to run it under valgrind on Debian 3 and fix any bugs coming up there before trying to clean it on a modern toolchain. Because it will be much easier to find and fix those when it runs well and you don't have to guess if the problem is the newer toolchain.

You might want to crank up the compiler warnings step by step on Debian 3 first, and fix those first.

Then bring the cleaned and still working code to a new toolchain, under valgrind.

Don't worry if there are pages of compile errors... these are usually a handful of things wrong appearing in multiple places.


I would compile it with adress sanitizer. Its very easy and already included into gcc. You basically only need to add -fsanitize=address to compiler and linker flags and -static-libasan.

Then you run it and when it dies address sanitizer tells you where the memory corruption occured. If the issue is pointer casting you 100% find it.

Then you can try to fix it. Most likely its s.th. like this:

Say you have a "long*" and the code assumes "ok a long has 4 bytes". Then casts the long* to a char* and does some fancy low level algorithm c programmers love.

But now you compile on debian 11 and suddenly a long is 8 bytes. And then ... it dies a horrible death.


There's a tool in the zig toolchain called `zig translate-c`, which does exactly what it sounds like. You might find some success in understanding the translated zig code for C snippets that you don't understand.


> (--(char)hash_ptr.parm.v.p) = fpop(sst);

Yeah that’s not very portable. It will work on x86 but will die on MIPS/SPARC a horrible bus error based death unless there are special invariants on that pointer dictated elsewhere.


No, that doesn't even work on X86:

    In file included from kernel.c:181:   
    words.c:2012:11: error: lvalue required as decrement operand
     2012 |         *(--(char*)hash_ptr.parm.v.p) = fpop(sst);
          |           ^~


It used to, as a GCC extension, which was removed in version 4.0.


Every time I approach this problem, it frustrates and educates, even in the last 24 hours.

I got a message from Florian Hars, who got it to compile and mostly working on Ubuntu 20.04 with GCC 9.4.

Amazing

https://github.com/fhars/stoical

I also got a message from Joshua Saxby, who put a ton of work into it as well. Like me, just getting it past the compiler was the goal.

https://github.com/saxbophone/stoical/tree/josh/cmake-build

Since Florian actually got it working, my instinct is to just import Florian's version into git. Does that seem reasonable? I can then start digging into the warnings, documentation, etc. and improve it from there.

Thanks again to everyone for your help!


> ...Since Florian actually got it working, my instinct is to just import Florian's version into git. Does that seem reasonable?

Bringing the original code to a runnable state is fairly straightforward using the latest gcc with mostly default options. Either with or without -Wstringop-overflow=0 (to take care of the string type design). Just the three funky lvalue decrements need to be untangled. Well, and the newlines added in the help.c

However, getting the resulting interpreter to pass the available tests (make test) could be a challenge. Not sure if anyone interested got these tests pass fully, which are mainly array and hash tests. It'd be helpful to have the included examples also run properly.

This may need either a code review or in-depth debugging to figure out the problems. Before embarking on extensive cleanup, modernization, porting to other platforms.


You could do a binary search on Debian 3 - start upgrading components of the toolchain until it breaks, and then play around with whatever you upgraded.

Likely there's a version of GCC where it works, and one where it doesn't, or the equivalent for a library.

Getting the latest GCC to build on Debian 3 might be an adventure in itself, of course.


I tried the opposite, compiling gcc 2.94.3 on Debian 11... it didn't like it one bit.


Whatever is breaking 2.94.3 on D11 is probably also breaking the other program, likely a 32bit library or something.


Before anything you need a test suite with good coverage of both lines and functionality. I recommend finding a way to mutate the code randomly then using that to find bad test coverage.

If you don't have good tests you will be completely lost and unable to make changes incrementally.


Probably do the same thing that the Linux Kernel did when it wanted to move to a newer version of C.[0] Make the change, see what breaks.

[0] https://lwn.net/Articles/885941/


I've been trying to do that... but the code is so opaque in terms of pointers and pointer math that I can't really tell what's going on.

Example:

     *(--(char*)hash_ptr.parm.v.p) = fpop(sst);

I know that pointer math in C is weird, and increment/decrement aren't necessarily +/- 1, but rather the size of the object pointed to??? THis is one of the first things that blows up when you compile it with a modern gcc. src/words.c - line 2012


The char* cast there makes it explicit what the decrement operator does here: it decrements hash_ptr.parm.v.p by one because the value is treated as a pointer to char.

Maybe the problem is that fpop() returns something else than a char?


The operand of -- must be a modifiable lvalue. The result of a cast is never an lvalue.

EDIT: This might be use of generalized lvalues: https://gcc.gnu.org/onlinedocs/gcc-3.2.3/gcc/Lvalues.html - latest gcc manual doesn't mention it though, so perhaps it's been removed?


Yes,it is gone since gcc-4.


So it decrements the pointer by 1, then uses that updated pointer to store a character in memory (the outer *)?


Yup. -- decrements by 1 * sizeof(type_being_pointed_to) bytes. Since the pointer was cast to (char *) first, the type being pointed to is char, which is 1 byte on most systems.

As others have mentioned, take some time to learn C on its own -- it will make the porting effort go faster, and serve you well generally.

Have you read "The C Programming Language" by Brian Kernighan and Dennis Ritchie? (but unless that code base is really old, I suggest later-than-first edition)


In your mind, you can expand the code to make it clearer:

  char *p;

  p = --(char*)hash_ptr.parm.v.p;
  *p = fpop(sst);


(char) cast looks like being used for address alignment.

Need to check what address alignment for target environment is and change the cast to appropriate alignment. aka sparc addresses are not 8 bit aligned!

hash_ptr.parm.v.p points to top of sst stack containing an 8 bit aligned pointer address. p == ---(.....) gets address to pop off stack. missing a step/statement. sst needs to be in struc address given by p. aka hash_ptr.parm.v.sst p should then point to the result of fpop(hash_ptr.parm.v.sst


fpop() is a typed-pop and yields a float from the value union. This design also uses float storage for any number including integers.

So in the code fragment above, it seems that the fpop() number is saved into a character value (apparently with some certitude that it should fit into char type) at some pointer destination in what seems to be an array of chars.


My guess is that the problem isn't with this line per-se, but that some of those components in that string of dots are unions, that the program is doing one of those things with unions that have always officially been undefined behavior (so older standards aren't going to help) but which used to happen to work reliably, and that that's tripping up the modern compilers.

I'd look at the types of hash_ptr, hash_ptr.parm, hash_ptr.parm.v, and hash_ptr.parm.v.p; I'd try building without optimizations (-O0) and see if that changes anything; and as others have said I'd try pulling it up in a debugger so I can poke things.


Compilers are pretty good at telling you what they’re unhappy about these days so you’re well on the track as opposed to chasing bugs in gdb.

Usually when I get to the stage where whatever I’m working on is ready to build I pipe stderr to a file and just go down the list fixing all the little things until it works (or usually segfaults).

I’ve successfully updated quite a few super old codebases to play around with by just fixing each compiler error until it works or I have to go hunting real bugs.


1) Byte align the value of hash_ptr.parm.v.p

2) Decrement hash_ptr.parm.v.p

3) hash_ptr.parm.v.p points to the return value of fpop(sst)

commentary:

1) Hardware / variable type specific! aka can be 8, 16, 32, or 64 C can pack the bit representation in unaligned fasion.

2) Hardware dependent -- some do not permit operations on addresses!v

3) Looks like fpop(sst) returns an address containing value (vs. returns value) which impliess that lower, unused bits in address are being used for storage.

short answer: OS and/or hardware may not permit non-zero values in lower, unused address. At hardware level, executable & data bits may be required to be in different locations.


Assuming the license terms on the working binary allows decompilation;

could use a disassembler to psudo C on the original binary and same hardware, new 32 bit compile (ideally with same compiler (gcc/clang/etc) used on original binary)

Diff the resulting pseudo source code to see where things get interpreted differently.

Then use compiler explorer on the "differences", https://godbolt.org/z/v4TdvE83b

open source decompile to pseudo-C:

ghidra: ghidra-sre.org

retdec: https://github.com/avast/retdec


Evaluate how 'similar' the pseudo soure code is via source code plagerism detection tools.


use gcc -fdump-translation-unit switch compile old code, compile new code.

compare the differenct between the old gcc AST against the new gcc AST (or differences between the AST's generated when different compile switches are used) Doesn't mean there aren't assembly language differeces, but will narrow things down.


Learning c via pointers : https://learncodethehardway.org/c/

Different topic of containing values in an address vs. storing values in address given by address number -> https://libcello.org/learn/a-fat-pointer-library


> ...It's become an obsession to get this thing running.

I know that feeling... Well, if you're just to have it running somewhere so that you could play with it, then it's just a matter of setting up a VM with old enough build tool chain.

On the other hand, if you indeed want to "modernize" the whole thing, then instead of simply throwing the existing project at a new compiler/tool chain, I'd rather embark on code analysis first to make sense of the following:

* the project structure (it appears to be rather compact and modular)

* dependencies, both external and internal

* for the internal dependencies, figure out the main primitives used and their APIs (like string, cell, ccell, voc_entry, hash etc.). These could be exercised separately to establish the expected behavior. Maybe write some tests.

* the build system. Well, it is autotools, so possibly it should just be plug-n-play now, unless some special config was used (like for packaging)

Once the code structure is more or less clear, the primitives isolated and possibly buildable separately, I'd delve into a boring and mindful review of the core operations code with the objective to itemize the assumptions/limitations (memory model, type sizes, some funky expressions like you showed).

While in the review, I believe it'll get clearer what part of the project indeed will need the modernization. This will guide your choice of compiler options to use with the newer compiler.

Well, it's very much doable, as the project seems compact and has some tests and examples. As for the expected performance of the "ported" code, well, this may be trickier to gauge, as some of the older gains may be unrealizable under the new architecture or may need some additional review just for that.

It's a nice project, and being already in github may attract some interest/help, if you find it worthwhile to put effort into it. Good luck!

P.S. I wonder, what is a simple use-case for a language like STOICAL? How could it or a similar lang be used in a modern day?


>P.S. I wonder, what is a simple use-case for a language like STOICAL?

Forth can, in theory, do anything Lisp can do... except it can't in practice. A big part of that this the ability to manipulate trees and strings as native objects. STOICAL solves that issue, and thus should make it possible to build almost anything. I want to explore that idea, but first I have to get this thing running.

As for the archaic vocabulary, once I can get it compiled, and grok the structure of it, I can change those words around and use anything. It becomes a matter of personal preference.


Use the -m32 flag when compiling.


That comment is rather pithy, but may be the most useful one of the lot. You can compile 32bit x86 executables on a x86_64 system, this should be one of the first things to try. If this works (try -O0, everything else is just asking for trouble at this stage of your project), your problem is that the interpreter will most likely assume that integers and pointers are the same size and cast freely among them. Fixing that will be ... tedious, at best.

edit: but then the code supports alpha, so it should be 64bit clean...


> (--(char)hash_ptr.parm.v.p) = fpop(sst);

If I'm understanding correctly, it looks like it's saying "assign the string (i.e. char) returned from `fpop(sst)` into the slot before hash_ptr.parm.v.p`. It definitely seems unnecessarily obtuse; casting a value to a pointer, then decrementing it to get the address of the location before it and then dereferencing that to assign to. I honestly can't recall off the top of my head if doing `--some_pointer_value` would mean `some_pointer_value - 1` or `some_pointer_value - sizeof(char)`, which is always something that used to trip me up.


Char does imply string but byte in C


It looks like the `*` that I copied got interpreted as markdown by HN's parser, and unfortunately it's too late for me to edit it. The code snippet I copied should show a cast to `char *. I suppose it's possible that OP's code could actually just be using it as a pointer to a single byte, but it seems fairly reasonable to assume that the value is a C string (i.e. pointer to a set of bytes with a null terminator) given that this is a Forth implementation (i.e. a stack based language) and the value that's being assigned looks like it's being popped off a stack.


There's no way around learning C if you really want to take this on. You said you're willing to dump lots of time in it, so do that first, and then you'll be in a much better position.


Original C wasdesigned to map as close to one to one pairing between assembler and c statements.

Will also need to understand the hardware assumptions implied in C code being ported.


You forgot to link to your efforts from 2 years ago: https://github.com/mikewarot/stoical


At your suggestion, I added it to the post. In retrospect, that was a botched attempt. Once I finally got the original code up and running I didn't even realize it was working because the dialect of Forth is so different. It has almost non of the FIG standard forth words.


I recommend 2 things 1 statically analyzer,Lint etc This will takeout in advance some problems in the migration 2 maybe create part of the toolchain for the migration, some of the new compilers are modulars and can make the c99 to c11 transition easier, if you don't want to re-implement the code by hand (wich is always a option)

https://m.youtube.com/watch?v=_cIVa-RctcA&t=325s


> How do I, a non-C programmer, who is willing to dump tons of time into it, migrate this code into the year 2022?

Unfortunately, I suspect the answer is going to strongly involve dumping time into becoming a C programmer. (I say this as someone in a somewhat similar situation; I had no intention of learning C, but I wanted to get into dev work on existing unix-family OSs, which are pretty much all written in C, so guess what I'm learning?)


fixed ONLY the compilation (still crashes) for gcc 11.2.0 under Ubnutu 21.10 https://filetransfer.io/data-package/Xp1NUg99#link (that should still build/run on your debian 3)

there is a string struct with this inner char s is super-evil

in kernel.h

    /\* all strings on the stack will have this format. */
    typedef struct {
     unsigned int l; /* length of string */
     char s;  /* string itself \*/ 
            // the rest of the string data
    } string;
string get allocated by malloc(sizeof(string)+stringlen+1) and member char s is then used as the first char + the rest chars from the attached memory

the memory managment is overall a little dirty and i think it will take some time for a experienced C developer to get that clean and running with todays compiler/checks etc.


>... there is a string struct with this inner char s is super-evil.

Apparently, the author wanted to minimize the size of string object by getting rid of the data pointer. Makes sense in general.

However, with modern gcc this use triggers secure check to prevent memory overwrites/ string overflow.

The check could be disabled/minimized by adding to CFLAGS -Wstringop-overflow=0. Just to prevent it from crashing on this.

Also this design could be salvaged by replacing direct references to string->s (string's data pointer) with an inline call to some func:

  char* c_str(string*)
which would return an offseted pointer to the data off string* base pointer. If this is done, then, probably, the stringop-overflow option could be reverted.


Likely as someone mentioned the code assumes the size of an int and a pointer are both 32 bits. And then is casting one to the other and of course barfs on modern system where that isn't the case.

Does it compile without warnings?

Did you try compiling with the -m32 flag to generate pure 32 bit code?

You can always try and run it under gdb


Need to know hardware alignement requirements in order to align a given value post struc extraction. https://gcc.gnu.org/onlinedocs/gcc-3.2/gcc/Variable-Attribut...


Does it compile with -std=c89?

Looking at your example though, I expect it assumes 32 bit when doing some pointer manipulation. I also assume you are building it now on a 64 bit system. That is likely the issue.

You will need to look at the pointer handling and port to 64 bit as required.


Maybe you could try using c2rust [1] that transpile C to Rust, then try to port it back and forth. It would be much better if you can use Rust though.

[1]: https://github.com/immunant/c2rust


to my understanding the left side breaks down as follows: (char)hash_ptr.parm.v.p, the char means to interpret the 'hash_ptr.parm.v.p as a pointer to an 8 bit character.

The -- before it says to decrement the pointer by one, the value of hash_ptr.parm.v.p is changed. i.e. go backwards one character position.

The leading * mean to overwrite the character that is pointed at by hash_ptr.parm.v.p is overwritten by the new value.

Migrating between version of C, the most common problems I have seen is the use of int (which varies), and the size of a pointer.

Furthermore, when I have migrated code before, I have often encountered the case where in the migration a latent bug was discovered, usually involving the stack some how.

I wish you the best of luck.


> it dies a horrible death

The answer depends on the exact meaning of that quote.


What sort of "horrible death"? Do you have a dump? As Admiral below said.. you will likely just need to make the changes then hack away at the problems one-by-one.


Sounds like C alignment mismatches. aka fetch value from struc, where value is not stored aligned in struc. Use value without aligning value and/or aligning on byte boundary (*char) when hardware requires 64bit alignement.


Actually why do you want to move to c99? Shouldnt the goal be that it just compiles on current compiler? Why not solve that first?


What is your goal with the code? is there a reason you can't just play with it within a VM running debian 3?


how far can you go up from Debian 3, build the code and it runs fine?

Debian 3, 3.1, 4, 5?


I make several porting projects, regularly as my job in this decades (including learning new language along it), so some tips, adjusting for the fact this is C (auch!):

- Learn well a modern lang (Rust, Zig, Modern C?, ... lets call it the porting LANG).

If you are not a OLD HAT C developer and was "raised" in more modern things, going straight to some modern alternative to C (Rust, Zig, ...) could pay off much more if you are really for the long term.

This not replace the fact you need to know enough C to learn this, but is for your sanity long term (You can disregard this if plan to be a C developer for long).

Is important because:

- Collect a list of requirements and dependencies. C requiere to recreate a lot of basic stuff (like hash, strings, etc).

You can certainly throw all of that. Even if that implementation has some useful property, you don't need that burden (re-implement that) right know.

Modern langs also have better idioms to use memory (no manual mallocs!) and that is another burden that is not worth long term.

p.d: For what I see, this project has TOO BIG of a "re-creating basic stuff that any sane lang has in their std library" than actual code for make the idea work. So spend time there is lost time, IMHO). If stay on C, look for already-implemented alternatives that let you focus in the main tasks.

- Eliminate/Simplify your tooling. If CMake sucks, it sucks. Replace it with the most sane alternative for your LANG.

- Prepare yourself for quickly and merciless refactoring all, all the time. The faster you get in the cycle of "attempt understand, implement, test, rewrite, restart" the better.

All the above is to improve your velocity at the start or in the long term.

In no particular order:

- Put a debugger. Step in the code.

- If you manage to compile the old code, and knowing this is a lang, you have a good chance to make this great for testing: Input STDIN and output STDOUT and use that to confirm things!

- Collect a "call tree" using any means (notes, a tree editor, etc).

Implement things in a new blank project, and write the functions. Be sure you understand the small steps. When stuck, create another mini project and try to pull the idea alone.

This is harder than "just move forward" but the point is that you need to understand the project, not just make it compile, right?

- DO NOT punish yourself if the porting is not 100% exact.

Is more important to match "outcomes" than "exact outputs". Is very likely that you hit a bug or a hack and recreating that stuff is not worth in this case, IMHO.

- Ask things to the community! Go into a good place where discuss stuff for your LANG and ask for help. Also, you can consider go into places like https://www.reddit.com/r/ProgrammingLanguages/ for META questions about language development & paradigms.

P.D: Consider this ideas as not replacements for the others in this thread, just extra things to consider!


Advice to port it to another language when they don’t understand what it is doing in the original language seems like a recipe for failure to me.

I’m no professional coder but I like to play around with stuff and have learned enough C/C++ over the years to code my way out of a paper bag specifically from not trying to reimplement whatever I’m playing with in another language — well, unless that language is Ruby[0] then I fully agree.

[0] there was this project that took xml schema and output something (dunno cause ruby) that I ported over to python so I could actually make sense of it because the technique they used was kind of brilliant.


> Advice to port it to another language when they don’t understand what it is doing in the original language seems like a recipe for failure to me.

That depends. I make in my career A LOT of ports from OLD -> NEW. In the very short term, is not good.

But in the mid/long term is certainly right if the new lang provide significant advantages (and in the case of C: Almost any other lang provide significant advantages!).

The case of C make some no-brainer stuff go out of the window. Making your own hash-table? String? Memory management? All that is a mini-major project and besides the core task.

If you don't know C very well, you will screw up your strings, your hash, your memory: How that helps when ALSO you screw up your core tasks?.


What's the exact error code?


hello my friend


I don't understand what "migrate this code into the year 2022" means. What's the actual problem you're trying to solve? If:

    wget, unzip, untar, ./configure, make, su make install and it works perfectly
Then, you're done, right? Why do you need to make any changes? If it doesn't like -std=c99, and it works without it, just don't compile it with -std=c99.

Are you trying to port it to some other host or architecture? If so, you'll need to learn C to port it.

Are you trying to port it to a 64-bit target or something like that? If so, you'll need to learn some C.

Does the existing code compile and run but has show-stopper bugs for you? If so, you'll need to learn C and fix them.


You cut the operative "... under Debian 3" from your quote. The current Debian is Debian 11, that is eight major releases later, and Debian is not exactly famous for its blazingly fast release schedule.

(For context: When Debian 3 was released, the current browser versions were Netscape 6.2.3 and Internet Explorer 6.0)

See also https://www.debian.org/News/2002/20020719


He also said, "Taking the executable from Debian 3 to Debian 11 (32 bit) and it works perfectly." Maybe I'm the only one, but I still think the problem statement is totally missing. A simple "I'm trying to get it to compile using gcc [version x], glibc [version y] on Debian [version z]." would have clarified the end goal.


Poster wrote that it works when compiled on Debian 3, but not on Debian 11. Support for Debian 3 ended on 31 March 2008. Most recent Debian release is Debian 11. Probably the most important things that changed are GCC and glibc.


Looks like 64 bit support is his main goal. Sounds like lots of pointer arithmetic in there that doesn't port well.

Agree that they'll definitely have to learn C. Sounds like this will definitely teach them some of that!


No, the original code supports Alpha, so it should be 64bit clean. It's got loads of other issues, though. Not respecting sequence points seems to be the most obvious.


Al, yeah, but does Alpha handle addresses boundaries on 1,2,3,4 byte boundaries. If its, more than one, it might compile, but will not run. https://gcc.gnu.org/onlinedocs/gcc-3.2/gcc/Variable-Attribut...


Some implicit hardware references implied by use of C casts. C compiler doesn't guarantee bit spacing/alignment in strucs.

(*char) is used to force pointer address inside of a struct to align on byte boundary which is not true of all hardware! aka adresses allignment may need to be 16, 32, or 64.


Perhaps not the answer you're looking for, but look for another library or a re-implementation of it.

> Anything newer, even with -std=c99, and it dies a horrible death.

Yes because the compiler is catching loads of undefined behavior that the code employs. I wouldn't suggest using this library and depending on what it does I'd just suggest finding a newer, more up to date library. Using it as-is probably worked for gcc at the time but it's anyone's guess as to how it'll run today.

If you don't care about any of this, just keep turning off warnings (`-Wno-unsequenced`) until it compiles correctly.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: