Lessons learnt while trying to modernize some C code (2022)

smcameron · on Feb 19, 2023

> C has a type system, but the developers are ignoring it, preferring to move stuff around as void *. Obviously, it shouldn’t be necessary to do that, and this was one of the things I wanted to fix by moving the code into a C++ compiler.

> This is a case of one function infecting the entire code base. In this case, a function that is often invoked, the xmalloc family, a wrapper over malloc-like functions that return pointers to void.

This makes no sense to me. In C, you can assign a void pointer to any typed pointer, no casting required, i.e.:

    struct foo *f = malloc(sizeof(*f));

The answer here isn't to rewrite it all in C++, but to learn C in the first place.

Gibbon1 · on Feb 19, 2023

> move stuff around as void *

People do that because they don't understand how to define and use opaque pointers.

Also if someone would put the C standards committee on an ice flow and replace them with people willing to add first class types and tagged unions to C that would be nice.

avinassh · on Feb 20, 2023

> People do that because they don't understand how to define and use opaque pointers.

what are opaque pointers?

masklinn · on Feb 20, 2023

Pointers to a well-defined type, but one which does not reveal the underlying datatype so a user is unable to interact with the actual data without going through the library’s exposed API (or at least strongly disincentivised from doing it).

IIRC the normal way to do that in C is to use an incomplete type, aka just have a

    struct foo;

In the header file and define the actual struct contents in the implementation file.

Gibbon1 · on Feb 20, 2023

You can create a pointer with a type but no storage definition.

   // foo_t is opaque
   typedef struct foo_s foo_t;

   // function defined taking an opaque pointer
   int Baz(foo_t *f);

   // actual definition
   typedef struct foo_s
   {
      int bar;
   } foo_t;

   // function def
   int Baz(foo_t *f)
   {
      return f->bar;
   }

gsinclair · on Feb 20, 2023

ice floe*

bitwize · on Feb 19, 2023

No. For a project like this you absolutely want the stronger type guarantees and abstractions that C++ gives you.

Were it me, I'd RIIR, but C++ that uses the type system and const-correctness in the proper way is an improvement.

WalterBright · on Feb 19, 2023

Const-correctness doesn't actually work in C++. It's more of a documentation aid.

(It doesn't work because it can be cast away. It's also not transitive, so is ineffective with generic template code.)

D code won't allow const to be cast away (unless in system code), and is transitive. Sometimes people complain that this is too harsh, but it's reliable and effective.

maccard · on Feb 20, 2023

I can't believe I'm about to try and correct you of all people on this but you can't just const_cast away[0]. The actual problem is that you have no guarantee that what you're being told is const is actually const. If you have a const ref to an object, that doesn't mean it's actually initialised as const, for example.

[0] well of course technically you can, but casting a CV qualifier away and modifying it is UB.

WalterBright · on Feb 20, 2023

Yes, you are correct, it is UB. Try this code:

    void foo(const int * const p)
    {
        *(int*)p = 3; // UB? So sue me!
    }

It compiles with g++ and clang++ with no warnings or errors. gcc and clang, too. Sure, it's UB, but the compiler won't even warn you.

I.e. it serves no purpose, other than documentation. Let's try it in D:

    @safe
    void foo(const int *p)
    {
       *cast(int*)p = 3;
    }

    Error: cast from const(int*) to int* not allowed in safe code

maccard · on Feb 20, 2023

I think we're in agreement here, but speaking past each other! Your snippet isn't actually UB though, it's perfectly fine. See [0]

> I.e. it serves no purpose, other than documentation Yeah, agreed.

[0] https://gcc.godbolt.org/z/v4qYME8fP

shrimp_emoji · on Feb 19, 2023

One problem is that, when you cast to `void*`, you discard size information.

I have a `double*`, I know the size of the object behind that pointer is `sizeof(double)`. If I have a `void*`, and nothing else, all bets are off.

masklinn · on Feb 19, 2023

OP is saying that you don't need an explicit cast to cast from a void* (which is what malloc returns).

shrimp_emoji · on Feb 20, 2023

In that case, I guess

>C has a type system, but the developers are ignoring it, preferring to move stuff around as void *.

is alleging that C programmers insist on literally keeping their allocated variables behind `void*`s.

That's too weird to me. :p Because, yeah, you don't need an explicit cast from a `void*` returned from a `malloc`, and I assume everyone tacitly assigns that `void*` to a `Foo*` immediately.

The only case where you're converting stuff to `void*`s is when passing/taking it to/from "generic" APIs, where it could be any type. (Hence "move stuff around".) I thought OP was alleging there's no problems with this practice, although that interpretation of mine doesn't make sense from what they said in retrospect.

wizofaus · on Feb 19, 2023

"Some non-zero multiple of sizeof(double)", surely?

mananaysiempre · on Feb 19, 2023

s/non-zero //

  double x[1], *p;
  p = x + 1; /* legal though *p wouldn’t be */

wizofaus · on Feb 20, 2023

Surely in that case x[0] is the object "behind" the pointer (assuming adding 1 puts you ahead!). But using the orginally intended meaning of "behind" here, logically there is no object at all...

void_ptr__ · on Feb 19, 2023

They may be referring to xmalloc(3).

What you said is true however they are transitioning to a C++ compiler so your complaint is off-base.

masklinn · on Feb 19, 2023

> however they are transitioning to a C++ compiler

OP says they want to transition to a C++ compiler but it's not clear why, the very first issue is not actually one. As GP notes, you don't need to cast out of malloc and you need a macro even less.

quietbritishjim · on Feb 19, 2023

When the article says that implicit conversations from void* "shouldn’t be necessary" they seem to mean they feel that, philosophically, it ought not to be necessary. Certainly it doesn't protect against obvious compile time mistakes, which ought to be the point of a type system:

    /* Compiles in C */
    int* x = malloc((sizeof)short)

    // Does not compile in C++
    int* x = new short

ginko · on Feb 19, 2023

I mean sure, but that seems like a minor thing compared to the codebase passing around void*'s left and right. OP could have tackled that first while sticking with C.

I've had a brief look over the goaccess source code. It's 37kloc of pretty average looking C. If you think that's even close to the most complicated software projects written in C then I don't know what to tell you.

bornfreddy · on Feb 19, 2023

Link if anyone else is curious (and lazy) : https://github.com/allinurl/goaccess

EDIT: looked at the code... I don't see void* being passed around, data types are used, no excessive #ifdefs, it looks like a pretty clean C app to be honest. Maybe the problem is that OP is used to C++? (as the advice to use a special struct for strings would indicate)

Either way, goaccess project looks awesome.

mananaysiempre · on Feb 19, 2023

What are the typing-related issues xmalloc has that plain malloc doesn’t? As far as I can see, they’re equivalent in that respect.

jalino23 · on Feb 19, 2023

I just recently started learning c and having hard time wrapping my head around third party libraries.

Coming from Node/npm I’m used to the idea that your dependencies gets bundled in your app. But its not like that on C.

you expect your users to install your app dependencies for you, and you have to figure out the path of your own dependencies.

cause they don’t live in one nice place like node_modules. and I haven’t even began to figure out how Im supposed to know what linker flag to use for the library I use.

the tutorials just give it to you without explaining where they get it from.

and most of the sources I’ve read almost always recommends dynamic linking cause static is heavy. which brings me back to this article that now recommends static linking and I agree with that.

much of the learning materials I’ve come across are just spoon fed to you without explaining why. And its so incredibly frustrating until you look at linux history, most of the assume knowledge now is based on baggage upon baggage of layers of history.

enriquto · on Feb 19, 2023

> you expect your users to install your app dependencies for you

Yes, you do. And that is a good thing.

> and you have to figure out the path of your own dependencies.

No, you don't. Once a build dependency is installed, the compiler and the linker find it automatically.

All of this is easier than it sounds. For example, say your program depends on libpng. Then your users will have to "apt get install libpng" (or whatever your distro does). Then on your C files you just write #include <png.h>, on the call to the linker you will have -lpng and everything will work correctly.

The C ecosystem does not favor dependencies whose api changes every month, so this will work with whatever version of the libraries you have. Of course, you are also free to bundle all the dependencies that you want with your code. And you can easily distribute static executables that will work everywhere.

VWWHFSfQ · on Feb 19, 2023

> No, you don't. Once a build dependency is installed, the compiler and the linker find it automatically.

Except when they don't.

And you're left to figure out the right incantation of linker flags to use. Maybe give up and start symlinking stuff around until it works.

db48x · on Feb 20, 2023

Compiler and linker flags are so annoying that people built tools to make developers lives easier. They aren’t amazing tools, but they get the job done. In this case, you would run `pkg-config -cflags libfoo` and `pkg-config -libs libfoo` and use the results when you invoke the compiler and linker.

enriquto · on Feb 20, 2023

> Compiler and linker flags are so annoying (...)

Hmmm. I'd rather say that some people who package libraries are very annoying! If a library needs specific compiler flags, or linker flags beyond "-lfoo", this means that somebody is doing something wrong. Out of incompetence or out of malice.

db48x · on Feb 20, 2023

Well, I dunno. libpng needs -lpng16 and -lz. Are you saying that libpng should have a staticly–linked copy of zlib inside of it, so that if your application needs both libpng and zlib you will end up with both? That’s probably unpalatable to most people.

enriquto · on Feb 21, 2023

> libpng needs -lpng16 and -lz

Where? I'm compiling the same program with exactly the same makefile on several linux distributions, openbsd, freebsd and macos and "-lpng" is all it needs.

This is for dynamic linking. For static linking I need to add -lz explicitly, you are right about that.

woodruffw · on Feb 19, 2023

Linux and other Unix-likes have a standard for this: if you want to link against libfoo, you pass `-lfoo`. The standard linker knows to expand that to some variant of libfoo, whether static or dynamic, with additional versioning, etc. If libfoo isn't in a standard library search path (e.g. it's being built in a `./vendor/` directory), then you need to additionally tell the linker that with `-Lvendor -lfoo`.

There's also a lot of silly memorization involved (e.g. remembering that zlib is usually packaged as `libz` so you need `-lz`, remembering that the non-SSL parts of OpenSSL are `-lcrypto`, etc.). C programmers tend to build mental scar tissue around these things and forget how pointlessly complicated and friction-inducing they are.

JHonaker · on Feb 19, 2023

Yea, I have been learning C recently too. I have found the experience quite a lot more enjoyable than I thought I would. I was originally motivated so that I wouldn't be "scared" to involve myself in working with C FFI calls in the various other languages I use.

I've found it just as much fun as Scheme, my favorite language to work in. If you think of Scheme as the minimum-usable implementation of lambda calculus, then C is like the minimum-usable implementation of a Turing machine. They're kind of dual to one another in a very fun way.

The one thing I'm really struggling with, like you, is understanding the C "ecosystem". Anything past Makefiles is so complex, especially for my uses. Using something header-only library like STB or a simple library like Raylib is super straightforward, but figuring out what, when, and how to use things like pkg-config, Cmake, Meson, or other tools is a real headache. Does anyone have any favorite resources?

I'd also appreciate something more detailed about modern C style, when and why I'd need a custom allocator, or anything else you wouldn't see in K&R or other introduction.

PS A lot of people say things like "libc" sucks, you shouldn't be using XYZ function from libc. What should I be using instead? I'm all for learning via implementing things myself, but surely that's not the recommended route for everything in C world.

enriquto · on Feb 19, 2023

> Anything past Makefiles is so complex, especially for my uses. (...) to use things like pkg-config, Cmake, Meson, or other tools is a real headache. Does anyone have any favorite resources?

Yes. The GNU make documentation. Just don't use cmake and similar stuff. You can typically rewrite your complex, non-working Cmakelists as simple, straightforward Makefiles.

irowe · on Feb 19, 2023

The GNU Make documentation is excellent, and you can get t a long way with handwritten Make files, but when you move beyond a few translation units that have different needs, it be becomes much easier to maintain things with CMake.

lost_tourist · on Feb 20, 2023

Choose cmake, learn it well, and it will make learning other systems much easier. If you jump from one build system to the next and just get a cursory understanding you will be hopping around forever.

rattlesnakedave · on Feb 19, 2023

Yes, this is crap, and not useful on modern systems. The only reason many claim to not think so is because they’ve been forced to figure it out, and now want others to as well.

rwmj · on Feb 19, 2023

It works in conjunction with your distribution package manager, so in Fedora I install the dependencies I need using 'dnf install foo-devel' or if the program is already known to Fedora I can install all the dependencies with 'dnf builddep program'

JCWasmx86 · on Feb 19, 2023

You can use pkg-config for that. E.g. if you want to compile with glib-2.0, you can run `pkg-config --cflags glib-2.0`: -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -I/usr/include/sysprof-4 -pthread (E.g. for compiling to object code, but not linking yet)

Add `--libs` to link against it, too: -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -I/usr/include/sysprof-4 -pthread -lglib-2.0

With meson that is basically `dependency('glib-2.0')` that you add to your executable/library.

> you expect your users to install your app dependencies for you, and you have to figure out the path of your own dependencies.

If you use a good buildsystem, it will find those dependencies automagically or you can e.g. use wrap dependencies: https://mesonbuild.com/Wrap-dependency-system-manual.html Those do automatically download the dependencies and compile them for you

johndough · on Feb 19, 2023

> You can use pkg-config for that. E.g. if you want to compile with glib-2.0, you can run `pkg-config --cflags glib-2.0

That works sometimes, but is there a general way to find out what to pass to pkg-config besides consulting StackOverflow (or now perhaps ChatGPT)? For example, my recent history looked something like:

    sudo apt install libopencv-dev
    pkg-config --libs opencv # did not work
    pkg-config --libs OpenCV # did not work
    pkg-config --libs opencv-core # did not work
    pkg-config --libs opencv45 # did not work
    pkg-config --libs OpenCV45 # did not work
    pkg-config --libs opencv4.5 # did not work
    pkg-config --libs OpenCV4.5 # did not work
    pkg-config --libs opencv-4.5 # did not work
    pkg-config --libs OpenCV-4.5 # did not work
    pkg-config --libs opencv-4 # did not work
    pkg-config --libs OpenCV-4 # did not work
    pkg-config --libs opencv4 # success!

kps · on Feb 19, 2023

The Unix answer was to read the man page. The Debian answer (since I see `apt`) is `dpkg-query -L libopencv-dev | grep /pkgconfig/`

JCWasmx86 · on Feb 20, 2023

At least for me, pkg-config has auto completion (Fedora), so typing e.g. `pkg-config open<TAB>` would give probably give me some results. But I agree that's one weakness of pkg-config, that you have to guess the name a bit

MawKKe · on Feb 19, 2023

pkg-config --list-all | grep keyword

lost_tourist · on Feb 20, 2023

grep -i :) since case sensitivity matters

UltimateEdge · on Feb 25, 2023

Out of curiosity, what resources are you using to learn C? I too have started learning it recently, and I am working through "Modern C" (Manning).

bee_rider · on Feb 19, 2023

For now I’d skip worrying about managing dependencies. Just use system libraries or put your dependencies somewhere you can easily find them. Packaging and shipping code to users is a problem for after you are familiar with the language.

Anyway, the typical C program doesn’t pull in a ton of dependencies.

zabzonk · on Feb 19, 2023

learn c++ instead - despite what you may have heard, it is much easier to learn than c.

but neither language (and they are very different) has standardised library/package tools.

MattPalmer1086 · on Feb 19, 2023

I don't think this is good advice, although I don't agree with you being downvoted because of that.

C++ is a much more complex language than C. The things you need to worry about in C are far fewer than in C++.

_yvc3 · on Feb 19, 2023

>C++ is a much more complex language than C

While this might be true, generic containers, sane strings, RAII and the possibility of not having to deal with pointers makes C++ an easier language to work with. You certainly need to worry about much less things long as performance is not a priority.

bsder · on Feb 19, 2023

> sane strings

So, explain to me why every C++ program always has its own String implementation?

> long as performance is not a priority.

Then C++ is a disastrous choice. Almost anything is better.

zabzonk · on Feb 19, 2023

>So, explain to me why every C++ program always has its own String implementation?

i have no idea where you are reading about c++, but every c++ program i have read or written in the past 20 years or so does

#include <string>

std::string greet = "hello, world";

the full-stop in front of the hash is something with HN, not me or c++, and only seems to show up when editing.

inglor_cz · on Feb 19, 2023

Before C++11, std::string with non-ASCII characters was terrible. Old programming languages/standards tend to disregard i18n.

zabzonk · on Feb 19, 2023

as was (and is) c

_yvc3 · on Feb 19, 2023

>So, explain to me why every C++ program always has its own String implementation?

*saner :)

>Then C++ is a disastrous choice. Almost anything is better.

True, C and C++ are most of the time not the right tool for the job nowadays. But, if it's just a project that you're working on in your free time, it wouldn't harm to write it in your favorite language. I write in C++ a lot even if I could do the same thing much easier in other languages.

camel-cdr · on Feb 19, 2023

This is only the case, because the C standard library is quite small and has many bad parts. You can use something like STC [1] to even the playing field.

[1] https://github.com/tylov/STC

wizofaus · on Feb 19, 2023

The last sentence doesn't really follow from the previous one. C++, like other higher level languages, has complexity built in because it takes care of things that in C you have to take care of yourself. I can't imagine any serious C++ project becoming "less complex" to work on by rewriting it in C.

uecker · on Feb 19, 2023

I personally found it very relieving when I switched from C++ to C. (and I also rewrote some stuff, it did not get more complex.) I find readability of modern C code much better than C++. In C you see what you get: no templates, overloading, virtual inheritance, namespaces, exceptions, references etc you need to keep processing in your head just to understand what is going on.

wizofaus · on Feb 19, 2023

I don't doubt that can be true - but there are plenty of common operations like concatenating strings or converting between strings/numbers that are a good deal simpler to write and understand in C++ than they are in C. In particular there are far more ways to get them wrong in C!

uecker · on Feb 19, 2023

If you code them by hand, then yes. If you a library, then it is the same.

It is certainly true that the C++ standard library provides more functionality out of the box.

wizofaus · on Feb 19, 2023

Almost any C library function that deals with strings is going to be more complex to use due to the lack of automatic memory handling, necessary for any use case where the max. size isn't sufficiently well-known at compile time for it to sit on the stack. I'd accept something like "atoi" or "strtol" is pretty simple to use (and hard to get wrong), but hardly win any awards for obviousness. And the less said about the need (or at least strong tendency) to use sscanf for more complex parsing (hexadecimals etc.), the better.

uecker · on Feb 22, 2023

A library can do the memory management for you:

string a = string("test"); string b = string(" this"); string n = string_concat(a, b); ...

another2another · on Feb 20, 2023

I started doing this a bit myself too, but soon found that I missed RAII too much.

Having things clean themselves up when going out of scope is very convenient, and means that the early return code style that I favour is much easier to implement.

MattPalmer1086 · on Feb 20, 2023

I'm talking about language complexity, not the complexity of code written in it.

Sure, you have to do more for yourself in C, as it isn't as powerful as C++.

C++ is the only language I've worked with where you have to tell people which bits of the language they can use!

wizofaus · on Feb 20, 2023

Your last sentence was that "there are fewer things you need to worry about in C". Which doesn't follow from c++ being a more complex/powerful language.

MattPalmer1086 · on Feb 20, 2023

Yes, it does although we seem to be coming at this from different perspectives.

The C++ language spec is just vast. There are many footguns when doing apparently safe things. Different code bases use different subsets of the language. It is immensely powerful.

C, by contrast, is just a much simpler language than C++. I'm not entirely sure what you are disagreeing with.

WalterBright · on Feb 19, 2023

You can learn C in a few days. It takes 10 years to learn C++. My experience, anyway.

bitwize · on Feb 19, 2023

My experience is you can get to where you think you know C in a few days. But it may take decades to become aware of all its footguns, and much more time to learn how to program C in a way that avoids them -- if that can be done by humans at all. (Maybe our new AI overlords will figure it out.)

Nuke the site from orbit. It's the only way to be sure. Deprecate C for all greenfield development, and launch a big-science push to rewrite all the extant load-bearing C code in something else with safety guarantees (Rust=good, SPARK=better). If null pointers were a billion dollar mistake, then letting C become ubiquitous was a mistake that runs in the trillions.

zabzonk · on Feb 19, 2023

i respect your opinion, but for example you can do some very common tasks using std::string that will be really difficult in c. and much of programming is dealing with strings.

speaking as a c and c++ trainer here.

cozzyd · on Feb 20, 2023

The lazy way to do things with strings in C is almost always asprintf

elteto · on Feb 20, 2023

If this was even remotely true we wouldn’t be living in a world full of kernel exploits and broken software, with a lot of it written in C.

Writing C code is hard.

WalterBright · on Feb 20, 2023

> Writing C code is hard

I found it very easy. I was writing production C code in a few days. (Of course, I was an experienced asm programmer before that, and C was a natural.)

I write code very differently today. But that's not learning C, it's learning technique.

zabzonk · on Feb 19, 2023

ok. a small challenge - read character input of undefined length, and then print the input out backwards. i guarantee the c++ code will be easier to write and understand than any c code.

a1369209993 · on Feb 19, 2023

It's unclear exactly what you mean by "backwards" (and some interpretations like "print UTF-8 characters in reverse order" are infamously literally impossible[0]), but here's a straightforward/naive version (bytes in each line):

  #include <stdlib.h>
  #include <stdio.h>
  #include <unistd.h>
  #include <string.h>
  #include <sysexits.h>
  #define die(E,...) ({ \
    fprintf(stderr,__VA_ARGS__); \
    exit(E); })
  void reverse(char* a,size_t z) {
    for(; z>=2 ;a++,z-=2) {
      char t=*a; *a=a[z-1]; a[z-1]=t; } }
  int main(void) {
    size_t z=16,i=0;
    char *a=malloc(z), *p=0;
    for(;;) {
      for(;;) {
        ssize_t n = read(0,a+i,z-i);
        if(n<0) die(EX_IOERR,"input error: %m");
        if(n+i==0) exit(0);
        if(n==0) die(EX_DATAERR,"error: %zu bytes of garbage at end of file",i);
        p = memchr(a,'\n', (i += n) );
        if(p) break;
        if(i > z/2) {
          a = realloc(a, (z = z+z/2) );
          if(!a) die(EX_OSERR,"realloc(%zu bytes): %m",z); } }
      while(p) {
        reverse(a,p-a);
        fwrite(a,p-a+1,1,stdout);
        memmove(a,p+1, (i -= p-a+1) );
        p = memchr(a,'\n',i); } } }

This produces:

  $ ./a.out
  cat
  tac
  warning: incompatible implicit declaration of built-in function
  noitcnuf ni-tliub fo noitaralced ticilpmi elbitapmocni :gninraw
  naïve resumé
  �muser ev�an

> i guarantee the c++ code will be easier to write and understand than any c code.

Put up or shut up, as the saying goes. At the very least it ought to be easy to abstract out some of the memory allocation and file handling to use C++ builtins, and I'd be interested to see what other improvements you can manage.

Edit: 0: Exercise for the reader: consider the input "abXc", where 'X' is some as-yet-undefined codepoint that will, at some point in the future, be defined as either a combining diacritic, or a standalone character. If 'X' is a character, the output should be "cXba", but if it's a diacritic, the output should be "cbXa". The information about which (mutually-exclusive) output is correct does not exist yet.

wizzwizz4 · on Feb 19, 2023

  #include <stdio.h>
  #include <stdlib.h>
  
  int main() {
    size_t len = 0, cap = 0;
    char *buffer = NULL;
    int c;
    while ((c = getchar()) != EOF) {
      if (len >= cap) {
        cap = (cap << 1) + 1;
        {
          char *newbuf = realloc(buffer, cap);
          if (!newbuf) {
            fprintf(stderr, "Could not reallocate to %zu. Printing what I've got.", cap);
            break;
          }
          buffer = newbuf;
        }
      }
      buffer[len++] = (char)c;
    }
    while (len --> 0) {
      if (putchar(buffer[len]) == EOF) {
        fprintf(stderr, "Output failure.");
        return 1;
      }
    }
    free(buffer);
    return 0;
  }

Most of the verbosity here is in correctly handling errors, something that isn't much easier to do in C++. (Be warned: I haven't tested this code.)

johndough · on Feb 19, 2023

You could check for ferror when getchar returns EOF. Otherwise very nice. I wrote almost exactly the same code before I saw yours. The only difference was that I used multiplication by 2 instead of left shift since the goal was easier understanding. gcc can perform this optimization (for unsigned integers).

I also looked up whether the cast from int to char is safe. Seems that it is not, but probably fine in practice https://stackoverflow.com/questions/19250521/best-way-to-por...

Of course all bets are off when someone tries to reverse emoji soup with this, but I am not knowledgeable enough about this topic to claim whether a general solution is even possible.

wizzwizz4 · on Feb 19, 2023

> but I am not knowledgeable enough about this topic to claim whether a general solution is even possible.

The correct way to reverse a Unicode string is to blit it back to front.

Addendum: a real C programmer helped me write (i.e., wrote most of) the following program:

  #include <skalibs/skalibs.h>

  int main() {
    stralloc sa = STRALLOC_ZERO;
    slurp(&sa, 0);
    stralloc_reverse(&sa);
    if (allwrite(1, sa.s, sa.len) < sa.len) {
      strerror_diefusys(1, "output everything");
    }
  }

In the event of an allocation failure, this program will output nothing; other than that, it's the same as my C program above. Perhaps the problem isn't C, but its standard library?

steveklabnik · on Feb 20, 2023

> Perhaps the problem isn't C, but its standard library?

That's certainly at least a factor. In Rust, the "reverse a string byte by byte" bit could look like:

  let reversed_string = String::from_utf8(some_string.bytes().rev().collect::<Vec<u8>>())?;

This is just what I thought of first. Peeking at the standard library examples, this one is fun:

  unsafe {
      let vec = some_string.as_mut_vec();
    
      vec.reverse();
  }

This is unsafe because you may create invalid utf8, but we've already established that's not something we care about, so...

But yeah stuff like this isn't impossible in C, though you'd not have method call syntax, of course. Rust's libraries are just richer here.

zabzonk · on Feb 19, 2023

or in c++, simply call getline? i haven't traced thru your code, but i would not be surprised if there were errors in it, having written similar stuff myself.

compare with using std::string and std::getline - easy to use and battle-hardened.

wizzwizz4 · on Feb 19, 2023

How does getline handle the out-of-memory condition? How do I avoid leaking memory – is that handled by std::string's RAII destructor? To detect the end-of-file condition, there's something about a failbit, but I'm only four pages of documentation in, so I don't know what that is yet.

What would the equivalent code look like in C++, handling EOF and OOM conditions appropriately?

zabzonk · on Feb 19, 2023

if memory is exhausted, then getline throws a bad_alloc exception. if end of fileis reached (which i am not sure your code deals with correctly) then getline returns a stream in a bad state. so something in outline like this (syntactically correct. std:: omitted for brevity, but probably not what you would really write):

string input;

try {

   if ( cin.getline ( input ) ) {

       // ok  read input, do something with it
   }
   else {

      // some sort of predictable error - look at state of cin to diagnose EOF
   }

} catch (...) {

    /// something terrible happened - as hard to recover as it would  be in c

}

_yvc3 · on Feb 19, 2023

probably std::string would throw std::bad_alloc, and there is also the failbit flag which std::getline sets

johndough · on Feb 19, 2023

Just to clearify: Do you make the assumption that the input only consists of a single line?

zabzonk · on Feb 19, 2023

that was my intention, but in the case of multiple lines c++ is evrn easier than in c - just use std::vector <std::string>.

camel-cdr · on Feb 19, 2023

That actually sounds like a fun way to explore different programming languages. Especially if you extend it to inputs that may take up more memory then you've got on your computer (let's say 100GB) and at least reasonably fast. I suppose it'd kinda trivial if you read the input from a regular (seekable) file, but if you actually read the 100 GB from stdin then this could get quite complex.

zabzonk · on Feb 19, 2023

finding good teaching examples is hard - here's one i prepared earlier https://latedev.wordpress.com/2011/07/28/writing-a-real-c-pr... but could not complete as i was struck down with terrible depression, and then my mum & dad needed looking after, and then......

perhaps i will start it up again, now it is just me, and not exactly happy, but not depressed.

johndough · on Feb 19, 2023

> i guarantee the c++ code will be easier to write and understand than any c code.

That will depend on how you define "understand". For a superficial level of understanding, you might be right. But the C++ code will make use of many concepts that would take a while to explain, for example references, classes, inheritance, templates, operator overloading, STL, and probably a bunch of other things I forgot.

erik_seaberg · on Feb 19, 2023

There are books explaining those things, and as a professional I can expect to amortize the effort over years of other projects.

isomorphic- · on Feb 19, 2023

C uses `declaration follows use` for its declaration syntax. This means that instead of `char* p` we write `char *p`. https://www.quora.com/Why-doesnt-C-use-better-notation-for-p...

Also, "decay" is a terrible way to describe the conversion of arrays to pointers-to-their-first-element. "Decaying" implies permanence and that the array is changing/decaying. This isn't the case. The array doesn't permanently change into a pointer. It is the expression that is converted rather than the array itself.

This article is terrible.

frutiger · on Feb 19, 2023

I can’t speak for the rest of the article but for better of for worse “decay” is the standard word used for what happens to arrays as they are passed around as pointers. See for instance https://en.cppreference.com/w/cpp/types/decay.

renox · on Feb 19, 2023

It's the correct word because you loose information when this happens: the size of the array is 'lost' (for 'fixed' arrays, VLA don't have sizes known by the compiler).

uecker · on Feb 19, 2023

While the size of the VLA is not (usually) known at compile time, it is part of its type and known at run-time (it is a dependent type). And if you use a pointer to the VLA (and not a decayed pointer), you can recover the size or benefit from run-time bounds checking.

  int n;
  char buf[n];
  char (*p)[n] = &buf;   // non decayed pointer to array
  sizeof(*p);

skribanto · on Feb 19, 2023

`char *p` is identical to `char* p`??

peterfirefly · on Feb 19, 2023

Yes, but "char* p,q;" is "char *p,q;" -- q is a char, only p is a pointer.

WalterBright · on Feb 19, 2023

In D we write it "char* p, q;" because both p and q are pointers.

quietbritishjim · on Feb 19, 2023

If you've gone to the trouble of changing the meaning, why not go the whole way and make a more decent syntax? Ideally I'd argue type second, but at least include a separator in there:

    // Completely clear that both are pointers
    var p, q: char*

I actually think that using the existing syntax for something different, even if it's "fixing" the meaning, is worse than just using the old behaviour.

WalterBright · on Feb 19, 2023

> Completely clear that both are pointers

Yes, it is, and "char* p, q;" is also completely clear! (And much more concise, too.)

> I actually think that D's use of an existing syntax for something different, even if it's "fixing" the meaning, is worse than just using the old behaviour.

At first blush it does seem like a problem. But my experience in translating many, many tens of thousands of lines of code from C to D is making a mistake with that always results in an obvious semantic error that is trivial to fix.

jacquesm · on Feb 19, 2023

Yes, so don't do that. Do:

char * p;

char q;

WalterBright · on Feb 19, 2023

> C’s portability is a joke between #ifdefs and #endif.

Ironically, I just spent a couple days trying to get the C11 Standard .h files on various platforms to compile with ImportC (D's C11 C code importer). Every single platform D supports fails to have a way to compile their .h files using a Standard C11 compiler, each in its own unique, peculiar ways. They all rely on their own C compiler extensions.

Some make an attempt with #ifdef/#ifndef to be portable, but they all fail.

tpmx · on Feb 19, 2023

Somewhat off-topic: goaccess (the tool the author is trying to work on, which isn't written in Go but rather C) does look very appealing.

https://goaccess.io/features

All panels and metrics are timed to be updated every 200 ms on the terminal output and every second on the HTML output.

2021: https://news.ycombinator.com/item?id=28012307

2019: https://news.ycombinator.com/item?id=21890027

2016: https://news.ycombinator.com/item?id=13211913

greatgib · on Feb 19, 2023

  it’s a demonstration why you shouldn’t use C to write complex projects.

Not! It is a demonstration why you need good competent developers for a complex project and not script kiddies.

notbeuller · on Feb 19, 2023

I'd like to +1 the advocates of -x c++ here. There's just no reason not to. It's like building scaffolding around a building so you can retrofit it. It doesn't need to end up like a tower of c++, but the scaffolding lets you hoist things out of overly complex codebases a lot easier. The only immediate code changes encountered were adding explicit casts from malloc (and fixing actual bugs that had been hidden.) In one case recently, trivially converting some c structures to c++ and hiding internal representations let me delete thousands of lines of set(foo, xxx) calls because I could prove that there was no access to the computed result.

However, because I'm inherently chaotic neutral, I tend to use -x objective-c++, but that's a story for another day.

uecker · on Feb 19, 2023

One can hide internal representations just fine in C using pointers to incomplete struct types with the definition of the struct and the implementation of the functions operating on it in a separate file. I like this much better than what C++ does, because only the interface and not also implementation details end up in the header. This keeps things nicely separated and built times very short.

I see no real benefit from -xc++ and I would also miss some stuff such as variably modified types, designated initializes, etc.

jvanderbot · on Feb 19, 2023

Very relevant article for the times. I sympathize that nowadays people don't have the same expectations as the previous generations. But pretty much all of these are just learning curve issues. C is fine, it's just very different than today's languages, and I'm glad we're improving the ecosystem.

zabzonk · on Feb 19, 2023

completely agree with this. strong typing is why i made the change from c to c++, and have never looked back. and of course malloc makes no sense in c++. as constructors will not be called on the types you are allocating - and let's not get started on free().

reply

uecker · on Feb 19, 2023

In what sense is C++ more strongly typed than C?

zabzonk · on Feb 19, 2023

in just about every sense. when K&R wrote the 2nd ed of The C Programming Language, they ran all their examples through Stroustrup's then new C++ compiler (there was no ANSI C compatible compiler at the time) - the number of type problems it diagnosed shocked them.

uecker · on Feb 19, 2023

I meant today, not in the past when C did not yet have prototypes. This is irrelevant today.

zabzonk · on Feb 19, 2023

well. the classic is:+-

    int * p = malloc( sizeof(int) );

which is horrible in either language, but is legal c, but not c++

uecker · on Feb 19, 2023

In C you could write or have a macro which does the cast.

  int *p = malloc(sizeof *p);

In C++ you would have to add an explicit cast, which would not make it safer:

  int *p = (int*)malloc(sizeof char);

So from a type system perspective this is not safer.

zabzonk · on Feb 19, 2023

it is safer. because you have to make the explicit cast, which either the compiler or a linter can, and should, warn you about.

and your post does not illustrate a macro. and if it did, the compiler/linter could diagnose it.

uecker · on Feb 22, 2023

This is the fallacy: A cast does not make code safer you have to use it for regular code that has no issue. Because a linter warning about that is just noise. By example illustrated a bug in the C++ code hidden by the cast.

avinassh · on Feb 20, 2023

any source or article on this?

zabzonk · on Feb 22, 2023

the introduction to TCPL 2nd ed - the ansi one

rurban · on Feb 20, 2023

> CXXFLAGS: "-std=c++20 -g3 -O2 -flto -Werror"

-Werror without -Wall or -Wextra is pretty lame. It just errors on all suppressed warnings.

leni536 · on Feb 19, 2023

The combination of stuff like char*** and the lack of const usage by some C code is really horrible.

Like, yeah, maybe you do need something like char***, but do you really need that to be mutable at every single indirection level? But even if you sometimes see char***, you really never see char const* const* const*.

bsder · on Feb 19, 2023

Actually, I put that kind of thing in my code all the time. However, you will see "char **" as "char const * const * const" in my code.

Yes, newer languages understand "char const * const" as "const String". And, yes, it stinks for brevity, but I'm not on a 300 baud TTY.

WalterBright · on Feb 19, 2023

> Add a string type and eradicate the zero terminated strings.

I tried that several times in my C code. I always wound up reverting to 0 terminated strings, for the simple reason that everything else in the C ecosystem is 0 terminated strings. For example, printf.

db48x · on Feb 20, 2023

Yes, you still need to be able to use null–terminated strings at the edges, when you read them in or print them out. But inside your program, resizable strings that know their own length and capacity are a huge benefit.

WalterBright · on Feb 20, 2023

Yes, that's why I'd make my own string type. But it always turned out to be more trouble than it was worth, because every other piece of C code wanted to use 0 terminated strings.

_gabe_ · on Feb 20, 2023

> goaccess is pretty neat for a C project, and it’s a demonstration why you shouldn’t use C to write complex projects

Or , it's an example of how not to write a complex project in C. None of the issues the author points out are issues with C in general, they're issues with how the developers coded the project.

The only issues the author brings up are:

* They don't use the type system

* They don't use const correctness (although I'm not sure how C const differs from C++ const if it does at all)

* They use unsafe functions like strlen instead of custom string types

* They complain about the lack of container types (but that's just an issue with C in general and you'll either have to forego type safety and use ugly macros or rewrite a ton of code afaik)

* They complain about portability via #ifdefs and then complain that code paths that aren't executed aren't going to be tested (I don't know what the author is expecting here. If you have tests you can use them, but if you have no way to run the tests for platform X, but somebody added the code once upon a time, what's wrong with leaving the code and putting a note that says YMMV. Also, what does #ifdefs have to do with this? Whether platform code is locked away in a #ifdef or an abstract interface that never gets tested, it's the same problem)

* They complain about dynamic linking. But that's something you can avoid if you want by statically linking like they mention

There are plenty of good reasons to hate on C, but all of the problems listed above can (and usually are) avoided by C devs, so this feels like a hollow argument.

I was interested in hearing some of the challenges that the author ran into while trying to upgrade a C project to a modern language, but it sounds like they didn't even do that. They just complain about this particular projects dev style, which is fine, but it's not what the title says this post is about.

pengaru · on Feb 19, 2023

> C has type system

TIL, apparently all that gobject code attempting to add something resembling a type system to C in glib is redundant...

Edit: After taking a quick glance at some random listings in goaccess, I don't have the impression this was written by particularly talented C developers.

Example: https://github.com/allinurl/goaccess/blob/master/src/sort.c#...

masklinn · on Feb 19, 2023

> TIL, apparently all that gobject code attempting to add something resembling a type system to C in glib is redundant...

GLib adds an object system, not a type system.

Not that C's type system is a great one, but GLib does not fundamentally change or add to it (though it aliases it, a lot, for reasons I never understood).

pengaru · on Feb 19, 2023

> GLib adds an object system, not a type system.

In service of GObject, glib implements GType, and GLib's goals necessitated adding a (dynamic) type system to C:

https://docs.gtk.org/gobject/concepts.html#the-glib-dynamic-...

We've had literally decades of glib-utilizing free software developers mocking C devs reinventing their own bespoke type systems (and/or object systems) in their projects in lieu of simply adopting glib.

It most definitely adds a type system, at least as much as one could manage something called that in C. It just doesn't end there, and obviously your object system will dovetail with your type system.

int0x80 · on Feb 19, 2023

gobject is trying to add OOP to C. C already has a type system indeed, however it can be considered weakly typed.

pengaru · on Feb 19, 2023

Calling the weak type checking of C a "type system" is an incredible stretch.

mtlmtlmtlmtl · on Feb 19, 2023

C has a static type system. You can call it limited and you'd be right. But to say it's not a type system is just silly, and just makes the term more confusing. There are types, they're checked, there's a type system.

blix · on Feb 19, 2023

Calling C's type system 'limited' is asking it to be something that it's not. A skateboard isn't limited because it doesn't have a steering wheel and airbag; if it did it wouldn't be a skateboard.

C's type system is different enough in scope and goals from many other type systems that it leads to confusion to lump them together. That's really what this article is about. The author has miscalibrated expectations.

db48x · on Feb 21, 2023

> The author has miscalibrated expectations.

No, they don’t. Nobody expects to dive into a C program and find that every single function parameter is a void*. That’s just abusive. Yea, the language technically allows it, but it eliminates all possibility of effective cooperation between developers.

> C's type system is different enough in scope and goals from many other type systems that it leads to confusion to lump them together.

C does have a type system, it makes C better than languages without one (like assembler), and faulting C for not incorporating ideas invented in the decades after C was invented is also abusive. Claiming that it isn’t good enough to be called “a type system” is just an attempt to change the language so that you are always right.

blix · on March 2, 2023

The difference between

    function(void *pointer) { 
      data_type *argument = pointer;
      ...
    }

and

    function(data_type *argument) { 
      ...
    }

is really small. If this hurdle is enough to 'eliminate all possibility of cooperation,' there was no possiblity in the first place.

pengaru · on Feb 19, 2023

"type system" actually means something substantial in the era of C's relevance, and C specifically didn't have one.

But nobody apparently cares about the in-context meaning of these words anymore.

Operating system, UNIX system, the system() function, system calls are costly, try find any mention of type system in the K&R C book [0]. Search for the word system, it's always in the context of something relatively huge, complex and usually runtime-dynamic vs. the C language.

To call C's type checking a type system is to completely misunderstand something fundamental about the language.

[0] https://ia803407.us.archive.org/35/items/the-ansi-c-programm...