Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> I've never seen a case where it would be hard to tell which assertion failed.

There are a set of unit testing frameworks that do everything they can to hide test output (junit), or vomit multiple screens of binary control code emoji soup to stdout (ginkgo), or just hide the actual stdout behind an authwall in a uuid named s3 object (code build).

Sadly, the people with the strongest opinions about using a "proper" unit test framework with lots of third party tooling integrations flock to such systems, then stack them.

I once saw a dozen-person team's productivity drop to zero for a quarter because junit broke backwards compatibility.

Instead of porting ~ 100,000 legacy (spaghetti) tests, I suggested forking + recompiling the old version for the new jdk. This was apparently heresey.



You should write an episode of Seinfeld.

I was a TL on a project and I had two "eng" on the project that would make test with a single method and then 120 lines of Tasmanian Devil test cases. One of those people liked to write 600 line cron jobs to do critical business functions.

This scarred me.


> One of those people liked to write 600 line cron jobs to do critical business functions.

I was a long-time maintainer of Debian's cron, a fork of Vixie cron (all cron implementations I'm aware of are forks of Vixie cron, or its successor, ISC cron).

There are a ton of reasons why I wouldn't do this, the primary one being is that cron really just executes jobs, period. It doesn't serialize them, it doesn't check for load, logging is really rudimentary, etc.

A few years ago somebody noticed that the cron daemon could be DoS'ed by a user submitting a huge crontab. I implemented a 1000-line limit to crontabs thinking "nobody would ever have 1000-line crontabs". I was wrong, quickly received bug reports.

I then increased it to 10K lines, but as far as I recall, users were hitting even that limit. Crazy.


Is Dillon cron a fork of Vixie cron?


Hadn't heard of it before, and it appears not to be.

There indeed exist a few non-Vixie-cron-derivative implementations but as far as I'm aware, all major Linux and BSD distributions use a Vixie cron derivative.

Edit: I see now where I caused confusion. In my original post, I should have said all default cron implementations.


I thought Dillon cron was the default cron in Slackware? Hard to be a more major Linux distribution than Slackware, in terms of historical impact if not current popularity.


Could be. Slackware is a popular name, but I would call it "niche" rather than a major distribution. Just my personal view, obviously.


It always gives me a spell of cognitive dissonance when someone points out that Slackware is no longer a "major" distribution.

It used to be the major distribution. Funny how times change.


I just confirmed with a Slackware user today, it still does use Dillon cron. I had a vague memory from before I switched from Slackware to Debian late last millennium.


Junit is especially bad about this. I often wonder how many of these maxims are from people using substandard Java tools and confusing their workarounds with deeper insights.


Yup, it's why I built `just-tap` [1] which trys to minimise as much magic that a lot of these frameworks try to "help" you with.

1. https://github.com/markwylde/just-tap


Here are a few mistakes I've seen in other frameworks:

- Make it possible to disable timeouts. Otherwise, people will need a different runner for integration, long running (e.g., find slow leaks), and benchmark tests. At that point, your runner is automatically just tech debt.

- It is probably possible to nest before and afters, and to have more than one nesting per process, either from multiple suites, or due to class inheritance, etc. Now, you have a tree of hooks. Document whether it is walked in breadth first or depth first order, then never change the decision (or disallow having trees of hooks, either by detecting them at runtime, or by picking a hook registration mechanism that makes them inexpressible).


We recently switched our system to a heartbeat system instead of a timeout system. The testing framework expects to see messages (printf, console.log, etc...) often. So a test testing a bunch of combinations might take 45 seconds to run but for each combination it's printing "PASS: Combination 1, 2, 3" every few ms.

This way the framework can kill the test if it doesn't see one of these messages in a short amount of time.

This fixed our timeout issues. We had tests that took too long, specially in debug builds and we'd end up having to set too large a timeout. Now though, we can keep the timeout for the heartbeat really short and our timeout issues have mostly gone away.


Never disable the timeouts. What you want is a way to set the timeouts once for an entire suite. Unit, functional, and integration tests all have a different threshold from each other. But in general within one kind your outliers almost always have something wrong with them. They’re either written wrong or the code is. And once I’m a while it’s okay to override the timeout on one test while you’re busy working on something else.

The problems isn’t with breaking rules. The problem is with promising yourself or others that you will fix it “later” and then breaking that promise.


I'd add one more: clearly document what determines the order in which tests are run.

On the one hand, running tests in any order should produce the same result, and would in any decent test suite.

On the other hand, if the order is random or nondeterministic, it's really annoying when 2% of PRs randomly fail CI, not because of any change in the code, but because CI happened to run unrelated tests in an unexpected order.


Test order should be random, so that the ability to run them in parallel and distribute them across multiple hosts is not lost by missing enforcement of test isolation.


> On the one hand, running tests in any order should produce the same result, and would in any decent test suite.

Therefore the tool should run the tests in random order, to flush out the non-decent tests. IMHO.


If you do this, then the tool should be running the tests all the time, not just on new commits.


The tests are fatally broken. It means you can't trust them to properly check new work even.

The solution is to use random ordering and print the ordering seed with each run so it can be repeated when it triggers an error. Immediately halt all new work until randomly run tests don't have problems.

This isn't as bad as it sounds, generally it's a few classes of things that cause the interference which each will fix many tests. It's unlikely that the code actually has a 2%+ density of global-variable use, for example.


The Ruby `minitest` API used to have a way to disable non-deterministic test ordering, but it was intentionally named in a condescending way: https://www.rubydoc.info/gems/minitest/Minitest%2FTest.i_suc...!

I sometimes run into issues not so much due to order dependencies specifically, but due to tests running in parallel sometimes causing failures due to races. It's almost always been way more work to convert a fully serial test suite into a parallel one than it is to just write it that way from the start, so I think there's some merit in having test frameworks default to non-deterministic ordering (or parallel execution if that's feasible) with the ability to disable that and run things serially. I'm not dogmatic enough to think that fully parallel/random order tests are the right choice for every possible use case, but I think there's value in having people first run into the ordering/race issues they're introducing before deciding to run things fully serially so that they hopefully will consider the potential future work needed if they ever decide to reverse that decision.


I’ll disagree with this. Every time I’ve seen that, the interference between tests was also possible between requests in production. I’d rather my test framework give me a 2% chance of noticing the bug than 0%.


What's annoying is not being able to reproduce the 2% cases so you can't fix it even when you've noticed them. Sensible test tools give you the random seed they used to order the tests so you can reproduce the sequence.


TAP is better than some things, but it has some serious issues that I wrote about on my blog a while back - https://blog.urth.org/2017/01/21/tap-is-great-except-when-it...

Basically it's nearly impossible to fully parse it correctly.


Is test2 a flag for TAP?

If you have to pick one or the other, then you're breaking the common flow (human debugging code before pushing) so that management can have better reports.

The right solution would be to add a environment variable or CLI parameter that told tap to produce machine readable output, preferably with a separate tool that could convert the machine readable junk to whatever TAP currently writes to stdout/stderr.


Test2 is a Perl distribution that replaces a bunch of older test stuff. See https://metacpan.org/pod/Test2

But unlike TAP, it's fairly Perl-specific as opposed to just being an output format. I imagine you could adapt the ideas in it to Node but it'd be more complex than simply implement TAP in JS.

And yes, I think the idea of having different output formats makes sense. With Test2, the test _harness_ produces TAP from the underlying machine-readable format, rather than having the test code itself directly product TAP. The harness is a separate program that executes the tests.


What is this madness?

Nothing should have to be parsed. Write test results to sqlite, done. You can generate reports directly off those test databases using anything of your choice.

    your-program test-re.sqlite output.html


Yeah, but sqlite doesn't scale, and SQL isn't a functional language:

https://web.archive.org/web/20110114031716/https://browserto...


> There are a set of unit testing frameworks that do everything they can to hide test output (junit), or vomit multiple screens of binary control code emoji soup to stdout (ginkgo), or just hide the actual stdout behind an authwall in a uuid named s3 object (code build).

The test runner in VS2019 does this, too and it's incredibly frustrating. I get to see debug output about DLLs loading and unloading (almost never useful), but not the test's stdout and stderr (always useful). Brilliant. At least their command line tool does it right.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: