verberant's comments

verberant · on Dec 28, 2020

In a past life, I pushed hard for more investment in seL4 for certain defense applications. The nuance isn't to be underestimated though – I can say from experience that it's often a hard sell to folks that aren't already versed in the ins and outs of formal methods, security, kernels/operating systems, etc. There are plenty of lower hanging fruit (like networking hardware) than the applications I worked on, yet commercial adoption still seems low. Maybe this is due to that nuance. To the credit of the folks at Trustworthy Systems and UNSW, they've done a great job producing literature that outlines the business case in a manner palatable to decision makers. This paper [0] in particular came in handy to me many times.

[0] https://ts.data61.csiro.au/publications/csiro_full_text/Klei...

kjs3 · on Dec 28, 2020

We had very similar 'discussions' about using Ada (or other 'safe' language) and/or TCSEC rated systems for security sensitive systems in the 80s and 90s. With abundant rationales for why that was a good idea. We've ended up with (mostly) C/C++ and COTS OSes, hopefully dressed up with a STIG (or similar) to close the really blatant holes.

I don't think the market has reached the point where the overhead of these technologies has been offset by the pain of endless security incidents. I don't even think we're close to being willing to take on a 'better way (YMMV)' instead of 'move fast and break things (no matter the cost to the consumer)'.

verberant · on Dec 19, 2020

> And this leads me to the conclusion that software quality is mostly driven by the company and economics. If the project doesn't pay for solid quality assurance or has too much time pressure on developers, the software you get might be bad at least to some degree. Even good developers can't make up for that.

I think this is absolutely the case. As an extreme example, an organization that uses formal methods is probably going to ship higher quality software because they've meticulously worked out design bugs before they've even written a line of code and, once the program is implemented, they've rigorously verified its correctness – the economics of this methodology, however, are not really feasible in most circumstances and so instead we opt for duct tape fixes and crunch time.

The author also mentions the lack of gatekeeping for software engineering – this is often pointed to as one reason for bad software existing. I tend to agree, since "good code" and "bad code" that both do what they're supposed to under the common case can appear the same to the user and the manager who signs off on it (excluding bad UI). Point being, if someone can code just well enough to get by in their field, they will probably remain employed. This is a tricky problem to solve though, because so much of the practice of software engineering is qualitative (bordering on aesthetic, I would argue) and involves designing abstractions. While much of software engineering is analytical and must obey a certain set of axioms, higher-level systems and software design is really more akin to art and architecture than civil or mechanical engineering.

verberant · on Dec 18, 2020

Space flight computers in the public sector are generally 15-20 years behind the types of hardware we commonly work with on the ground, as I think this page shows.

We now have pretty capable low-power SoCs and FPGAs that we've yet to see broadly leveraged for govt. space applications. SpaceX flies Starlink with Xilinx FPGAs, while NASA and DoD are still baselining new platforms on incredibly expensive (albeit rad-hard) PowerPC RAD750 and similar. This is a huge bottleneck for any computationally intensive task we might want to do on-orbit, and I'm curious if or when it will change. It's one technical reason, in my opinion, that the private sector is currently calling the shots in space.

cpgxiii · on Dec 18, 2020

The RAD750 (edit - the whole RAD family, there are newer models available) remains the standard because it's the highest performance rad-hard design available, period. If you're putting an expensive satellite in orbit for 5,10 years, the cost of the processors is insignificant compared to everything else.

The real problem is that we don't have good solutions for improving the performance of rad-hard designs, so we're stuck with older, larger process sizes that limit what can be implemented. Look at the lengths involved in getting A* to run on Curiosity, and you see just how limiting the hardware is. Everyone, Nasa especially, wants more compute available.

In low earth orbits and shorter mission durations, you can get away with redundant hardware instead of rad-hard. Most of the damage done by radiation is upsets, so you can reboot the affected hardware and keep going. But on an unprotected design some of the damage can be permanent, and thus redundancy alone isn't enough for longer/farther missions.

kevin_thibedeau · on Dec 19, 2020

I had to select a processor that controls the camera in the GOES-R ABI. The image processing is all done by custom hardware so all that was needed was microcontroller level performance. It turns out there are very limited options in this space and all of them are quirky outdated architectures with limited available tooling.

The RAD750 in particular is a bit of a nightmare because of the high pin count, need for a support chip, and the 32-bit bus forces the use of more RAM and ROM than a smaller micro would need. I took a pass on that. I also never have liked IBM's reverse bit numbering and the implications it has on SRAM power consumption.

trzy · on Dec 19, 2020

How does the PowerPC’s reverse bit numbering impact power consumption?

kevin_thibedeau · on Dec 19, 2020

If you wire the bits as numbered to a conventional memory device designed with LSB as bit-0, the internal address bus will induce more switching from sequential access than normal. The internal row and column decoders will be working overtime consuming more power than necessary. Reversing the bus to deal with that isn't always straightforward on a space constrained board.

trzy · on Dec 19, 2020

If I understand correctly, you are saying that with the reversed bit numbering, bit 31 (in a 32-bit address bus) changes most frequently with sequential accesses because it is the LSB but when wired to the MSB of SRAM, it causes switching in the column decoder for every single access.

That makes sense but I didn’t realize that it was difficult to simple swap the wiring. Are the physical pins ordered backwards as well (that is, do PowerPC’s A31 and A30 appear where A0 and A1, respectively, would be on a “normal” system)?

yholio · on Dec 19, 2020

I have absolutely no idea why someone would connect the upper bits of the CPU address bus to the lower bits of the memory, if this is what the GP refers to. Their naming scheme seems irrelevant.

Almost all modern memory is built in a large matrix where the upper bits select the row into a buffer and the lower bits control a multiplexer that selects a slice of that row. Scanning incrementally through the memory will hit the fast multiplexer path and result in much faster access.

Propagating into the whole matrix at each increment is not only a power draw but a massive slowdown.

geomark · on Dec 19, 2020

Since you have recently been through this, what were the other available options? There were a couple other rad hard processors under development years ago when I left the industry. Do you know what happened to those?

formerly_proven · on Dec 18, 2020

> The RAD750 remains the standard because it's the highest performance rad-hard design available, period.

RAD5500?

cpgxiii · on Dec 18, 2020

You're right, the RAD5500 and family are available. I should have said the whole BAE RAD family.

The reality hasn't changed much, though, there's really only one game in town for high rad-hard performance, and it's still well behind conventional processors.

nzentzis · on Dec 20, 2020

> Look at the lengths involved in getting A* to run on Curiosity

Can you link to something that goes into detail? Googling it doesn't turn up anything relevant, but it sounds like it'd be interesting to read about.

runeb · on Dec 19, 2020

Could they not offload a lot of compute to ground based computers and submit results back via radio? Or are these real-time applications?

cpgxiii · on Dec 19, 2020

The whole point of implementing A* on Curiosity was to give it some navigation autonomy. The time delay in getting sensor data back to earth, coming up with a motion plan, then sending the plan back to be executed imposes tight limits on how fast the rover can drive, what kinds of terrain it can cover, and ultimately how much science can be done. Local autonomy for basic "go over to than weird-looking rock" tasks is a major improvement.

gorgoiler · on Dec 19, 2020

You could A* your way around the whole planet by using an Earth based computation but only if you knew where every rock was.

There must be some equation of motion in space robots that combines terrain difficulty, robot speed, round trip time to Earth, and how far ahead you’d need to be able to see.

Curiosity moves about as fast as a Roomba. The ping is (min/avg/max) 10’/24’/40’. Ergo, it needs to be able to see X yards ahead of itself to plan A* from Earth, requiring a camera boom Y feet tall producing images with Z megapixels of resolution.

I wonder what X, Y and Z are.

cpgxiii · on Dec 19, 2020

Some very rough numbers, if you want plans valid until you see the results and travel at a good speed:

- You want a travel speed of 0.5 m/s - Worst-case round trip time from command to result is 40 minutes, or 2400 seconds - Max distance covered is 1200 meters - Assume you can drive over any rock 10cm or less in size (the rover can do better, but at a reduced speed)

So you need to scan an area 1200m x 1200m, with range accuracy of, say, +/-5cm. Forget the camera boom height, or the time it might take to scan and process, there's no sensor that will give you that kind of accuracy. The stereo baseline would be huge, and leave you entirely at the mercy of whatever texture or not a given part of the martian surface has to offer. LIDAR is OK if you definition of "long range sensing" is larger objects at 200m. The time cost for shipping back all the raw data for processing would kill performance as well, and if you wanted to process it locally the compute requirements would be just as high as doing the planning locally as well.

Onboard, or at least much closer,compute is the only way forward in autonomy. Honestly, the best bet on improving local compute would be to send a robot bulldozer, some C4, a rack full of milspec servers, and a big RTG. Blow a nice crater, push the servers to the bottom, and bury them in dirt for shielding. If you get really lucky, you could find some old cave or lava tube.

scoopertrooper · on Dec 19, 2020

That's more or less what they used to do with Sojourner, only with humans setting out the way points rather than A* . It never managed to get more than 10 meters from its lander though, I assume this was partly driven by the limitations on a human's patience in operating a slow vehicle with 28 minute feedback loop.

One drawback of the remote A* approach is that you end up using more energy as the rover would have to be constantly communicating with its onboard antenna. Its relay satellites are only in range for a limited period each day. Fine grained maneuvers (like drive around that big rock to get to this small rock) would also prove difficult because of likely errors in the rover's inertial-navigation system.

https://en.wikipedia.org/wiki/Sojourner_(rover)

throwaway189262 · on Dec 19, 2020

Does curiosity really move that fast? Roomba is maybe 2 mph. I thought curiosity was closer to 0.1 mph

Mistletoe · on Dec 19, 2020

His or her estimate seems wildly off. I had the same response because my Roomba moves fast!

Some quick googling says-

Curiosity max speed equals 0.08699 mph.

Roomba equals a foot per second which is 0.682 mph.

Off by about 7.82x.

gorgoiler · on Dec 20, 2020

Thanks: my estimate of Roomba slowness was way off. I’ve only ever seen them on TV.

I remember once hearing that Curiosity, flat out, could do 1km a day.

throwaway189262 · on Dec 23, 2020

Random but I highly recommend the new "mapping" type roombas. Best thing I bought all year.

publicola1990 · on Dec 19, 2020

With what kind of processing power did the recent Change'5 probe did the autonomous docking manoeuvre in lunar orbit..

Also it seems to be doing sometype of image processing to identify a suitable landing spot and guiding on to that point.

arjun-menon · on Dec 19, 2020

I was thinking, instead of that, what if you had a separate isolated tiny computer on spacecraft, that was powered by its solar panels (so there's no electrical wiring, or other connection to it), and have its own radio. And this separate computer could use the latest bleeding-edge CPU, and be encased in a radiation-hardened shell. It would use its radio to talk to the slower main computer, and do math really fast locally, and if need be, beam the results to Earth, or to a nearby orbiting satellite.

angry_octet · on Dec 19, 2020

Unfortunately there isn't really any practical way to have a radiation hardened shell that is sufficiently effective. Eg. 5cm of aluminium stops only 30% of the galactic radiation. (Heavier elements (e.g. gold) are scattered by incoming particles causing incoming heavy ions which cause even more damage.)

So practically it would still experience significant radiation.

But having the main compute for Mars remain in orbit with the relay isn't a bad idea.

throwaway189262 · on Dec 19, 2020

Shannon limit implies a linear relation between bit rate and transmit power. The only way to get fast enough transfer would be to spend tons of the power budget on radio

senkora · on Dec 19, 2020

For Mars, at least, that would be tens of minutes round trip because of the speed of light.

It works for some things, but for pathfinding it isn’t a great fit.

The other issue is bandwidth between the craft and Earth, which is quite limited.

Maybe there would be benefits to a “orbiting datacenter” around Mars carrying a bunch of rad-hardened compute? I assume NASA has considered this and decided it would be a bad idea.

Rebelgecko · on Dec 19, 2020

SpaceX doesn't have the same requirements--The radiation environment by Mercury or halfway to Jupiter is drastically different than LEO.

SpaceX missions are also a lot shorter. Having one unrecoverable latchup a week isn't a big deal if your mission is 2 weeks long. If you mission is 10 years, it starts to become a problem (especially since some radiation damage can be cumulative)

>NASA and DoD are still baselining new platforms on incredibly expensive (albeit rad-hard) PowerPC RAD750 and similar

NASA and DOD have also been sending up Xilinx and Altera boards for ages (even the non space-grade ones). However you can get rad-hard ARM CPUs that are cheaper and more powerful than the ones in a Zynq board.

tkinom · on Dec 19, 2020

It would be interesting to know if someone put a raspberry pi inside and outside space station in exposed complete unprotected environment and run some continuous tests, how long would we start to see any failures and what kind failure would be that be.

qayxc · on Dec 19, 2020

This has been done multiple times. Amateur radio satellites and some cubesat kits [1] use primarily COTS components.

The lifetime and radiation environment for those applications are very limited, though. It seems that for short missions (e.g. <2 years) and low orbits (<500km), COTS hardware should be fine if properly shielded.

It would be interesting to see what difference it actually makes for HEO or even BEO missions, especially if a high degree of redundancy is introduced as well.

[1] http://www.cubesatkit.com

Rebelgecko · on Dec 19, 2020

Typically those sorts of tests can be done on Earth if you have access to a cyclotron. My guess is that the SD card would be the weak link.

geomark · on Dec 19, 2020

"Having one unrecoverable latchup a week isn't a big deal if your mission is 2 weeks long.Having one unrecoverable latchup a week isn't a big deal if your mission is 2 weeks long."

Unless it happens in your attitude control system or your command and control system, causing you to lose control of or communication with your spacecraft.

Rebelgecko · on Dec 19, 2020

I guess I didn't actually say so, but my implicit assumption was that you have a voting setup where a single failure isn't necessarily a problem

mikepurvis · on Dec 19, 2020

I think the assumption is that a redundancy scheme is in place. So you have your unrecoverable issue in some module of compute A, but compute B and C vote them down and life proceeds. The problem is when your mission is long enough that the same module gets hit in one of the other two units, and now you're in trouble.

mhh__ · on Dec 19, 2020

> Starlink with Xilinx FPGAs, while NASA and DoD are still baselining new platforms on incredibly expensive (albeit rad-hard) PowerPC RAD750 and similar.

Ignoring that Starlink isn't very far away, I would assume NASA stuff would also have FPGAs and ASICs on them - they aren't CPUs and aren't used like them.

johnwalkr · on Dec 19, 2020

It’s pretty common in space to implement a soft core CPU (or redundant ones) on a space-grade FPGA.

bumby · on Dec 19, 2020

Some NASA orgs have tried using FPGAs as a way to get around software requirements, to varying levels of success

mhh__ · on Dec 19, 2020

Interesting, although I was more thinking about FPGA's in things like acquisition and processing rather than overall logic as the PC seemed to imply.

bumby · on Dec 19, 2020

The high level NASA requirements cast a pretty wide net (to include data acquisition and processing) as to what falls under the purview of those requirements. From 7150.2:

“ A.30 Software. Computer programs, procedures, scripts, rules, and associated documentation and data pertaining to the development and operation of a computer system. Software includes programs and data. This also includes COTS, GOTS, MOTS, reused software, auto generated code, embedded software, firmware, and open source software components.”

https://nodis3.gsfc.nasa.gov/displayCA.cfm?Internal_ID=N_PR_...

dimator · on Dec 18, 2020

My understanding is that certification is the bottle neck, in both time and cost. No one wants to spend the money or time to flight certify something new when something already battle tested will suffice.

But your comment makes me wonder if the private sector doesn't have those certification requirements?

The other differentiating factor is that the private sector is not sending multi-year (indeed multi-decade) deep space missions, where the need for battle tested systems is paramount.

pauldino · on Dec 18, 2020

Flagship multi-year science missions are generally conservative with technology choices, but some NASA projects are intended as technology demonstrations and can on take more risks.

So like the Perseverance rover on its way to Mars is powered by redundant RAD 750s (same as Curiosity), but the Ingenuity helicopter along for the ride is powered by a Snapdragon 801.

It will be interesting to see how it holds up.

dylan604 · on Dec 18, 2020

How do you battle test a RAD prototype? Stick it in microwave like device with ionizing radiation and see how many bit-flips occur?

sjburt · on Dec 19, 2020

It depends on where the spacecraft is going as radiation environments differ. I've taken parts to be exposed by a proton line at a particle accelerator. For some environments they just use Cobalt-60 as a radiation source.

smueller1234 · on Dec 19, 2020

This reminds me of a relevant anecdote: back in the naughts, I was doing research in cosmic rays at a large nuclear research facility. I did simulation and data analysis - office/computer work mostly. One day, a person with a clipboard comes into my office and asks about the whereabouts of some radiation source. I look at them confused - I had not touched sources since teaching nuclear physics labs. They show me their clipboard and lo and behold, it has my name next to a really high intensity source and they're looking to locate it.

After a few minutes of awkward shock and denying all involvement, we realized it was a colleague at the same research institute (and of the same name). He was, one building down, doing his PhD research on radiation hardened detectors for the CMS experiment at CERN(1). He was using the source for the testing. But I had a minute of real stress before that came together in my mind...

(1) I think this was his work: https://onlinelibrary.wiley.com/doi/abs/10.1002/pssa.2007763...

vbtemp · on Dec 19, 2020

Pretty much!

inamberclad · on Dec 18, 2020

To answer - no we don't have the same certification requirements. NASA steps in when there's human lives and/or a lot of money on the line, but most smaller projects and just about every independent project is free to assume its own level of risk.

bumby · on Dec 19, 2020

You know (and probably are implying) this but it’s completely program/project specific. Some projects out of Armstrong, for example, must meet FAA certification requirements

bryananderson · on Dec 19, 2020

NASA has used Xilinx FPGAs on a number of missions (though still mostly smaller missions). They are doing so for precisely this reason: on-spacecraft computation for intensive tasks such as image processing.

Here’s the website for the SpaceCube platform (developed at NASA Goddard). This is a little out of date (I worked on flight software for a mission called STP-H6 which I don’t see listed here), but gives an idea of how this idea is slowly but surely gaining steam in NASA.

https://spacecube.nasa.gov/

throwaway189262 · on Dec 19, 2020

The private sector has decided to put regular ground chips in spacecraft and just deal with errors using triple redundancy. Low earth orbit where most satellites hang out doesn't have much radiation anyways.

The cost savings from using regular chips is so high that I bet SpaceX will continue to use them even in deep space. Just surround them with sheilding. When a $400 desktop cpu is 500X faster than a $40,000 space rated one a couple pounds of shielding is well worth it

chmod775 · on Dec 19, 2020

The kind of radiation you want to protect against is not "easily" shielded.

The effectiveness of shielding is proportional to its mass and thickness, and both are at a premium for spacecraft.

quazar987 · on Dec 19, 2020

There’s more to it than just specs. Consumer grade silicon will not survive in space, radiation will just kill them.

verberant · on Dec 17, 2020

While serious, this should come as no surprise to anyone who has had the "pleasure" of using a government IT system. The OPM hack a few years ago demonstrated this and the current SolarWinds crisis just reminds us of it.

You could liken the security issue to climate change – our entire global economy appears to depend on consumption, which appears to be accelerating climate change. But are we going to actually change anything significantly to address the problem? Uh no, not now, maybe later. Most people barely understand the problem and, even if they care, are powerless to change it. Furthermore, we are now completely reliant on the status-quo and seemingly incapable of imagining a different world. In the same manner, these software systems which now underlie every part of our day-to-day lives are taken as a given. They are now simply too convenient and ingrained in our lives to ever go away.

How do we overcome the inertia of change? Most likely, from what I can see, we will simply change our expectations – it is impossible to build a completely secure software system, so we should instead change how we use it/what we expect it to do.

We also feel pressure to constantly modernize the infrastructure without having a parallel discussion about the security impact of these innovations. As we get further and further from the bare metal with newer and more convenient abstractions, our engineers understand less and less about the realities of the systems they are constructing. And, arguably, as software becomes easier for users to use, they too lose sight of what the system is actually doing and how something can go wrong.

webmaven · on Dec 18, 2020

> How do we overcome the inertia of change? Most likely, from what I can see, we will simply change our expectations – it is impossible to build a completely secure software system, so we should instead change how we use it/what we expect it to do.

Hardening systems further would hopefully make breaches more difficult and less common, but never prevent them entirely, therefore we should instead start focusing security efforts on limiting the blast-radius of potential damage a breach can cause, resilience of organizations in the face of breaches, and mitigation and recovery from breaches.

There is probably a need/niche for a security equivalent to Netflix's Chaos Monkey that randomly breaches your own systems in order to encourage/enforce that resilience, mitigation and recovery.

rmrfstar · on Dec 18, 2020

> We also feel pressure to constantly modernize the infrastructure without having a parallel discussion about the security impact of these innovations.

We also have PE firms, like the ones that controlled Solar Wind, dictating the level of security investment and dumping their positions when the bill for their negligence comes due. PE wizz-kids call that "optionality". They love optionality.

Congress should listen to Dan Geer and adopt product liability for closed-source software. They should also do something about PE control over the economy, they are doing serious damage.

tw25460241 · on Dec 18, 2020

> our entire global economy appears to depend on consumption

I'm puzzled by this statement. For there to be consumption there must be production, i.e., supply and demand, which is the economy, not some separate dependent thing.

oska · on Dec 18, 2020

The existence of a mutli-trillion dollar advertising industry means you shouldn't be puzzled. The grandparent comment is talking about pushed consumption, i.e. that our economies are based on constantly pushing up consumption, not having it simply be based upon natural, unforced demand. Not to mention things like built-in obsolescence, designed to be thrown away, etc, etc.

tw25460241 · on Dec 18, 2020

> our economies are based on constantly pushing up consumption, not having it simply be based upon natural, unforced demand.

Setting aside the appeal to nature fallacy, there's nothing special about the market process in "our economies". Of course suppliers want to increase profits, and one way is to increase the quantity supplied -- but that takes willing consumers.

If consumers prefer a cheaper thing now with a shorter life span, or to pay for something with their attention instead of their money, is the "problem" that the market process gives people what they want, or that their preferences should be substituted with your own?

oska · on Dec 18, 2020

> the market process gives people what they want

It doesn't. Advertising heavily manipulates and subverts people's 'desires'. That was my point.

And just as an aside, discounting points by citing various dubious 'fallacies' is poor discursive etiquette.