That's a beautifully concise numbering system for sharing being used there.
Now if only we had descriptions of chemistry that were this terse. Imagine if this kind of problem solving, collaboration, simulation, and instant verification were the norm for synthetic chem. One of the comments -- "[Let's] use gencols to rub the ship against gliders and *WSSs to see whether there is a useful collision to maybe build a puffer" -- just blew me away. If this were chemistry, that commentator would have been suggesting automatic nanomachine factory discovery.
(InChI appears to be close. But vast amounts of data are locked up in obtuse formats are either Assigned-Names-And-Numbers style formats which are useless to indexing and similarity searches, or formats that embed non-relative coordinates in 3d space, etc, in such a way that computing a deterministic ID for sharing is practically a nonstarter.)
InChI is not designed as, nor can used for, what you propose. Last I checked, conversion from an InChI to a structure format was not a supported option, and only exists for testing purposes. Which is not to say that people don't use for that. (Also, last I checked, there are inputs which cause InChI to segfault. And there's no spec for how to interpret InChIs other than to run the reference code.)
The InChI normalization step may also change the chemistry. For example, it may put things into a preferred tautomer form or charge state. This is essential for its goal of linking disparate records, but it is not chemistry preserving, and can be worse than passing an SD file to your nanomachine factory.
It was designed as "deterministic ID for sharing", yes, but not for sharing structure information but rather information related to the structure.
The most "concise" solution is to record the positions of the nuclei and the total charge. Then any quantum mechanics program will be able to reconstruct (to the desired level of approximation) the electron density of the molecule. This is the one you reject.
For good reasons. As you say, you want a nomenclature that can help with identity and similarity, and electron density is a very slow way to do that.
But such nomenclature is driven by human concepts. Fundamentally, there are different reaction rates depending on the relative orientation of two molecules. We see this in molecular beam experiments. For example, from 1979, http://www.sciencedirect.com/science/article/pii/00092614798... "Molecular beam reaction of K atoms with sideways oriented CF3I".
InChI doesn't have a special name for "sideways oriented CF3I", even though a complete, concise system would have to be able to provide a name for it.
Instead, we've come up with concise approximations - a SMILES string, a bitstring fingerprint, etc. - which are tractable for the purposes you want, so long as we stay within relatively normal chemistry. But it's still humans who categorize things into, say, "aromatic" or "chiral", so our approximate nomenclature has to partially encode how humans think of chemistry.
The Game of Life is much simpler than the Reality of Life.
All of this is true. The word "close" was chosen very, very deliberately.
Though the Game of Life is surely simpler than the Reality of Life... if one were to flat out name the core concept as "godel numbers for chemical formations", it's certainly possible, don't you think? InChI isn't it, I agree; just wanted to toss the seed of the idea out there, with an example that's a lot "closer" to the target semiotics than a bunch of unnormalized floating point coordinates drifting around in xml.
"good enough is the enemy of good" can apply in spades here, so I appreciate your thorough analysis of specific weak points in InChI.
InChI is not closer but actually further than SMILES or SD files for many of the use cases you wanted. It is not a supported capability to turn an InChI into a structure, which makes it hard to generate, say, a fingerprint for a Tanimoto search system. (Yes, the command-line tools and its wrappers support that, but the InChI developers do not condone its use. Even if you do so, the output may differ from your input structure in ways you disagree with.)
InChI is only closer for the classical purpose of chemical documentation - record association and text-based search. That is not a bad thing, but given the entirety of your goals, it is equally correct to use "close" for SMILES, SLN, SD, Mol2, etc.
Side question: are there any open source synthetic chemistry projects that I can look at to get an idea of the techniques and problems encountered? I try to learn about a new non work related subject every few years and it piqued my interest but I haven't been able to find anything but textbooks - code is easier for me to understand.
It does help that Game of Life physics is somewhat simpler than Real Life physics. Of course, Real Life physics is a little less apt to collapse into a seething mass of undifferentiated matter of you poke it in the wrong place.
This is the slowest spaceship known, right? (Which isn't a machine made out of other parts?)
Oh, of course, simulations without perfect replays would be an abomination! Not even worth mentioning! -- What are we, heathens who can't even manage the basics of reproducible builds?
Yes, GoL physics is simpler than Real Life physics.
RL follows quantum mechanics. As far as we know, its state cannot be recorded perfectly.
GoL is a Turing Machine. Its state can be recorded perfectly.
RL follows relativity. There is no equivalent in GoL.
Hashlife gives a method to compute GoL boards faster than processing each cell, and gives an exact answer. There is no equivalent for RL. At best there are probabilistic predictions. (There might be a 1E-50000 chance that the Sun won't rise tomorrow because all of the atoms in the Sun fuse at the same time and it blows up.)
> RL follows relativity. There is no equivalent in GoL
Of course there is an equivalent of relativity! There's a fixed, ultimate speed-limit (i.e. c) of once-cell/time-step. It is not possible to transmit information faster than this in GoL.
You can have C without relativity. There are no relative reference frames in GoL. Reactions only work at specific absolute speeds, and most of them only work at absolute speed 0.
Um, my first comment was just referring to the fact that in an absolutely framed turing-complete environment. One can sprout a quantum vacuum full of constant noise, ie. constant producing and annihilating structures(analagous to virtual particles.)
This is already experimentally observed in our own universe. Quantum noise exists in vacuums.
If you look at C in GoL as the fastest an actual black square can propagate, then of course you'll validate your naive assumptions that our universe could never be digitally replicated by GoL. But if you take the idea of implementing a primordial sea of quantum noise, and looking at the GoL board not as physical particles directly, but as a deeper construct. Then you can actually replicate relativity.
> (There might be a 1E-50000 chance that the Sun won't rise tomorrow because all of the atoms in the Sun fuse at the same time and it blows up.)
To estimate probability of an event you must have a sample were both have occurred, to provide a probability for something like the Sun not rising tomorrow is pure non-sense.
We have complete, verified descriptions of what the possible states look like. You can predict precisely how long it will take a cannonball to fall to the ground (at least in a vacuum), even if you've never dropped it from exactly that particular height before.
"Precise" - to within some tolerance. At some point you reach the uncertainty limits allowed by quantum mechanics.
You also have to make an arbitrary decision about when the cannonball touches the ground. How much interaction is needed between the atoms of the cannonball and ground before you say there's contact?
It also helps that we GoL comes with the capacity to instantiate the universe in any configuration we wish, rather than being forced to coerce it into that state from scratch.
It makes about as much as uncountable infinite numbers.
I can say everything in the universe is within a countably infinite distance of me. Thus, anything an uncountable infinite distance from me is outside the universe.
Not a very tangible idea, but not useless of purpose of reasoning about the affairs of things.
It's a bit of an abuse of notation. Talking about the inside definitely makes sense. (Talking about the outside is a bit like talking about the empty set.)
Topologically, encoding a 2D structure into a 1D line of text is dramatically simpler than encoding a 3D structure into a 1D line of text. To motivate this simply, recall that no graph whose minors include K5 or K3,3 can be drawn in the plane, whereas any graph can be drawn in 3D. It gets even simpler because all GoL graphs are confined to a grid.
> Now if only we had descriptions of chemistry that were this terse.
Chemistry is dealing with a lot, lot more of concepts than the study of the Game of Life. Brevity only works when you have few enough information that you can map it to only concise words.
(Entropy strikes again, even if it's information entropy for a change.)
if InChI doesn't do it for you, i suppose SMILES[1] isn't sufficient either? By-the-bye, you're not ever going to be able to "concisely" encode 3d (or even 2d coordinate) structures, because there isn't just one such. They're a thermally sampled average of structures.
heavenlyhash wants three things: 1) pass structure information around so it can be simulated or reproduced, 2) a deterministic ID for sharing and linking chemical information, and 3) viable methods for determining identity and similarity from the cloud of all chemical configurations.
SMILES is better at #1 than InChI, but not as good as a 3D structure representation. But the 3D structure isn't as compact, and offers few advantages for structure identity for small molecules where the 3D configuration is more easily predicted from geometry.
InChI is better for #2 than SMILES because different tools generate a different canonical structure, and because general purpose text search engines like Google don't do so well with SMILES as InChI keys.
Fingerprints, either binary or count, are more often used for the similarity aspect of #3, than n-gram solutions using the SMILES directly (like LINGO), or any method based on InChI. Also common are 3D shape and electrostatics comparison methods, with attempts to sample configuration space.
And for certain chemical structures other approximations are even more important. DNA similarity is based on very different principles than used for small molecules.
In my humble experience "1)" and "2)" are sufficiently overlapping as to be indistinguishable. Whereas "3)...identity" only demands that the same system be used for both the search compound and the database ("cloud"). For that, a particular implementation of "canonical SMILES" is perfectly respectable. But it's a notably different world once one gets into "3)...similarity". With that you've opened a vast library of doctoral theses, commercial promises, and ingenuity, (and essentially my entire career). And yes, as you suggest, DNA similarity is a wholly different beast.
Taking dalke's enumeration and running with it... I'd support that "1)" and "2)" are ever so slightly different: defining a serialized form that can reheat to a consistent snapshotted simulation state is one thing; and is necessary but not quite sufficient for "2)". Defining the serialized form to be deterministic is quite an important next step, because that makes a lot of things go from `O(some)` to `O(1)`.
Very close, yes. But if we compare to, say, reproducible builds (because it's also dear to my heart), the consequences of that difference are legion. For example: the `tar` command can definitely extract you a filesystem, yes, absolutely. However: whether or not two `tar` implementations (or even the same implementation, run twice in a row) produce the same byte stream is make-or-break for whether someone else can follow your procedure and get the same result.
If the process is deterministic, the identity check is cheap: hash it. If the process requires unpacking again to do semantic checks on the contents, the amount of time is... well, if we're hand-waving on the format, essentially unbounded.
So... yeah, "canonical SMILES" is a valid answer; that "canonical" part is just really important.
InChI Key is an example of #2 that does not imply #1.
For a variation on the theme, if we all agreed on a canonical SMILES method, then the SHA256 of that could be used as a key to connect records, without revealing what the underlying structure is.
This is done now, with InChI Key, when two companies want to compare overlap without sharing information about internal compounds. (There was a paper about this last year or so, but I can't find it.)
I also pointed out that InChI Keys are better for text-based search engines. As an extreme example, I could produce a line notation which uses only the characters "(", ")", ".", and "-". It would be fine for #1, but poor for #2 for any database search engine which expects some concept of "words" made of letters.
This is why I distinguish #1 from #2, even though they are indeed close.
Chemistry isn't even fully expressible using naive graph structures (atoms = vertices, bonds = edges), for example, eta-2 metal-pi bonds where the chemical bond is between an atom and another bond.
The terseness vanishes when considering the complexities of interactions and possible classifications. In fact, I'd say that the above terseness is only possible because it's about half of ALL the interesting effects possible in the game of life.
Some members recently have put together a distributed soup search project to catalog objects that appear naturally. While the c/10 spaceship wasn't found in the search, a number of interesting things have been discovered. For more information about the catalog effort (named "catagolue" for Game of Life), the object database, and the distributed client, see http://catagolue.appspot.com/home
"Trying to perfect a rake so it does not create Methuselah which eventually evolves into loaves, beehives and traffic lights isn't normal. But on Conway's Life, it is. Life. Not even once." - 'muzik
"- Use gencols to rub the ship against gliders and *WSSs to see whether there is a useful collision to maybe build a puffer." - 'HartmutHolzwart
And the excitement exhibited over this discovery. Very cool.
You might be interested in a simple proof I found of why c/2 and c/3 are speed limits for orthogonal and diagonal spaceships respectively.
Definition: In a gameplay of life, an "infinite lifeline" is a sequence of pairs (c_i,n_i) such that each c_i is alive in generation n_i and either c_(i+1)=c_i or c_(i+1) is adjacent to c_i.
Lemma ("Two Forbidden Directions"): Let x,y be any two 'forbidden' directions from among N,S,E,W,NE,NW,SE,SW. In any gameplay of life that starts finite and doesn't die out, there is an infinite lifeline that never goes in either direction x or y.
The lemma's proof uses biology. Say that (c,n) is a "father" of (c',n+1) if c' is the cell adjacent to c in direction x or y. Otherwise, (c,d) is a "mother" of (c',n+1). By the rules of the game of life it's easy to show every living (c,n+1) has at least one living father and at least one living mother. It follows (modulo some more details) that since the gameplay doesn't die out, there must be an infinite lifeline where each cell is a mother of the next, i.e., an infinite lifeline that never goes direction x or y.
Proof of c/2 orthogonal speed limit: If a spaceship went faster than c/2, say, northward, by the lemma, it would have an infinite lifeline that never goes N or NE. The only way it could ever go northward would be to go NW. Every NW step would have to be balanced out by an eastward step (of which NE is forbidden) or the spaceship would drift west. So every northward step requires a non-northward step, QED.
Proof of c/3 speed limit for diagonal: A diagonal spaceship faster than c/3, say, northeastward, would have an infinite lifeline that never goes N or NE. The only way for it to go northward would be to go NW. Each NW step would need at least two eastward steps in order for the ship to go eastward, QED.
I'm no life expert, but this proof doesn't work for me. Just because there exists some configuration that doesn't generate lifelines in your chosen directions, doesn't mean that any particular configuration doesn't generate lifelines in those directions.
The lemma says that in any configuration which starts finite and never dies out, there DOES exist at least one lifeline which DOESN'T go in a forbidden direction. (There are almost certainly lots of other lifelines which do go in forbidden directions, that's fine.)
Okay, I read that and now I get it. Each live cell has at least three ancestors. Two or fewer of these ancestors come from a forbidden direction, so at least one comes from an allowed direction. Among all of the ancestors from an allowed direction, arbitrarily choose one to be the “mother”. All of the other ancestors are called “fathers”. There's no requirement that a father has to come from a forbidden direction.
I'm thinking about writing a Conway's Life MMO, where you can activate "lanterns" that illuminate rectangles in the grid with Conway Life squares. These lanterns are fueled with "living" Conway Life cells, which are harvested by the player. Sound interesting?
Honestly no. Building an MMO that ran entirely on life would be amazing but probably impossible. Building something that used the trappings of life without having it mean much would just be infuriating. Unless I'm misunderstanding the proposal?
The first thing that comes to mind, is that players would build glider guns that would cause live cells to "fall into" collector cells to gather resources. So it would be a farming/RTS game, except all of the crops and weapons would be emergent from Conway's Life.
I think that's going to be unplayably difficult for anyone except serious programmers, and it will be very difficult to design life structures that have the behaviour you need to make the game fun, and even if you manage it you'll have trouble running them fast enough to perform well. I mean best of luck if you do do it, but it's going to be a real uphill struggle.
You're totally misunderstanding my proposal. (Either on purpose as a troll, or you're just having a particularly obtuse day.)
The life structures you need to start would just be glider guns. The glider guns would fire into walls of "collector cells" which are not Conway Life entities, but are implemented by the game directly.
Likewise, the character is not a Conway Life entity, but implemented directly by the game.
I think that's going to be unplayably difficult for anyone except serious programmers
What I've envisioned might well be appreciated by Life enthusiasts.
A glider gun is still incomprehensibly complex for non-programmers, I fear.
To the extent that resource collection involves stuff happening in Life you will struggle to balance it, and to the extent that it doesn't it's not really life-based at all.
MMOs need two things: mechanics that make them fun as a game, and themeing that will attract players. As far as I can see including a life aspect doesn't help with either.
Maybe a turn-based roguelike where the monsters are lifeforms that follow the rules of GoL. Could either be very interesting or exceedingly boring once it settled into a self-sustaining pattern.
People have been trying to write good games based on Conway's Life for decades, but it seems as if most game mechanisms end up having too steep a learning curve -- mostly because the system is too sensitive, so one bit out of place tends to cause huge, unpredictable, uncontrollable effects.
Seems to me Life has a lot of potential as abstract art, though. I'm almost curious enough about what they're doing in Second Life these days, to sign up and go have a look:
It is. Someone discovered that if you start with that initial pattern you get a "spaceship" that moves. I gather that it is an impressive find both because of how small it is (you'd expect someone or a computer to have already discovered it) and how slow it moves.
At speed c/10, "one tenth of a cell in each iteration", or, better said: after ten iterations, the pattern has moved one cell, but is otherwise unchanged. That's why it is called a glider; it glides through the grid (a puffer stays at its location and returns to its original snape after some time, but periodically sends out gliders)
What's new here, apparently, is that this is the first glider discovered with that speed. It also is surprisingly small.