"Nobody really understands git" is the truest part of that. While hyperbolic, it really has a lot of truth.
It's always a bit frustrating when working with a team because everyone understands a different part of git and has slightly different ideas of how things should be done. I still routinely have to explain to others what a rebase is and others have to routinely explain to me what a blob really is.
In a team of the most moderate size, teaching and learning git from each other is a regular task.
People say git is simple underneath, and if you just learn its internal model, you can ignore its complex default UI. I disagree. Even just learning its internal model leads to surprises all the time, like the blobs that I keep forgetting why aren't they just called files.
The day I got over what I feel was the end of the steep part of the learning curve, everything made so much sense. Everything became easy to do. I've never been confused or unsure of what was going on in git since.
What git needs is a chair lift up that hill. A way to easily get people there. But I have no idea what that would look like. Lots of people try, few do very well at it.
The whole point about abstractions is you shouldn't need to understand the internals to use them. If the best defense of git is "once you understand the inner workings, it's so clear" then it is by definition a poor abstraction.
Who said it's supposed to be an abstraction? The point, theoretically, of something like Git is that the actual unvarnished model is clear enough that you don't need an abstraction. The problem IMO is that the commands are kind of random and don't map cleanly to the model.
There are couple projects that try to tackle this problem by providing an alternative CLI (on top of git's own plumbing), like gitless and g2. Haven't used any of them myself, but would be interested in experience of others.
Any interface means you'll build an mental model of the system you're manipulating. How else could you possibly know what you want to do and what commands to issue?
So given a mental model is inevitable, seems reasonable that that model should be the actual model.
You don't need to understand how media is encoded to watch a movie or listen to a song. You don't need to understand the on disk format of a Word document to write a letter. When writing a row to an SQL database I don't always understand how that software is going to record that data, but I do know I can use that SQL abstraction to get it back out.
> You don't need to understand how media is encoded to watch a movie or listen to a song.
I recall the time when mp3 was to demanding for many CPUs, so you had to convert to non-compressed formats. Today you do need to know that downloading non-compressed audio will cost you a lot of network traffic. Once performance is a concern, all abstractions have to be discarded.
Exactly, if you stick to the very basics with git, you can live a happy life never caring about the internals. If you however want to dig into the depths of Git and use all its power, I don’t get why people don’t think there would be an obvious learning curve.
Same exact thing above applies to so many things in software development, from IDEs, to code editors (Vim/Emacs/Sublime/etc), to programming languages, to deploy tools, the list goes on. There’s a reason software development is classified as skilled labor and not a low end job generally. You’re expected to have knowledge of, or be willing to learn a lot, to do your job.
The difference is that the video model abstracts over the encoding, the git model does not abstract over the storage model, it exposes it. git commands are operations on a versioned blob store.
> I think the longevity of SQL has proved there's value is non-leaky abstracted interfaces.
How is sql non-leaky? To be proficient with sql you have to understand how results are stored on disk, how indexes work, how joins work, etc. To debug and improve them you need to look at the query plan which is the database exposing it's inner workings to you.
You have to know about the abstractions an sql server sits on as well. Why is it faster if it's on an SSD instead of an HDD? Why does the data dissapear if it's an in memory DB?
> To be proficient with sql you have to understand how results are stored on disk, how indexes work, how joins work, etc
No, you don’t. As far as I know, the data is stored in discrete little boxes and indexes are a separate stack of sorted little boxes connected to the main boxes by spaghetti. This is the abstraction, it works, and I don’t need to know about btrees, blocksizes, how locks are implemented, or anything else to grok a database.
You've never had to look at a query plan that explains what the database is doing internally? If not then I wouldn't consider you proficient, or you've only ever worked with tiny data sets.
Have you created an index? Was it clustered or non-clustered? That's not a black box, that's you giving implementation details to the database.
I don’t think being a professional DBA managing an enterprise Oracle installation is isomorphic to the general populace that might use git.
There’s no question that knowing more will get you more, but I think for the question of “when will things go sideways and I need to understand internals to save myself”, one would be able to use a relational database with success longer than git, getting by on abstractions alone. Running a high-performance installation of either is really outside the scope of the original point.
Those things don't generally influence how you structure the query, though - you can choose to structure your query to fit the underlying structure better, or you can modify the underlying structure to better fit your data and the manipulations you are trying to preform.
Yes, most of us will have to do both at some point, but they can be thought of as discrete skills.
This isn't a bad analogy though. Git itself is similar - once you understood the graph-like nature of commits (which isn't all that complicated to begin with), it's generally not hard to skim through a repository and understand its history. Diffing etc. is also simple enough this way.
If, on the other hand, you are working to create said history (and devise/use an advanced workflow for that), it's very helpful if you understand the underlying concepts. Which also goes for designing database layouts - someone who doesn't understand the basics of the optimizer will inevitably run into performance problems, just as someone who doesn't understand Git's inner workings will inevitably bork the repository.
You don't need to know more than sql to manipulate the data. The semantic of your query is fully contained in sql.
You may need to go deeper and understand the underline model if you want performance but sticking to normal form can make unnecessary for a lot of people a lot of the time.
You can have a useful separation of work between a developer understanding/using sql and a DBA doing the DDL part and the optimization when needed.
> Relational databases abstract away the physical nature of the disk, just as file systems do; but instead of storing informal arrays of bytes, relational databases provide access to sets of fixed sized records.
This isn't true. SQLite does not use fixed size records.
This suggests to me that a lot of people who consider themselves proficient with SQL don't know how the results are stored on disk, nor the difference between the SQL model and the actual implementation details, making them not proficient under your definition.
> That is, I am not proficient with relational databases, and I can handwave why an SDD is faster, and why data may disappear from an in-memory DB.
Because you know that information for other reasons as most people would. Just because the information is gained for other reasons does not make it irrelevant when using a database though.
> This isn't true. SQLite does not use fixed size records.
It's actually true of most/all modern databases these days. The point isn't knowing the exact structure the database uses to store it's information (even though it can be useful) but knowing how efficiently it can find the information for any given request. Knowing when a database is doing an index lookup or a full table scan is very important and I wouldn't consider someone that can't make a reasonable guess to be proficient in sql. Many of these details are even exposed in the sql, when you create an index and decide if it's clustered or non-clustered your giving the database specific directions about how the data will be physically stored.
The fact that you need to know anything about how they do their work internally to be reasonably competent at using them makes them a leaky abstraction.
SQL leaks for complex queries and schemas if performance needs to be optimized. I argue virtually all abstractions leak heavily when performance is considered, some more than others. SQL leaks relatively little in comparison to some other technologies IME.
Also, SQL has well-established processes and formalisms to design schemas which generally result in solid performance by themselves. That's what RDBMS are around for, after all: enabling efficient and consistent record-oriented data manipulation. This is quite difficult to do correctly in reality; for example, if you write your own transaction mechanism for disk/solid-state storage, you are going to do it wrong. This is genuinely difficult stuff.
There is a ton of internals that SQL abstracts so well that very few DB programmers know or (have to) care about them. Things like commit and rollback protocols, checkpointing, on-disk layouts, I/O scheduling, page allocation strategies, caching etc.
You seem to be talking about a different kind of leakiness. In my mind, there are two kinds: conceptual and performance leakiness. You are talking about the latter. Pretty much any non-trivial system on modern hardware leaks performance details. From what I understand, git's UI tries to provide a different model that the actual implementation but still leaks a lot of details of the implementation model.
I disagree with that. The point of an abstraction is to not having to know the implementation. Understanding the principles used behind will always lead to a much better use of your abstraction
I'd also say an abstraction could be carrying its weight even if it only reduces the amount you have to think about the implementation details when using it.
To be fair, most ORMs poorly implement the "leaky" principle. When implemented well, like with SQLAlchemy, the end result is a much nicer ORM.
In fact, one of the things in common among the ORMs that have left a bad taste in my mouth is that they all tried to abstract away SQL without leaking enough of it.
Picking the ideal interface to abstract is critically important (and very hard).
In the case of ORMs, available solutions abstract the schema (tables, rows, fields), the objects, or use templates. My solution abstracted JDBC/ODBC. The only leak in my abstraction was missing metadata, which I was able to plug (with much effort!).
My notions for interfaces, modularity, abstractions are mostly informed by the book "Design Rules: The Power of Modularity". http://a.co/hXOGJq1
This might sound a little out of touch, but am I the only one who doesn't think git is that hard? It is a collection of named pointers and a directed acyclic graph. The internals aren't really important once you have that concept down.
That said, I do feel some "porcelain" git commands are poorly named and operate inconsistently -- compared to the plumbing of the acyclic graph concepts which is good but limited.
I mean, one of these looks just a little more straightforward than the other, doesn't it?
Also, a cursory test in a local git repo just now showed that command seems to print out only immediate descendants--i.e., unless that commit is the start of a branch, it's only going to tell you the single commit that comes immediately after it, not the timeline of activity that fossil will--and all it gives you is the hash of those commit(s), with no other information.
I use git myself, not fossil, but if this is something you really want in your workflow, fossil is a pretty clear win.
I don't know why they have the need to retrieve the hash of the descendant commit, but usually what I'm doing is: I use a decent visual tool and just follow the branch (sourcetree).
`git log` stays in the current branch unless you give it the `--all` option. But when you give it the `--all` option the limitation by `<COMMIT>..` does no longer work. So not a solution.
Getting only the changed filenames is a fairly specialised operation. Often in normal use you can get away with a more generic operation that comes close to what you need, but is way more common, e.g.
If you're new to tech or you've got a different mental model of how version control works, getting across the gap to git is a challenge.
My current team are mostly controls engineers, working on PLCs. But the software we're now working with has its configurations tracked in git. These aren't dumb people, they're quite talented, but their education wasn't in CS, and "directed acyclic graph" is not a thing they have a mental model for.
No you're definitely not the only one. Git is one of the simplest and dumbest tools developers have at our disposal. People's inability to conceptualize a pretty straight forward graph is something no amount of shiny UI can ever fix.
Sure, and a piece table is a simple way to represent a file's contents. But if anyone wrote a shell or a text editor that required you to directly interact with the piece table to edit a file—instead of something sane—then they'd rightfully be called out on it. It wouldn't matter how much you argued about how simple the piece table is to understand, and it wouldn't matter how right you were about how simple the piece table is to understand. It's the wrong level of abstraction to expose in the UI.
The only thing Git can really fix is changing it's command flags to be consistent across aliases/internal commands. That's about it. The whole point of an SCM is that graph that you want to move away from. People have asserted your claim many times but can't ever give specific things to fix about the "abstraction."
There are about 5/6 fundamental operations you do in git/hg. If that's too much then again, there's not an abstraction that is going to help you out.
See, you're trying to foist a position on me that isn't mine—that I'm scared of the essential necessities of source control. And you act as if source control were invented with Git. Neither of these are true.
> git/hg
Mercurial was a great solution to the same problem that Git set out to tackle, virtually free of Git's foibles. The tradeoff was a few minor foibles of its own, but a much better tool. It's a fucking shame that Git managed to suck all the air out of the room, and we're left with a far, far worse industry standard.
>Mercurial was a great solution to the same problem that Git set out to tackle, virtually free of Git's foibles.
No, Mercurial's design is fundamentally inferior to Git, and practically the entire history of Mercurial development is trying to catch up to what Git did right from the start. For example having ridiculous "permanent" branches -> somebody makes "bookmarks" plugin to imitate Git's lightweight branches -> now there are two ways to branch, which is confusing. No way to stash -> somebody writes a shelve plugin -> need to enable plugin for this basic functionality instead of being proper part of VCS. Editing local history is hard -> Mercurial Queues plugin -> it's still hard -> now I think they have something like "phases". In Git all of this was easy from the start.
Another simple thing. How to get the commit id of the current revision. Let's search stack overflow:
The problem is, this answer is wrong! This simple command can execute for hours on a large enough repository, and requires write privileges to the repository! Moreover, it returns only a part of the hash. There's literally no option to display the full hash.
The "correct" answer is `hg parent --template '{node}'`. Except `hg parent` is apparently deprecated, so the actual correct way is some `hg log` invocation with a lot of arguments.
I would not call "hg log -r tip" a lot of arguments.
Also, on the git/hg debate, I feel I've had problems (like the stash your modification and redownload everything) more often with git that hg. I mean perhaps it tells something about my capability to understand a directed acyclic graph, but hg seems less brittle when I'm using it.
I disagree with some of your comments, is git stash really essential or unneeded complexity? That's debatable, I never use it personally.
What I don't like in git is the loss of history associated with squashing commits, I would prefer having a 'summary' that would keep the full history but by default would ne used like a single commit.
In git you can use merge commits as your "summary" and `--first-parent` or other DAG depth flags to `git log` (et al) to see only summaries first. From the command line you can easily add that to key aliases and not worry about. I think that if GitHub had a better way to surface that in their UI (ie, default to `--first-parent` and have accordions or something to dive deeper), there would be a lot less squashing in git life. (Certainly, I don't believe in branch squashing.)
The DAG is already powerful enough to handle both the complicated details and the top-level summaries, it's just dumb that the UIs don't default to smarter displays.
(I find git stash essential given that `git add --interactive` is a painful UX compared to darcs and git doesn't have anything near darcs' smarts for merges when pulling/merging branches. Obviously, your mileage will vary.)
>you're trying to foist a position on me that isn't mine
I just said you can't give specifics on what to change, because there isn't much too change.
>And you act as if source control were invented with Git
No I'm not?
>and we're left with a far, far worse industry standard.
Yeah, we definitely should have gone with the system that can't do partial checkouts correctly or even roll things back. Branching name conflicts across remote repositories and bookmark fun! Git won for a reason, because it's good and sane at what it does.
No, the reason is mercurial sucked at performance with many commits at the time, and was extra slow when merging.
Lacked a few dubious features such as merging multiple branches at the same time too.
It has improved but git is still noticeably more efficient with large repositories.
(Almost straight comparison is any operation on Firefox repository vs its git port.)
Git main target is Linux. Obviously. Performance on the truly secondary platform was not relevant and it is mostly caused by slow lstat call.
Instead Mercurial uses additional cache file which instead is slower on Linux with big repos. But happens to be faster in Windows.
And the octopus merge is used by kernel maintainers sometimes if not quite a lot. That feature is impossible to add in Mercurial as it does not allow more than two commit parents.
Which reinforces the position that git should have stayed a Linux kernel specific DVCS, as the Bitkeeper replacement it is, instead of forcing its use cases on the rest of us.
...as I get stares (okay, mostly of fear) if I point out that we need a branch in my workplace. What you can/can't do (sanely) with your tool shapes how you think about its problem space.
To emphasize that even more: Try to explain the concept of an ML-style sum type (i.e. a discriminated union in F#) to someone who only knows languages with C++-based type systems. You'll have a hard time to even explain why this is a good idea, because they will try to map it to the features they know (i.e. enums and/or inheritance hierarchies), and fail to get the upsides.
> Many people complain that Git is hard to use. We think the problem lies deeper than the user interface, in the concepts underlying Git. Gitless is an experiment to see what happens if you put a simple veneer on an app that changes the underlying concepts
> The whole point of an SCM is that graph that you want to move away from.
I think that's an exaggeration. For example, Darcs and Pijul aren't based around a "graph of commits" like Git is, they use sets of inter-dependent patches instead. I'm sure there are other useful ways to model DVCS too.
Whilst this is mostly irrelevant for Git users, you mentioned Mercurial so I thought I'd chime in :)
> The only thing Git can really fix is changing it's command flags to be consistent across aliases/internal commands.
I mostly agree with this: Git is widespread enough that it should mostly be kept stable; anything too drastic should be done in a separate project, either an "overlay", or a separate (possibly Git-compatible) DVCS.
>For example, Darcs and Pijul aren't based around a "graph of commits" like Git is, they use sets of inter-dependent patches instead.
I said graph, I didn't say which graph. Both systems still use graphs. And still a graph you have to understand how to edit with each tool. The abstraction is still the same, and if you have problems with Git, you're going to have problems with either of those tools as well. The abstraction is not the problem, it's the developers inability to conceptualize the model in their head.
You said "that graph" which, in context, I took to mean the git graph.
> Both systems still use graphs
True
> The abstraction is still the same
Not at all, since those graphs mean different things. Each makes some things easier and some things harder. For example, time is easy in git ("what did this look like last week?"). Changes are easy in Darcs ("does this conflict with that?"). Both tools allow the same sorts of things, but some are more natural than others. I think it's easy enough to use either as long as we think in its terms; learning to think in those terms may be hard. For git in particular, I think the CLI terminology doesn't help with that (e.g. "checkout").
> if you have problems with Git, you're going to have problems with either of those tools as well
Not necessarily. As a simple example, some git operations "replay" a sequence of commits (e.g. cherrypicking). I've often had sequences which introduce something then later remove it (bugs, workarounds, stubs, etc.). If there's a merge conflict during the "replay", I'll have to spend time manually reintroducing those useless changes, just so i can resume the "replay" which will remove them again.
From what I understand, in Darcs such changes would "cancel out" and not appear in the diff that we end up applying.
> Where is the exaggeration?
The idea that "uses a graph" implies "equally hard to use". The underlying datastructure != the abstraction; the semantics is much more important.
For example, the forward/back buttons of a browser can be implemented as a linked list; blockchains are also linked lists, but that doesn't mean that they're both the same abstraction, or that understanding each takes the same level of knowledge/experience/etc.
>The idea that "uses a graph" implies "equally hard to use".
What I'm getting at is that if you don't understand what the graph entails, and what you need to do the graph, any system is going to be "hard to use." This idea that things should immediately make sense without understanding what you need to do or even what you're asking the system to do, is just silly.
I've never seen someone who understands git, darcs, mercurial, pijul, etc go "I totally understand how this data is being stored but it's just so hard to use!" I don't think that can be the case, because any of the graphs those applications choose to use have some shared cross section of operations:
* add
* remove
* merge
* reorder
* push
* pull
I see people confused about the above, because they don't understand what they're really asking the system to do. I don't think any abstraction is ever going to solve that.
Git does have a problem with its command line (or at least how consistent and ambiguous it can sometimes be), but you really should get past it after a week or two of using it. The rest is on you. If you know what you want/need to do getting past the CLI isn't hard. People struggle with the former and so they think the latter is what's stopping them.
Can you tell the other guy to not post false and disingenuous statements? Because I'm pretty sure that is what degrades discussions, not any tone I choose to exhibit. I highly encourage you to read the thread thoroughly. If I switched my position on git we wouldn't be having this discussion, as evidenced elsewhere in the thread where people are taking a notably blunter tone than I am just with the side with popular support on this forum.
I posted a bald statement. He replied directly with snide remarks and fallacies. Look at the timestamps and edits. I have every right to be annoyed and make it known that I am annoyed in my posts when the community refused to consistently adhere to guidelines.
Enforce guidelines that keep discussions rational, not because people don't want to be accosted in public for their misleading, emotionally bloated statements.
>Don't say things you wouldn't say face-to-face. Don't be snarky.
"Every day humans make me again realize that I love my dogs, and respect my dogs, more than humans. There are exceptions but they are few and far between." [2]
>Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize.
"Which sort of doesn't matter since everyone thinks GitHub is source management." [1]
>Please don't post shallow dismissals
"You all lost out on "the most sane and powerful" as a result." [1]
"Calling it a sane and powerful source control tool is just not supported by the facts, calling "the most ..." is laughable." [1]
"Calling Git sane just makes it clear that you haven't used a sane source management system." [1]
"Lots of people are too busy/whatever to know what they are missing, maybe that's you. It's not me" [3]
>When disagreeing, please reply to the argument instead of calling names. "That is idiotic; 1 + 1 is 2, not 3" can be shortened to "1 + 1 is 2, not 3."
"Arguing with some random dude who thinks he knows more than me is not really fun." [3]
Some of the things you quoted there are admittedly borderline, but you went much further across the nastiness line. Could you please just not do that? It isn't necessary, and it weakens whatever substantive points you have.
>borderline, but you went much further across the nastiness line.
I didn't insinuate that people are worth less than pets that I bought and own (who can't even choose who to be dependent on) because they don't agree with my perspectives over a piece of software. In what context would this be an acceptable statement to make face to face or in a public setting and you go "well, you know, it's kind of okay to say!"
I'm exceedingly interested in where I crossed that line in a considerable manner because that's one distant line to cross. Next time someone says something I perceive to be incorrect, or they get on my nerves for continually disagreeing with me, I'll be sure to tell them my dog is worth more than them since that's actively being allowed and has a precedent of moderator support.
And for the record, my tone is probably "abrasive" in this post because the above actions and outright blind eye towards outright lies and uncalled for statements is aggravating. I have a feeling you're not doing anything just because of who he is, and not because what he is saying is warranted or even accurate (it's definitely not, as I demonstrated across several different posts).
Exactly, it was no longer mysterious for me after I had to prepare a written branching procedure for our team starting from how to branch off, commit, rebase to doing resets and working with reflog. While doing that I've thoroughly read the official docs, examined lots of examples, created a local repo with a couple of text files to test various commands. An then it became so clear and simple! Especially the reflog – so powerful!
So, my advice is to try to write some instructions for yourself for all the common cases you might run into during your work. It will not only help you realise what you actually need from git, but also will serve as a good cheat-sheet.
I start with simple examples and work up from there. It's based on training I've conducted at various companies, and avoids talk of Merkle trees or DAG.
I am not a git expert or anything, but I have helped resolve weird git issues for my teammates usually using a lot of Google and StackOverflow.
I just know 5 basic commands; pull, push, commit, branch, and merge. Never ran into any issues. People who run into issues are usually editing git log or doing something fancy with “advanced” commands. I have a feeling that these people get into trouble with git cause they issue commands without really knowing what those commands do or even what they want to achieve.
I use submodules every day, never had a problem with them. What do people complain about when it comes to them?
My mental model is basically that they're separate repos, and the main repo has a pointer to a commit in the submodule. Do your work that needs to be done for the submodule, push your changes, and then check out that new commit. Make a commit in the main repo to officially bump the submodule to that new commit. Done.
The annoying part is when you do a pull on the main repo, you have to remember to run git submodule update --recursive.
Because you have the .gitmodules file, the .git/config file, the index, and .git/modules directory, each of which can get out of sync with the others.
If, for example, you add a submodule with the wrong url, then want to change the url, then you instinctively change .gitmodules. But that won't work, and it won't even nearly work.
If you add a submodule, then remove it, but not from all of those places, and try to add the submodule again (say, to a different path), then you also get wierd errors.
If you add a submodule and want to move it to another directory then just no.
Oh and also one time a colleague ran into problems because he had added the repo to the index directly - with git add ..
Oh and let's talk about tracking submodule branches and how you can mess that up by entering the submodule directories and running commands...
But seriously, the fact that there is a .gitmodules file lulls you into a sense that that file is "the configuration file". If you don't know about these other files, then it's natural to edit .gitmodules. When you make errors, the fixing those errors are pretty hard. There is no "git submodule remove x" or "git submodule set-url" or "git submodule mv".
For example, do you know how, on the top of your head, to get an existing submodule to track a branch?
How do you think someone who does not quite understand git would do it? Even with a pretty ok understanding of git infernal, you can put yourself deep in the gutter. (case in point, if you enter the submodule directory and push head to a new commit, you can just "git add submodule-directory" to get point the submodule to the new commit. But if you were to change upstream url or branch or something else in the submodule, you're screwed. That's not intuitive by a long shot)
Edit: git submodule sync is not enough by the way... You can fuck up your repo like crazy even if you sync the two configuration files.
Right, it’s not that hard, but there are some gotchas. The most common problem I see is the local submodule being out of sync with the remote superproject. Pushes across submodules are not atomic. Accidentally working from a detached head then trying to switch to a long out of date branch can be an issue, as can keeping multiple submodules synced to the head on the same branch. Recursive submodules are, as you mentioned, even more fun.
What's the alternative? Managing all dependencies by an external dependency manager does not exactly reduce complexity (if you're not within a closed ecosystem like Java + Maven that has a mature, de-facto standard dependency manager; npm might count, too).
It's absolutely not feasible for C++ projects; all projects that do this have horrible hacks upon hacks to fetch and mangle data and usually require gratuitous "make clean"s to untangle.
I use git sub-trees. Actually I love the thing. They give you a 'linear' history, and allow you to merge/pull/push into their original tree, keeping the history (if you require it).
I never had any problems the past 6 years I've been using Git professionally. But then someone asked me what to do when Git prevents you from changing branches and not knowing they did not stage, I told them to stash or commit. They stashed and the changes were gone.
My point is, while your basic commands do the work, your habits and knowledge keep you from losing code like this without you knowing.
I do like Git, most of the time, but really, not a single problem, in six years?
When using Git daily we never really did anything complicated, just a few feature branches per developer, commit, push, pull-request, merge. Basic stuff. We had Git crap out all the time. Never something that couldn't be fixed, but sometimes the fix was: copy your changes somewhere else, nuke your local repo, clone, copy changes in and then commit an continue as normal.
I’ve been using git since 2007 and never ever even wanted to try nuking a checkout and starting over to recover from anything, much less did so. (Did have a nameless terrible Java ide plug-in do it for me once.)
I think it's the most important sources of my cognitive dissonance around git. It strengthens the illusion that a working directly is somehow related to a git store, which it really isn't.
You have a working directly/checkout - that can be: identical (apart from ignored files) to some version in git; or different.
If it's different ; some or all changes can be marked for storing in the git repo - most commonly as a new commit.
It's a bit unfortunate that the repo typically is inside your work directory/checkout - under '.git' along with some files like hooks, that are not in the repo at all...
I use `git config pull.rebase true` too, but that doesn't mean you _have_ to stash first, just as rebase manually wouldn't - depends if there's a conflict.
It's quite a considerable saving. I suppose by "fix UX" you mean make it so the saving would be less anyway, but I think really they're just conceptually different:
- branch: pointer to a line of history, i.e. a commit and inherently its ancestors
- stash: a single commit-like dump of patches
If stashing disappeared from git tomorrow, I think I'd use orphan commits rather than branches to replace it.
`cherry-pick` is just plucking a single commit and adding it the commit history of the current branch, and `rebase` is what civilized people use when they don't want merge commits plaguing their entire code base.
merge is what civilized people who care about getting history and context in their repository use ;) ... I worked a lot in git using both rebase and merge workflows and I'll be darned if I understand the fear of the merge commit ... If work happened in parallel, which it often does, we have a way of capturing that so we can see things in a logical order ...
Polluting the master repo with a bunch of irrelevant commits isn't giving you context, it's giving you pollution. There's nothing to fear about merge commits. It's about wasting everyone's time by adding your 9 commits to fix a single bug to the history. I work on teams, and we care about tasks. The fact that your task took you 9 commits is irrelevant to me. What is relevant is the commit that shows you completed the task.
It's not really a fear of the merge commit. In a massively collaborative project, almost everything is happening in parallel, and most of that history is not important. The merge makes sense when there is an "official" branch in the project, with a separate effort spent on it. It's likely that people working on that branch rebase within the branch when collaborating, and then merge the branch as a whole when it is ready to join the mainstream.
Ah, you can learn the beauty of merge AND rebase at the same time then...
Here to 'present' feature branches, we take a feature development branch will all the associated crud... Once it's ready to merge, the dev checkouts a new 'please merge me' branch, resets (or rebase -i --autosquash) to the original head, and re-lay all the changes as a set of 'public' commits to the subsystems, with proper headings, documentation etc.
At the end, he has the exact same code as the dirty branch, but clean...
So he merges --no-ff the dirty branch in (no conflicts, same code!) and then the maintainer can merge --no-ff that nice, clean branch in the trunk/master.
What it gives us is a real, true history of the development (the dirty branch is kept) -- and a nice clean set of commits that is easy to review/push (the clean branch).
Sometimes I want to take a subset of the commits out of a coworker's merge on staging to push to production, and then put all non-pushed commits on top of the production branch to form a new staging branch. I find having a linear history with no merges helpful for reasoning about conflict resolution during this process. What advantages do merged timelines give in this context?
What I like about merges it that it shows you how the conflicts were resolved. You can see the two versions and the resolved and you can validate it was resolved properly. With a rebase workflow you see the resolutions as if nothing else existed, you can't tell the difference between an intentional change and a bad resolution...
Yes, my direct team is small of 4 devs but the main repo we work on is used by 100+ devs. We use git workflow (new branch for each feature) for the main repo and github style workflow (clone and then submit PR) for some other repos.
The number 1 reason my team has not moved from Subversion to Git is we can't decide what branching model to use. Use flow, don't use flow, use this model, use that model, no, only a moron would use that model, use this one instead. Rebase, don't rebase, etc. No doubt people will say that it all depends on the project/team/environment/etc., but nobody ever says "If your project/team/environment/etc. look like this, then use this model." So we keep on using Subversion and figure that someday we will run across information that convinces us that it is the one true branching model.
I have another solution: just switch to mercurial. I switched some big projects to mercurial from svn many years ago. Migration was painless, tooling was similar but better, the interface is simpler than git, and haven't regretted it once.
This is the path I took for a few projects years ago when Google Code didn’t support git.
Switched to mercurial from svn and workflow was painless for the team. Interestingly, we slowly started adopting more distributed techniques like developer merges being common. With svn, I think I was the only one who could merge and it would be rare and added product risk.
Then after about a year of mercurial we switched to git and our brains had adapted. Our team was small, 5-10 people.
Somewhat relatedly, in 2002, I worked in a large team of 75 people or so with a large codebase of a few hundred thousand lines of active dev. It used Rational ClearCase had “big merges” that happened once or twice a release with thousands of files requiring reconciliation. There was a team who did this so it was annoying to dev in, but largely I didn’t care.
Company went through layoffs and the team was down to one. He quit, the company couldn’t merge, so couldn’t release new software versions.
There was a big crisis so they went to the architects and pulled a few out of dev work. It turns out I was the one who could figure it out and dumb enough to admit it.
That sucked. It took a few weeks to sort out and modify our dev process to make merges easy and common. But it was not fun. Upside is we ended up not having any “non-programmer” op/configuration management people since the layed off/quit team were ClearCase users, who didn’t code.
Moral- don’t let people know you can do hard, mundane tasks.
> but nobody ever says "If your project/team/environment/etc. look like this, then use this model."
Honestly, its because a lot of it comes down to preference and what value you gain from using version control. It is very much like code style standards -- it doesn't matter what is in the standard so much as your teammates all using the same one.
If part of the blocker for your team is that no one is experienced enough with git to have a strong opinion, I'd be happy to brainstorm with you for an hour to learn about your current process and offer a tailored opinion.
Why not replicate whatever you are doing in Subversion in Git? You'll still be able to take advantage of the better merging algorithms, while maintaining whatever political momentum seems to be driving the team's decisions.
If it is import to switch to Git, I suggest a technical leader, imbued with authority from management, make those decisions and just do it. However, I don't necessarily think a team should switch away from Subversion if it's working for them.
> everyone understands a different part of git and has slightly different ideas of how things should be done
This was a big problem that bugged me too, so for every team I've worked with I've created a few scripts for the team's most common version control operations.
Most devs, including me, are pretty lazy so they'd all rather run this script than go to Stack Overflow to figure out git arcania.
This helps standardize conventions too: Feature branches/linear DAGs/topic branches/dev branches/prod branches/whatever weird thing a team does they all just do that using the script so it's standardized.
Rebase is "rewind local changes" "pull" "replay local chances"
Basically it makes it so that all of the local-only commits are sequenced after any remote changes that you have not seen yet.
[edit]
YZF is correct. In the context of pulling (i.e. "git pull --rebase") my description is correct. However in general rebasing branch X to Y that diverge from commit C is:
rewind branch Y to commit C; call the old tip of Y Y'
"pull" might be the first thing I'd throw out, if thought there was any hope of fixing git ux. Then add a working merge --dry-run #do i have conflicts?.
I think a default of --ff-only would be fine for pull. This is great for when I'm merely a consumer of a project, and will never silently perform a merge or rebase.
rebase can do a lot more. Try `git rebase -i` to squash smaller commits, edit the commit msg, or even drop a commit before you push it to your colleagues.
Last time our devop did 20 commits to get something on elasticbeanstalk right, I squashed it all into just one clean commit that got merged into master branch.
It will help you to commit more often without worry until the moment you have to hand in your work.
Rebase is a controversial history altering operation and makes it easy to paint yourself into a corner and get weird error messages or wrong results. Its very different from pull/merge.
History altering is only controversial on things that are published. There is nothing wrong with reordering, combining or splitting your local commits to give more clarity to what you are doing. Keeping this in mine will give you the freedom to commit frequently.
This confusion happens because many popular SCMs historically have the "commit" and "push" operation in a single step. Git keep them separate.
There is no tracking by git on what is published, so it's easy to make the mistake of rebasing things that are published and shared by others. Then you will have a bad time later when you try to sync with others, possibly days later.
Um... git kind of does with remote tracking branches. You can also make it very obvious by your workflow? If you use local feature branches (which you should for juggling between development tasks, etc.), what you are working on vs what's upstreamed should be pretty clear. Sounds like you are not using local branches.
Not using local branch is another confusion caused by the perspective of historical/traditional SCMs (people thinking branches are the domain of a centralized server and are outside of their control.)
Often you want to push changes to a remote, but not yet merge or PR them to upstream.
Keeping "local feature branches" just on your dev machine is bad for many many reasons:
- you want to encourage low barrier cooperation in your team -> sharing changes
- you want changes to the CI pipeline early so the potentially slow testing machinery works in parallel with the developer
- you want to keep the team up to date on what changes you make
- you don't want to lose work if the machine/OS dies, or the developer leaves/becomes sick/goes on a 4 week vacation during which they forget their disk crypto password
So, in practice you can try to use rebase opportunistically, when out of chance your WIP work is still unpushed because the change was only made very recently. This is error prone. Or you can rebase published branches explicitly, by destroying the original branches in the PR merge phase. But all this is big bother if the purpouse is to just beautify history and at the same time hide the real trial and error that went into making the changes.
Did you notice that y2kenny was talking about how, if you use local feature branches, then the remote tracking branches make it really clear what's been published vs not? The implicit meaning is that we should use local feature branches but also publish them to the repo while we're working on them.
But maybe to you, 'publish' means 'publish to master'? In that case I can assure you, they are not necessarily the same thing. I regularly work on a local feature branch, publish that branch to the shared repo, rebase it on top of master, then force-push to the shared tracking branch. When I'm done I merge it into master and don't rebase master on top of anything.
I'm not sure if you are being serious? The answer is that published advice on rebase overwhelmingly warns against rebasing published code, and for good reason.
I LOVE rebase but when I run into merge conflicts I rather `rebase --abort` and leave that merge commit as it is.
But those instances are rare and having a merged branch's commits nice and compact in the log makes me happy every time.
What I find ironic is that github is massively popular as a central way to use a distributed version control system. The distributed nature only adds to the complexity and I am sure it is only used by a fraction of git users.
Yes...? What's surprising about using a central repo to collaborate? There needs to be a single source of truth for a coherent project, otherwise you're just going to have chaos.
The distributed nature of git led to the simple and secure contribution model of everyone working on their own repos and not needing to give write access to anyone else. This pretty directly led to an explosion of open source software.
Is there any really good tutorial on git that teaches the internal model? Ideally, it would illustrate each command and show the before and after of the internal objects.
https://learngitbranching.js.org/ is the best guide I've seen. It shows you the complete commit graph and all refs on that graph, and updates the graph when you type in commands. It covers and displays workflows involving remotes as well.
Indeed. When the article said "younger developers only know git" I immediately thought, no, they don't know anything. These people don't even know what a DAG is. Git was made for people who know these concepts. I've tried explaining git to people and they just don't understand. They just don't.
What's annoying is that git is just expected knowledge these days and having a github account is enough to claim it. There's not a good way to sell the fact that you're a bit more into it than that.
I've even said to git "experts" that branches should really be called refs and their eyes glaze over. It's difficult for me to understand what git is in their heads.
I started naming branches 'post-its', as to me that's what they are, labels you place on the real 'branches' (the commit tree). You can take them of easily, move them, discard them, whatever you want. They are just volatile.
A symbolic ref is a ref that points to another ref instead of a ref that points to a commit. `HEAD` is a symbolic ref. (It should be your only symbolic ref.)
Hrm, but a ref is a file containing a hash, right? So if the hash is equivalent to the file, the surely a ref is equivalent to a symlink? A symbolic ref, in turn, should be a symlink to a symlink... Or something like that...
Git is the solution to the problem of doing distributed development on the Linux kernel. People who aren’t doing that, I wonder if they’re entirely clear in their own minds why they use it. I’m certainly not... other than that it’s just the default choice these days, the path of least resistance...
It's always a bit frustrating when working with a team because everyone understands a different part of git and has slightly different ideas of how things should be done. I still routinely have to explain to others what a rebase is and others have to routinely explain to me what a blob really is.
In a team of the most moderate size, teaching and learning git from each other is a regular task.
People say git is simple underneath, and if you just learn its internal model, you can ignore its complex default UI. I disagree. Even just learning its internal model leads to surprises all the time, like the blobs that I keep forgetting why aren't they just called files.